OECD releases framework to evaluate AI capabilities against human skills

The OECD’s AI Capability Indicators represent a shift from traditional performance benchmarks toward a framework rooted in human abilities. By evaluating AI systems against criteria such as language use, social reasoning, and problem-solving, the indicators aim to offer a clearer picture of what current technologies can and cannot do

OECD releases framework to evaluate AI capabilities against human skills

The Organisation for Economic Co-operation and Development (OECD) introduced a structured framework to assess the capabilities of artificial intelligence (AI) systems in comparison to human abilities. The report, titled Introducing the OECD AI Capability Indicators, presents a set of nine indicators designed to provide evidence-based evaluations of AI performance across various cognitive and physical domains.

The indicators were developed as part of the OECD’s Artificial Intelligence and Future of Skills (AIFS) initiative, a long-term project aimed at informing education, labour, and innovation policies. The work draws on contributions from over 50 experts in computer science, psychology, and educational research, and has been released in beta form for consultation with researchers and policymakers.

Measuring AI in human terms

The OECD framework focuses on nine domains: language, social interaction, problem solving, creativity, metacognition and critical thinking, knowledge and memory, vision, manipulation, and robotic intelligence. Each domain is assessed using a five-level scale. Level 1 includes tasks that are widely considered solved by existing AI, while level 5 represents a level of performance equivalent to humans in real-world contexts.

The assessment does not aim to predict technological developments but instead offers a standardised method for describing current capabilities based on existing systems. The indicators draw on formal tests, benchmark data, expert judgement, and peer review.

For example, large language models such as GPT-4o are rated at level 3 for language, due to their ability to handle multiple modalities and demonstrate advanced semantic understanding. However, they fall short of higher ratings due to issues like hallucinations and limited dynamic learning. In contrast, capabilities in areas such as robotic intelligence and manipulation remain at level 2, reflecting constraints in adaptability and generalisation.

A tool for education and employment analysis

One of the report’s key purposes is to support policy decisions by mapping AI capabilities to the skill demands of occupations and educational goals. For instance, the indicators help identify which human tasks in teaching might be supplemented or restructured as AI systems become more capable. This approach allows for more precise analysis than previous methods that focused narrowly on whether jobs or tasks could be automated.

The framework also enables the identification of gaps between AI performance and human roles. These gaps can inform planning in workforce development, curriculum reform, and public investment in technology infrastructure.

Limitations and future work

The OECD acknowledges several limitations in the current version of the indicators. In many domains, especially those involving social reasoning or physical dexterity, there is limited benchmarking data. The report emphasises that these indicators should not be treated as definitive rankings but as a foundation for systematic monitoring and analysis.

Planned next steps include the development of new benchmark tests, formal expert surveys, and regular updates to the indicators. An online repository has also been launched to gather additional evidence and encourage contributions from researchers.

A reference for policy, not prediction

The report positions the OECD as an independent body capable of providing transparent assessments of AI systems’ performance. Unlike industry-led benchmarks or purely technical evaluations, the OECD indicators are designed to communicate developments in AI in a way that is accessible to policymakers and relevant to broader societal outcomes.

By grounding AI assessment in human capabilities rather than in benchmarks alone, the framework offers a way to track progress and respond to developments without relying on speculative or promotional narratives. It also provides a basis for evaluating claims about artificial general intelligence (AGI) with reference to empirical data rather than abstract definitions.

The report is available on the OECD’s website, along with accompanying documentation and data collection tools. Feedback from researchers, educators, and policy practitioners will inform the final version of the indicators, expected to be released following further review.

Go to Top