Minbook
KO
Re-reading the Stanford AI Index 2026 — Why It Feels Weaker Than Last Year

Re-reading the Stanford AI Index 2026 — Why It Feels Weaker Than Last Year

MJ · · 8 min read

Reading the 2026 AI Index left a recurring impression — this year's edition lands softer than last year's. This piece chases that impression. Side by side: the report's headline indicators, and the shifts that landed outside it during the eight weeks before publication.

What kept coming back while reading

Stanford HAI released the 2026 AI Index Report in mid-April. Ninth edition. More than 500 pages, packed with quantitative data. The headline message is simple: the technology is accelerating, while the systems around it lag behind.

I look forward to this report every year. This time, the same impression kept coming back while reading. It feels weaker than last year.

There are plenty of new numbers. The arguments are still there. But the sense I had last year — that one report could put the AI landscape in front of me at a glance — is dimmer this time.

This piece is in two parts. The first walks through what the report actually says, in numbers. The second lists what the report could not catch — the seismic changes from Q1 and Q2 of 2026. Then a closing reflection on why the report feels weaker, which is really a question about cadence.


Part 1 — What the report says

Acceleration is real

AI capability indicators are not flat. They are still climbing.

MetricLast yearThis year
SWE-bench Verified (coding benchmark)~60%nearly 100%
Generative AI adoption inside organizationsmid 50s%88%
College students using generative AI~65%~80%
AI agents on computer-use tasks~12%~66%

A handful of indicators jumped by significant margins in a single year. Coding benchmarks are saturating. The leap of agent computer-use success from 12% to 66% reflects how fast 2025 agent research moved. That said, the 66% still means roughly one failure in three.

The US-China gap has effectively closed

At the performance frontier, the gap between the United States and China has practically vanished. As of March 2026, Anthropic’s lead is reported at about 2.7%. South Korea ranks first in AI patents per capita.

What was a forecast last year is a record this year. The change matters less than the fact that it has now been formalized.

A jagged frontier

The report introduces a phrase: jagged frontier. AI capability does not improve uniformly. The variance across tasks is large.

A model that wins International Mathematical Olympiad gold can read an analog clock with only 50% accuracy. SWE-bench coding is near saturation, while computer-use sits at 66%.

The takeaway for AI strategy is concrete. The right question is not “is AI capable?” but “is AI capable for this specific task?” Each task needs its own check.

Responsible AI keeps falling behind

The gap on the safety and governance side widened.

  • AI incident reports: 233 last year, 362 this year (+55%)
  • Confirmed trade-off: pushing safety up tends to drag accuracy down
  • Country-level trust in AI regulation: the US sits at the bottom of the surveyed group at 31%

Safety issues are now showing up in the data, which is itself progress. But against capability growth that runs at multiples, +55% in safety infrastructure is too slow. The gap is widening, not closing.

The economic paradox

MetricValue
2025 US AI investment~$285.9B
2025 China AI investmentabout 1/23 of the US
Inbound AI researchers to the USdown ~89% YoY
US developers aged 22-25 employmentdown ~20%
Demand for senior (55+) developersrising
Customer support and SW productivity+14–26%

Money is pouring in while talent is leaving. Productivity is rising while entry-level hiring is falling. These opposite-direction movements running side by side are the actual face of the 2025 AI economy.

The expert-public divide

The last big indicator is perception. 73% of AI experts expect AI to have a positive impact on employment. Only 23% of the general public agrees. A 50 percentage-point gap. Insider and outsider views are now provably split. The report devotes several chapters to what this means for policy and product.

Quantitatively, the 2026 AI Index is still a rich document. SWE-bench saturation, the agent jump, the closing US-China gap, the jagged frontier, the widening safety gap, the entry-hiring shock, the expert-public split. Any one of those seven could anchor a strategy report.

By the time you finish reading, though, a different impression sets in. Many of these numbers are confirmations of things you already knew. For anyone who tracked AI through 2025, the surprise is small.


Part 2 — What the report misses

This is the heart of the piece. Stanford’s AI Index typically uses a data cutoff sometime between late December and early January. The report comes out in April. The three-to-four month gap between cutoff and publication matters much more in 2026 than it used to.

In the eight weeks from late Q1 through early Q2 2026, the field saw at least six events that count as structural shifts. The report includes none of them.

An agent orchestration paradigm reversal

In the first week of April, Anthropic released three things in 96 hours (related post). On April 4, Claude subscriptions were blocked from third-party agentic tools. On April 7, Claude Mythos Preview was unveiled, distributed only to selected partners without a general API. On April 8, Claude Managed Agents went into public beta, with Anthropic hosting the agent runtime directly.

Around the same time, the Advisor Tool was added to the official documentation (related post). The pattern — small model drives, large model advises — is a reversal of the traditional planner-executor structure.

The agent success rate of 12 → 66% in the report comes from data that predates this shift. The orchestration realignment of Q1 and Q2 2026 is absent. It will appear in the next edition (April 2027), by which point it will be a settled topic rather than a live one.

Agent benchmark credibility cracks

In mid-April, UC Berkeley RDI published “Exploiting the most prominent AI agent benchmarks.” It showed that major agent benchmarks are adversarially exploitable. Scoring high on these benchmarks no longer maps cleanly onto having higher capability.

Around the same time, agent-as-judge benchmarks like AJ-Bench started appearing in volume. Evaluation itself now needs to be orchestrated.

The benchmark numbers the report leans on were measured before this benchmarks-in-flux moment. The next edition will likely add a section on benchmark methodology in crisis.

Consulting becomes AI infrastructure

Late in April, OpenAI announced it was rolling out Codex to enterprises with seven global consulting firms. McKinsey released a report on agent AI in marketing.

The report’s employment chapter centers on developers and customer support. But in Q2 2026, consulting is one of the fastest-reshaping fields in AI. Not AI as a tool for consulting — consulting as a distribution channel for AI. That is the reversal. The report cannot cover it because the announcements landed after cutoff.

A model price pyramid stabilizes

In early 2026, the price ratios between Opus, Sonnet, and Haiku stabilized at roughly 5x and 19x. This ratio is what makes patterns like the Advisor Tool economically viable.

The report’s investment and pricing chapter covers macro investment volume ($285.9B), but it does not address how model pricing structure now drives orchestration architecture. That structure stabilized in late Q1 2026.

Claude Opus 4.7 and 1M context

On April 17, Anthropic released Claude Opus 4.7. The headline features were a 1M-token context window and stronger reasoning. A 1M context window redraws the line between fine-tuning and in-context learning. Research like SKILL0 only stands up after this model shipped.

The report covers data from before Opus 4.7. Agent design patterns shaped by 1M context are not in it.

Gartner’s 2028 security forecast

Gartner published an early 2026 forecast that 25% of enterprise generative AI applications will face five or more security incidents per year by 2028. The number has since become one of the most-cited stats in enterprise AI conversations.

The report’s “AI incident reports: 362” figure is a 2025 datapoint. Gartner’s projected 2028 incidents run into the thousands. The scale jump is implicit nowhere in the report.

Eight weeks of negative space

The six shifts above all reshape the 2026 AI landscape. Laid out month by month, the flow becomes clearer.

timeline
    title 2026 changes by month
    section 2025-12 / 2026-01
        AI Index cutoff
    section 2026-02
        Model price pyramid stabilizes
    section 2026-03
        Agent orchestration papers accelerate
    section 2026-04
        Apr 4 : Anthropic blocks third-party harnesses
        Apr 7 : Claude Mythos Preview
        Apr 8 : Managed Agents public beta
        Apr 13 : Berkeley benchmark exploit report
        Apr 14 : AI Index 2026 published
        Apr 17 : Claude Opus 4.7 released
        Apr 22 : OpenAI Codex × 7 consulting firms

The publication date (April 14) itself is already before Opus 4.7 (April 17). For the report to function as a complete description of the current AI landscape, that current would have to be moving more slowly than it actually is.


Why it feels slow

The report’s quality has not declined. The perception has. The reason sits in the production cycle.

From 2018 through 2023, an annual cycle was enough. Major AI changes happened on roughly an annual rhythm. New model generations cycled in 12 to 18 months. Benchmark refreshes once a year could keep pace.

From 2024, the spacing between major events shortened. GPT-4, Claude 3 and 3.5, Llama 3, Gemini 1.5 — all of them landed at intervals well under a year. Even so, model releases were the main events, and the report could continue to do its job by quantifying performance changes.

From 2026, the type of change shifted. The headline events are no longer model releases but structural reversals — Advisor, Agent-as-judge, on-the-job learning. The agenda moved from benchmark refresh to benchmark credibility. Pricing structure, not technical metrics, is starting to determine architecture.

An annual report cannot cover these three classes of change. Structural reversals in particular only matter if caught at the moment they happen. An annual report can only catalog them after the fact.

The argument is not to scrap the report. Annual depth and comparability still have value. But the report needs companion cadences.

ProductCadenceRole
AI Index annualOnce a yearComparable baseline
AI Index Quarterly DeltaOnce a quarterCaptures the past three months of structural shift
AI Index PulseOnce a monthShort updates on benchmarks, pricing, adoption

Stanford HAI does not need to run all three. The annual report can stay as is, while quarterly and monthly cadences come from elsewhere — the Anthropic Economic Index, Hugging Face leaderboards, private analytics firms. What is no longer true, as of 2026, is the assumption that one annual report can capture the full landscape.


How to read the report

A few practical adjustments make the 2026 AI Index more useful.

Read it as a snapshot. Use it as a quantitative anchor for late 2025 through January 2026. Do not extrapolate it as the current landscape.

Trust the trend directions. Treat the absolute numbers as already aging. The agent success rate of 66% reflects late 2025; the Q2 2026 number is likely higher.

Maintain a separate list of what the report did not cover. The six shifts above are this year’s list. Next year, the quality of the report can be judged in part by how much of this list it absorbs.

The perception data — experts at 73% versus the public at 23% — is the most durable part of the report. Those numbers move slowly. Social change has a longer half-life than structural change.


Closing — the report’s shelf life

The 2026 AI Index is not a weak report. Compiling more than 500 pages of quantitative data once a year is itself a meaningful exercise. What is no longer reasonable is the expectation that one report will cover the AI landscape on its own.

Through 2023, one report was enough. In 2024 and 2025, one report plus a curated set of model releases. From 2026, the report is a baseline, with the live landscape assembled from other channels.

A yearly report’s freshness is highest just after publication and falls off quickly within three months. April publication means that without companion sources, the picture starts to misfire after July. Better to plan for that shelf life upfront.

The closing claim of this piece is this. The 2026 AI Index still has value. The problem is not that the value shrank. The fraction of the landscape one report of this size can cover has shrunk.

The “feels weak” sensation comes from this. The report stayed the same. The AI landscape kept expanding past its edges.


References

Share

Related Posts