US vs China frontier model Elo over time

A chart from the U.S. Center for AI Standards and Innovation tracks frontier model capability (Elo score) against release date from January 2024 through May 2026, with separate trend lines for the US and Chinese ecosystems.

What the chart shows

US trajectory (low to high): OpenAI GPT-4o (Jan 2024, ~100 Elo) → Anthropic 3.6 Sonnet → OpenAI o1 → OpenAI o3-mini → OpenAI o3 → Anthropic Opus 4 → OpenAI GPT-5 → OpenAI GPT-5.2 → OpenAI GPT-5.4 (~1100) → Anthropic Opus 4.6 (~1100) → OpenAI GPT-5.5 (~1250, current peak).

China trajectory: DeepSeek R1 (Jan 2025, ~150) → Alibaba Qwen3 → DeepSeek R1-0528 → Alibaba QwQ → DeepSeek V3.1 → Kimi K2-Thinking → Kimi K2.5 → DeepSeek V4 Pro (~800, current peak).

Observations worth keeping

  • Similar slopes, staggered starts. Both lines progress at roughly the same rate — the popular narrative of China "catching up faster" doesn't really show up in the slope. What's there instead is a starting-point lag: the Chinese frontier is roughly 6–9 months behind on the time axis.
  • Current capability gap is about 450 Elo. The US line touches ~1250 by mid-2026 while the top Chinese model (DeepSeek V4 Pro) sits around 800.
  • Elo is not absolute capability. A leaderboard score is a coarse compression of behavior across tasks. Two models with similar Elo can perform very differently on specific workloads.
  • Linear extrapolation does not close the gap. If you draw both trends forward at their current slopes, the distance stays roughly constant rather than narrowing.

Caveats

This is a public-leaderboard, public-release view. Closed lab models, internal evaluation snapshots, and unreleased systems are not in the picture. Whatever conclusions you draw from the chart should be held loosely on that basis.