AI Notes — April 30

Intel and the CPU narrative for the agent era

LLM inference itself — matrix multiplies, KV cache, attention — is firmly a GPU job. Running it on CPUs would be orders of magnitude slower. So the chain "inference inflection point → buy more CPUs" doesn't hold up directly. But what the article is actually pointing at is the peripheral workloads around inference, and those genuinely are CPU-heavy and tightly coupled to the agentic AI wave.

First, agent and tool-use execution environments. Coding agents — Claude Code, Devin, and the rest — generate code that has to actually run in sandboxes: compile, test, container start, file I/O. All CPU. A single agent task can spin up dozens of sandboxes, each backed by CPU cores. OpenAI, Anthropic, and the coding companies are scaling out this sandbox infrastructure at huge volume.

Second, RL training environments and rollouts. RL gyms and SWE-bench-style environments parallelize thousands of episodes; the model inference runs on GPU but each environment step runs on CPU. The more central RL becomes, the more the CPU/GPU ratio bends back toward CPU.

Third, serving infrastructure itself: tokenization, request routing, batching, KV cache management, retrieval (vector DB, BM25), safety filters, logging. Every GPU server needs a stack of CPUs in front and behind it just to be kept fed.

Fourth, the refresh cycle argument: the COVID-era 2020–2021 CPU buildout is hitting natural end-of-life, and for the past two years budgets have all gone to GPUs with CPUs on bare-minimum maintenance. Even with mild demand growth, a backlog refresh can produce a real CPU crunch window. That part is fairly solid.

So the more accurate framing: LLM inference itself doesn't eat CPU, but the AI system as a whole — especially once it goes agentic — does, and the past two years severely underinvested in that layer. The "compute demand up a million times in two years" line has froth, but "agent era CPU demand curve is bending up" is directionally correct, just nowhere near as steep as GPUs.

Who the major CPU players actually are

Datacenter CPU is more crowded than the GPU side. Three tiers.

Tier 1: x86 duopoly. Intel (Xeon) and AMD (EPYC) still ship most servers outside hyperscalers. AMD has gone from under 5% datacenter share to roughly half of datacenter CPU revenue in five or six years, riding EPYC's lead in core count, efficiency, and price/performance. Intel is one of the biggest losers of this AI cycle — process trailing TSMC, Xeon losing share on cloud — and is restructuring under Lip-Bu Tan, splitting off foundry, refocusing on products.

Tier 2: ARM, the real disruption variable. Hyperscaler in-house ARM CPUs are the most important silicon trend of the past five years.

AWS Graviton: earliest and most aggressive. New AWS capacity is now majority Graviton; Lambda, ECS, RDS default to it.
Google Axion: announced 2024, GCP migrating internal workloads onto it.
Microsoft Cobalt: Azure's own design, paired with Maia (AI accelerator).
NVIDIA Grace: paired tightly with GPUs (Grace Hopper, Grace Blackwell). NVIDIA wants to ship GPU + tightly-coupled CPU as one box.
Ampere Computing: the main independent commercial ARM server CPU vendor. Oracle Cloud uses them heavily, but they're getting squeezed by hyperscaler in-house silicon.

The ARM camp eats directly from Intel and AMD's plate. When hyperscalers can design their own, fab at TSMC, and skip x86 licensing and channel margins — and get better efficiency — there's no reason not to.

Tier 3: China and specialty. Alibaba T-Head (Yitian 710) — internal cloud, ARM. Huawei Kunpeng — domestic enterprise/government. Marvell and Broadcom — don't sell general-purpose CPUs but help hyperscalers with custom ARM silicon (Marvell reportedly involved in parts of Google and Amazon's programs).

Back to AI inference-adjacent CPU demand: if you're OpenAI or Anthropic and you don't fab your own, the fastest way to scale sandbox and RL environment capacity is renting from AWS / GCP / Azure — which means the demand actually flows through Graviton / Cobalt / Axion, not necessarily Intel's P&L. Enterprise on-prem and the cloud's "non-self-designed" regions are where Intel and AMD directly benefit. So Lip-Bu Tan's number deserves a discount.

Rough probability ranking of the real winners: AMD ≈ hyperscaler in-house ARM > Intel > Ampere. Intel is "base intact, share keeps getting nibbled" — not "rocketing on the AI tailwind."

How much GPU/CPU does one humanoid robot actually need

Two completely different stories: onboard, and cloud.

Onboard compute

What's actually packed into the chest cavity of a humanoid robot is far less than you'd think. Current mainstream:

NVIDIA Jetson: 80%+ of humanoid robots use this. Jetson Orin (last gen, 275 TOPS) is the most common today; Jetson Thor (mass production 2025, 2070 FP4 TOPS, designed specifically for humanoids) is becoming the new standard. Figure, Boston Dynamics, Agility, 1X, Apptronik all on this line.
Tesla Optimus: in-house silicon derived from FSD/Dojo lineage.
Domestic Chinese line: Unitree, AgiBot mix Jetson Orin with Horizon and Cambricon NPUs, scattered supply due to export controls.

A typical configuration: one main SoC (Jetson Thor class) for VLA model inference and visual perception — GPU + CPU integrated, basically half a workstation. Plus several microcontrollers / real-time control CPUs (ARM Cortex-R / x86) for joint servo, force feedback, safety logic — these need millisecond latency, can't host an LLM. No standalone CPU server.

So "how many GPUs and CPUs per robot" — the main compute is one Jetson Thor with GPU + CPU + NPU all in one package, not the discrete-stack pattern of a datacenter. Power budget is around 100W because the battery only holds so much.

Cloud training

This is the heavy compute:

Training VLA foundation models (Figure's Helix, Physical Intelligence's π0, Google's RT-2 line): industry norm is hundreds to a few thousand H100/H200. Figure hasn't published numbers, but its Microsoft and NVIDIA relationship suggests it's running on Azure H100/B200 clusters.
Training simulation/RL policy: another GPU layer for massively parallel rollouts.
Tesla: Dojo + H100 hybrid, in-house supply.

Simulation — this is the CPU hog

Back to the CPU thread, this is where robotics companies actually burn CPU cores:

Isaac Sim / MuJoCo / Genesis-style simulators run thousands of parallel episodes for RL policy training. Physics simulation is primarily CPU-bound (NVIDIA is pushing GPU-accelerated Isaac, but the physics engine bottleneck still skews CPU).
A real RL cluster is typically thousands to tens of thousands of CPU cores plus a few hundred GPUs for the visual rollout part.

Vendor map

Onboard main compute: NVIDIA (Jetson Orin / Thor) dominates; Tesla in-house; domestic uses Horizon / Cambricon as fillers.
Joint real-time control: NXP, TI, ST, Infineon microcontrollers.
Training GPU: NVIDIA (H100/H200/B200/GB200), small share for AMD MI300 and in-house ASICs.
Simulation CPU: AMD EPYC dominant, Intel Xeon and ARM (Graviton) following.
Storage and network: Pure Storage, VAST, NetApp; NVIDIA Spectrum/Mellanox for networking.

So one Figure unit means: one Jetson Thor-class SoC (~$3,000–5,000) onboard. Behind it, Figure is burning thousands of H100s continuously on training (amortized per unit is the real cost). Plus tens of thousands of CPU cores for sim/RL, likely AWS/Azure rented.

The genuinely scarce resource isn't GPU or CPU — it's "an edge SoC small enough to live in a robot but powerful enough to run a VLA model." Jetson Thor is the de facto monopoly there for 2025–2026. That's also why Mobileye's acquisition of Mentee is interesting: it's a bet that the autonomous-driving SoC supply chain can move sideways into humanoid and challenge NVIDIA on onboard.

Mayo Clinic's REDMOD: pancreatic cancer detection up to three years early

At Mayo Clinic, a model called REDMOD was tested on nearly 2,000 historical CT scans previously reviewed and marked normal. It still identified early signs of pancreatic cancer in 73% of cases, sometimes up to three years before diagnosis. Around the two-year mark it detected roughly three times more cases than the radiologists had.

The compare on the same set of scans: AI sensitivity 73% vs senior radiologists at 39%. On scans more than two years before diagnosis, AI catches roughly three times as many cases. Pancreatic cancer's five-year survival is under 15% precisely because 85% of cases are caught after spread. If this clears clinical (Mayo's AI-PACED prospective study is underway), it's a life-saving level of progress.

Stripe's agent-payment stack: four protocols and a chain

Stripe's release isn't a single product; it's four protocols and one chain, an infrastructure bet on the AI agent economy.

1. Machine Payments Protocol (MPP). Open standard co-drafted with Tempo, first published March 18. Core idea:

Let agents and services negotiate payment directly over HTTP, no traditional card-network intermediary.
Revives HTTP 402. The "Payment Required" status code has been in the HTTP spec since 1989 and unused for 30 years. MPP turns it on: your API request comes in, the service returns 402 plus a challenge ("you need to pay $0.01 to continue"), the agent wallet returns a signed token, payment goes through, data returns.
Supports microtransactions — fractions of a cent per API call. Card rails fundamentally can't do this (minimum fee is around 30 cents).

2. Universal Commerce Protocol (UCP). A multi-party standard Stripe joined as a Tech Council member, broader scope: cross-platform checkout interop, identity linking, order tracking, secure payment-token exchange. Direct competitor to OpenAI's ACP (Agentic Commerce Protocol). Google sided with UCP (Stripe also partnered with Google so you can buy from inside Gemini); OpenAI sides with ACP. This is the protocol war being fought right now in 2026.

3. Tempo — a payment-purpose Layer 1 chain. Incubated by Stripe and Paradigm, launched in March, claims $5B in on-chain volume:

Designed for high-frequency payments. No native gas token — unusual, since most L1s have one.
Designed for large-scale agent-to-agent settlement with stablecoins as the base settlement unit.
Think "Ethereum alternative redesigned for payments."

4. Shared Payment Tokens. The mechanism closest to what the agent product page is showing:

You (human) authorize a token to your agent — e.g. "under $10, hotel bookings only, valid 24 hours."
Agent uses the token at any Stripe-connected merchant.
Agent never sees your real card number or bank account.
Revocable at any time.

Same idea as Apple Pay / Google Pay's virtual card numbers, but made programmable along limit, purpose, and time dimensions for the agent case.

5. Plus stablecoins and "agentic tokens." The docs mention upcoming stablecoin and "agentic token" support. Latter is vague — looks like co-issued products with Mastercard and Visa for AI agent contexts.

Stack the five layers and the picture is clear: if you're a developer building an agent that buys things on a user's behalf, you used to integrate Stripe, then Alipay/WeChat/PayPal, then write your own fraud check, your own user authorization/revocation, then handle cross-border settlement. Now Stripe gives you a packaged stack: Link Wallet for user authorization, MPP for the agent↔service handshake, UCP for cross-platform interop, Tempo for large-scale settlement, Shared Tokens for credential safety.

Stripe's 2026 thesis: a large fraction of future economic transactions will be agent-initiated, and agent-initiated payments aren't a 1990s "credit card + checkout form" pattern — they need a new protocol layer. Stripe wants to be the de facto standard at that layer, the way it became the de facto developer payment integration in the 2010s.

Pangram's AI text detection

Per Max, Pangram claims a false-positive rate of 1/10,000. When Pangram says something is AI-generated, you can be highly confident it is. Not infallible — short text, heavily humanized content, or very new models can slip through. But when they flag, they claim 98.99% accuracy that the text is AI-written. Max noted the previous wave of "AI detectors" like GPTZero became a punchline because of constant false positives (e.g. flagging the Declaration of Independence as AI), and says that era is over.