AI Notes — May 10

Code with Claude: Managed Agents

The pitch from this week's Code with Claude developer event: in the future, you accomplish a goal by giving Claude just two things — an outcome and a budget. That's the direction behind the new Managed Agents feature: Claude wrapped in a cloud computer that you can spin up, scale, and manage as needed. Anthropic is taking on the infrastructure layer that kills most agent products, and making it scale for agents running 24/7.

Interview: Anthropic platform team (Every podcast)

An interview with two leads from Anthropic's platform team — Angela (product) and Caitlin (engineering). The key points:

1. How the platform evolved

The earliest "platform" was just a completion endpoint: send a prompt, get a reply. Then tool calls, chat sessions, and now stateful sessions. The overall direction is stateless → stateful, low-level primitives → higher-level abstractions, all toward one goal: let users get the best result with the least work.

2. What Managed Agents is, and who it's for

It builds on the Messages API and built-in tools (code execution sandbox, web search, etc.), packaging the infrastructure Anthropic has rebuilt several times into an off-the-shelf solution. Two target users: people inside companies building automation or internal platforms (an end-to-end software-dev platform, or a small flow that lets legal auto-review marketing copy); and developers integrating agents into products they ship externally — still highly custom, but the infra engineering isn't worth rebuilding yourself.

One host raised the worry of lock-in: his team runs a claude -p loop on a Mac mini and fears falling behind Claude Code's new features. Caitlin's answer: Anthropic's own first-party products (Claude Code, co-worker, etc.) are built on the same platform, so divergence should shrink over time.

3. Harness and model path dependence

The old fashion was to build a very generic harness that could hot-swap models. Angela argues that's becoming outdated — labs are training models in increasingly different directions, so harness and model become more tightly coupled. You still want redundancy and will still use other models, but the hot-swap unit rises to the level of the "agent" (harness + model), not the model alone. Her example: building the same memory feature, they tried several harness designs and eval results varied enormously. There's a lot of alpha in harness engineering itself. The cost: choosing your core primitive deeply shapes the model's capability direction. Anthropic bet on the file system and skills, so the models keep getting better at exactly those.

4. What users think is hard vs. what actually is

Users think the hard part is harness engineering (prompt caching, context-window management). The actual wall is infrastructure: a dropped sandbox kills the agent, long-running servers, transcript storage, security sandboxing, scalability. Managed Agents is built to solve that layer for you.

5. The levels of agent applications

Personal productivity tools are everywhere, but team-level agents are where the real complexity and leverage live — multiple agents collaborating, end-to-end process automation, needing a platform abstraction above the single agent. Vercel's Guillermo called this organizational form an "internal AI software factory." Concrete case: a legal-review agent for marketing copy can't be solved with a skill alone — it needs human-in-the-loop, multi-person collaboration, and cross-session runs. Once built, marketing and legal users don't edit prompts directly; they interact with the agent through another Claude — "managed agents all the way down."

6. Multi-agent orchestration patterns

You can assemble different strategy architectures: advisor/executor separation, adversarial (one generates, one critiques), divide-and-merge, best-of-N, swarm collaboration. Different architectures suit different tasks — swarm for bug hunting, divide-and-conquer for deep research. The more Lego-like the primitives, the higher up the stack you can hill-climb.

7. Measuring agent success

Beyond standard evals, they prefer verifiable outcomes — for a code agent, did the PR get merged? The ultimate vision: users provide just two parameters — a verifiable outcome plus a budget — and the system handles everything else.

8. Agent expiration / retirement

A real problem. They built skills to help upgrade models and do migrations, treating it as a proper "breaking change." The most AGI-pilled players will run agents to monitor whether their own agents have gone stale.

9. The one-year vision

Angela: get close to that "outcome + budget" world, where Claude understands itself, picks its own model, spins up sub-agents, writes its own harness, and users stop worrying about architecture choice and prompt engineering. Within a year the outcome part may be reachable; budget will still have some error. Caitlin: that world demands extreme platform scalability — agents always running and rebuilding themselves, so token throughput, long-running requests, and weird load shapes must never become bottlenecks.