AI Notes — April 24

GPT-5.5 ships

GPT-5.5 dropped today. It's fast enough to use continuously, friendly enough to collaborate with, and decisive enough to ship complex engineering work. On writing it beats every OpenAI model from the past year, and it tops the new "Senior Engineer benchmark" — which measures how well a model can rewrite a messy production codebase the way a senior would. Models that feel both easier and more powerful are rare.

Speed is the most obvious change. Head-to-head, GPT-5.5 is much faster than Opus 4.7, with a low-friction sense of fluency that makes it easier to iterate, stay in sync, and rely on day-to-day. It also spends more time planning and reviewing, asks more questions, and double-checks its own work before continuing — especially at higher reasoning settings.

Beyond benchmarks: GPT-5.5 uses about 40% fewer tokens than 5.4. It's more expensive per token, but nets out to about a 20% price increase — for a model that's smarter and faster.

AI-native companies: Latent Space × Unsupervised Learning

swyx (founder of the AI Engineer community, now at Cognition) and host Jacob Effron go deep on the 2026 AI ecosystem. Core points:

① What AI engineers are obsessing about

Top topics: harness engineering, context engineering, and skills (skill packs). Skills have become the minimum-viable packaging format for an agent — basically a markdown file plus a few scripts. That's a sign infrastructure is finally stabilizing.

② Application companies outlast infrastructure companies

AI infra companies have to disrupt themselves every year (LangChain / LangGraph being the textbook case). App companies (Sierra, Lara) play the role of "outsourced AI team" — they only need to keep up with the frontier model and customer stickiness is much higher.

③ The "agent lab" playbook

Top coding agent companies (Cursor, Cognition) start on frontier models and, once they have enough domain data, train their own proprietary models to cut cost and latency. For high-frequency, well-defined tasks like search and code completion, this is real value — not theater.

④ Open source + alternative chips

swyx flipped from his earlier pessimism on open source. One reason: non-NVIDIA chips (Cerebras, Taalas, etc.) are now hitting thousands of tokens per second, and every 10x of speed unlocks a new product experience.

One correction: in December 2025, NVIDIA acquired Groq's core assets for ~$20B — the largest deal in NVIDIA's history. The structure is unusual: NVIDIA took the tech IP and the core leadership (including founder Jonathan Ross), but the Groq "shell" continues as an independent company under the CFO. The FTC has flagged it as one of several "merger in disguise" deals under investigation. In March 2026 NVIDIA released a new chip based on Groq LPU technology, claiming 35x the inference throughput of Blackwell on trillion-parameter models.

Taalas: bake the model into silicon

The "Talu" mentioned in the podcast is actually Taalas, a Toronto chip startup with a radical thesis: don't build a general GPU — bake the AI model's weights directly into the transistors, turning the model into a hardware circuit instead of software running on hardware.

The numbers are wild: their HC1 chip claims 14,000 tokens/sec — 10x faster than Cerebras, ~100x faster than a GPU, at only 200 watts (vs. 120–600 kW for a GPU rack). The cost is flexibility: each chip is custom-tailored to a specific model, and the first product is locked to Llama 3.1 8B. They closed a $219M round in February.

This is exactly why swyx pairs "open source + alternative silicon": companies like Taalas can only run open source models (you need the weights to bake them in), so the rise of open source ecosystems and non-NVIDIA hardware is structurally bound together.

An analogy

Standard GPU: a chef who walks to the pantry, grabs ingredients, cooks, walks back, grabs more — most of the time is spent moving things. The real bottleneck in AI inference isn't compute — it's data movement. The technical name is the memory bandwidth wall.

Taalas: burn the model weights straight into the chip's circuitry — like burning the recipe into the chef's muscle memory. The data is already where it's needed; pick it up and use it. No movement, so it's tens or hundreds of times faster.

The cost is obvious: this chef only knows one dish. Switching models means re-tooling the chip — about two months. It's a classic flexibility-for-speed trade: enormously valuable for shipping one model at scale, useless for research workflows that swap models constantly.

⑤ The coding war is in "capability exploration" mode

Anthropic's Claude Code is around $2.5B ARR, OpenAI and Cursor each around $2B — markets that didn't exist a year ago. The whole industry is in "spend more, get rewarded more" mode. Efficiency optimization hasn't started yet.

⑥ 2026 in one line

swyx's call: 2025 was the year of the coding agent; 2026 is the year coding agents break out and eat everything else. The chain: software ate the world → coding agents ate software → coding agents eat the world.

⑦ Consumer AI hits a ceiling, coding AI keeps accelerating

ChatGPT user growth has plateaued — looks more like the entire consumer AI category hitting a frequency-and-product-design ceiling than competitors stealing share. Coding AI, on the other hand, is genuinely a daily-active category.

⑧ Traditional SaaS is being eaten

swyx's company spent $200K on annual event management software; he thinks an AI-built custom version would cost $2K. The biggest internal blocker is team adoption — there's a real cultural rift between AI natives and traditionalists.

⑨ "Dark factory" is the next frontier

The industry has accepted "zero humans writing code." The more radical next frontier: zero humans reviewing code — models write and ship directly, forcing companies to rebuild test and validation systems from the ground up.

⑩ Memory is the slowest scaling axis

Context length went from 4K to 1M tokens over three years, and even with a million tokens most real workflows didn't change. Memory and personalization will be the key bottleneck of the next AI phase — and the main thing users will pick products on.

⑪ The "Good Will Hunting" problem

The closing image is striking: today's models are like Matt Damon in Good Will Hunting — they've read everything but never lived. Fei-Fei Li's "spatial intelligence" thesis points at the same gap: the model knows the word "table" but doesn't know what a table feels like. World model research exists to close that gap.

OpenAI Privacy Filter: a small model that runs in your browser

OpenAI open-sourced a genuinely useful model called Privacy Filter — 1.5B parameters, only 50M active, small enough to run fully offline in the browser. Xenova built a web demo. This kind of small model has real practical value — you could use it to filter privacy-sensitive inputs (like medical data) locally before sending the cleaned text to a cloud-based AI tool.

Voice input plus cursor-aware reference

"Here, fix THIS thing" — you don't have to say what "this" is. Claude infers from where your cursor was at the time. This combination of indexical reference plus visual context noticeably upgrades the voice-driven remote work flow.

Kimi K2.6: Opus at home (if you have a data center)

Kimi K2.6 dropped. Open source heavyweights continue to close on closed-source flagships.

Medical and infra signals

OpenAI released a clinician/medical model alongside the new Workspace Agents. Together AI reported usage growing from 30B to 300T tokens/month YoY — a large-scale indicator of inference demand expansion. Epoch AI revised down operational power at Stargate Abilene to ~0.3 GW today and pushed the full 1.2 GW milestone to Q4 2026 — frontier compute deployment remains hard to track.

World ID 4.0: human verification

World launched World ID 4.0, the new version of its proof-of-human system, designed for an internet flooded with AI-generated content. The platform uses an iris-scanning device called the Orb to mint a unique cryptographic identity. Over 18M users across 160 countries are already enrolled.

The bigger signal is adoption — new partners include Tinder, Zoom, DocuSign, Shopify, Okta, AWS, and Vercel. Tinder is adding verified-human badges, Zoom is testing deepfake checks for video calls, DocuSign is planning human verification for signatures.

noscroll: an AI that doomscrolls X for you

"X has the best information on the internet and the worst incentives and culture. noscroll is the AI that doomscrolls it for you and texts you just the things that matter. No feed. No brainrot. No ragebait. Just signal."

Remotion + Claude Code: (almost) one-shot product video

Austin Tedesco, Every's head of growth, spent days battling Remotion (open-source video creation) and Claude Code, then settled on a workflow he reuses every time he needs a launch or feature demo:

Step 1: Screen-record yourself using the product. All you need is raw footage of yourself clicking through features in real time.
Step 2: Send the recording to a model (Austin prefers Opus) and have it draft a storyboard. The recording grounds the model in how the UI actually works and what the copy actually says — preventing the most common cause of fake-looking launch videos: plausible-but-hallucinated labels and features.
Step 3: Iterate on the storyboard. Go back and forth with the model until the hook, pacing, and beat-by-beat plan feel right.
Step 4: Hand the storyboard to a coding agent and have it build the video in Remotion. With the screen recording and the matching storyboard, the first full render is usually publishable. It's not a true one-shot, but it saves a lot of time.