AI Notes — April 22

Claude Design may lock in creativity — an unavoidable downside of all AI tools

The prompt-response loop works well for extending or revising existing design systems. But it's not great for creating something new from scratch — design is "50% exploration," as Lucas puts it. In Figma you start from a blank canvas, and your output is shaped by a sequence of decisions — dragging a shape, snapping it to a grid, tweaking a drop shadow, comparing three variants side by side. Claude Design turns open-ended exploration into reaction to something already generated.

The DeleteMe-style privacy angle

Ever look yourself up and find databases of personal information you didn't consent to? That information should belong to you, not to data brokers who make money off selling it without permission. Services like DeleteMe use privacy experts to remove your data from hundreds of broker sites, send a detailed report showing exactly what was cleared, and monitor those sites continuously throughout the year. The interesting bit is that this is becoming a common category.

Two practical rules from the piece:

Take stock of every AI app your employees have connected to a work account. Then turn on two-factor authentication everywhere it isn't already on.
Before you ship anything an AI built — even a weekend prototype — ask the generator one question: "What is this app exposing to the public internet, and should it be?" If you can't get a clear answer, don't ship.

GPT-Image-2 released

Benchmarks show a big jump, especially on practical image tasks: Arena reports GPT-Image-2 as #1 across all Image Arena leaderboards — 1512 on text-to-image, 1513 on single-image editing, 1464 on multi-image editing — with a striking +242 Elo lead over the next model on text-to-image.

Independent reactions center on the same theme: this isn't just prettier art, it's a more usable model for UI, prototypes, documents, productivity visuals, and reference-driven design loops. The most interesting systems-level implication is that image generation is becoming the front-end for coding agents: generate a UI spec as an image, then have Codex or another code agent implement against that visual reference.

ChatGPT Images 2.0 — reasoning built into the generator

OpenAI launched ChatGPT Images 2.0, a major upgrade that pushes image generation beyond pretty pictures and into real creative work. The biggest shift is the built-in reasoning and thinking capability: before generating, the model can spend time interpreting prompts, planning outputs, checking details, and even pulling in web context when needed. It behaves more like a creative assistant than a standard image generator.

With reasoning enabled, ChatGPT Images 2.0 can create up to eight image variations from a single prompt while keeping characters, objects, and visual style consistent across every output. OpenAI says this makes longer-form visual storytelling and multi-asset campaigns far easier to produce.

Other upgrades:

Much better text rendering in Japanese, Korean, Chinese, Hindi, and Bengali.
More current world knowledge, updated through December 2025.
Stronger style replication — texture, lighting, composition, and mood.
Especially useful for branding work: logos, product shots, campaigns, consistent business visuals.

Riley Brown's demo is a good illustration of what the new reasoning loop unlocks: given a task involving a book, the model autonomously searches for the real book's barcode and adds it to the generated image. That's the point — not just pixels, but the agentic behavior of gathering context before drawing.

Hugging Face's ml-intern: the strongest open agent-in-the-loop release of the set

HF introduced ml-intern, an open-source agent that automates the post-training research loop: reading papers, following citation graphs, collecting and reformatting datasets, launching training jobs, evaluating runs, and iterating on failures.

The reported examples are notable because they're end-to-end loops, not just coding demos:

GPQA scientific reasoning improved from 10% → 32% in under 10 hours on Qwen3-1.7B
A healthcare setup reportedly beat Codex on HealthBench by 60%
A math setup wrote a full GRPO script and recovered from reward collapse via ablations

Community tests quickly showed it can autonomously fine-tune and publish artifacts back to the Hub (example: a SAM fine-tuning run).

SEO and GEO — autonomous agents for search distribution

RankAI positions itself as an autonomous agent that handles SEO and GEO (Generative Engine Optimization) to pull buyers from both Google and ChatGPT. Interesting category. Here's how the tech stack likely breaks down:

Content generation layer. Almost certainly GPT-4 / Claude / Gemini APIs plus custom prompt engineering and content templates to batch-produce SEO-conformant articles. Standard playbook for this kind of tool.
Keyword database. They claim "billions of keywords" — this typically comes from buying third-party data (SEMrush, Ahrefs APIs) or scraping Google Suggest and People Also Ask, then clustering and classifying by intent.
CMS integration. WordPress REST API, Shopify API, Webflow API — direct push via official endpoints for "auto-publish." Not technically hard, standard integration work.
GEO (Generative Engine Optimization). This is their differentiator. The core idea: embed structured data (Schema.org), FAQ format, and clear entity relationships into content so LLMs are more likely to cite these pages during retrieval-augmented generation (RAG) or in training data. Essentially prompt design for AI search.
Monitoring and feedback loop. Connect Google Search Console API to pull click and impression data; when a page underperforms, trigger a rewrite. Classic evaluate-optimize automation pipeline.

Google: AI now generates 75% of its new code

Reported milestone from Google — AI is now responsible for 75% of newly written code internally. One more data point in the trend where the ratio of AI-authored to human-authored code keeps climbing across top-tier engineering orgs.