AI Notes — April 13

AI Writing Is Harder Than You Think

Most discussions about AI and writing assume: enter a prompt, output text, task complete. But Katie Parrott reveals a more complex process: before writing, an agent "interviews" her; she fights repeatedly over structure; she has a team of AI reviewers named "Hemingway" and "Hitchcock" for revisions; then a final read-through flagging anything that sounds machine-generated.

Data Annotation: Small Models with Specialized Data Can Beat Large Models

A 4-billion-parameter model recently beat one 60 times its size by training on the right financial data. Shutterstock and News Corp are making hundreds of millions licensing data to AI labs, with contracts growing 20% annually. The key question: what makes your company's proprietary data valuable, and whether to license it, train on it yourself, or both.

AI Healthcare Pilot

A fascinating staged deployment approach: the first 250 AI prescription renewals are reviewed by a physician, and the AI must agree with that physician more than 98% of the time before proceeding independently. The next 1,000 require 99%+ agreement. Then monitoring shifts to randomized monthly testing. This is how you responsibly deploy AI in high-stakes domains.

Using Claude for Tax Filing: Issues Found

Two friction points discovered from personal use:

Extremely slow. Claude controls the browser via screenshots — most websites can't be captured in one screenshot, requiring scroll → screenshot → read cycles. This consumes a lot of time. Clicking is also error-prone.
Should pause and ask, but doesn't. Without knowing my bank account info, it chose to mail a check instead of selecting online bank payment. It should have stopped to ask.

GLM-5.1 Architecture: MoE, RMSNorm, MLA, DSA

GLM-5.1 is a 744B parameter MoE model from Zhipu (Z.ai) — MIT license. Architecture overview: Token Embedding → 78 repeated Blocks → Final RMSNorm → Linear output → predict next token. Each block has two major components: Attention and FFN/MoE.

RMSNorm: Normalizes values after each layer to prevent gradients from exploding or vanishing. Like an audio compressor — levels out the signal to prevent clipping.

MLA (Multi-head Latent Attention): DeepSeek's invention, inherited by GLM-5.1. Instead of storing full K and V, compresses them into a small "latent vector" and decompresses when needed. Like JPEG compression — store compressed, decompress to display. Enables 202k context window.

DSA (DeepSeek Sparse Attention): Only computes attention between "meaningful token pairs," reducing computation from O(n²) down. MLA + DSA together: MLA solves memory, DSA solves compute — making long contexts feasible.

MoE (Mixture of Experts): Split FFN into 256 expert networks. A Router decides which experts each token goes to. GLM-5.1 total: 744B params, but only 40B activated per inference — compute equivalent to a 40B model, but knowledge capacity of 744B.

GLM-5.1 Community Assessment

SWE-Bench Pro: 58.4 — first open-source model to top this benchmark, beating GPT-5.4 (57.7) and Claude Opus 4.6 (57.3)
Long autonomous coding is genuinely impressive — 8-hour continuous plan→execute→test→fix→optimize loops
Trained entirely on Huawei Ascend 910B with MindSpore — no NVIDIA GPUs
Weak point: not a generalist. General reasoning "clearly weaker than Claude and GPT." Local deployment painful: 3-5 tok/s on M4 Max with IQ2_M quantization
The "beats Claude Opus 4.6" claim is misleading — only true on SWE-Bench Pro, one benchmark

CLAUDE.md: The Folder Is the Agent

An interesting pattern for agent-native development: a repo folder with accumulated institutional knowledge that any new agent inherits automatically. Reading order: CLAUDE.md first, then architecture docs, then system reports, then agent prompts.

The author built a dispatch layer: a Ruby daemon that watches a directory for task requests. When asked to orchestrate a task, it creates a lead agent, the lead breaks it into subtasks (written as files), the daemon picks up those files and spawns worker agents. Workers report back by writing files. The daemon checks status every 60 seconds — no custom networking or agent-to-agent protocol needed.

Two daily commands: /hey (morning review of all agent progress reports) and /orchestrate "Fix issue #1765" (drop a task in, agent auto-decomposes, works, submits PR, human reviews).