These are notes, thoughts, progress logs, and ideas I'm still forming. Some will age well. Some won't — that's the point.

AI Notes — May 19

Meta moves 7,000 employees into AI-native divisions days before layoffs. Humanoid robots as four interconnected systems. When marginal cost goes to zero, value migrates rather than disappears.

AI Notes — May 18

Drone warfare: Yaroslav Azhnyuk on FPV drones as the new god of war, China's 4 billion-unit drone capacity, why rare earths are a hard constraint, and what the West has not yet built.

AI Notes — May 17

US vs China frontier model Elo over time. Similar slopes, staggered starts — current gap around 450 Elo and not closing on linear extrapolation.

AI Notes — May 16

Skepticism about Figure vs broader conviction about robotics acceleration. Sales as a steady role. Cerebras IPO. Codex remote control.

AI Notes — May 15

Abridge: from AI medical scribe to clinical intelligence layer. Three strategic phases: save time, save money, save lives. Why prototypes beat PRDs.

AI Notes — May 14

A classic data-leakage training failure: a sepsis-prediction model that cheated with future information and collapsed in real hospitals.

AI Notes — May 13

Use your longest agent run as a difficulty proxy. Perplexity's skill design. Blackwell as the reference platform for large-MoE serving. Hassabis on AI for health.

AI Notes — May 12

Few-shot vs zero-shot prompting explained, with selection advice. Plus full-duplex multimodal interaction.

AI Notes — May 11

Jack Clark's Import AI: a 60%+ chance of human-out-of-the-loop AI R&D. Forward Deployed Engineers. What Chinese AI and robotics companies are building.

AI Notes — May 10

Code with Claude: Managed Agents — accomplish a goal by giving Claude an outcome and a budget. Anthropic platform team on harness and model path dependence.

AI Notes — May 9

Anthropic's valuation in light of secondary-market and media reports. Notes on Figure Robotics.

AI Notes — May 8

Why the data industry is still immature: $1M+ per RL environment, hundreds of millions a year, and labs still prefer to build in-house. Anthropic's Dreaming: agents that review past sessions, rewrite memory, and learn between runs as a team. GPT-Realtime-Translate goes live with 70+ input languages to 13 outputs.

AI Notes — May 7

Claude rate limits doubled after the SpaceX compute deal. Harvey's LAB benchmark covers 1,200 long-horizon legal-agent tasks. Genesis AI ships GENE-26.5 with 100x cheaper data collection hardware. Figure HQ tour: ~1M hours of pre-training data, sim-to-real zero-shot, 50-200 Hz onboard inference, and a high-DOF hand built to learn from human videos. Hugging Face's Reachy Mini App Store points at a desktop-robot category.

AI Notes — May 6

AI for science: o3 cuts a multi-day physics calculation to 11 minutes. Anthropic JV with Blackstone/H&F/Goldman and OpenAI's Deployment Company push model-makers downstream into B2B consulting. GPT-5.5 Instant becomes the ChatGPT default with memory sources exposed. RL infra shifts from single-shot rewards to long-running action systems; Anthropic Orbit and Manus point to a new proactive-assistant category.

AI Notes — May 5

From software that gives you tools to software that delivers results. The next-gen data supplier playbook: outcome delivery, lifecycle management, productized service tiers, pricing tied to model metrics. Meta acquires ARI to bet on robotics as a training strategy. Model × harness × context wins: prompt and middleware swaps move gpt-5.2-codex 13.7 points on Terminal-Bench 2.0.

AI Notes — May 4

Cyber psychosis: builders shipping 163 commits a day, vibe-coding straight to production. What AI cannot copy — premium subscription letters, boutique consulting, curated brands, members clubs, anyone bearing legal responsibility. Cursor's Composer 2: continued pretraining before RL adds 17.1 CursorBench points. Keep Rate as the behavioral north-star. Why PMs become loop designers and product taste is cost judgment. Defending against AI cyberattacks.

Data Annotation Industry Report

Market landscape, company profiles, pricing models, technical trends and pain points across the AI data annotation industry. Field research and analysis (Chinese).

AI Notes — May 3

AI-native organizations: why companies see zero gains while individuals get 15-40% faster. Three rebuild patterns — subsidiary spin-off, internal Pods, and laying off everyone who codes. End-to-end ownership, trait-based teams, and context as the moat. Cursor's UIUX lead on software as concept stacks. Why fine-tune became continued pretraining, and the new pre/mid/post training pipeline. Bad data, taste at scale, and benchmark leakage.

AI Notes — May 2

Agent orchestration as a while-loop of tool calls, in five steps. LLM-era distillation: data distillation and CoT distillation. PMs writing only the roadmap and letting Claude do everything else. The six layers of AI products. GPT-5.5, Grok 4.3, DeepSeek V4 Pro and the closing open/closed gap. Six places synthetic data can't replace human annotation.

AI Notes — May 1

Coding agent shootout: Claude Code, Claude Design, Cursor, Codex on a single landing-page brief. nanochat depth scaling and the FP8 training trick. Cursor SDK leads Terminal-Bench 2.0. Why Apache 2.0 actually matters for enterprise. 2023–2025 AI value flowed to infrastructure: VR NVL72 economics and the neocloud margin compression.

AI Notes — April 30

Why the agent era CPU narrative is real but smaller than the GPU one. The CPU player map: AMD vs Intel vs hyperscaler ARM vs Ampere. How much GPU/CPU one humanoid robot actually needs — Jetson Thor as the de facto onboard monopoly. Mayo's REDMOD catches pancreatic cancer up to 3 years early. Stripe's four-protocol agent payment stack.

AI Notes — April 29

NVIDIA Nemotron 3 Nano Omni: 30B/A3B multimodal MoE, 256K context, 9x throughput. Mini-SGLang prefix matching with the radix tree. Unsloth LoRA: merged vs non-merged tradeoffs. Mimicking Dream of the Red Chamber style with a 167MB adapter. TRL DPO end-to-end.

AI Notes — April 28

Sakana's 7B Conductor orchestrates frontier models, hits 83.9% on LiveCodeBench. OpenAI's AI-first phone targeting 2028. GUI Agent annotation needs a totally different paradigm. YC Summer 2026 RFS: 14 directions betting AI is now infrastructure, not feature.

AI Notes — April 27

Medical-LLM refactor: 4 findings on overnight runs and multi-format interference. Architectural breakdown of Gemma 4, Qwen 3.6, GLM-5.1, Kimi K2.6 and DeepSeek V4-Pro. Anthropic's Project Deal: Opus agents close better trades than Haiku.

AI Notes — April 26

SkillsBench vs our skillrank — a postmortem on seven mistakes: LLM-as-judge instead of deterministic verifiers, pairwise instead of pass/fail, no with/without baseline, and too much time on infra.

Where Sages Agree

A book on where four wisdom traditions converge — Zen, Confucianism, Stoicism, and Adlerian psychology — on what it means to live well in an anxious age.

AI Notes — April 25

DeepSeek-V4 vs Flash Attention vs MHA — algorithmic vs architectural innovation. CSA/HCA shrinks KV cache 5-10x via low-rank latent compression. GPT-Image 2 + Seedance 2.0 short-film workflow.

AI Notes — April 24

GPT-5.5 ships — faster, cheaper net, smarter. swyx on AI-native: skills as the agent unit, app companies outlast infra, Taalas bakes models into silicon. World ID 4.0 hits Tinder, Zoom, DocuSign.

AI Notes — April 23

Shopify at ~100% internal AI use, critique loops over parallel agents, Tangle/Tangent/SimGym. MacAskill on AI character as the most underrated lever. mini-sglang RadixAttention vs nano-vllm: 7311 tok/s on a single 3090.

AI Notes — April 22

Claude Design locks in creativity. GPT-Image-2 tops Image Arena by +242 Elo. ChatGPT Images 2.0 adds reasoning before drawing. RankAI's SEO+GEO stack. Google: 75% of new code is AI.

AI Notes — April 21

RLVR explained via DeepSeek-R1. Hermes agent patterns: stateless units, structured failure traces, directory-scoped AGENTS.md. Alex Imas on the post-commodity economy.

AI Notes — April 20

Generative Agents (Smallville), OASIS large-scale social simulation, and Love First Know Later — three papers mapping the theoretical base for persona products like Halo.

AI Notes — April 19

Claude Code terminal shortcuts (Shift+Tab, Esc, @). Fengtian's workflow: two Max plans + voice input + Agent Team mode = 10x productivity.

AI Notes — April 18

Claude Design pipeline: Pinterest inspiration → AI-generated background and character → Seedance 2.0 animation → motionsites.ai template → Landbook layouts.

AI Notes — April 17

Overseeing agents is the future, not writing code. Deep dive into nano-vllm attention, preempt, prefix caching. McKinsey on the agentic organization.

AI Notes — April 16

Energy-Based Models: not new — Hopfield Networks, Boltzmann Machines, diffusion models all trace back here. Yann LeCun's bet against autoregressive LLMs.

AI Notes — April 15

Local model rankings from Reddit, how to steer AI toward your design style with images, the 2026 AI engineer roadmap, Karpathy on the AI capability gap.

AI Notes — April 14

nano-vLLM deep dive: prefill vs decode, KV cache, PagedAttention, continuous batching. Plus Notion's Model Behavior Engineer role and software factory design.

AI Notes — April 13

GLM-5.1 architecture explained (MoE, MLA, DSA). Using Claude for tax filing: what broke. AI writing is harder than it looks. The folder-as-agent pattern.

AI Notes — April 12

A quiet day. Sometimes letting ideas settle is the work.

AI Notes — April 11

Consultant-style agent coordination: cheap executor + expensive advisor. Haiku + Opus doubles BrowseComp scores vs Haiku alone.

AI Notes — April 10

Meta's Muse Spark: 10x efficiency over Llama 4, 16 hidden tools in meta.ai. Two thoughts: AI tools as games, vibe coding as web fiction.

AI Notes — April 9

Mythos scores 93.9% on SWE-bench — a nuclear weapon. Picotron distributed training: DP naive vs bucket, AFAB vs 1F1B pipeline schedules.

AI Notes — April 8

Moltbook: AI theater or genuine emergence? Nebius $46B in signed contracts. Ryan Leoplo on harness engineering and zero human-written code.

AI Notes — April 7

Why changing one character in an image is harder than generating a cyberpunk city. Full diffusion model walkthrough with math and code.

AI Notes — April 6

Claude's Cowork feature supports Computer Use across devices — control a remote machine's browser without touching your own.

The Force That Keeps Me Moving

The Force That Keeps Me Moving

A simple number changed everything. Thirty thousand days. That's roughly how many days a human life has. This realization reshaped how I live, work, and think about time.

Why I'm Building in Public

Why I'm Building in Public

The decision to document everything publicly wasn't easy. Here's why I chose transparency over polish, and what I hope to gain from it.

What Steplify Taught Me About Product-Market Fit

My startup failed. But the lessons about listening to users, timing, and the gap between conviction and validation are worth more than any success.