research
Notes from the lab.
Long-form writing on agents, infrastructure, and evaluation — published as we ship. Each note is a stake in the ground we are willing to be wrong about, in public.
- Agents3 min read
Long-horizon tool use without context collapse
What we learned from running coding agents for 1,000+ turns: the surprising failure mode isn't reasoning, it's memory shape.
May 12, 2026
- Infrastructure3 min read
Edge-native inference and the cost of cold paths
Where Workers AI wins, where it doesn't, and the architectural patterns that survive a 100× traffic spike at 3am.
Apr 28, 2026
- Evaluation3 min read
Why agent benchmarks lie — and what to measure instead
Most agent leaderboards measure the wrong thing. A practitioner's framework for evals that correlate with production reliability.
Mar 17, 2026