Writing
2026
- May 17 Compressing the KV cache without retraining, and what mostly doesn't work
- May 15 Trying to push sparse attention further, and what didn't work
- May 13 90% of attention is slack in pretrained transformers
- May 11 Knowledge in weights, not in prompts
- Apr 23 A runtime for agent-authored apps
- Apr 22 Sandboxing agent-generated code with disposable unikernels
- Apr 8 Matching frontier LLMs with diverse small ensembles
- Apr 6 Code optimization with LLM-imagined ideas and evolutionary search