Home About Work Blog

Writing

2026

May 30 Searching for tiny-cache LLMs
May 17 Compressing the KV cache without retraining, and what mostly doesn't work
May 15 Trying to push sparse attention further, and what didn't work
May 13 90% of attention is slack in pretrained transformers
May 11 Knowledge in weights, not in prompts
Apr 23 A runtime for agent-authored apps
Apr 22 Sandboxing agent-generated code with disposable unikernels
Apr 8 Matching frontier LLMs with diverse small ensembles
Apr 6 Code optimization with LLM-imagined ideas and evolutionary search

© 2026 Jason Normore