<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"><channel><title>Jason Normore</title><description>Software developer. Founder &amp; CTO at Mantle. Writing about LLM systems, commerce infrastructure, and the craft of building products.</description><link>https://jasonnormore.com/</link><item><title>Compressing the KV cache without retraining, and what mostly doesn&apos;t work</title><link>https://jasonnormore.com/blog/compressing-the-kv-cache/</link><guid isPermaLink="true">https://jasonnormore.com/blog/compressing-the-kv-cache/</guid><description>10× KV reduction without retraining is reachable on a pretrained transformer. The mechanism isn&apos;t what you&apos;d expect — summaries work not because they preserve information but because they prevent gap-induced recency bias.</description><pubDate>Sun, 17 May 2026 00:00:00 GMT</pubDate></item><item><title>Trying to push sparse attention further, and what didn&apos;t work</title><link>https://jasonnormore.com/blog/projection-sparsity-and-what-didnt-work/</link><guid isPermaLink="true">https://jasonnormore.com/blog/projection-sparsity-and-what-didnt-work/</guid><description>The natural follow-up to CSA: if 90% of attention reads are skippable, can the projection machinery be trimmed too? The hypothesis survives, the obvious method doesn&apos;t, and the gap between them is the interesting part.</description><pubDate>Fri, 15 May 2026 00:00:00 GMT</pubDate></item><item><title>90% of attention is slack in pretrained transformers</title><link>https://jasonnormore.com/blog/calibrated-sparse-attention/</link><guid isPermaLink="true">https://jasonnormore.com/blog/calibrated-sparse-attention/</guid><description>A small training-free procedure that measures per-(layer, head) attention diffuseness, allocates a budget proportional to it, and applies it as a top-k mask at inference. Result: dense quality at ~10% of the attention reads, across seven model/context/corpus combinations.</description><pubDate>Wed, 13 May 2026 00:00:00 GMT</pubDate></item><item><title>Knowledge in weights, not in prompts</title><link>https://jasonnormore.com/blog/knowledge-in-weights-not-prompts/</link><guid isPermaLink="true">https://jasonnormore.com/blog/knowledge-in-weights-not-prompts/</guid><description>The frontier-agent answer to personalization is more prompt — RAG, memory injection, larger context windows. A hybrid Mamba+attention base plus a small LoRA adapter lets the model just learn you instead.</description><pubDate>Mon, 11 May 2026 00:00:00 GMT</pubDate></item><item><title>A runtime for agent-authored apps</title><link>https://jasonnormore.com/blog/a-runtime-for-agent-authored-apps/</link><guid isPermaLink="true">https://jasonnormore.com/blog/a-runtime-for-agent-authored-apps/</guid><description>Agents are great at one-shot work and bad at durable work. cue is a small daemon that closes the gap — actions, triggers, addressable URLs, each invocation sandboxed in a fresh unikernel.</description><pubDate>Thu, 23 Apr 2026 00:00:00 GMT</pubDate></item><item><title>Sandboxing agent-generated code with disposable unikernels</title><link>https://jasonnormore.com/blog/sandboxing-agent-code-with-disposable-unikernels/</link><guid isPermaLink="true">https://jasonnormore.com/blog/sandboxing-agent-code-with-disposable-unikernels/</guid><description>Coding agents need to run code. The options today are all bad. unitask is a small tool that runs each call in a fresh unikernel under declarative policy — code in, runs, returns, destroyed.</description><pubDate>Wed, 22 Apr 2026 00:00:00 GMT</pubDate></item><item><title>Matching frontier LLMs with diverse small ensembles</title><link>https://jasonnormore.com/blog/matching-frontier-llms-with-diverse-small-ensembles/</link><guid isPermaLink="true">https://jasonnormore.com/blog/matching-frontier-llms-with-diverse-small-ensembles/</guid><description>An OpenAI-compatible ensemble proxy that lands at GPT-5 accuracy on a 150-case benchmark — at 13× less cost. The catch is that diversity, not model count, does the work.</description><pubDate>Wed, 08 Apr 2026 00:00:00 GMT</pubDate></item><item><title>Code optimization with LLM-imagined ideas and evolutionary search</title><link>https://jasonnormore.com/blog/code-optimization-with-llms-and-evolution/</link><guid isPermaLink="true">https://jasonnormore.com/blog/code-optimization-with-llms-and-evolution/</guid><description>Pairing LLM idea generation with genetic algorithms for code optimization — and the synergies hill-climbing can&apos;t reach.</description><pubDate>Mon, 06 Apr 2026 00:00:00 GMT</pubDate></item></channel></rss>