How we cut RAG latency 4× without changing the model
Caching, batching and a different retriever — the three changes that took p95 from 2.4s to 600ms in production.
Notes from engineers shipping AI in production across ASEAN. No think-pieces. Just what worked and what didn’t.
Caching, batching and a different retriever — the three changes that took p95 from 2.4s to 600ms in production.
A walk-through of the actual evidence pack we submitted, mapped clause-by-clause to MAS FEAT principles.
Under 100k chunks, BM25 plus a re-ranker beats half the vector pipelines we’ve inherited. Here’s the math.
Daily diary of a four-week sprint with an ASEAN insurer — including the use case we declined to take into production.
Twelve months of red-team logs across client engagements. The five injection patterns that account for 80% of attempts.
Once a month. The post that did the most work, plus what we shipped. No sponsor slot, no newsletter swap.