Long-context inference

Stop paying repeatedly for the same long prompt prefill.

Long-context workloads make cache economics easier to see: stable instruction blocks, RAG context, and agent memory can produce meaningful reuse when policy is safe.

RAG context

Cache stable retrieved context when freshness and tenant boundaries are explicit.

Agent memory

Separate durable memory from volatile session context so invalidation remains understandable.

Latency budget

Use cache-hit estimates to decide where TTFT improvement matters most.