estimate_kv_cache_savings
Estimate reused prompt tokens, avoided prefill time, and rough GPU spend saved.
Remote MCP endpoint
The endpoint is designed for paid token access. It exposes structured tools that help an agent reason about KV cache economics, rollout safety, and long-context reuse.
Hosted checkout activates when a payment URL is configured.
Estimate reused prompt tokens, avoided prefill time, and rough GPU spend saved.
Create a rollout checklist covering vLLM, SGLang, storage tiers, cache invalidation, and observability.
Produce cache rules that an agent workflow can read before routing long-context requests.