Cache sizing
Ask for reused prompt tokens, avoided prefill time, and GPU-hour impact.
MCP examples
These examples show the intended shape of the paid MCP endpoint. Use a real paid token when access is active.
{
"jsonrpc": "2.0",
"id": 1,
"method": "tools/call",
"params": {
"name": "estimate_kv_cache_savings",
"arguments": {
"input": "28k cached tokens, 1200 requests per day, 55 percent hit rate"
}
}
}Ask for reused prompt tokens, avoided prefill time, and GPU-hour impact.
Ask for staging checks, cache tiers, invalidation, and observability.
Ask for agent-readable cache rules before production routing.