MCP examples

Call cache-estimation tools from an agent workflow.

These examples show the intended shape of the paid MCP endpoint. Use a real paid token when access is active.

{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "tools/call",
  "params": {
    "name": "estimate_kv_cache_savings",
    "arguments": {
      "input": "28k cached tokens, 1200 requests per day, 55 percent hit rate"
    }
  }
}

Cache sizing

Ask for reused prompt tokens, avoided prefill time, and GPU-hour impact.

Rollout planning

Ask for staging checks, cache tiers, invalidation, and observability.

Policy export

Ask for agent-readable cache rules before production routing.