Estimate
Quantify repeated prompt tokens, avoided prefill work, and rough GPU-hour savings.
KV cache layer for LLM inference
Estimate prompt-token reuse, choose cache tiers, and generate an MCP-ready cache policy for vLLM, SGLang, long-context inference, and agent workflows.
What changes
The site is built around a practical estimate, trace review, rollout checklist, and agent-readable policy output. That gives a real user something to do on the first screen instead of a parked-domain sales block.
Quantify repeated prompt tokens, avoided prefill work, and rough GPU-hour savings.
Choose cache tiers, retention policy, invalidation rules, and observability checks.
Expose a paid MCP endpoint and cache policy that agents can reference safely.
Inspect hit-rate, long-context pressure, and miss patterns before production routing.
Map cache tiers, retention windows, and hit-rate targets for prompt reuse and long-context workloads.
Prepare an LMCache-compatible rollout checklist for vLLM services and inference routers.
Protect long prompts, RAG context, and agent memory from avoidable repeated prefill cost.
MCP-ready
LMCache Space exposes a server-card, JSON-RPC style MCP endpoint, and structured tool definitions so a paid agent workflow can request cache estimates or rollout checks.
estimate_kv_cache_savings for prompt-token reuse and GPU spend estimates.plan_lmcache_rollout for staging, cache tiers, and observability checks.generate_mcp_cache_policy for agent-readable cache rules.Pricing
One cache-readiness review, sizing estimate, and MCP policy export for a single inference service.
$294 due today, covers one year, renews automatically until canceled.
Repeated KV cache sizing, rollout evidence, and MCP cache-policy exports for a platform team.
$894 due today, covers one year, renews automatically until canceled.
Portfolio cache governance, multi-cluster rollout planning, and trace review for production teams.
$2394 due today, covers one year, renews automatically until canceled.