KV cache layer for LLM inference

Plan an LMCache-compatible cache layer before GPU spend gets silly.

Estimate prompt-token reuse, choose cache tiers, and generate an MCP-ready cache policy for vLLM, SGLang, long-context inference, and agent workflows.

View pricing plans Open MCP setup
KV cache sizing vLLM and SGLang Long-context reuse MCP policy export
Independent product workspace for LMCache-compatible deployments. Built for platform teams, inference operators, and agent builders. Hosted checkout activates after a payment URL is configured.
Isometric infrastructure illustration of a KV cache layer moving data between GPU, CPU, SSD, and remote storage.

Usable preview

KV cache savings estimator

{ "product": "LMCache Space", "status": "ready_for_input", "output": [ "Estimate repeated prompt-token reuse and prefill time avoided.", "Plan cache tiers across GPU memory, CPU RAM, SSD, and remote storage.", "Generate an MCP-ready cache policy for agent workflows." ] }

Hosted checkout activates after a payment URL is configured for this deployment.

What changes

Cache planning becomes a workflow, not a guess.

The site is built around a practical estimate, trace review, rollout checklist, and agent-readable policy output. That gives a real user something to do on the first screen instead of a parked-domain sales block.

1

Estimate

Quantify repeated prompt tokens, avoided prefill work, and rough GPU-hour savings.

2

Stage

Choose cache tiers, retention policy, invalidation rules, and observability checks.

3

Connect

Expose a paid MCP endpoint and cache policy that agents can reference safely.

4

Review

Inspect hit-rate, long-context pressure, and miss patterns before production routing.

KV cache layer

Map cache tiers, retention windows, and hit-rate targets for prompt reuse and long-context workloads.

vLLM KV cache

Prepare an LMCache-compatible rollout checklist for vLLM services and inference routers.

Long-context inference

Protect long prompts, RAG context, and agent memory from avoidable repeated prefill cost.

MCP-ready

Agent-readable cache policy and diagnostics.

LMCache Space exposes a server-card, JSON-RPC style MCP endpoint, and structured tool definitions so a paid agent workflow can request cache estimates or rollout checks.

MCP tools

  • estimate_kv_cache_savings for prompt-token reuse and GPU spend estimates.
  • plan_lmcache_rollout for staging, cache tiers, and observability checks.
  • generate_mcp_cache_policy for agent-readable cache rules.

Pricing

Choose a cache rollout plan.

Starter

$24.50 /mo

One cache-readiness review, sizing estimate, and MCP policy export for a single inference service.

$294 due today, covers one year, renews automatically until canceled.

Team

$74.50 /mo

Repeated KV cache sizing, rollout evidence, and MCP cache-policy exports for a platform team.

$894 due today, covers one year, renews automatically until canceled.

Scale

$199.50 /mo

Portfolio cache governance, multi-cluster rollout planning, and trace review for production teams.

$2394 due today, covers one year, renews automatically until canceled.