KV cache layer for LLM inference

Plan an LMCache-compatible cache layer before GPU spend gets silly.

Estimate prompt-token reuse, choose cache tiers, and generate an MCP-ready cache policy for vLLM, SGLang, long-context inference, and agent workflows.

View pricing plans Open MCP setup

KV cache sizing vLLM and SGLang Long-context reuse MCP policy export

Independent product workspace for LMCache-compatible deployments. Built for platform teams, inference operators, and agent builders. Hosted checkout activates after a payment URL is configured.

Isometric infrastructure illustration of a KV cache layer moving data between GPU, CPU, SSD, and remote storage.

Usable preview

KV cache savings estimator

Cached prompt tokens per request Requests per day Expected cache-hit rate % Prefill ms per token GPU cost per hour USD

{ "product": "LMCache Space", "status": "ready_for_input", "output": [ "Estimate repeated prompt-token reuse and prefill time avoided.", "Plan cache tiers across GPU memory, CPU RAM, SSD, and remote storage.", "Generate an MCP-ready cache policy for agent workflows." ] }

Hosted checkout activates after a payment URL is configured for this deployment.

What changes

Cache planning becomes a workflow, not a guess.

The site is built around a practical estimate, trace review, rollout checklist, and agent-readable policy output. That gives a real user something to do on the first screen instead of a parked-domain sales block.

Estimate

Quantify repeated prompt tokens, avoided prefill work, and rough GPU-hour savings.

Stage

Choose cache tiers, retention policy, invalidation rules, and observability checks.

Connect

Expose a paid MCP endpoint and cache policy that agents can reference safely.

Review

Inspect hit-rate, long-context pressure, and miss patterns before production routing.

KV cache layer

Map cache tiers, retention windows, and hit-rate targets for prompt reuse and long-context workloads.

vLLM KV cache

Prepare an LMCache-compatible rollout checklist for vLLM services and inference routers.

Long-context inference

Protect long prompts, RAG context, and agent memory from avoidable repeated prefill cost.

MCP-ready

Agent-readable cache policy and diagnostics.

LMCache Space exposes a server-card, JSON-RPC style MCP endpoint, and structured tool definitions so a paid agent workflow can request cache estimates or rollout checks.

Open server-card See MCP examples

MCP tools

estimate_kv_cache_savings for prompt-token reuse and GPU spend estimates.
plan_lmcache_rollout for staging, cache tiers, and observability checks.
generate_mcp_cache_policy for agent-readable cache rules.

Pricing

Choose a cache rollout plan.

Starter

$24.50 /mo

One cache-readiness review, sizing estimate, and MCP policy export for a single inference service.

$294 due today, covers one year, renews automatically until canceled.

Team

$74.50 /mo

Repeated KV cache sizing, rollout evidence, and MCP cache-policy exports for a platform team.

$894 due today, covers one year, renews automatically until canceled.

Scale

$199.50 /mo

Portfolio cache governance, multi-cluster rollout planning, and trace review for production teams.

$2394 due today, covers one year, renews automatically until canceled.

Related AI workflow reference

LMCache Space readers comparing workflow plans with launch and market assumptions can also review MiroFish AI Simulator, a companion reference for simulation-style product reasoning.

For cache and inference planning, the Kimi K3 resource hub helps frame long-context assumptions before teams size inputs, KV cache layers, and model routes.