What is Prompt Caching? — LLM Cost Glossary | GateCtr

Prompt caching stores the output of an LLM call and returns the cached response when the same (or semantically similar) prompt is submitted again. For applications with repetitive queries — FAQ bots, document Q&A, code assistants — caching can eliminate 30–70% of API calls entirely.

There are two types: exact caching (same prompt → same response) and semantic caching (similar prompts → reuse response if similarity exceeds a threshold). Semantic caching requires embedding the prompt and comparing against a vector store.

GateCtr's LLM Cache Layer (coming Q1 2027) will implement semantic caching transparently. Until then, GateCtr's token compression reduces the cost of cache misses.

Qu'est-ce que Prompt Caching ?

Termes associés

Voir GateCtr en action — gratuit