What is Semantic Caching? — LLM Cost Glossary | GateCtr

Semantic caching extends traditional exact-match caching by using vector embeddings to identify prompts that are semantically equivalent even if worded differently. "What is the capital of France?" and "Tell me the capital city of France" would both hit the same cache entry.

The process: embed the incoming prompt → search a vector store for similar cached prompts → if similarity exceeds a threshold (e.g., 0.95 cosine similarity), return the cached response without calling the LLM.

Semantic caching is particularly effective for customer-facing applications where users ask similar questions repeatedly. It can reduce LLM API calls by 40–70% for high-traffic use cases. GateCtr's roadmap includes a semantic cache layer in Q1 2027.

Qu'est-ce que Semantic Caching ?

Termes associés

Voir GateCtr en action — gratuit