LLM Caching Library
8.0k 2026-04-26
zilliztech/GPTCache
A semantic caching library for LLM queries, designed to drastically cut API costs and accelerate response times.
Core Features
Semantic caching for Large Language Model (LLM) responses.
Up to 10x reduction in LLM API costs.
Up to 100x acceleration in LLM query response speed.
Full integration with popular LLM frameworks like LangChain and LlamaIndex.
Docker server image available for language-agnostic usage.
Quick Start
pip install gptcacheDetailed Introduction
GPTCache is an open-source library engineered to address the common challenges of high costs and slow response times associated with Large Language Model (LLM) API calls. By implementing a semantic cache, GPTCache intelligently stores and retrieves responses to semantically similar queries, thereby minimizing redundant API requests. This innovative approach empowers developers to build more cost-effective and performant LLM-powered applications, ensuring scalability and an enhanced user experience.