Save up to 90% on your LLM API costs with automatic prompt caching optimization.
Anthropic, OpenAI, and other LLM providers now offer prompt caching - you pay 90% less for tokens that are cached. But most developers:
- ❌ Don't know how to structure prompts for optimal caching
- ❌ Manually optimize each prompt (time-consuming and error-prone)
- ❌ Have no visibility into cache performance
- ❌ Miss opportunities to reduce costs
Result: You're paying 5-10x more than necessary for the same API calls.
Prompt Cache Optimizer automatically:
✅ Analyzes your prompts to find caching opportunities
✅ Optimizes prompt structure for maximum cache efficiency
✅ Monitors cache performance in real-time
✅ Reports actual dollar savings
pip install cachex# Analyze a directory
cachex analyze ./your_project
# Output:
# 💰 Potential savings: $2,847/month (87% reduction)
# 📊 Cacheable tokens: 45,231 / 52,000
# 🎯 Top opportunities:
# 1. System prompt (15K tokens) - used 1,240x - save $1,200/mo
# 2. Few-shot examples (8K tokens) - used 890x - save $680/mofrom cachex import optimize
# Before: inefficient prompt structure
prompt = f"{system_prompt}\n{user_query}\n{examples}"
# After: automatically optimized
optimized = optimize(prompt)
# Restructured to: {system_prompt}\n{examples}\n{user_query}
# Result: 90% of tokens cached, massive savings# LangChain integration
from cachex.langchain import OptimizedChatAnthropic
llm = OptimizedChatAnthropic() # Zero code changes, automatic optimization
chain = llm | prompt | parser- Static code analysis to detect prompt patterns
- Runtime analysis of actual API calls
- Identify what changes vs what stays static
- Calculate potential savings in dollars
- Automatic prompt restructuring
- Multiple optimization strategies
- Validates optimized prompts work correctly
- Provider-specific optimizations (Anthropic, OpenAI)
- Real-time cache hit rate tracking
- Historical cost analytics
- Alerts when efficiency drops
- Beautiful CLI and web dashboards
- Anthropic Claude (full support)
- OpenAI GPT-4 (full support)
- LangChain (drop-in adapters)
- LlamaIndex (drop-in adapters)
- Raw API calls (works with any provider)
"Reduced our monthly Claude bill from $4,200 to $580 with zero code changes"
— AI Startup founder
"The analysis alone paid for itself - we found $12K/year in wasted spending"
— ML Engineer at Series B company
The optimizer analyzes your prompts to identify:
- Static content (system prompts, examples) → cache these
- Semi-static content (RAG context with reuse) → cache intelligently
- Dynamic content (user queries) → never cache
# ❌ Before: Poor caching (most tokens not cached)
prompt = f"""
{user_query} # Changes every time (dynamic)
{system_prompt} # Static - should be first!
{few_shot_examples} # Static - should be cached!
"""
# ✅ After: Optimal caching (90% tokens cached)
prompt = f"""
{system_prompt} # Static first → cached
{few_shot_examples} # Static next → cached
{user_query} # Dynamic last → only this changes
"""cachex monitor
# Live dashboard shows:
# Cache hit rate: 87.3% ⬆️ +5.2%
# Tokens cached: 45,231 / 52,000
# Cost today: $12.40 (saved $85.60)
# Cost this month: $580 (saved $3,620)- 🤖 Chatbots: Cache system prompts and conversation history
- 📚 RAG Systems: Cache document context that's frequently reused
- 🔧 Agents: Cache tool descriptions and examples
- 📝 Content Generation: Cache templates and style guides
- 💬 Customer Support: Cache company knowledge base
- Core analysis engine
- CLI tool
- Anthropic integration
- OpenAI integration (Week 8)
- LangChain adapters (Week 7)
- Web dashboard (Week 12)
- Team collaboration (Month 4)
- Enterprise features (Month 6)
We welcome contributions! See CONTRIBUTING.md for guidelines.
Good first issues: Look for the good-first-issue label
- 📖 Documentation
- 💬 GitHub Discussions
- 🐛 Issue Tracker
- 📧 Email: [email protected]
MIT License - see LICENSE for details
Built with:
- Anthropic Claude - for the amazing API and prompt caching feature
- OpenAI - for GPT models and caching support
- tiktoken - for token counting
⭐ If this tool saves you money, please star the repo!
Made with ❤️ by developers tired of high API bills