Context Optimization
Reduce token usage while maintaining quality through intelligent context optimization.
Overview
Context Optimization automatically reduces token count by:
- Removing redundant information
- Compressing repetitive patterns
- Optimizing prompt structure
- Preserving critical information
Optimization Techniques
1. Redundancy Removal
Duplicate content is identified and removed:
# Before (3000 tokens)
"""
User asked: "How do I implement OAuth?"
I provided: Full OAuth implementation guide
User asked: "Can you show me the code?"
I provided: Complete code example
User asked: "How do I test it?"
I provided: Testing procedures
User asked AGAIN: "How do I implement OAuth?"
I provide: Reference to previous answer
"""
# After (800 tokens)
"""
OAuth implementation covered (messages 1-12).
Code example: auth.py (lines 1-150)
Testing: test_auth.py
Latest: User asked same question again → Referenced previous answer
"""
2. Pattern Compression
Repetitive patterns are compressed:
# Before (1500 tokens)
[
{"role": "user", "content": "Fix this bug"},
{"role": "assistant", "content": "Here's the fix..."},
{"role": "user", "content": "Thanks, now fix this other bug"},
{"role": "assistant", "content": "Here's the fix..."},
# ... repeated 20 times
]
# After (300 tokens)
[
{"role": "system", "content": "Debugging session: 20 bugs fixed"},
{"role": "user", "content": "Fix this new bug"},
]
3. Structure Optimization
Prompt structure is optimized:
# Before: Verbose system prompt
"""
You are an expert Python developer with 10 years of experience.
You have worked on large-scale systems and know best practices.
Please provide clear, concise answers with code examples.
When suggesting solutions, consider:
- Performance implications
- Security concerns
- Maintainability
- Scalability
"""
# After: Optimized
"""
Python expert. Prioritize: performance, security, maintainability.
Provide code examples.
"""
What Gets Optimized
Safe to Optimize
| Content Type | Safe to Compress |
|---|---|
| Verbose instructions | ✅ Yes |
| Repetitive patterns | ✅ Yes |
| Redundant context | ✅ Yes |
| Long examples (if similar) | ✅ Yes |
Never Optimized
| Content Type | Always Preserved |
|---|---|
| User code | ✅ Preserved |
| API keys/secrets | ✅ Preserved |
| Latest messages | ✅ Preserved |
| Critical decisions | ✅ Preserved |
Optimization Levels
Conservative (20-30% reduction)
- Remove obvious redundancy
- Compress verbose instructions
- Preserve most context
Balanced (40-60% reduction)
- Aggressive redundancy removal
- Pattern compression
- Structure optimization
Aggressive (70-80% reduction)
- Maximum compression
- Risk: Some context loss
- Best for: Well-defined tasks
Configuration
Via Savings Slider
The savings slider controls optimization level:
- Quality Focus → Conservative optimization
- Balanced → Balanced optimization
- Maximum Savings → Aggressive optimization
Via API (Coming Soon)
client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[...],
korad_settings={
"optimization": {
"level": "balanced", # conservative, balanced, aggressive
"preserve_code": True,
"preserve_latest_n": 5
}
}
)
Examples
Long Documentation
# Before: 5000 tokens
"""
User: Here's my API documentation (50 pages)
Assistant: I'll analyze it...
[Full documentation included in context]
"""
# After: 800 tokens
"""
User: API documentation provided (50 pages)
Key endpoints:
- GET /users - List users
- POST /users - Create user
- PUT /users/:id - Update user
- DELETE /users/:id - Delete user
Auth: JWT required
Rate limit: 100 req/min
"""
Code Review Session
# Before: 8000 tokens (multiple files)
[
{"role": "user", "content": "Review this file..."},
{"role": "assistant", "content": "Here's my review..."},
# ... repeated for 20 files
]
# After: 1500 tokens
[
{
"role": "system",
"content": "Code review session: 20 files reviewed. Issues found: 15 critical, 30 minor."
},
{"role": "user", "content": "Review this new file..."},
]
Performance Impact
| Metric | Before | After | Improvement |
|---|---|---|---|
| Avg request size | 8000 tokens | 3200 tokens | 60% reduction |
| Avg response time | 2.5s | 1.8s | 28% faster |
| Cost per request | $0.048 | $0.019 | 60% savings |
Monitoring
Track optimization effectiveness:
response = client.messages.create(...)
if hasattr(response, 'korad_optimization'):
print(f"Original size: {response.korad_optimization.original_tokens}")
print(f"Optimized size: {response.korad_optimization.optimized_tokens}")
print(f"Reduction: {response.korad_optimization.reduction_percent}%")
Best Practices
1. Start with Balanced
# Good: Let Korad.AI optimize automatically
savings_level = "medium"
2. Monitor Quality
# Check if optimization affects your use case
# Run A/B tests with different levels
3. Preserve Critical Context
# For important decisions, use quality focus
savings_level = "low"
4. Test Before Committing
# Try aggressive optimization on non-critical tasks first
savings_level = "high"