Skip to main content

Dynamic Savings Slider

Control your cost savings vs. output quality with an intuitive slider control.

Overview

The Savings Slider lets you choose how much Korad.AI optimizes your requests:

  • Maximum Savings (90%) — Aggressive compression, minimal quality loss
  • Balanced (50-70%) — Smart optimization, recommended default
  • Quality Focus (20-30%) — Minimal compression, maximum fidelity

How It Works

graph LR
A[Your Prompt] --> B{Savings Level}
B -->|High| C[Aggressive Summarization]
B -->|Medium| D[Smart Context Compression]
B -->|Low| E[Minimal Optimization]
C --> F[Cheaper Model Routing]
D --> F
E --> F
F --> G[Reduced Token Usage]

Setting Your Savings Level

Via Dashboard

  1. Go to korad.ai/dashboard
  2. Adjust the savings slider
  3. Your preference is saved automatically

Via API (Coming Soon)

client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[...],
korad_settings={
"savings_level": "high" # "low", "medium", "high"
}
)

Savings Levels

Maximum Savings

90% cost reduction

  • Aggressive context summarization
  • Maximum model downgrading
  • Best for: Large document analysis, background jobs
korad_settings = {"savings_level": "maximum"}

High Savings

70% cost reduction

  • Strong context compression
  • Smart model selection
  • Best for: Code generation, data processing
korad_settings = {"savings_level": "high"}

Balanced

50% cost reduction (Recommended)

  • Smart optimization
  • Quality-preserving compression
  • Best for: General use, chat applications
korad_settings = {"savings_level": "medium"}

Quality Focus

30% cost reduction

  • Minimal optimization
  • Maximum output fidelity
  • Best for: Creative writing, nuanced responses
korad_settings = {"savings_level": "low"}

What Gets Optimized?

1. Context Compression

Long conversations are intelligently summarized:

Original: 15,000 tokens
Compressed: 3,000 tokens (80% savings)
Quality loss: < 5%

2. Model Routing

Tasks are routed to cost-effective models:

TaskOriginal ModelOptimized ModelSavings
Simple questionsClaude SonnetClaude Haiku70%
Code completionClaude SonnetOptimized Sonnet40%
Complex reasoningClaude OpusClaude Sonnet50%

3. Token Reduction

Redundant tokens are removed:

# Before (user prompt)
"""
Please explain what quantum computing is, including:
- The basic principles
- How quantum bits work
- Why it's faster than classical computing
- Current applications
"""

# After (optimized)
"""
Explain quantum computing: principles, qubits, advantages vs classical,
applications.
"""

Real-World Examples

Document Analysis

# Analyzing a 100-page document
# Original: 50,000 tokens → $0.30 (Sonnet)
# With Maximum Savings: 8,000 tokens → $0.03 (Haiku)
# Savings: 90%

Code Generation

# Generating a React component
# Original: 2,000 tokens → $0.012
# With Balanced: 1,200 tokens → $0.005
# Savings: 58%, identical output

Chat Application

# 10-turn conversation about coding
# Original: 8,000 tokens cumulative → $0.048
# With High Savings: 2,500 tokens cumulative → $0.012
# Savings: 75%, responses stay coherent

Monitoring Your Savings

Every API response includes savings metrics:

response = client.messages.create(...)

print(f"Savings: {response.korad_metrics.savings_percent}%")
print(f"Original cost: ${response.korad_metrics.original_cost:.4f}")
print(f"Your cost: ${response.korad_metrics.your_cost:.4f}")

Best Practices

  1. Start with Balanced — Recommended for most use cases
  2. Test different levels — Run A/B tests for your specific use case
  3. Monitor quality — Check that optimization doesn't hurt your use case
  4. Use Maximum Savings — For background jobs, document analysis
  5. Use Quality Focus — For creative work, nuanced responses
Pro Tip

You can set different savings levels for different API keys! Use one key for production (Balanced) and another for batch jobs (Maximum).

Technical Details

Summarization Algorithm

Korad.AI uses a hybrid approach:

  1. Extract key entities — Names, dates, technical terms
  2. Preserve code blocks — Untouched for accuracy
  3. Summarize prose — Context-aware compression
  4. Maintain thread structure — Conversation flow preserved

Quality Metrics

We measure quality impact across dimensions:

DimensionImpact at High Savings
Factual accuracy< 2% degradation
Code correctnessNo change
Conversation coherence< 5% degradation
Response relevanceNo change

Context Optimization → Hybrid Summarization →