Available Models
Korad.AI gives you access to 20+ cutting-edge AI models through a single API. All models support tools, streaming, and consistent Anthropic-compatible formatting.
Model Catalog
Claude 4.5 Series (Anthropic)
Native Anthropic models — First-class citizens on Korad.AI
Claude 4.5 Sonnet
model="claude-sonnet-4-5"
- Context Window: 200K tokens
- Max Output: 8,192 tokens
- Best For: Primary reasoning, coding, analysis, complex tasks
- Speed: Medium
- Cost Tier: Premium
Claude 4.5 Haiku
model="claude-haiku-4-5"
- Context Window: 200K tokens
- Max Output: 8,192 tokens
- Best For: Fast responses, summarization, simple tasks
- Speed: Fast
- Cost Tier: Economy
Gemini 2.5 Series (Google)
Google's flagship models — excellent for creative coding
Gemini 2.5 Pro
model="gemini-2.5-pro"
- Context Window: 128K tokens
- Max Output: 8,192 tokens
- Best For: Creative coding, "vibe coding", complex problem-solving
- Speed: Medium
- Cost Tier: Mid-range
Gemini 2.5 Flash
model="gemini-2.5-flash"
- Context Window: 128K tokens
- Max Output: 8,192 tokens
- Best For: Ultra-fast iterative development, quick responses
- Speed: Very Fast (~100ms latency)
- Cost Tier: Economy
Grok Series (xAI)
xAI's models with massive context windows
Grok-3
model="grok-3"
- Context Window: 1,000K tokens (1M!)
- Max Output: 8,192 tokens
- Best For: Extended context reasoning, long-form analysis
- Speed: Medium
- Cost Tier: Mid-range
Grok-2 Fast
model="grok-2-fast"
- Context Window: 1,000K tokens (1M!)
- Max Output: 8,192 tokens
- Best For: Low-latency queries with large context
- Speed: Fast
- Cost Tier: Economy
DeepSeek Series
Cost-effective models for budget-conscious developers
DeepSeek V3 Chat
model="deepseek-v3-chat"
- Context Window: 128K tokens
- Max Output: 8,192 tokens
- Best For: Cost-effective default, general-purpose
- Speed: Fast
- Cost Tier: Budget
GLM Series (Zhipu AI)
Strong Chinese models with excellent coding capabilities
GLM-4.7
model="glm-4.7"
- Context Window: 1,000K tokens (1M!)
- Max Output: 8,192 tokens
- Best For: Coding, reasoning, Chinese/English bilingual
- Speed: Medium
- Cost Tier: Mid-range
GLM-4.7 Flash
model="glm-4.7-flash"
- Context Window: 1,000K tokens (1M!)
- Max Output: 8,192 tokens
- Best For: Fast coding assistance, quick responses
- Speed: Fast
- Cost Tier: Economy
Kimi Series (Moonshot AI)
Multimodal agent swarm models
Kimi K2.5
model="kimi-k2.5"
- Context Window: 262K tokens (256K)
- Max Output: 8,192 tokens
- Best For: Multimodal tasks, agent workflows
- Speed: Medium
- Cost Tier: Mid-range
Qwen Series (Alibaba)
Enterprise-focused models
Qwen Max
model="qwen-max"
- Context Window: 30,720 tokens
- Max Output: 8,192 tokens
- Best For: Enterprise tasks, business applications
- Speed: Medium
- Cost Tier: Mid-range
Qwen Flash
model="qwen-flash"
- Context Window: 30,720 tokens
- Max Output: 8,192 tokens
- Best For: Quick enterprise queries
- Speed: Fast
- Cost Tier: Economy
MiniMax M2 Series
Polyglot coding models
MiniMax M2.1
model="minimax-m2.1"
- Context Window: 200K tokens
- Max Output: 8,192 tokens
- Best For: Polyglot code mastery, multi-language projects
- Speed: Medium
- Cost Tier: Mid-range
MiniMax M2.1 Lightning
model="minimax-m2.1-lightning"
- Context Window: 200K tokens
- Max Output: 8,192 tokens
- Best For: Fast code completion, suggestions
- Speed: Very Fast (~100 tps)
- Cost Tier: Economy
Model Comparison
| Model | Context | Speed | Cost | Best For |
|---|---|---|---|---|
| Claude 4.5 Haiku | 200K | Fast | $ | Fast responses, summarization |
| Gemini 2.5 Flash | 128K | Very Fast | $ | Ultra-fast iterative dev |
| DeepSeek V3 | 128K | Fast | $ | Cost-effective default |
| Grok-2 Fast | 1M | Fast | $$ | Low-latency, large context |
| Gemini 2.5 Pro | 128K | Medium | $$ | Creative coding |
| Grok-3 | 1M | Medium | $$ | Extended context reasoning |
| GLM-4.7 Flash | 1M | Fast | $$ | Fast coding, bilingual |
| Claude 4.5 Sonnet | 200K | Medium | $$$ | Primary reasoning, coding |
| GLM-4.7 | 1M | Medium | $$$ | Complex coding, 1M context |
| MiniMax M2.1 | 200K | Medium | $$$ | Polyglot coding |
Model Selection Guide
Choose Claude 4.5 Haiku when:
- You need fast responses
- Tasks are simple (Q&A, classification, summarization)
- Cost is a priority
- You want Anthropic quality at lower cost
Choose Gemini 2.5 Flash when:
- You need ultra-fast iteration
- Building in a "vibe coding" flow
- Quick prototyping and testing
Choose DeepSeek V3 when:
- You want the best value
- General-purpose tasks
- Budget-conscious development
Choose Claude 4.5 Sonnet when:
- You need the best reasoning
- Complex coding tasks
- Production applications
- Cost is secondary to quality
Choose Grok-3 when:
- You have massive context (up to 1M tokens!)
- Long-form analysis
- Extended conversations
Choose Gemini 2.5 Pro when:
- Creative coding tasks
- Complex problem-solving
- You want a different reasoning style
Usage Examples
Basic Request
import anthropic
client = anthropic.Anthropic(
base_url="https://api.korad.ai/v1",
api_key="sk-korad-YOUR-KEY"
)
# Use any model
response = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
messages=[{"role": "user", "content": "Explain quantum computing"}]
)
print(response.content[0].text)
Switching Models
# Start fast and cheap
response = client.messages.create(
model="gemini-2.5-flash",
max_tokens=512,
messages=[{"role": "user", "content": "Quick summary"}]
)
# Upgrade for quality if needed
response = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=2048,
messages=[{"role": "user", "content": "Detailed analysis"}]
)
With Tools
response = client.messages.create(
model="gemini-2.5-pro",
max_tokens=1024,
tools=[{
"type": "computer_20241022",
"name": "web-search",
"description": "Search the web for current information"
}],
messages=[{"role": "user", "content": "What's the latest AI news?"}]
)
Model Capabilities
Supported by All Models
- ✅ Text generation
- ✅ Tool/function calling
- ✅ Streaming responses
- ✅ System prompts
- ✅ Conversation history
Vision Support
Coming soon to selected models (Claude 4.5, Gemini 2.5 Pro)
Long Context
Models with 1M+ context windows:
- Grok-3, Grok-2 Fast (1M tokens)
- GLM-4.7, GLM-4.7 Flash (1M tokens)
Model Limits
| Limit | Value |
|---|---|
| Max input tokens | Up to 1M (model-dependent) |
| Max output tokens | 8,192 (all models) |
| Requests/minute | 500 (global) |
| Concurrent requests | 100 |
Cost Estimation
# Example: 1,000 tokens with different models
input_tokens = 1000
output_tokens = 1000
# Approximate costs (varies by provider)
claude_sonnet_cost = (input_tokens * 0.003 + output_tokens * 0.015) / 1000 # ~$0.018
gemini_pro_cost = (input_tokens * 0.001 + output_tokens * 0.004) / 1000 # ~$0.005
deepseek_cost = (input_tokens * 0.0001 + output_tokens * 0.0001) / 1000 # ~$0.0002
List Available Models
curl https://api.korad.ai/v1/models \
-H "Authorization: Bearer sk-korad-YOUR-KEY"
Response:
{
"object": "list",
"data": [
{"id": "claude-sonnet-4-5", "object": "model", ...},
{"id": "gemini-2.5-pro", "object": "model", ...},
{"id": "grok-3", "object": "model", ...},
...
]
}