Available Models

Korad.AI gives you access to 20+ cutting-edge AI models through a single API. All models support tools, streaming, and consistent Anthropic-compatible formatting.

Model Catalog

Claude 4.5 Series (Anthropic)

Native Anthropic models — First-class citizens on Korad.AI

Claude 4.5 Sonnet

model="claude-sonnet-4-5"

Context Window: 200K tokens
Max Output: 8,192 tokens
Best For: Primary reasoning, coding, analysis, complex tasks
Speed: Medium
Cost Tier: Premium

Claude 4.5 Haiku

model="claude-haiku-4-5"

Context Window: 200K tokens
Max Output: 8,192 tokens
Best For: Fast responses, summarization, simple tasks
Speed: Fast
Cost Tier: Economy

Gemini 2.5 Series (Google)

Google's flagship models — excellent for creative coding

Gemini 2.5 Pro

model="gemini-2.5-pro"

Context Window: 128K tokens
Max Output: 8,192 tokens
Best For: Creative coding, "vibe coding", complex problem-solving
Speed: Medium
Cost Tier: Mid-range

Gemini 2.5 Flash

model="gemini-2.5-flash"

Context Window: 128K tokens
Max Output: 8,192 tokens
Best For: Ultra-fast iterative development, quick responses
Speed: Very Fast (~100ms latency)
Cost Tier: Economy

Grok Series (xAI)

xAI's models with massive context windows

Grok-3

model="grok-3"

Context Window: 1,000K tokens (1M!)
Max Output: 8,192 tokens
Best For: Extended context reasoning, long-form analysis
Speed: Medium
Cost Tier: Mid-range

Grok-2 Fast

model="grok-2-fast"

Context Window: 1,000K tokens (1M!)
Max Output: 8,192 tokens
Best For: Low-latency queries with large context
Speed: Fast
Cost Tier: Economy

DeepSeek Series

Cost-effective models for budget-conscious developers

DeepSeek V3 Chat

model="deepseek-v3-chat"

Context Window: 128K tokens
Max Output: 8,192 tokens
Best For: Cost-effective default, general-purpose
Speed: Fast
Cost Tier: Budget

GLM Series (Zhipu AI)

Strong Chinese models with excellent coding capabilities

GLM-4.7

model="glm-4.7"

Context Window: 1,000K tokens (1M!)
Max Output: 8,192 tokens
Best For: Coding, reasoning, Chinese/English bilingual
Speed: Medium
Cost Tier: Mid-range

GLM-4.7 Flash

model="glm-4.7-flash"

Context Window: 1,000K tokens (1M!)
Max Output: 8,192 tokens
Best For: Fast coding assistance, quick responses
Speed: Fast
Cost Tier: Economy

Kimi Series (Moonshot AI)

Multimodal agent swarm models

Kimi K2.5

model="kimi-k2.5"

Context Window: 262K tokens (256K)
Max Output: 8,192 tokens
Best For: Multimodal tasks, agent workflows
Speed: Medium
Cost Tier: Mid-range

Qwen Series (Alibaba)

Enterprise-focused models

Qwen Max

model="qwen-max"

Context Window: 30,720 tokens
Max Output: 8,192 tokens
Best For: Enterprise tasks, business applications
Speed: Medium
Cost Tier: Mid-range

Qwen Flash

model="qwen-flash"

Context Window: 30,720 tokens
Max Output: 8,192 tokens
Best For: Quick enterprise queries
Speed: Fast
Cost Tier: Economy

MiniMax M2 Series

Polyglot coding models

MiniMax M2.1

model="minimax-m2.1"

Context Window: 200K tokens
Max Output: 8,192 tokens
Best For: Polyglot code mastery, multi-language projects
Speed: Medium
Cost Tier: Mid-range

MiniMax M2.1 Lightning

model="minimax-m2.1-lightning"

Context Window: 200K tokens
Max Output: 8,192 tokens
Best For: Fast code completion, suggestions
Speed: Very Fast (~100 tps)
Cost Tier: Economy

Model Comparison

Model	Context	Speed	Cost	Best For
Claude 4.5 Haiku	200K	Fast	$	Fast responses, summarization
Gemini 2.5 Flash	128K	Very Fast	$	Ultra-fast iterative dev
DeepSeek V3	128K	Fast	$	Cost-effective default
Grok-2 Fast	1M	Fast	$$	Low-latency, large context
Gemini 2.5 Pro	128K	Medium	$$	Creative coding
Grok-3	1M	Medium	$$	Extended context reasoning
GLM-4.7 Flash	1M	Fast	$$	Fast coding, bilingual
Claude 4.5 Sonnet	200K	Medium	$$$	Primary reasoning, coding
GLM-4.7	1M	Medium	$$$	Complex coding, 1M context
MiniMax M2.1	200K	Medium	$$$	Polyglot coding

Model Selection Guide

Choose Claude 4.5 Haiku when:

You need fast responses
Tasks are simple (Q&A, classification, summarization)
Cost is a priority
You want Anthropic quality at lower cost

Choose Gemini 2.5 Flash when:

You need ultra-fast iteration
Building in a "vibe coding" flow
Quick prototyping and testing

Choose DeepSeek V3 when:

You want the best value
General-purpose tasks
Budget-conscious development

Choose Claude 4.5 Sonnet when:

You need the best reasoning
Complex coding tasks
Production applications
Cost is secondary to quality

Choose Grok-3 when:

You have massive context (up to 1M tokens!)
Long-form analysis
Extended conversations

Choose Gemini 2.5 Pro when:

Creative coding tasks
Complex problem-solving
You want a different reasoning style

Usage Examples

Basic Request

import anthropic

client = anthropic.Anthropic(
    base_url="https://api.korad.ai/v1",
    api_key="sk-korad-YOUR-KEY"
)

# Use any model
response = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Explain quantum computing"}]
)

print(response.content[0].text)

Switching Models

# Start fast and cheap
response = client.messages.create(
    model="gemini-2.5-flash",
    max_tokens=512,
    messages=[{"role": "user", "content": "Quick summary"}]
)

# Upgrade for quality if needed
response = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=2048,
    messages=[{"role": "user", "content": "Detailed analysis"}]
)

With Tools

response = client.messages.create(
    model="gemini-2.5-pro",
    max_tokens=1024,
    tools=[{
        "type": "computer_20241022",
        "name": "web-search",
        "description": "Search the web for current information"
    }],
    messages=[{"role": "user", "content": "What's the latest AI news?"}]
)

Model Capabilities

Supported by All Models

✅ Text generation
✅ Tool/function calling
✅ Streaming responses
✅ System prompts
✅ Conversation history

Vision Support

Coming soon to selected models (Claude 4.5, Gemini 2.5 Pro)

Long Context

Models with 1M+ context windows:

Grok-3, Grok-2 Fast (1M tokens)
GLM-4.7, GLM-4.7 Flash (1M tokens)

Model Limits

Limit	Value
Max input tokens	Up to 1M (model-dependent)
Max output tokens	8,192 (all models)
Requests/minute	500 (global)
Concurrent requests	100

Cost Estimation

# Example: 1,000 tokens with different models
input_tokens = 1000
output_tokens = 1000

# Approximate costs (varies by provider)
claude_sonnet_cost = (input_tokens * 0.003 + output_tokens * 0.015) / 1000  # ~$0.018
gemini_pro_cost = (input_tokens * 0.001 + output_tokens * 0.004) / 1000     # ~$0.005
deepseek_cost = (input_tokens * 0.0001 + output_tokens * 0.0001) / 1000    # ~$0.0002

List Available Models

curl https://api.korad.ai/v1/models \
  -H "Authorization: Bearer sk-korad-YOUR-KEY"

Response:

{
  "object": "list",
  "data": [
    {"id": "claude-sonnet-4-5", "object": "model", ...},
    {"id": "gemini-2.5-pro", "object": "model", ...},
    {"id": "grok-3", "object": "model", ...},
    ...
  ]
}

Chat Completions API → | Tools →

Model Catalog​

Claude 4.5 Series (Anthropic)​

Claude 4.5 Sonnet​

Claude 4.5 Haiku​

Gemini 2.5 Series (Google)​

Gemini 2.5 Pro​

Gemini 2.5 Flash​

Grok Series (xAI)​

Grok-3​

Grok-2 Fast​

DeepSeek Series​

DeepSeek V3 Chat​

GLM Series (Zhipu AI)​

GLM-4.7​

GLM-4.7 Flash​

Kimi Series (Moonshot AI)​

Kimi K2.5​

Qwen Series (Alibaba)​

Qwen Max​

Qwen Flash​

MiniMax M2 Series​

MiniMax M2.1​

MiniMax M2.1 Lightning​

Model Comparison​

Model Selection Guide​

Choose Claude 4.5 Haiku when:​

Choose Gemini 2.5 Flash when:​

Choose DeepSeek V3 when:​

Choose Claude 4.5 Sonnet when:​

Choose Grok-3 when:​

Choose Gemini 2.5 Pro when:​

Usage Examples​

Basic Request​

Switching Models​

With Tools​

Model Capabilities​

Supported by All Models​

Vision Support​

Long Context​

Model Limits​

Cost Estimation​

List Available Models​