Skip to main content

Available Models

Korad.AI gives you access to 20+ cutting-edge AI models through a single API. All models support tools, streaming, and consistent Anthropic-compatible formatting.

Model Catalog

Claude 4.5 Series (Anthropic)

Native Anthropic models — First-class citizens on Korad.AI

Claude 4.5 Sonnet

model="claude-sonnet-4-5"
  • Context Window: 200K tokens
  • Max Output: 8,192 tokens
  • Best For: Primary reasoning, coding, analysis, complex tasks
  • Speed: Medium
  • Cost Tier: Premium

Claude 4.5 Haiku

model="claude-haiku-4-5"
  • Context Window: 200K tokens
  • Max Output: 8,192 tokens
  • Best For: Fast responses, summarization, simple tasks
  • Speed: Fast
  • Cost Tier: Economy

Gemini 2.5 Series (Google)

Google's flagship models — excellent for creative coding

Gemini 2.5 Pro

model="gemini-2.5-pro"
  • Context Window: 128K tokens
  • Max Output: 8,192 tokens
  • Best For: Creative coding, "vibe coding", complex problem-solving
  • Speed: Medium
  • Cost Tier: Mid-range

Gemini 2.5 Flash

model="gemini-2.5-flash"
  • Context Window: 128K tokens
  • Max Output: 8,192 tokens
  • Best For: Ultra-fast iterative development, quick responses
  • Speed: Very Fast (~100ms latency)
  • Cost Tier: Economy

Grok Series (xAI)

xAI's models with massive context windows

Grok-3

model="grok-3"
  • Context Window: 1,000K tokens (1M!)
  • Max Output: 8,192 tokens
  • Best For: Extended context reasoning, long-form analysis
  • Speed: Medium
  • Cost Tier: Mid-range

Grok-2 Fast

model="grok-2-fast"
  • Context Window: 1,000K tokens (1M!)
  • Max Output: 8,192 tokens
  • Best For: Low-latency queries with large context
  • Speed: Fast
  • Cost Tier: Economy

DeepSeek Series

Cost-effective models for budget-conscious developers

DeepSeek V3 Chat

model="deepseek-v3-chat"
  • Context Window: 128K tokens
  • Max Output: 8,192 tokens
  • Best For: Cost-effective default, general-purpose
  • Speed: Fast
  • Cost Tier: Budget

GLM Series (Zhipu AI)

Strong Chinese models with excellent coding capabilities

GLM-4.7

model="glm-4.7"
  • Context Window: 1,000K tokens (1M!)
  • Max Output: 8,192 tokens
  • Best For: Coding, reasoning, Chinese/English bilingual
  • Speed: Medium
  • Cost Tier: Mid-range

GLM-4.7 Flash

model="glm-4.7-flash"
  • Context Window: 1,000K tokens (1M!)
  • Max Output: 8,192 tokens
  • Best For: Fast coding assistance, quick responses
  • Speed: Fast
  • Cost Tier: Economy

Kimi Series (Moonshot AI)

Multimodal agent swarm models

Kimi K2.5

model="kimi-k2.5"
  • Context Window: 262K tokens (256K)
  • Max Output: 8,192 tokens
  • Best For: Multimodal tasks, agent workflows
  • Speed: Medium
  • Cost Tier: Mid-range

Qwen Series (Alibaba)

Enterprise-focused models

Qwen Max

model="qwen-max"
  • Context Window: 30,720 tokens
  • Max Output: 8,192 tokens
  • Best For: Enterprise tasks, business applications
  • Speed: Medium
  • Cost Tier: Mid-range

Qwen Flash

model="qwen-flash"
  • Context Window: 30,720 tokens
  • Max Output: 8,192 tokens
  • Best For: Quick enterprise queries
  • Speed: Fast
  • Cost Tier: Economy

MiniMax M2 Series

Polyglot coding models

MiniMax M2.1

model="minimax-m2.1"
  • Context Window: 200K tokens
  • Max Output: 8,192 tokens
  • Best For: Polyglot code mastery, multi-language projects
  • Speed: Medium
  • Cost Tier: Mid-range

MiniMax M2.1 Lightning

model="minimax-m2.1-lightning"
  • Context Window: 200K tokens
  • Max Output: 8,192 tokens
  • Best For: Fast code completion, suggestions
  • Speed: Very Fast (~100 tps)
  • Cost Tier: Economy

Model Comparison

ModelContextSpeedCostBest For
Claude 4.5 Haiku200KFast$Fast responses, summarization
Gemini 2.5 Flash128KVery Fast$Ultra-fast iterative dev
DeepSeek V3128KFast$Cost-effective default
Grok-2 Fast1MFast$$Low-latency, large context
Gemini 2.5 Pro128KMedium$$Creative coding
Grok-31MMedium$$Extended context reasoning
GLM-4.7 Flash1MFast$$Fast coding, bilingual
Claude 4.5 Sonnet200KMedium$$$Primary reasoning, coding
GLM-4.71MMedium$$$Complex coding, 1M context
MiniMax M2.1200KMedium$$$Polyglot coding

Model Selection Guide

Choose Claude 4.5 Haiku when:

  • You need fast responses
  • Tasks are simple (Q&A, classification, summarization)
  • Cost is a priority
  • You want Anthropic quality at lower cost

Choose Gemini 2.5 Flash when:

  • You need ultra-fast iteration
  • Building in a "vibe coding" flow
  • Quick prototyping and testing

Choose DeepSeek V3 when:

  • You want the best value
  • General-purpose tasks
  • Budget-conscious development

Choose Claude 4.5 Sonnet when:

  • You need the best reasoning
  • Complex coding tasks
  • Production applications
  • Cost is secondary to quality

Choose Grok-3 when:

  • You have massive context (up to 1M tokens!)
  • Long-form analysis
  • Extended conversations

Choose Gemini 2.5 Pro when:

  • Creative coding tasks
  • Complex problem-solving
  • You want a different reasoning style

Usage Examples

Basic Request

import anthropic

client = anthropic.Anthropic(
base_url="https://api.korad.ai/v1",
api_key="sk-korad-YOUR-KEY"
)

# Use any model
response = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
messages=[{"role": "user", "content": "Explain quantum computing"}]
)

print(response.content[0].text)

Switching Models

# Start fast and cheap
response = client.messages.create(
model="gemini-2.5-flash",
max_tokens=512,
messages=[{"role": "user", "content": "Quick summary"}]
)

# Upgrade for quality if needed
response = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=2048,
messages=[{"role": "user", "content": "Detailed analysis"}]
)

With Tools

response = client.messages.create(
model="gemini-2.5-pro",
max_tokens=1024,
tools=[{
"type": "computer_20241022",
"name": "web-search",
"description": "Search the web for current information"
}],
messages=[{"role": "user", "content": "What's the latest AI news?"}]
)

Model Capabilities

Supported by All Models

  • ✅ Text generation
  • ✅ Tool/function calling
  • ✅ Streaming responses
  • ✅ System prompts
  • ✅ Conversation history

Vision Support

Coming soon to selected models (Claude 4.5, Gemini 2.5 Pro)

Long Context

Models with 1M+ context windows:

  • Grok-3, Grok-2 Fast (1M tokens)
  • GLM-4.7, GLM-4.7 Flash (1M tokens)

Model Limits

LimitValue
Max input tokensUp to 1M (model-dependent)
Max output tokens8,192 (all models)
Requests/minute500 (global)
Concurrent requests100

Cost Estimation

# Example: 1,000 tokens with different models
input_tokens = 1000
output_tokens = 1000

# Approximate costs (varies by provider)
claude_sonnet_cost = (input_tokens * 0.003 + output_tokens * 0.015) / 1000 # ~$0.018
gemini_pro_cost = (input_tokens * 0.001 + output_tokens * 0.004) / 1000 # ~$0.005
deepseek_cost = (input_tokens * 0.0001 + output_tokens * 0.0001) / 1000 # ~$0.0002

List Available Models

curl https://api.korad.ai/v1/models \
-H "Authorization: Bearer sk-korad-YOUR-KEY"

Response:

{
"object": "list",
"data": [
{"id": "claude-sonnet-4-5", "object": "model", ...},
{"id": "gemini-2.5-pro", "object": "model", ...},
{"id": "grok-3", "object": "model", ...},
...
]
}

Chat Completions API → | Tools →