Chat Completions API
Create message completions using Claude models through Korad.AI's optimized gateway.
Endpoint
POST https://api.korad.ai/v1/messages
Request Body
{
model: string;
max_tokens: number;
messages: Message[];
system?: string;
temperature?: number;
top_p?: number;
top_k?: number;
stream?: boolean;
stop_sequences?: string[];
tools?: Tool[];
}
Parameters
Required Parameters
model (string)
The model to use for completion.
Available Models:
claude-sonnet-4-20250514— Best for most tasksclaude-sonnet-4-20250514-optim— Cost-optimized versionclaude-haiku-4-20250514— Fast, for simple tasksclaude-opus-4-20250514— Most capable, for complex tasks
max_tokens (integer)
Maximum tokens to generate. Must be ≥ 1.
{
"model": "claude-sonnet-4-20250514",
"max_tokens": 1024,
"messages": [{"role": "user", "content": "Hello"}]
}
messages (array)
Array of message objects representing the conversation.
type Message = {
role: "user" | "assistant";
content: string | ContentBlock[];
};
Optional Parameters
system (string)
System prompt to guide behavior.
{
"model": "claude-sonnet-4-20250514",
"max_tokens": 1024,
"system": "You are a helpful coding assistant.",
"messages": [{"role": "user", "content": "Write a function"}]
}
temperature (number, default: 0.7)
Sampling temperature. Lower = more focused, higher = more creative.
top_p (number, default: 0.9)
Nucleus sampling threshold.
top_k (number)
Top-k sampling parameter.
stream (boolean, default: false)
Enable streaming responses.
stop_sequences (array[string])
Sequences that will stop generation.
tools (array)
Function calling tools.
Example with Tools
{
"model": "claude-sonnet-4-20250514",
"max_tokens": 1024,
"tools": [
{
"name": "get_weather",
"description": "Get current weather",
"input_schema": {
"type": "object",
"properties": {
"location": {"type": "string"}
},
"required": ["location"]
}
}
],
"messages": [
{"role": "user", "content": "What's the weather in Tokyo?"}
]
}
Response
Non-Streaming Response
{
"id": "msg_1a2b3c4d",
"type": "message",
"role": "assistant",
"content": [
{
"type": "text",
"text": "Hello! How can I help you today?"
}
],
"model": "claude-sonnet-4-20250514",
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": {
"input_tokens": 10,
"output_tokens": 25,
"total_tokens": 35
},
"korad_metrics": {
"savings_percent": 45,
"original_cost": 0.0003,
"your_cost": 0.000165
}
}
Streaming Response
With "stream": true, responses are sent as Server-Sent Events.
Usage Tracking
Every response includes detailed usage metrics:
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[...]
)
print(f"Input tokens: {response.usage.input_tokens}")
print(f"Output tokens: {response.usage.output_tokens}")
print(f"Savings: {response.korad_metrics.savings_percent}%")
print(f"Your cost: ${response.korad_metrics.your_cost}")
Examples
Basic Chat
import anthropic
client = anthropic.Anthropic(
base_url="https://api.korad.ai/v1",
api_key="sk-korad-YOUR-KEY"
)
message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[
{"role": "user", "content": "Explain recursion"}
]
)
print(message.content[0].text)
Streaming
with client.messages.stream(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": "Count to 10"}]
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
With System Prompt
message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system="You are an expert Python developer. Provide concise, working code.",
messages=[
{"role": "user", "content": "Write a binary search function"}
]
)
Multi-turn Conversation
conversation = [
{"role": "user", "content": "What's 2+2?"},
{"role": "assistant", "content": "2+2 equals 4."},
{"role": "user", "content": "What about 5+5?"}
]
message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=conversation
)
Error Handling
from anthropic import AnthropicAPIError, APITimeoutError
try:
response = client.messages.create(...)
except AnthropicAPIError as e:
print(f"API Error: {e.status_code} - {e.message}")
except APITimeoutError:
print("Request timed out")
Best Practices
- Set appropriate
max_tokens— prevents unexpected costs - Use
temperaturecarefully — 0.7 for creative, 0.3 for focused - Include system prompts — guide behavior without using message tokens
- Monitor
korad_metrics— track your savings in real-time - Use streaming for UX — users see responses faster