Yes. We forward your requests to real AI providers (Claude, GPT, Gemini, Grok, DeepSeek). Same models, same output, same context windows. Only the price is different.

How is the discount possible?

We pool bulk credit across providers and accept crypto, which keeps ops cost low. Those savings get passed through as 70-80% off list price.

Which SDKs and tools work?

Anthropic SDK, OpenAI SDK, LangChain, raw fetch — all work. Just swap the base URL and the model name to use any AI.

What payment methods do you accept?

We accept cryptocurrency — USDT (TRC20/ERC20), BTC, ETH, and 100+ other coins via Oxapay. Credits never expire.

What are the pricing plans?

Basic plan (free): 70% off, 200 req/min. Pro ($19 lifetime): 80% off, 500 req/min. Both plans support every connected AI provider.

Do you store my prompts or data?

No. We don't log, store, or train on your API requests. Zero data retention policy on request content.

24/7 support via email at support@aiapi.cheap. Pro users get priority response.

aiapi.cheap — API Documentation

Quick Start

Create an account at aiapi.cheap/dashboard
Top up your balance with crypto (USDT, BTC, ETH)
Generate an API key from the dashboard
Replace your Anthropic base URL and API key
Start making requests — that's it!

Authentication

All API requests require a valid API key in the Authorization header. Keys start with sk-cc-.

HTTP

Authorization: Bearer sk-cc-your-api-key-here

Base URL

We expose two compatible APIs. Pick the one that matches the SDK or tool you're using.

For Anthropic SDKs & Claude Code

URL

https://aiapi.cheap/api/proxy

Set as base_url / baseURL / ANTHROPIC_BASE_URL. The SDK appends /v1/messages automatically — do not add /v1 yourself.

Works with: official anthropic Python & Node SDKs, Claude Code CLI, Cursor (Anthropic provider), Cline (Anthropic provider).

For OpenAI-compatible tools

URL

https://aiapi.cheap/api/proxy/v1

Set as base_url / baseURL. The OpenAI client appends /chat/completions automatically — the /v1 suffix IS required here.

Works with: official openai Python & Node SDKs, Kilo Code, Cline (OpenAI Compatible), Cursor (OpenAI Compatible), Cherry Studio, LobeChat, LibreChat, Continue, Roo Code, SwiftRouter.

Quick reference

Your tool	Base URL to enter
Anthropic SDK / Claude Code	`https://aiapi.cheap/api/proxy`
Kilo Code, Continue, Cherry Studio, etc.	`https://aiapi.cheap/api/proxy/v1`
Raw `curl` — Anthropic format	`https://aiapi.cheap/api/proxy/v1/messages`
Raw `curl` — OpenAI format	`https://aiapi.cheap/api/proxy/v1/chat/completions`

Available Models

Model ID	Official	Basic (70% off)	Pro (80% off)	Context
`claude-opus-4-7`	$5 / $25	$1.50 / $7.50	$1.00 / $5.00	200K
`claude-sonnet-4-6`	$3 / $15	$0.90 / $4.50	$0.60 / $3.00	200K
`claude-haiku-4-5`	$1 / $5	$0.30 / $1.50	$0.20 / $1.00	200K

Messages API

POST /v1/messages — Create a message (Anthropic Messages format).

Request Body

Field	Type	Required	Description
`model`	string	Yes	Model ID
`messages`	array	Yes	Array of message objects
`max_tokens`	integer	No	Max output tokens (default: 4096)
`system`	string \| array	No	System prompt (string or array with cache_control)
`temperature`	float	No	0.0 to 1.0
`stream`	boolean	No	Enable SSE streaming
`thinking`	object	No	Enable extended thinking (see below)
`tools`	array	No	Tool definitions for function calling
`tool_choice`	object	No	Control tool selection behavior

Response

response.json

{

"id": "msg_abc123",

"type": "message",

"role": "assistant",

"model": "claude-sonnet-4-6",

"content": [

{

"type": "text",

"text": "Hello! How can I help you today?"

}

"stop_reason": "end_turn",

"usage": {

"input_tokens": 12,

"output_tokens": 15

}

OpenAI-Compatible API

POST /v1/chat/completions — Same Claude models, OpenAI ChatCompletions wire format.

This endpoint exists so any tool that speaks OpenAI's ChatCompletions API can talk to Claude through us without modification — drop us in as a custom OpenAI provider.

When to use this

Use it whenever your tool asks for an "OpenAI Compatible" / "OpenAI-style" provider. We handle the translation in both directions internally — system prompts, tool calls, streaming, thinking blocks, and prompt caching all map cleanly.

Configuration

Field	Value
Base URL	`https://aiapi.cheap/api/proxy/v1`
API Key	`sk-cc-…` (same key as Anthropic side)
Model	`claude-opus-4-7` · `claude-sonnet-4-6` · `claude-haiku-4-5`

Python (OpenAI SDK)

openai_example.py

from openai import OpenAI

client = OpenAI(

base_url="https://aiapi.cheap/api/proxy/v1",

api_key="sk-cc-your-api-key",

)

resp = client.chat.completions.create(

model="claude-sonnet-4-6",

messages=[

{"role": "user", "content": "Explain quantum computing"}

)

print(resp.choices[0].message.content)

Node.js (OpenAI SDK)

openai_example.ts

import OpenAI from "openai";

const client = new OpenAI({

baseURL: "https://aiapi.cheap/api/proxy/v1",

apiKey: "sk-cc-your-api-key",

});

const resp = await client.chat.completions.create({

model: "claude-sonnet-4-6",

messages: [

{ role: "user", content: "Explain quantum computing" }

});

console.log(resp.choices[0].message.content);

cURL

terminal

curl -X POST https://aiapi.cheap/api/proxy/v1/chat/completions \

-H "Authorization: Bearer sk-cc-your-api-key" \

-H "Content-Type: application/json" \

-d '{

"model": "claude-haiku-4-5",

"messages": [

{"role": "user", "content": "Hello!"}

]

Editor extension setup (Kilo Code, Cline, Continue, Cursor "OpenAI Compat")

In the extension settings, choose OpenAI Compatible provider.
Set Base URL to https://aiapi.cheap/api/proxy/v1. The /v1 suffix is required — without it the extension hits a 404.
Paste your sk-cc-… key into the API key field.
Add models manually if the extension does not auto-discover: claude-opus-4-7, claude-sonnet-4-6, claude-haiku-4-5.

Streaming

Set "stream": true to receive Server-Sent Events (SSE). Works with every model, thinking, and prompt caching.

streaming.py

with client.messages.stream(

model="claude-sonnet-4-6",

max_tokens=1024,

messages=[{"role": "user", "content": "Write a haiku about Vietnam"}]

) as stream:

for text in stream.text_stream:

print(text, end="", flush=True)

Prompt Caching

Cache long system prompts to save up to 90% on repeated input tokens. Works with all models and streaming.

How it works

First request — system prompt is cached. You pay 1.25× input price (cache write).
Next requests — same system prompt is read from cache. You pay only 0.1× input price (90% off).
Cache lasts 5 minutes (refreshed on each use).

Minimum cacheable size: 1,024 tokens for Sonnet 4.x and Opus 4.x, 2,048 tokens for Haiku 4.5. Below the threshold the request still works but nothing is cached.

Pass system as an array with cache_control:

caching.py

message = client.messages.create(

model="claude-sonnet-4-6",

max_tokens=1024,

system=[

{

"type": "text",

"text": "Your long system prompt here...",

"cache_control": {"type": "ephemeral"}

}

messages=[{"role": "user", "content": "Hello"}]

)

# Response usage will include:

# cache_creation_input_tokens (first call)

# cache_read_input_tokens (subsequent calls)

Extended Thinking

Enable extended thinking to let Claude reason through complex problems before answering. Works with streaming.

thinking.py

message = client.messages.create(

model="claude-sonnet-4-6",

max_tokens=8000,

thinking={

"type": "enabled",

"budget_tokens": 5000

messages=[{"role": "user", "content": "Solve this step by step..."}]

)

# Response includes thinking + text content blocks:

# content[0] = {"type": "thinking", "thinking": "..."}

# content[1] = {"type": "text", "text": "..."}

budget_tokens controls max thinking tokens. Thinking tokens are billed as output tokens. Works on Claude Sonnet 4.6 and Opus 4.7.

Python SDK

example.py

from anthropic import Anthropic

client = Anthropic(

base_url="https://aiapi.cheap/api/proxy",

api_key="sk-cc-your-api-key"

)

message = client.messages.create(

model="claude-sonnet-4-6",

max_tokens=1024,

messages=[

{"role": "user", "content": "Explain quantum computing"}

]

)

print(message.content[0].text)

Node.js SDK

example.ts

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({

baseURL: "https://aiapi.cheap/api/proxy",

apiKey: "sk-cc-your-api-key",

});

const message = await client.messages.create({

model: "claude-sonnet-4-6",

max_tokens: 1024,

messages: [

{ role: "user", content: "Explain quantum computing" }

});

console.log(message.content[0].text);

cURL

terminal

curl -X POST https://aiapi.cheap/api/proxy/v1/messages \

-H "Authorization: Bearer sk-cc-your-api-key" \

-H "Content-Type: application/json" \

-d '{

"model": "claude-sonnet-4-6",

"max_tokens": 1024,

"messages": [

{"role": "user", "content": "Hello!"}

]

Error Handling

Status	Error Type	Description
`401`	`authentication_error`	Invalid or missing API key
`400`	`invalid_request_error`	Invalid model or malformed request
`402`	`insufficient_balance`	Balance is $0. Top up required.
`429`	`rate_limit_error`	Too many requests
`500`	`internal_error`	Server error
`502`	`api_error`	Model temporarily unavailable

Rate Limits

Plan	Price	Requests/min	Tokens/min	Discount
Basic	Free	200	1,000,000	70% off
Pro	$19 lifetime	500	2,000,000	80% off

Claude Code Integration

Use aiapi.cheap directly with Claude Code:

~/.bashrc

export ANTHROPIC_API_KEY="sk-cc-your-api-key"

export ANTHROPIC_BASE_URL="https://aiapi.cheap/api/proxy"

Then just run claude in your terminal. All requests route through aiapi.cheap at up to 80% off with Pro plan.

API Documentation

On this page

Quick Start

Authentication

Base URL

For Anthropic SDKs & Claude Code

For OpenAI-compatible tools

Quick reference

Available Models

Messages API

Request Body

Response

OpenAI-Compatible API

When to use this

Configuration

Python (OpenAI SDK)

Node.js (OpenAI SDK)

cURL

Editor extension setup (Kilo Code, Cline, Continue, Cursor "OpenAI Compat")

Streaming

Prompt Caching

How it works

Extended Thinking

Python SDK

Node.js SDK

cURL

Error Handling

Rate Limits

Claude Code Integration

Need Help?