# AIgateway

> AIgateway is a universal AI API. One OpenAI-compatible endpoint, one key, every frontier and open-weight model, every modality (text, image, audio, vision, embeddings, moderation, classification, translation, rerank). Pay upstream cost + 5%; cached requests get a 50% discount. Bill only successful runs.

If you are a coding agent (Claude Code, Cursor, Cline, Windsurf, OpenClaw, etc.), you can autoconfigure a full multi-modal pipeline from this file alone.

- Base URL: `https://api.aigateway.sh/v1`
- Auth: `Authorization: Bearer sk-aig-...`
- SDK: drop-in for the OpenAI client in any language — just change `base_url`
- OpenAPI 3.1 spec: https://api.aigateway.sh/openapi.json
- Model catalog (live): https://api.aigateway.sh/v1/models
- How to call any model: `GET /v1/models/{id}` (or `/v1/models/{id}/schema`) returns the exact endpoint, request/response/streaming examples, model-specific quirks, and runnable curl/Python/TS snippets for that model — never guess a request shape
- Capability vocabulary: https://api.aigateway.sh/v1/capabilities
- Provider health (live): https://api.aigateway.sh/v1/health/providers
- Get a key: https://aigateway.sh/signin ($5 free credits on signup, redeemable on a curated 7-model edge tier; expires 7 days after signup)
- Full dynamic catalog: https://aigateway.sh/llms-full.txt (every model + price + capabilities, regenerated hourly)

## Quick answers
- Cheapest LLM with tool calling: `ibm-granite/granite-4.0-h-micro` at $0.02/M in, $0.11/M out (frontier-family alt: `openai/gpt-5.4-nano` at $0.20/M in, $1.25/M out).
- Cheapest model spendable on the $5 signup credit: `moonshot/kimi-k2.7-code` (chat, included in the curated 7-model signup-credit shortlist).
- Longest context window: `xai/grok-4.20-0309-reasoning` at 2M tokens (`openai/gpt-4.1` at 1M, `anthropic/claude-opus-4.8` / `openai/gpt-5.5` / `google/gemini-3.1-pro` at 1M).
- Best coding / agentic model: `anthropic/claude-opus-4.8` — #1 on the Artificial Analysis Intelligence Index, 84% on Online-Mind2Web (computer use).
- Best cheap alternative to GPT-5.5: `openai/gpt-5.4`, `moonshot/kimi-k2.7-code`, or `openai/gpt-5.4-mini`.
- How to call Kimi K2.7 Code in Python: `OpenAI(base_url="https://api.aigateway.sh/v1").chat.completions.create(model="moonshot/kimi-k2.7-code", ...)`
- Install: `pip install aigateway-py openai` (Python) or `pnpm add aigateway-js openai` (Node/TS).
- CLI: `npm i -g aigateway-cli` then `aig login` (browser device auth) → `aig call moonshot/kimi-k2.7-code "hi"`.

## Auto Router (`model:"auto"`)

Don't want to pick a model? Set `model:"auto"` on any request and the router reads it, picks the cheapest model in a curated, eval-covered pool that still clears the quality floor, and bills you less than the premium model you'd otherwise have called — guaranteed. The request shape stays OpenAI-compatible; only the `model` value changes.

- `model:"auto"` — route within the modality the endpoint implies (chat → text, /v1/images → image, etc.).
- `model:"auto/<modality>"` — scope to a lane: `auto/text`, `auto/image`, `auto/video`, `auto/tts`, `auto/stt`, `auto/music`, `auto/embedding`.
- Omit `model` entirely — same as `model:"auto"`.
- `baseline_model` (request body, optional) — the model you'd otherwise call. The router only routes DOWN from it, so the baseline doubles as a hard cost ceiling; you never pay more than calling it directly. Defaults to the premium model for the modality.
- `x-routing: cost | speed | quality | auto` (header) — bias the pick. `cost` favors the cheapest model that clears the floor; `quality` favors the strongest model still at or under your baseline; `auto` balances both from the complexity read.

Every routed response returns transparency headers so you can audit any single call or chart savings over time:

- `X-Routing-Selected` — the model that actually ran
- `X-Routing-Reason` — why it was picked
- `X-Routing-Complexity` — the complexity read on the request
- `X-Routing-Quality` — the quality score of the selected model
- `X-Auto-Baseline-Model` — your premium baseline
- `X-Auto-Baseline-Cost-Cents` — what the baseline would have cost
- `X-Auto-Route-Fee-Cents` — the platform fee on this routed call
- `X-Auto-Savings-Cents` — the exact amount saved versus the baseline
- `X-Cached-Input-Units` — number of input tokens served from provider prompt cache (present when caching reduced cost)

On streaming responses the routing-decision headers arrive up front, before the first token. It's the only auto router that spans every generative modality, not just text. See: https://aigateway.sh/auto-router and https://aigateway.sh/docs/auto-router

## Core endpoints

- `POST /v1/chat/completions` — OpenAI-compatible chat. Any text model, tool calling, streaming. See: https://aigateway.sh/reference#chat
- `POST /v1/embeddings` — embeddings across every embedding model in the catalog. See: https://aigateway.sh/reference#embeddings
- `POST /v1/images/generations` — image generation (Flux, Stable Diffusion XL, Lucid Origin, DreamShaper, Phoenix). See: https://aigateway.sh/reference#images
- `POST /v1/images/edits` — image-to-image edits with a prompt (Bria Fibo family, Bytedance Seedream edit, Flux dev). Multipart or JSON. Default model: `bria/fibo-edit/edit`. See: https://aigateway.sh/reference#images-edits
- `POST /v1/audio/transcriptions` — STT (Whisper variants, Deepgram Nova 3, Flux, Smart Turn). Sync by default; add `async:true` (then poll `GET /v1/jobs/:id`) or a `webhook_url` for batch transcription of long files, or stream live over a WebSocket (see Realtime streaming below). See: https://aigateway.sh/reference#stt
- `POST /v1/audio/speech` — TTS (Deepgram Aura 1/2, MeloTTS). See: https://aigateway.sh/reference#tts
- `POST /v1/moderations` — content moderation via Llama Guard 3. See: https://aigateway.sh/reference#moderations

## Utility endpoints

- `POST /v1/translations` — translate text between languages. Body: `{ text, source_lang, target_lang, model? }`.
- `POST /v1/classifications` — text classification with label + score. Body: `{ input, model? }`.
- `POST /v1/detections` — object detection in an image. Body: `{ image_url | image_b64, model? }`.
- `POST /v1/ocr` — text extraction from an image. Body: `{ image_url | image_b64, model? }`.
- `POST /v1/rerank` — RAG re-ranking. Body: `{ query, documents, model? }`.

## Async endpoints (video, music, 3D, transcription)

Long-running generations return a job record immediately. Poll `GET /v1/jobs/:id` or supply `webhook_url` for push notification. Binary results land at `GET /v1/files/jobs/:id/:filename`.

- `POST /v1/videos/generations` — text-to-video + image-to-video (Runway, Luma, CF video models). Body: `{ prompt, model?, duration, aspect_ratio, resolution, image_url?, webhook_url? }` → `202 { id, status }`.
- `POST /v1/audio/music` — text-to-music. Body: `{ prompt, model?, duration?, lyrics?, is_instrumental?, webhook_url? }` → `202 { id, status }`. `duration` (seconds) is honored where the model supports it (stable-audio, ace-step, mmaudio, elevenlabs) and clamped to its range; minimax music picks its own length.
- `POST /v1/3d/generations` — text-to-3D (GLB assets). Body: `{ prompt, model?, image_url?, webhook_url? }` → `202 { id, status }`.
- `POST /v1/audio/transcriptions` with `async:true` or `webhook_url` — batch speech-to-text (Deepgram Nova 3 / Flux). Body: `{ model, audio_url | file, async?: true, webhook_url? }` → `202 { id, status }`. Poll `GET /v1/jobs/:id`; the `result.transcript` arrives inline (no binary file). Best for long recordings.
- `GET /v1/jobs/:id` — poll job status. Response includes `status: queued | processing | completed | failed`, `result_url`, and `result` when terminal.
- `DELETE /v1/jobs/:id` — cancel a queued job.

## Realtime streaming (WebSocket)

Live speech-to-text over a WebSocket — stream audio frames in, get interim + final transcripts back as the speaker talks. Deepgram Nova 3 / Flux.

- `wss://api.aigateway.sh/v1/realtime?model=deepgram/nova-3&encoding=linear16&sample_rate=16000&interim_results=true`
- Auth: `?api_key=sk-aig-...` on the URL (browsers can't set headers), or `Authorization: Bearer sk-aig-...` (servers).
- Send raw audio frames (e.g. linear16 PCM @ 16 kHz). Receive `{ type:"Results", channel:{ alternatives:[{ transcript }] }, is_final }` messages, then a final `{ type:"Metadata" }`. End the stream with `{ "type":"CloseStream" }`.
- Forwards Deepgram live params: `encoding`, `sample_rate`, `channels`, `language`, `interim_results`, `endpointing`, `vad_events`, `utterance_end_ms`, `diarize`, `smart_format`, `punctuate`. Billed per audio-minute on close at the realtime (websocket) rate, which is higher than batch. See: https://aigateway.sh/docs/audio

## MCP server (Model Context Protocol)

Every capability above is also exposed as an MCP tool. Point an MCP-enabled agent at the endpoint and it can auto-discover everything.

- **Streamable HTTP transport** (preferred, MCP 2025-03-26): `POST https://api.aigateway.sh/mcp`
- **Legacy SSE transport**: `GET https://api.aigateway.sh/mcp/sse` + `POST https://api.aigateway.sh/mcp/message?sessionId=...`
- **Auth**: same `Authorization: Bearer sk-aig-...` as the HTTP API.

Tools exposed: `chat`, `embed`, `generate_image`, `transcribe`, `speak`, `translate`, `classify`, `moderate`, `rerank`, `ocr`, `detect_objects`, `generate_video`, `generate_music`, `generate_3d`, `get_job`, `cancel_job`, `list_models`, `search_models`, `get_model`. Call `tools/list` on the server for current JSON Schemas.

Discovery flow for agents: `list_models`/`search_models` to shortlist (returns pricing, context window, capabilities, tier) → `get_model` to fetch the exact request/response schema and quirks for the chosen model → then call the matching tool. `get_model` returns the same contract as `GET /v1/models/{id}`, so an agent never has to be hand-fed a schema.

## Aggregator-native primitives

These are only on AIgateway. They all compare across models or look at the full traffic picture — single-provider SDKs physically can't ship them.

- `POST /v1/sub-accounts` — mint a scoped API key for one of your end customers with its own spend cap, rate limit, default tag, and isolated analytics. See: https://aigateway.sh/reference#sub-accounts
- `POST /v1/evals` — run an eval across candidate models on your own dataset; use `eval:<run_id>` as a model alias to always route to the current winner. See: https://aigateway.sh/reference#evals
- `POST /v1/replays` — re-run any past request against a different model and diff cost, latency, and output. See: https://aigateway.sh/reference#replays
- `GET /v1/usage/by-tag` — per-tag cost attribution. Tag any request with `x-aig-tag: <string>`. See: https://aigateway.sh/reference#tags
- `GET /v1/usage/by-sub-account` — per-customer cost attribution.

## Official SDKs + CLI

- **Python**: `pip install aigateway-py` — `from aigateway import AIgateway, AsyncAIgateway, verify_webhook`. Sync + async clients. (Distribution name on PyPI is `aigateway-py`; the import path is `aigateway`.)
- **Node / TypeScript**: `pnpm add aigateway-js` (or `npm install aigateway-js`) — `import { AIgateway, verifyWebhook } from 'aigateway-js'`. ESM + CJS, zero runtime deps. Covers async jobs, sub-accounts, evals, replays, signed URLs, webhook verification.
- **CLI**: `npm i -g aigateway-cli` (or `npx aigateway-cli init`) — installs the `aig` binary. `aig init` walks through key + scaffolds a starter file. Also ships `aig call`, `aig models`, `aig jobs`, `aig mcp`, `aig usage`, `aig eval`, `aig replay`, `aig sub-account`, `aig tail`.

For chat / embeddings / images / STT / TTS, just use the official `openai` package with `base_url='https://api.aigateway.sh/v1'`. Reach for the AIgateway SDKs when you need the aggregator-native surface (async jobs, sub-accounts, evals, replays, signed URLs, webhook verification) — endpoints OpenAI doesn't model.

## Webhook signatures

Every callback (async job results AND lifecycle events) carries:

- `x-aig-signature: t=<unix>,v1=<hex>` — HMAC-SHA256 over `${t}.${raw_body}` using the per-key signing secret
- `x-aig-event-type: <event>` — see event list below
- `x-aig-delivery-id: <uuid>` — stable across retries, use for idempotency
- `x-aig-attempt: <n>` — 1-indexed attempt counter

Fetch the signing secret at `GET /v1/webhook-secret` (rotate with `POST /v1/webhook-secret/rotate`). Failed deliveries (non-2xx, timeout) retry on a 6-attempt schedule: `0s, 30s, 2m, 10m, 1h, 6h`. Both official SDKs ship constant-time verifiers (`verify_webhook` in Python, `verifyWebhook` in Node).

Event types currently emitted:

- `job.completed`, `job.failed` — async generations (video, music, 3D) and async transcription
- `balance.low`, `balance.exhausted` — wallet thresholds
- `usage.threshold.exceeded`, `usage.daily.summary` — spend reports
- `subaccount.created`, `subaccount.spend.exceeded` — multi-tenant signals
- `model.added`, `model.deprecated` — catalog drift
- `key.rotated` — API-key lifecycle

## Hosts

- `api.aigateway.sh` — JSON API
- `media.aigateway.sh` — file downloads (job results, signed URLs). All `result_url` values in poll + webhook responses resolve to this host.
- `logs.aigateway.sh`, `store.aigateway.sh` — reserved for upcoming features.

## Signed file URLs

Share completed job results without handing out the gateway key:

- `GET https://api.aigateway.sh/v1/files/jobs/:jobId/:filename/signed?expires_in=3600` → `{ url, expires_at }`. The returned URL is on `media.aigateway.sh`, publicly fetchable until `exp` (no Authorization needed). Max expiry: 7 days.
- Storage is swept nightly; files older than 7 days are deleted.

## MCP inspector

Live HTML inspector at `https://api.aigateway.sh/mcp/inspect` — paste your key and try every MCP tool from the browser. Useful for eyeballing schemas before wiring up an agent.

## Request headers every agent should know

- `Authorization: Bearer sk-aig-...` (required)
- `X-Request-Id: <uuid>` (optional correlation id; echoed in response)
- `x-aig-tag: <string>` (attribute the request to a feature / tenant / user for cost reports)
- `x-cache: auto | force | skip` (override cache behavior)
- `x-routing: cost | speed | quality | auto` (bias the Auto Router when `model` is `auto`/`auto/<modality>` or omitted — see the Auto Router section)
- `baseline_model` (request body, not a header) sets your cost ceiling for Auto Router calls; the router only routes down from it

## Model naming

All model IDs use `<provider>/<model>` slugs. Examples:

- `anthropic/claude-opus-4.8`
- `openai/gpt-5.5`
- `google/gemini-3.1-pro`
- `moonshot/kimi-k2.7-code`
- `meta/llama-4-scout-17b-16e-instruct`
- `black-forest-labs/flux-1-schnell`
- `deepgram/aura-2-en`
- `openai/whisper-large-v3-turbo`
- `baai/bge-m3`

The full live list (with pricing, context window, capabilities, modality) is `GET /v1/models`. Filter with `?modality=text`, `?modality=image`, `?provider=anthropic`, etc. For any model in the list, `GET /v1/models/{id}` returns its exact invocation schema (request/response/streaming examples + quirks + SDK snippets).

## Errors (OpenAI-shaped, with remediation)

Every error has `type`, `code`, and a `message` that tells you what to do next.

| code | type | remediation |
|------|------|-------------|
| 400  | invalid_request_error | Fix the request body |
| 401  | authentication_error  | Check `Authorization: Bearer ...` |
| 402  | budget_exceeded       | Raise the sub-account spend cap or wait for the month rollover |
| 404  | model_not_found       | Use the exact `<provider>/<model>` slug from `/v1/models` |
| 429  | rate_limit_error      | Honor `Retry-After`; request a higher RPM |
| 502  | provider_error        | Upstream 5xx; automatic failover should have engaged — see `/v1/health/providers` |
| 504  | timeout_error         | Upstream timed out; retry with a smaller `max_tokens` |

## Quickstart — agent copy-paste

```python
from openai import OpenAI
client = OpenAI(base_url="https://api.aigateway.sh/v1", api_key="sk-aig-...")

# 1. text
r = client.chat.completions.create(
    model="anthropic/claude-opus-4.8",
    messages=[{"role": "user", "content": "hi"}],
)

# 2. image
img = client.images.generate(
    model="black-forest-labs/flux-1-schnell",
    prompt="a cozy reading corner, golden hour",
    size="1024x1024",
)

# 3. embeddings
e = client.embeddings.create(model="baai/bge-m3", input=["hello", "world"])
```

## Pricing in one line

Every model bills at upstream cost + a flat 5% platform fee. Provider-cached input tokens bill at 50% of the input rate. Gateway-cached full responses get a 50% discount. Failed requests don't bill. No monthly minimum; top up in any amount from $5. Pay in USD, or in INR via UPI / RuPay / cards at local checkout.

## Cost metadata on every response
- `usage.cost` (USD) in every JSON response body
- `X-Cost-Cents` header on every response
- `X-Cached-Input-Units` header when provider prompt caching reduced the cost
- Streaming: `usage.cost` on the trailing usage chunk before [DONE]

## Free tier (signup credit)

Every new account gets $5 in free credit. The credit can ONLY be spent on the following curated CF-hosted models (one per modality):

- moonshot/kimi-k2.7-code      — chat / text
- baai/bge-m3                  — embedding (multilingual)
- black-forest-labs/flux-2-klein-9b — text-to-image
- google/gemma-4-26b-a4b-it    — vision + chat
- deepgram/aura-2-en           — text-to-speech
- openai/whisper-large-v3-turbo — speech-to-text
- meta/llama-guard-3-8b        — moderation

Calling any other model with a promo-only balance returns HTTP 402 with `error.type = "promo_credit_model_restricted"` and `error.free_trial_models = [...]` listing the slugs above.

The signup credit expires 7 days after signup. Topup credits never expire. Once any topup lands on the account, the allowlist is lifted and the full catalog (Claude Opus 4.8, GPT-5.5 family, Gemini 3.1 Pro, FLUX 2, Veo 3.1, Seedream, Hailuo, etc.) becomes available.

## Switching from another gateway

Credit-match on your last invoice (up to $500) if you're coming from another aggregator. See per-competitor migration guides:

- OpenRouter → https://aigateway.sh/switch/openrouter
- Portkey → https://aigateway.sh/switch/portkey
- Helicone → https://aigateway.sh/switch/helicone
- LiteLLM → https://aigateway.sh/switch/litellm
- Together → https://aigateway.sh/switch/together
- Fireworks → https://aigateway.sh/switch/fireworks
- Requesty → https://aigateway.sh/switch/requesty
- Braintrust → https://aigateway.sh/switch/braintrust

## For humans

- Docs: https://aigateway.sh/docs
- API reference: https://aigateway.sh/reference
- Pricing: https://aigateway.sh/pricing
- Playground: https://aigateway.sh/playground
- Guides (short how-tos): https://aigateway.sh/guides
- Examples (runnable recipes — agent swarms, swaps, multi-modal, ops): https://aigateway.sh/examples
- Rankings (live model leaderboard): https://aigateway.sh/rankings
- Providers (every lab we route to): https://aigateway.sh/providers
- Compare any two models: https://aigateway.sh/compare
- Model catalog with per-model deep pages: https://aigateway.sh/models
- Enterprise (evals, guardrails, replay, prompt IDs, SSO, SLA): https://aigateway.sh/enterprise
- Security (posture, compliance, incident response): https://aigateway.sh/security
- Integrations (OpenAI SDK, ai-sdk, LangChain, LlamaIndex, Cursor, Claude Code, Continue, Cline, Aider, Zed, LobeChat, LibreChat, Open WebUI): https://aigateway.sh/integrations
- Support: https://aigateway.sh/support (reply under 24h from a real engineer)