# For AI Agents (/docs/agents) This gateway is built to be driven by code and by AI agents, not just read by humans. If you are an agent (or building one), everything you need is machine-readable. * **API contract:** [`/openapi.yaml`](/openapi.yaml) — complete OpenAPI 3.1 spec (endpoints, schemas, auth). Feed it to a tool/function generator. * **Docs as context:** [`/llms.txt`](/llms.txt) (index) and [`/llms-full.txt`](/llms-full.txt) (every page as one markdown file). Drop either into a system prompt. * **Base URL:** `https://agent-router.gaib.ai` ## The one thing to know [#the-one-thing-to-know] The gateway is **OpenAI-compatible**. Any agent or framework that already speaks the OpenAI API works by changing **one base URL** — no new SDK, no custom client. ```python from openai import OpenAI client = OpenAI( base_url="https://agent-router.gaib.ai/v1", api_key="sk-your-key", # see "Getting a key" below ) resp = client.chat.completions.create( model="gemini/gemini-2.5-flash", # switch providers by changing this string messages=[{"role": "user", "content": "Hello"}], ) print(resp.choices[0].message.content) ``` ```ts import OpenAI from 'openai'; const client = new OpenAI({ baseURL: 'https://agent-router.gaib.ai/v1', apiKey: 'sk-your-key', }); const resp = await client.chat.completions.create({ model: 'kimi/kimi-k2.5', messages: [{ role: 'user', content: 'Hello' }], }); console.log(resp.choices[0].message.content); ``` ```bash curl https://agent-router.gaib.ai/v1/chat/completions \ -H "Authorization: Bearer sk-your-key" \ -H "Content-Type: application/json" \ -d '{ "model": "gemini/gemini-2.5-flash", "messages": [{"role": "user", "content": "Hello"}] }' ``` ## Choosing a model [#choosing-a-model] Models are addressed as `provider/model`. Switch providers by changing the string — nothing else changes. The live list with pricing is at [`GET /v1/models`](/docs/api/list-models) (no auth required), so an agent can discover models at runtime. ```bash curl https://agent-router.gaib.ai/v1/models ``` ## Tool / function calling [#tool--function-calling] Standard OpenAI `tools` and `tool_choice` are accepted and forwarded to the provider, so agentic tool-use loops work unchanged. Streaming is supported with `"stream": true` (SSE, terminated by `data: [DONE]`). Requests containing `thinking` or `reasoning_effort` are rejected with HTTP 400. Pick a model rather than a reasoning flag. ## Getting a key (no human in the loop after setup) [#getting-a-key-no-human-in-the-loop-after-setup] Authentication is your wallet, not an account. A key is minted by signing a [SIWE](/docs/authentication) message — see [API Keys](/docs/api/api-keys). Once an agent holds an `sk-…` key, every other call is a plain `Authorization: Bearer` request. Balance is funded via [x402 top-up](/docs/api/topup); an agent can check its own credit any time: ```bash curl https://agent-router.gaib.ai/v1/balance/0xYourWallet # public, no auth ``` ## Billing an agent can reason about [#billing-an-agent-can-reason-about] * Balance is **reserved before** each call (based on `max_tokens`) and **reconciled** to actual usage — concurrent calls can never overspend. * A `402` with `{"error":"INSUFFICIENT_BALANCE"}` means top up; a `429` means slow down. All error codes are enumerated in [Errors](/docs/api/errors) and the [OpenAPI spec](/openapi.yaml). * Per-key spend and token counts: [`GET /v1/usage/:wallet`](/docs/api/usage). ## Use it inside a framework [#use-it-inside-a-framework] # Authentication (/docs/authentication) The gateway uses two authentication methods depending on the endpoint: ## API Key Authentication [#api-key-authentication] Used for inference (`/v1/chat/completions`) and usage queries (`/v1/usage`). ``` Authorization: Bearer sk-<64 hex chars> ``` The gateway hashes your key (keys are stored hashed, never in plaintext) and matches it against active, non-revoked keys. If valid, the request proceeds and is billed to the associated wallet. ### How it works [#how-it-works] ## SIWE Authentication [#siwe-authentication] Used for API key management (`/v1/auth/keys`). Sign-In with Ethereum (SIWE) proves wallet ownership without sessions. ### Building a SIWE Message [#building-a-siwe-message] ```ts import { SiweMessage } from 'siwe' const siweMsg = new SiweMessage({ domain: window.location.host, address: walletAddress, // checksummed EIP-55 uri: window.location.origin, version: '1', chainId: 1, nonce: crypto.randomUUID().replace(/-/g, '').slice(0, 16), issuedAt: new Date().toISOString(), statement: 'Sign in to the AI Gateway', }) const message = siweMsg.prepareMessage() const signature = await walletClient.signMessage({ account: address, message }) ``` ### Verification Rules [#verification-rules] * SIWE signature must be valid * `issuedAt` must be within the last **5 minutes** * Address is lowercased for storage Generate a fresh `issuedAt` timestamp before each call. The server rejects SIWE messages older than 5 minutes. ## x402 Payment Authentication [#x402-payment-authentication] Used for top-up (`/v1/topup`). No wallet auth needed — the payment signature itself proves the payer. # Introduction (/docs) Token Kiosk is a **credit-based LLM inference gateway**. It sits between your app and LLM providers, giving you a single OpenAI-compatible API to access multiple model providers. Jump to [**For AI Agents**](/docs/agents) — the gateway is OpenAI-compatible, so it's a one-line base-URL change. Machine-readable contracts: [`/openapi.yaml`](/openapi.yaml), [`/llms.txt`](/llms.txt), [`/llms-full.txt`](/llms-full.txt). ## How it works [#how-it-works] ### The 3-step flow [#the-3-step-flow] | Step | What | How | | ------------------- | ---------------------- | ------------------------------------------------ | | **1. Top up** | Fund your balance | Send USDC or USDT via x402 — no accounts, no KYC | | **2. Get API key** | Prove wallet ownership | Sign a SIWE message, receive `sk-...` key | | **3. Call the API** | Use any model | Standard OpenAI SDK with `Bearer sk-...` | ## Why use this? [#why-use-this] * **No vendor lock-in** — switch models by changing a string (`gemini/gemini-2.5-flash` → `kimi/kimi-k2.5`) * **No accounts** — your Ethereum wallet is your identity * **No KYC** — pay with USDC or USDT, get credit instantly * **OpenAI-compatible** — use the `openai` SDK, the Vercel AI SDK, or raw `fetch` * **Transparent pricing** — billed at downstream provider cost, all queryable via `/v1/usage` ## What's next? [#whats-next] # Client onboarding (/docs/onboarding) This guide walks through using the gateway as a client: add credit with USDC or USDT (x402), create an API key (SIWE), then call the chat completion API. ## What you need [#what-you-need] * An Ethereum wallet for signing * **USDC or USDT** on **Base** or **Arbitrum** (mainnet) * The gateway **base URL** — `https://agent-router.gaib.ai` ## End-to-end flow [#end-to-end-flow] | Step | What happens | | ---------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------- | | **1. Top up** | `POST /v1/topup` with `{ "amount": }`. The server responds with **402** and payment instructions. Sign and complete payment per x402, then retry. | | **2. API key** | `POST /v1/auth/keys` with a SIWE message and signature. Response includes a secret API key (store it safely). | | **3. Inference** | `POST /v1/chat/completions` with `Authorization: Bearer ` and a chat-style JSON body. | Model IDs are provider-qualified: `gemini/...`, `kimi/...`, `minimax/...`, or `bedrock/...`. For copy-paste code covering all three steps, see the [Quickstart](/docs/quickstart). ## SIWE message alignment [#siwe-message-alignment] The SIWE message must match what the server verifies: * **domain** — hostname of the gateway (e.g. `agent-router.gaib.ai`) * **uri** — the auth endpoint (e.g. `https://agent-router.gaib.ai/v1/auth/keys`) * **chainId** — must match the gateway's chain If key creation fails with a 4xx, confirm you're using the same public HTTPS host the gateway presents. # Quickstart (/docs/quickstart) ## Prerequisites [#prerequisites] | Requirement | Why | | -------------------------------- | --------------------------------------- | | Ethereum wallet | Signs SIWE messages & pays USDC or USDT | | USDC or USDT on Base or Arbitrum | Credits your gateway balance | | Node.js 18+ | Runs the examples below | The hosted gateway at `https://agent-router.gaib.ai` runs on **mainnet**. Top up with real USDC or USDT on **Base** or **Arbitrum** — the live 402 response lists the exact assets and networks it accepts. ## Step 1: Top up your balance [#step-1-top-up-your-balance] ```ts import { withPaymentInterceptor } from '@x402/fetch' const BASE_URL = 'https://agent-router.gaib.ai' const fetchWithPayment = withPaymentInterceptor(fetch, walletClient) const res = await fetchWithPayment(`${BASE_URL}/v1/topup`, { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ amount: 5 }), // $5 USD minimum $1 }) const { balance_usdc } = await res.json() // balance_usdc: 5000000 (micro-USDC, divide by 1_000_000 for USD) ``` ## Step 2: Create an API key [#step-2-create-an-api-key] ```ts import { SiweMessage } from 'siwe' const siweMsg = new SiweMessage({ domain: 'agent-router.gaib.ai', address: walletAddress, uri: 'https://agent-router.gaib.ai/v1/auth/keys', version: '1', chainId: 8453, // Base mainnet nonce: crypto.randomUUID().replace(/-/g, '').slice(0, 16), issuedAt: new Date().toISOString(), statement: 'Sign in to the AI Gateway', }) const message = siweMsg.prepareMessage() const signature = await walletClient.signMessage({ account: address, message }) const res = await fetch(`${BASE_URL}/v1/auth/keys`, { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ message, signature, label: 'my-first-key' }), }) const { key } = await res.json() // key: "sk-a3f9..." ← save this, it's shown only once ``` ## Step 3: Make an inference call [#step-3-make-an-inference-call] ```ts import OpenAI from 'openai' const client = new OpenAI({ baseURL: 'https://agent-router.gaib.ai/v1', apiKey: 'sk-your-api-key', }) const res = await client.chat.completions.create({ model: 'gemini/gemini-2.5-flash', messages: [{ role: 'user', content: 'What is x402?' }], max_tokens: 256, }) console.log(res.choices[0].message.content) ``` ```ts const res = await fetch('https://agent-router.gaib.ai/v1/chat/completions', { method: 'POST', headers: { 'Authorization': 'Bearer sk-your-api-key', 'Content-Type': 'application/json', }, body: JSON.stringify({ model: 'gemini/gemini-2.5-flash', messages: [{ role: 'user', content: 'What is x402?' }], max_tokens: 256, }), }) const data = await res.json() console.log(data.choices[0].message.content) ``` ```bash curl -X POST https://agent-router.gaib.ai/v1/chat/completions \ -H "Authorization: Bearer sk-your-api-key" \ -H "Content-Type: application/json" \ -d '{ "model": "gemini/gemini-2.5-flash", "messages": [{"role": "user", "content": "What is x402?"}], "max_tokens": 256 }' ``` ## Step 4: Check your balance [#step-4-check-your-balance] ```ts const res = await fetch( 'https://agent-router.gaib.ai/v1/usage/0xYourWalletAddress', { headers: { 'Authorization': 'Bearer sk-your-api-key' } }, ) const usage = await res.json() console.log(`Balance: $${usage.available_usdc / 1_000_000}`) console.log(`Requests: ${usage.keys[0].request_count}`) ``` ```bash curl https://agent-router.gaib.ai/v1/usage/0xYourWalletAddress \ -H "Authorization: Bearer sk-your-api-key" ``` ## Available models [#available-models] Use `GET /v1/models` to list all models, or pick from these popular options: | Model ID | Provider | Prompt $/1M | Completion $/1M | | ------------------------- | ----------- | ----------- | --------------- | | `gemini/gemini-2.5-flash` | Google | $0.15 | $0.60 | | `gemini/gemini-2.5-pro` | Google | $1.25 | $10.00 | | `kimi/kimi-k2.5` | Moonshot | $0.60 | $3.00 | | `minimax/MiniMax-M2.5` | MiniMax | $0.118 | $0.95 | | `bedrock/nova-pro` | AWS Bedrock | $0.80 | $3.20 | | `bedrock/qwen3-32b` | AWS Bedrock | $0.155 | $0.618 | These are the rates you're billed at. See the [live model catalog](/docs/models/catalog) for the full list. ## Streaming [#streaming] Set `stream: true` for server-sent events: ```ts const stream = await client.chat.completions.create({ model: 'gemini/gemini-2.5-flash', messages: [{ role: 'user', content: 'Tell me a story' }], stream: true, }) for await (const chunk of stream) { process.stdout.write(chunk.choices[0]?.delta?.content ?? '') } ``` ## What's next? [#whats-next] # API Keys (/docs/api/api-keys) ## Create API Key [#create-api-key] `POST /v1/auth/keys` Requires SIWE authentication in the request body. ```bash curl -X POST https://agent-router.gaib.ai/v1/auth/keys \ -H "Content-Type: application/json" \ -d '{ "message": "", "signature": "0x...", "label": "my-app" }' ``` ```json { "id": 1, "key": "sk-a3f9e2b1c4d5...", "label": "my-app", "created_at": "2026-04-07T12:00:00.000Z" } ``` The `key` field is returned **only once**. Store it immediately. | Field | Type | Required | Description | | ----------- | -------- | -------- | ---------------------------- | | `message` | `string` | yes | Prepared SIWE message | | `signature` | `string` | yes | Hex-encoded wallet signature | | `label` | `string` | no | Human-readable label | ## List API Keys [#list-api-keys] `GET /v1/auth/keys` Pass SIWE auth as query parameters. ```bash curl "https://agent-router.gaib.ai/v1/auth/keys?message=&signature=" ``` ```json { "data": [ { "id": 1, "label": "my-app", "created_at": "2026-04-07T12:00:00.000Z", "revoked_at": null } ] } ``` ## Revoke API Key [#revoke-api-key] `DELETE /v1/auth/keys/:key_id` Requires SIWE authentication in the request body. ```bash curl -X DELETE https://agent-router.gaib.ai/v1/auth/keys/1 \ -H "Content-Type: application/json" \ -d '{"message": "", "signature": "0x..."}' ``` ```json { "revoked": true } ``` # Balance (/docs/api/balance) `GET /v1/balance/:wallet` Return a wallet's available credit (balance minus any locked-for-in-flight amount). **Public** — no authentication required, since balances are not sensitive. ```bash curl https://agent-router.gaib.ai/v1/balance/0xYourWallet ``` ```json { "wallet": "0xabc...", "available_usdc": 4950000 } ``` | Field | Unit | Conversion | | ---------------- | ---------- | -------------------------------------------------- | | `available_usdc` | micro-USDC | `/ 1_000_000` = USD (`balance_usdc − locked_usdc`) | Unknown wallets return `available_usdc: 0` rather than a 404. For the full per-key breakdown and raw `balance_usdc` / `locked_usdc`, use the [Usage endpoint](/docs/api/usage). # Chat Completions (/docs/api/chat-completions) `POST /v1/chat/completions` **OpenAI-compatible.** Requires `Authorization: Bearer `. ```bash curl -X POST https://agent-router.gaib.ai/v1/chat/completions \ -H "Authorization: Bearer sk-your-key" \ -H "Content-Type: application/json" \ -d '{ "model": "gemini/gemini-2.5-flash", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Hello!"} ], "max_tokens": 512 }' ``` ```ts import OpenAI from 'openai' const client = new OpenAI({ baseURL: 'https://agent-router.gaib.ai/v1', apiKey: 'sk-your-key', }) const res = await client.chat.completions.create({ model: 'gemini/gemini-2.5-flash', messages: [{ role: 'user', content: 'Hello!' }], max_tokens: 512, }) ``` ```json { "id": "chatcmpl-abc123", "object": "chat.completion", "choices": [{ "index": 0, "message": { "role": "assistant", "content": "Hello! How can I help?" }, "finish_reason": "stop" }], "usage": { "prompt_tokens": 20, "completion_tokens": 9, "total_tokens": 29 } } ``` ## Request body [#request-body] | Field | Type | Required | Default | Description | | ------------- | ------------------- | -------- | ------- | ----------------------------------------- | | `model` | `string` | yes | — | Model ID (e.g. `gemini/gemini-2.5-flash`) | | `messages` | `array` | yes | — | Array of `{role, content}` objects | | `max_tokens` | `number` | no | `1024` | Max completion tokens | | `temperature` | `number` | no | — | Sampling temperature | | `top_p` | `number` | no | — | Nucleus sampling | | `stream` | `boolean` | no | `false` | Enable SSE streaming | | `tools` | `array` | no | — | OpenAI-style tool/function definitions | | `tool_choice` | `string` / `object` | no | — | Tool selection control | Other standard OpenAI fields — `stop`, `presence_penalty`, `frequency_penalty`, `response_format`, `seed`, `n`, `logprobs` — are also accepted and forwarded to the provider. Requests with `thinking` or `reasoning_effort` parameters return **HTTP 400**. Strip these fields before calling the gateway. ## Streaming [#streaming] Set `"stream": true`. The response is standard OpenAI SSE: ``` data: {"id":"chatcmpl-...","choices":[{"delta":{"content":"Hello"}}]} data: {"id":"chatcmpl-...","choices":[{"delta":{"content":"!"}}]} data: [DONE] ``` # Error Codes (/docs/api/errors) All errors follow a consistent shape: ```json { "error": "ERROR_CODE", "message": "Human readable description" } ``` ## Error reference [#error-reference] | HTTP | Code | When | | ----- | ---------------------- | ------------------------------------------------------------------------------------------------ | | `400` | `VALIDATION_ERROR` | Missing/invalid params, unknown model, amount too low, `thinking`/`reasoning_effort` not allowed | | `401` | `UNAUTHORIZED` | Missing/invalid API key or SIWE signature | | `402` | `INSUFFICIENT_BALANCE` | Not enough credit for the estimated cost | | `404` | `NOT_FOUND` | Resource not found | | `429` | `RATE_LIMITED` | Per-wallet rate limit exceeded | | `500` | `INTERNAL_ERROR` | Server error | | `502` | `UPSTREAM_ERROR` | LLM provider failed or timed out | ## Detailed examples [#detailed-examples] ### Insufficient Balance (402) [#insufficient-balance-402] Returned when balance can't cover the reserved cost estimate. ```json { "error": "INSUFFICIENT_BALANCE", "message": "Insufficient balance. Top up at POST /v1/topup" } ``` ### Rate Limited (429) [#rate-limited-429] Returned when per-wallet sliding window limits are exceeded. ```json { "error": "RATE_LIMITED", "message": "Rate limit exceeded" } ``` ### Validation Error (400) [#validation-error-400] ```json { "error": "VALIDATION_ERROR", "message": "Thinking/reasoning models are not supported" } ``` ### Upstream Error (502) [#upstream-error-502] Returned when the LLM provider fails. The balance lock is released (no charge). ```json { "error": "UPSTREAM_ERROR", "message": "Provider returned an error" } ``` ## Error handling behavior [#error-handling-behavior] | Scenario | Behavior | | -------------------- | ----------------------------------------------------- | | Insufficient balance | HTTP 402 with top-up instructions and current balance | | Provider down | Release lock, return 502, **no balance deducted** | | Invalid model | Return 400, release lock, **no balance deducted** | | Rate limited | Return 429, **no balance deducted** | | Invalid API key | Return 401 | | Expired/invalid SIWE | Return 401 | | Top-up below minimum | Return 400 before issuing 402 | # List Models (/docs/api/list-models) `GET /v1/models` No auth required. Returns all available models with pricing. ```bash curl https://agent-router.gaib.ai/v1/models ``` ```json { "object": "list", "data": [ { "id": "gemini/gemini-2.5-flash", "name": "gemini-2.5-flash", "provider": "gemini", "contextLength": 1048576, "promptPricePer1MTokens": 0.15, "completionPricePer1M": 0.60 } ] } ``` Prices are **USD per 1M tokens** (downstream provider cost — the rate you're billed at). The [Model catalog](/docs/models/catalog) renders this endpoint live. # Overview (/docs/api/overview) **Base URL:** `https://agent-router.gaib.ai` All endpoints are under `/v1/` except `/health`. The full API is published as an **[OpenAPI 3.1 spec](/openapi.yaml)** — point your agent, SDK generator, or API client at it. For LLM context, [`/llms.txt`](/llms.txt) indexes the docs and [`/llms-full.txt`](/llms-full.txt) is every page as one markdown file. ## Authentication [#authentication] | Method | Used for | | ------------------------------------------------- | ------------------------ | | **API key** — `Authorization: Bearer sk-<64 hex>` | Inference, usage queries | | **SIWE** — signed message + signature | API key management | | **x402** — payment signature | Top-up | See [Authentication](/docs/authentication) for details. ## Endpoints [#endpoints] | Endpoint | Method | Auth | Description | | ---------------------------------------------------- | -------- | -------------- | ------------------------------ | | `/health` | `GET` | — | Health check | | [`/v1/models`](/docs/api/list-models) | `GET` | — | List models with pricing | | [`/v1/topup`](/docs/api/topup) | `POST` | x402 | Fund balance with USDC or USDT | | [`/v1/topups/:wallet`](/docs/api/topups) | `GET` | API key / SIWE | List top-up history | | [`/v1/auth/keys`](/docs/api/api-keys) | `POST` | SIWE | Create API key | | [`/v1/auth/keys`](/docs/api/api-keys) | `GET` | SIWE | List API keys | | [`/v1/auth/keys/:id`](/docs/api/api-keys) | `DELETE` | SIWE | Revoke API key | | [`/v1/chat/completions`](/docs/api/chat-completions) | `POST` | API key | Inference (OpenAI-compatible) | | [`/v1/usage/:wallet`](/docs/api/usage) | `GET` | API key / SIWE | Balance & usage stats | | `/v1/balance/:wallet` | `GET` | — | Available balance only | ## Health [#health] `GET /health` ```bash curl https://agent-router.gaib.ai/health ``` ```json { "status": "ok" } ``` # Top Up (/docs/api/topup) `POST /v1/topup` Fund wallet balance via x402 stablecoin payment. USDC (ERC-3009) and USDT (Permit2) are accepted on Base and Arbitrum. Two-step flow. ## Step 1: Get payment requirements [#step-1-get-payment-requirements] ```bash curl -X POST https://agent-router.gaib.ai/v1/topup \ -H "Content-Type: application/json" \ -d '{"amount": 5}' ``` ```json { "accepts": [ { "scheme": "exact", "network": "eip155:8453", "amount": "5000000", "maxAmountRequired": "5000000", "asset": "0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913", "payTo": "0x...", "extra": { "assetTransferMethod": "eip3009", "name": "USD Coin", "version": "2" } }, { "scheme": "exact", "network": "eip155:8453", "amount": "5000000", "maxAmountRequired": "5000000", "asset": "0xdAC17F958D2ee523a2206206994597C13D831ec7", "payTo": "0x...", "extra": { "assetTransferMethod": "permit2" } } ] } ``` Always use the `accepts` array from the live 402 response as the source of truth — the offered assets, networks, and transfer methods can vary by deployment. ## Step 2: Send with payment signature [#step-2-send-with-payment-signature] ```bash curl -X POST https://agent-router.gaib.ai/v1/topup \ -H "Content-Type: application/json" \ -H "X-Payment: " \ -d '{"amount": 5}' ``` ```json { "balance_usdc": 5000000, "credited_usdc": 5000000 } ``` | Field | Type | Description | | -------- | -------- | ------------------------ | | `amount` | `number` | USD amount, minimum `$1` | # Top-up History (/docs/api/topups) `GET /v1/topups/:wallet` List a wallet's top-up history, newest first — used to render receipts. Wallet-scoped: accepts API key **or** SIWE auth. ```bash curl https://agent-router.gaib.ai/v1/topups/0xYourWallet \ -H "Authorization: Bearer sk-your-key" ``` The API key may also be passed as an `api_key` query parameter. ```bash curl "https://agent-router.gaib.ai/v1/topups/0xYourWallet?message=&signature=" ``` ```json { "wallet": "0xabc...", "network": "Base (Mainnet)", "topups": [ { "id": 12, "amount_usdc": 5000000, "tx_hash": "0x...", "created_at": "2026-06-09T10:23:00.000Z" } ] } ``` | Field | Unit | Description | | ---------------------- | ---------- | -------------------------------------------------------------------------------------------- | | `network` | `string` | Human-readable network the gateway settles on (`Base (Mainnet)` or `Base Sepolia (Testnet)`) | | `topups[].amount_usdc` | micro-USDC | `/ 1_000_000` = USD | | `topups[].tx_hash` | `string` | On-chain settlement transaction hash | | `topups[].created_at` | ISO 8601 | When the top-up was recorded | To create a top-up, see [Top Up](/docs/api/topup). # Usage (/docs/api/usage) `GET /v1/usage/:wallet` Get wallet balance and per-key usage breakdown. Accepts API key **or** SIWE auth. ```bash curl https://agent-router.gaib.ai/v1/usage/0xYourWallet \ -H "Authorization: Bearer sk-your-key" ``` ```bash curl "https://agent-router.gaib.ai/v1/usage/0xYourWallet?message=&signature=" ``` ```json { "wallet": "0xabc...", "balance_usdc": 4950000, "locked_usdc": 0, "available_usdc": 4950000, "keys": [ { "api_key_id": 1, "label": "my-app", "request_count": 3, "total_prompt_tokens": 66, "total_completion_tokens": 174, "total_charged_usdc": 50000, "total_platform_revenue_usd": 0.008 } ] } ``` | Field | Unit | Conversion | | -------------------- | ---------- | ---------------------------- | | `balance_usdc` | micro-USDC | `/ 1_000_000` = USD | | `available_usdc` | micro-USDC | `balance_usdc − locked_usdc` | | `total_charged_usdc` | micro-USDC | `/ 1_000_000` = USD | # Credits & Billing (/docs/concepts/billing) The gateway is **credit-based**: you fund a balance, and each inference request is billed against it. ## Balance units (micro-USDC) [#balance-units-micro-usdc] All balances are stored as **micro-USDC** — integers with 6 decimal places. | Value | Unit | Convert to USD | | -------------------- | ---------- | ---------------------------- | | `balance_usdc` | micro-USDC | `/ 1_000_000` | | `available_usdc` | micro-USDC | `balance_usdc − locked_usdc` | | `total_charged_usdc` | micro-USDC | `/ 1_000_000` | So `5_000_000` micro-USDC = **$5.00**. ## Pricing [#pricing] Prices reported by `GET /v1/models` are **downstream costs** — what the gateway pays the provider — and you're billed at those rates. See [Pricing](/docs/models/pricing). ## Overdraft protection [#overdraft-protection] Balance is **reserved before** the upstream call starts. The reservation covers the maximum possible cost (based on `max_tokens`), and the actual charge is always ≤ that estimate. This guarantees concurrent requests can never overspend your credit. If the balance can't cover the estimate, the request is rejected with **HTTP 402** before any provider call. ## Checking usage [#checking-usage] Query `GET /v1/usage/:wallet` for balance plus per-key request counts, token totals, and charges. See the [Usage endpoint](/docs/api/usage). # Choosing a model (/docs/concepts/choosing-a-model) The gateway routes each request to a provider by reading the **prefix** of the model ID. There is no automatic fallback or cross-provider load balancing — you choose the exact model, and the gateway calls that provider. ## Model ID format [#model-id-format] All model IDs follow `/`: ``` gemini/gemini-2.5-flash bedrock/nova-pro bedrock/claude-haiku-4-5-20251001 kimi/kimi-k2.5 minimax/MiniMax-M2.5 ``` The gateway splits on the first `/` to select the provider, then forwards the rest as the upstream model name. ## Switching models [#switching-models] Because routing is just a string, switching models or providers is a one-line change: ```ts // Google Gemini … model: 'gemini/gemini-2.5-flash' // … or Moonshot Kimi — same request shape model: 'kimi/kimi-k2.5' ``` ## Picking the right one [#picking-the-right-one] * **Cheapest general-purpose:** `gemini/gemini-2.5-flash` * **Long context:** Gemini models (up to 1M+ tokens) * **Anthropic Claude:** via `bedrock/claude-*` (see [Integrations → Anthropic](/docs/integrations/anthropic)) Browse everything with live pricing and context windows in the [Model catalog](/docs/models/catalog), and check [Provider notes](/docs/models/provider-notes) for per-model quirks. # Rate limits (/docs/concepts/rate-limits) The gateway enforces **per-wallet** rate limits using a sliding window — a maximum number of requests per minute and per day for each wallet. ## When you hit a limit [#when-you-hit-a-limit] Exceeding a window returns **HTTP 429** with **no charge**: ```json { "error": "RATE_LIMITED", "message": "Rate limit exceeded" } ``` ## Handling 429s [#handling-429s] * Back off before retrying — exponential backoff with jitter for bursty workloads. * Rate limits are tracked per wallet, so spreading load across wallets does not bypass per-wallet windows for a single wallet. Rate-limited requests are rejected **before** any provider call, so they never deduct balance. # Top up (x402) (/docs/concepts/topup) Top-up uses the **x402** protocol. You pay a stablecoin on-chain, and the gateway credits your balance after verifying the payment. USDC (ERC-3009) and USDT (Permit2) are accepted on Base and Arbitrum. It's a **two-step** flow: the first request returns payment requirements, the second carries the signed payment. ## Step 1 — Discover payment requirements [#step-1--discover-payment-requirements] ```bash curl -X POST https://agent-router.gaib.ai/v1/topup \ -H "Content-Type: application/json" \ -d '{"amount": 5}' ``` The server responds with **HTTP 402** and an `accepts` array describing how to pay: ```json { "accepts": [ { "scheme": "exact", "network": "eip155:8453", "maxAmountRequired": "5000000", "asset": "0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913", "payTo": "0x...", "extra": { "assetTransferMethod": "eip3009", "name": "USD Coin", "version": "2" } } ] } ``` Always use the `accepts` array from the live 402 response as the source of truth — the offered assets, networks, and transfer methods can vary by deployment. ## Step 2 — Sign and resend [#step-2--sign-and-resend] Sign the payment and resend with the `X-Payment` header. The easiest way is the `@x402/fetch` interceptor: ```ts import { withPaymentInterceptor } from '@x402/fetch' const fetchWithPayment = withPaymentInterceptor(fetch, walletClient) const res = await fetchWithPayment('https://agent-router.gaib.ai/v1/topup', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ amount: 5 }), }) const { balance_usdc, credited_usdc } = await res.json() // balance_usdc: total balance in micro-USDC ``` | Field | Type | Description | | -------- | -------- | ------------------------ | | `amount` | `number` | USD amount, minimum `$1` | See the full [Top Up API reference](/docs/api/topup) for response fields. # Anthropic (Claude) models (/docs/integrations/anthropic) **Verified.** Claude models via `bedrock/claude-*` are exercised end-to-end (streaming + non-streaming) through the full gateway stack. Claude models are available through **AWS Bedrock** using the standard OpenAI-compatible request shape — you do **not** use the Anthropic SDK. Just set the model ID to a `bedrock/claude-*` value. ```ts import OpenAI from 'openai' const client = new OpenAI({ baseURL: 'https://agent-router.gaib.ai/v1', apiKey: 'sk-your-api-key', }) const res = await client.chat.completions.create({ model: 'bedrock/claude-haiku-4-5-20251001', messages: [{ role: 'user', content: 'Explain x402 in one sentence.' }], max_tokens: 512, }) ``` ## Available Claude models [#available-claude-models] Examples (see the [live catalog](/docs/models/catalog) for the authoritative list and pricing): | Model ID | Notes | | ----------------------------------- | ----------------- | | `bedrock/claude-haiku-4-5-20251001` | Cheapest, fastest | | `bedrock/claude-sonnet-4-6` | Balanced | | `bedrock/claude-opus-4-7` | Most capable | Because requests use the OpenAI format, streaming and tool calling work exactly as documented in [OpenAI SDK](/docs/integrations/openai-sdk). `thinking` / `reasoning_effort` are not supported. # Cursor & Claude Code (/docs/integrations/editors) **Community / untested.** These tools accept an OpenAI-compatible base URL, so they should work with the gateway, but they are not part of the gateway's automated test suite. ## Cursor [#cursor] In **Settings → Models → OpenAI API**, enable a custom base URL: ``` Base URL: https://agent-router.gaib.ai/v1 API Key: sk-your-api-key Model: gemini/gemini-2.5-flash ``` Add your gateway model IDs as custom models so Cursor sends them verbatim. ## Claude Code [#claude-code] Claude Code can target an OpenAI-compatible endpoint via environment variables: ```bash export OPENAI_BASE_URL="https://agent-router.gaib.ai/v1" export OPENAI_API_KEY="sk-your-api-key" ``` Then select a `bedrock/claude-*` model. See [Anthropic (Claude) models](/docs/integrations/anthropic). Tools that send `thinking` / `reasoning_effort` will get HTTP 400 from the gateway. Disable reasoning parameters if the client exposes them. # Frontend integration (/docs/integrations/frontend) The gateway is a credit-based LLM inference API. Users fund a balance, authenticate with their wallet (SIWE), get an API key, and call the inference endpoint. ## User flow [#user-flow] ``` Connect wallet → Top up USDC → Sign in (SIWE) → Create API key → Use API key for completions ``` Top-up and sign-in are independent — both require a connected wallet but can happen in either order. ## Recommended libraries [#recommended-libraries] | Purpose | Library | | ------------------------- | ----------------------------------------- | | Wallet connection | `wagmi`, `RainbowKit`, `ConnectKit` | | SIWE message construction | `siwe` | | Viem wallet client | `viem` | | x402 payment | `@x402/fetch` or `@x402/client` | | OpenAI-compatible client | `openai` (point `baseURL` at the gateway) | ## Top up USDC balance [#top-up-usdc-balance] Top-up uses x402 — the client pays USDC on Base before the balance is credited. ```ts import { withPaymentInterceptor } from '@x402/fetch' const fetchWithPayment = withPaymentInterceptor(fetch, walletClient) const res = await fetchWithPayment(`${BASE_URL}/v1/topup`, { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ amount: 5 }), }) const { balance_usdc, credited_usdc } = await res.json() // $1 = 1_000_000 micro-USDC ``` See [Top up (x402)](/docs/concepts/topup) for the underlying two-step protocol. ## SIWE authentication [#siwe-authentication] SIWE proves wallet ownership for API key management. There is no server-side session — every call needs a fresh signed message (valid 5 minutes). ```ts import { SiweMessage } from 'siwe' import { createWalletClient, custom } from 'viem' import { mainnet } from 'viem/chains' const walletClient = createWalletClient({ chain: mainnet, transport: custom(window.ethereum), }) const [address] = await walletClient.getAddresses() const siweMsg = new SiweMessage({ domain: window.location.host, address, uri: window.location.origin, version: '1', chainId: 8453, // Base mainnet nonce: crypto.randomUUID().replace(/-/g, '').slice(0, 16), issuedAt: new Date().toISOString(), statement: 'Sign in to the AI Gateway', }) const message = siweMsg.prepareMessage() const signature = await walletClient.signMessage({ account: address, message }) ``` Generate a fresh `issuedAt` timestamp immediately before each API call. The server rejects SIWE messages older than 5 minutes. ## Chat completions in the browser [#chat-completions-in-the-browser] ```ts import OpenAI from 'openai' const client = new OpenAI({ baseURL: `${BASE_URL}/v1`, apiKey: 'sk-a3f9...', dangerouslyAllowBrowser: true, }) const response = await client.chat.completions.create({ model: 'gemini/gemini-2.5-flash', messages: [{ role: 'user', content: 'Hello!' }], }) ``` ### Streaming (SSE) [#streaming-sse] ```ts const res = await fetch(`${BASE_URL}/v1/chat/completions`, { method: 'POST', headers: { 'Authorization': `Bearer ${apiKey}`, 'Content-Type': 'application/json', }, body: JSON.stringify({ model, messages, max_tokens: 512, stream: true }), }) const reader = res.body!.getReader() const decoder = new TextDecoder() while (true) { const { done, value } = await reader.read() if (done) break const chunk = decoder.decode(value) for (const line of chunk.split('\n')) { if (!line.startsWith('data: ')) continue const data = line.slice(6) if (data === '[DONE]') break const delta = JSON.parse(data).choices[0].delta.content ?? '' // append delta to your UI } } ``` ## Units & conversions [#units--conversions] | Value | Unit | Display | | ------------------------ | ----------------------- | ------------------- | | `balance_usdc` | micro-USDC (6 decimals) | `/ 1_000_000` → USD | | `amount` in topup | USD (float) | e.g. `5` = $5 | | `promptPricePer1MTokens` | USD per 1M tokens | — | | `total_charged_usdc` | micro-USDC | `/ 1_000_000` → USD | # LangChain (/docs/integrations/langchain) **Community / untested.** Works via OpenAI compatibility, but is not part of the gateway's automated test suite. Point LangChain's OpenAI chat model at the gateway base URL. ```python from langchain_openai import ChatOpenAI llm = ChatOpenAI( base_url="https://agent-router.gaib.ai/v1", api_key="sk-your-api-key", model="gemini/gemini-2.5-flash", ) print(llm.invoke("Hello!").content) ``` ```ts import { ChatOpenAI } from '@langchain/openai' const llm = new ChatOpenAI({ apiKey: 'sk-your-api-key', model: 'gemini/gemini-2.5-flash', configuration: { baseURL: 'https://agent-router.gaib.ai/v1' }, }) const res = await llm.invoke('Hello!') console.log(res.content) ``` # LlamaIndex (/docs/integrations/llamaindex) **Community / untested.** Works via OpenAI compatibility, but is not part of the gateway's automated test suite. Configure the OpenAI-like LLM with the gateway base URL. ```python from llama_index.llms.openai_like import OpenAILike llm = OpenAILike( api_base="https://agent-router.gaib.ai/v1", api_key="sk-your-api-key", model="gemini/gemini-2.5-flash", is_chat_model=True, ) print(llm.complete("Hello!")) ``` # OpenAI SDK (/docs/integrations/openai-sdk) **Verified.** The gateway is OpenAI-compatible and continuously tested for response shape, streaming chunks, tool calling, and `finish_reason`. The gateway speaks the OpenAI Chat Completions API. The only change is the **base URL** and your **API key**. ```ts import OpenAI from 'openai' const client = new OpenAI({ baseURL: 'https://agent-router.gaib.ai/v1', apiKey: process.env.GATEWAY_API_KEY, // sk-... }) const res = await client.chat.completions.create({ model: 'gemini/gemini-2.5-flash', messages: [{ role: 'user', content: 'Hello!' }], max_tokens: 512, }) console.log(res.choices[0].message.content) ``` ```python from openai import OpenAI client = OpenAI( base_url="https://agent-router.gaib.ai/v1", api_key="sk-your-api-key", ) res = client.chat.completions.create( model="gemini/gemini-2.5-flash", messages=[{"role": "user", "content": "Hello!"}], max_tokens=512, ) print(res.choices[0].message.content) ``` ## Streaming [#streaming] ```ts const stream = await client.chat.completions.create({ model: 'gemini/gemini-2.5-flash', messages: [{ role: 'user', content: 'Tell me a story' }], stream: true, }) for await (const chunk of stream) { process.stdout.write(chunk.choices[0]?.delta?.content ?? '') } ``` ## Tool calling [#tool-calling] Tool calling is supported and returns standard OpenAI shapes (`finish_reason: "tool_calls"` and a `tool_calls` array). ```ts const res = await client.chat.completions.create({ model: 'gemini/gemini-2.5-flash', messages: [{ role: 'user', content: 'What is the weather in Tokyo?' }], tools: [ { type: 'function', function: { name: 'get_weather', parameters: { type: 'object', properties: { city: { type: 'string' } }, required: ['city'], }, }, }, ], }) ``` `thinking` and `reasoning_effort` parameters are **not supported** and return HTTP 400. Strip them before calling the gateway. # Vercel AI SDK (/docs/integrations/vercel-ai-sdk) **Community / untested.** The gateway is OpenAI-compatible, so this should work, but it is not part of the gateway's automated test suite. Use the OpenAI-compatible provider and point it at the gateway base URL. ```ts import { createOpenAICompatible } from '@ai-sdk/openai-compatible' import { generateText } from 'ai' const gateway = createOpenAICompatible({ name: 'token-kiosk', baseURL: 'https://agent-router.gaib.ai/v1', apiKey: process.env.GATEWAY_API_KEY, }) const { text } = await generateText({ model: gateway('gemini/gemini-2.5-flash'), prompt: 'Hello!', }) ``` Streaming uses `streamText` as usual. `thinking` / `reasoning_effort` are not supported by the gateway. # Model catalog (/docs/models/catalog) All model IDs follow the pattern `/` (e.g. `gemini/gemini-2.5-flash`). The gateway parses the prefix to route requests to the correct provider. The table below is **live** — it fetches `GET /v1/models` from the gateway. Prices are USD per 1M tokens and are the rates you're billed at. Notes shown under a model id (e.g. "reasoning model — use max\_tokens ≥ 1000") are the only capability hints we surface, because they're documented and verified. The `/v1/models` API does not expose vision/JSON/tool flags, so this catalog does not claim them. See [Provider notes](/docs/models/provider-notes). # Pricing (/docs/models/pricing) ## How prices work [#how-prices-work] Prices reported by `GET /v1/models` (and shown in the [catalog](/docs/models/catalog)) are **downstream costs** — what the gateway pays the provider. They are quoted in **USD per 1M tokens**, separately for prompt and completion tokens. You're billed at those rates. ## What you're charged [#what-youre-charged] * Charges are computed from **actual** token usage returned by the provider. * Balance is reserved before the call based on `max_tokens`, then reconciled to the real cost. See [Credits & Billing](/docs/concepts/billing). * On a provider error, the reservation is released and **no charge** is made. # Provider notes (/docs/models/provider-notes) These are documented behaviors that differ from the generic OpenAI request shape. `kimi-k2.5` ignores `temperature`, `top_p`, and penalty parameters. MiniMax models ignore `presence_penalty` and `frequency_penalty` parameters. `bedrock/kimi-k2-thinking` is a reasoning model that uses an internal thinking budget. Use `max_tokens ≥ 1000` to ensure output is produced. `bedrock/gpt-oss-20b` requires `max_tokens ≥ 500` to produce output. ## Not supported anywhere [#not-supported-anywhere] `thinking` and `reasoning_effort` request parameters return **HTTP 400**. Strip them before calling the gateway, even for reasoning models — those manage their thinking budget internally. # Data & logging (/docs/operations/data-logging) The gateway is designed to bill accurately without retaining your prompts. ## What is stored [#what-is-stored] Each request records one row of **metadata**: | Field | Description | | ---------------- | ----------------------------------- | | Wallet & API key | Which wallet/key made the request | | Model | Model ID used | | Token counts | Prompt and completion token totals | | Cost breakdown | Downstream + platform cost | | Billing amounts | Amount charged and reserved | | Latency & mode | Response time, streaming or not | | Error | Error string, if the request failed | | Timestamp | When the request was made | ## What is NOT stored [#what-is-not-stored] **Prompt and completion content is not persisted.** The gateway does not write your messages or the model's responses to its database — only the token counts and billing metadata above. ## Upstream providers [#upstream-providers] Requests are forwarded to the selected upstream provider (Gemini, Bedrock, Kimi, MiniMax, …). Those providers apply **their own** data-handling and retention policies to the content they receive. Choose models accordingly for sensitive workloads. # Limits (/docs/operations/limits) | Limit | Value | | -------------------- | -------------------------- | | Requests per minute | Per wallet, sliding window | | Requests per day | Per wallet, sliding window | | Minimum top-up | `$1` | | Default `max_tokens` | `1024` | ## Rate limiting [#rate-limiting] Limits are enforced per wallet with a sliding window. Exceeding them returns **HTTP 429** with no charge. See [Rate limits](/docs/concepts/rate-limits). ## Cost estimation [#cost-estimation] Balance is reserved before each call based on `max_tokens` (default `1024`). Setting a realistic `max_tokens` avoids over-reserving your balance on concurrent requests. See [Credits & Billing](/docs/concepts/billing). # llms.txt (/docs/resources/llms) This documentation follows the [llms.txt](https://llmstxt.org/) convention to make it easy for AI agents and LLMs to consume. ## Available formats [#available-formats] | URL | Description | | ---------------------------------- | -------------------------------------------------------- | | [`/llms.txt`](/llms.txt) | Concise index of all pages | | [`/llms-full.txt`](/llms-full.txt) | Full documentation in one file | | [`/openapi.yaml`](/openapi.yaml) | Complete OpenAPI 3.1 API spec — endpoints, schemas, auth | Every docs page is also available as Markdown by appending `.md` to its URL, or via the "Copy Markdown" button on the page. Building an agent that calls the gateway? See [For AI Agents](/docs/agents) for the drop-in base URL and a tool-use walkthrough. ## Usage [#usage] Point your LLM or agent at the plain-text file: ```ts const docs = await fetch('https://gaib.ai/llms-full.txt').then(r => r.text()) ``` ```python import httpx docs = httpx.get("https://gaib.ai/llms-full.txt").text ``` ```bash curl https://gaib.ai/llms.txt ``` These files are designed to be included in system prompts or tool descriptions so AI agents can call the gateway API without human guidance. # Balance checker (/docs/tools/balance) Enter wallet addresses to watch their available credit balances. Addresses are saved in your browser and auto-refreshed every 30 seconds. Balances are read from `GET /v1/balance/:wallet` (no auth) and shown in USD (converted from micro-USDC): ```json { "wallet": "0xabc…", "available_usdc": 4950000 } ``` `available_usdc` is `balance − locked`, in micro-USDC — divide by `1_000_000` for USD.