Pricing

How prices work

Prices reported by GET /v1/models (and shown in the catalog) are downstream costs — what the gateway pays the provider. They are quoted in USD per 1M tokens, separately for prompt and completion tokens. You're billed at those rates.

What you're charged

Charges are computed from actual token usage returned by the provider.
Balance is reserved before the call based on max_tokens, then reconciled to the real cost. See Credits & Billing.
On a provider error, the reservation is released and no charge is made.

How prices work

What you're charged

On this page