Overview
Cosine Inference is a /responses-compatible inference gateway that gives you a
single endpoint for model access across tools and clients.
At a high level, the gateway is designed to:
- expose an OpenAI Responses-compatible interface
- expose a model catalog via
/models - let clients authenticate once and route through Cosine infrastructure
- keep billing and usage reporting tied to Cosine account usage rather than a patchwork of per-client integrations
Endpoints
Section titled “Endpoints”The production gateway exposes:
https://api.cosine.sh/responseshttps://api.cosine.sh/models
The gateway is intended for clients that already understand the OpenAI Responses-style API shape or can be configured to point at an OpenAI-compatible provider.
API compatibility
Section titled “API compatibility”Cosine Inference is designed around the Responses API style rather than older chat-completions-only integrations.
In practice, that means:
- request and response payloads are compatible with
/responses-style clients - model discovery is available from
/models - streaming clients can continue to use the same OpenAI-compatible mental model
If a client supports custom OpenAI-compatible providers, it can usually be pointed at Cosine Inference with:
- a custom base URL
- a bearer token
- the provider wire API set to
responses
For a concrete example, see Generic client example.
Billing and usage
Section titled “Billing and usage”Cosine Inference usage is billed through Cosine rather than through a separate provider account setup for each client.
At the platform level, usage is tracked against the billable request and token dimensions that matter operationally, including:
- input tokens
- output tokens
- cache tokens when applicable
- model and upstream provider attribution
- request origin metadata when available
This is the same general usage model that powers Cosine’s reporting and billing views elsewhere in the product.
If you are looking for how model pricing is represented in the product, see Models and Pricing.
Client integrations
Section titled “Client integrations”You can use Cosine Inference from:
- custom OpenAI-compatible clients
- tools that support custom Responses-compatible providers
- Codex CLI with a custom provider configuration
- OpenClaw through the dedicated Cosine inference plugin
See: