Overview

Cosine Inference is a /responses-compatible inference gateway that gives you a single endpoint for model access across tools and clients.

At a high level, the gateway is designed to:

expose an OpenAI Responses-compatible interface
expose a model catalog via /models
let clients authenticate once and route through Cosine infrastructure
keep billing and usage reporting tied to Cosine account usage rather than a patchwork of per-client integrations

Endpoints

The production gateway exposes:

https://api.cosine.sh/responses
https://api.cosine.sh/models

The gateway is intended for clients that already understand the OpenAI Responses-style API shape or can be configured to point at an OpenAI-compatible provider.

API compatibility

Cosine Inference is designed around the Responses API style rather than older chat-completions-only integrations.

In practice, that means:

request and response payloads are compatible with /responses-style clients
model discovery is available from /models
streaming clients can continue to use the same OpenAI-compatible mental model

If a client supports custom OpenAI-compatible providers, it can usually be pointed at Cosine Inference with:

a custom base URL
a bearer token
the provider wire API set to responses

For a concrete example, see Generic client example.

Billing and usage

Cosine Inference usage is billed through Cosine rather than through a separate provider account setup for each client.

At the platform level, usage is tracked against the billable request and token dimensions that matter operationally, including:

input tokens
output tokens
cache tokens when applicable
model and upstream provider attribution
request origin metadata when available

This is the same general usage model that powers Cosine’s reporting and billing views elsewhere in the product.

If you are looking for how model pricing is represented in the product, see Model pricing.

Client integrations

You can use Cosine Inference from:

custom OpenAI-compatible clients
tools that support custom Responses-compatible providers
Codex CLI with a custom provider configuration
OpenClaw through the dedicated Cosine inference plugin

See: