Skip to content

Overview

Cosine Inference is a /responses-compatible inference gateway that gives you a single endpoint for model access across tools and clients.

At a high level, the gateway is designed to:

  • expose an OpenAI Responses-compatible interface
  • expose a model catalog via /models
  • let clients authenticate once and route through Cosine infrastructure
  • keep billing and usage reporting tied to Cosine account usage rather than a patchwork of per-client integrations

The production gateway exposes:

  • https://api.cosine.sh/responses
  • https://api.cosine.sh/models

The gateway is intended for clients that already understand the OpenAI Responses-style API shape or can be configured to point at an OpenAI-compatible provider.

Cosine Inference is designed around the Responses API style rather than older chat-completions-only integrations.

In practice, that means:

  • request and response payloads are compatible with /responses-style clients
  • model discovery is available from /models
  • streaming clients can continue to use the same OpenAI-compatible mental model

If a client supports custom OpenAI-compatible providers, it can usually be pointed at Cosine Inference with:

  • a custom base URL
  • a bearer token
  • the provider wire API set to responses

For a concrete example, see Generic client example.

Cosine Inference usage is billed through Cosine rather than through a separate provider account setup for each client.

At the platform level, usage is tracked against the billable request and token dimensions that matter operationally, including:

  • input tokens
  • output tokens
  • cache tokens when applicable
  • model and upstream provider attribution
  • request origin metadata when available

This is the same general usage model that powers Cosine’s reporting and billing views elsewhere in the product.

If you are looking for how model pricing is represented in the product, see Models and Pricing.

You can use Cosine Inference from:

  • custom OpenAI-compatible clients
  • tools that support custom Responses-compatible providers
  • Codex CLI with a custom provider configuration
  • OpenClaw through the dedicated Cosine inference plugin

See: