Skip to content

Model pricing

Cosine lets you choose which model runs a task. Models have different strengths, speeds, context windows, and costs.

The model picker shows an approximate multiplier so models are easy to compare at a glance. The billing usage view breaks this down further into the credits used for input, cached input, and output tokens.

When you use a model, usage is measured across different token types:

  • Input tokens — Context sent to the model, such as your prompt, instructions, files, memory, and tool results
  • Cached input tokens — Reused input context that the provider can bill at a lower rate
  • Output tokens — Text, tool calls, code, and other responses produced by the model

Different models price these token types differently. A model can be relatively cheap for input but more expensive for output, or vice versa.

The table below mirrors the pricing breakdown shown in the billing UI. Values are credits used per token for each token type.

Model Provider Approx. multiplier Input Cached input Output Long-context*
Lumen Outpost Cosine ~0.25x 0.60 0.10 3
Lumen Scout Cosine ~0.1x 0.10 0.05 0.30
GPT 5.5 OpenAI ~2.75x 5 0.50 30 True
GPT 5.4 OpenAI ~1.5x 2.50 0.25 15 True
GPT 5.4 1M OpenAI ~1.5x 2.50 0.25 15 True
GPT 5.4 mini OpenAI ~0.5x 0.75 0.075 4.50
Codex 5.3 OpenAI ~1x 1.75 0.175 14
Opus 4.7 Anthropic ~2.75x 5 0.50 25
Opus 4.6 Anthropic ~2.75x 5 0.50 25
Sonnet 4.6 Anthropic ~1.75x 3 0.30 15
Gemini 3.1 Pro Google ~1x 2 0.20 12 True
Gemini 3 Flash Google ~0.25x 0.50 0.05 3
Kimi K2.6 Moonshot ~0.5x 0.95 0.16 4
GLM 5.1 Z AI ~0.75x 1.40 0.26 4.40
MiniMax M2.7 MiniMax ~0.25x 0.30 0.06 1.20
Qwen 3.6 Plus Qwen ~0.25x 0.50 0.10 3
DeepSeek v4 Pro DeepSeek ~1x 1.74 0.14 3.48

The approximate multiplier shown in the model picker is a simplified comparison against the standard Cosine-managed rate. It is useful for choosing quickly, but it is not the full billing formula.

For example:

  • ~0.25x means the model is usually one of the lower-cost options
  • ~1x means the model is around the standard managed rate
  • ~2.75x means the model is usually more expensive to run

The multiplier is a pricing weight, not a quality score. A higher multiplier usually means a model costs more, not that it is always better for every task.

Models marked True in the Long-context* column can use different rates once the input context grows past a long-context threshold.

Long-context pricing matters most for large repositories, long conversations, or tasks that require Cosine to carry a lot of file and tool context at once.

Auto lets Cosine choose the model for you instead of locking the session to one fixed model up front.

When you select Auto, Cosine treats model choice as a routing decision. It can start from a lower-cost baseline when a task looks simple and step up to stronger models when the task is ambiguous, long-horizon, or more complex.

Auto is still part of the Cosine-managed catalog. It is not a separate provider and does not require a third-party model account.

Some model options are accessed through your own provider account, such as ChatGPT-authenticated or GitHub Copilot-authenticated models. Those flows are billed by the external provider or subscription rather than the default Cosine-hosted catalog above.

  • Models — Conceptual overview of model selection
  • Reasoning — How reasoning changes speed, depth, and cost
  • Swarm mode — Multi-agent execution for larger tasks