Skip to content

Models and Pricing

Cosine exposes a single model picker, but the underlying models do not all cost the same to run. To keep pricing predictable, we convert token consumption into a simple multiplier.

When you use a model, the underlying provider charges for some combination of:

  • input tokens
  • output tokens
  • reasoning tokens
  • context-window premiums on long-context variants

Instead of exposing all of those rates directly in the product, Cosine maps each model onto a single multiplier such as 0.5x, 1x, or 3x.

Reasoning matters here because reasoning tokens count toward total token consumption. In practice, higher reasoning settings usually mean the model spends more tokens thinking before it answers, so the total cost of running that model goes up.

A multiplier is a simple pricing weight attached to each model. Cosine uses it to turn underlying token consumption into something easier to compare in the picker.

In practice:

  • GPT 5.4: 1x
  • GPT 5.4 mini: 0.5x
  • GPT 5.4 nano: 0.25x
  • Sonnet 4.6: 2x
  • Opus 4.6: 3x

For variants with larger context windows, such as 1M context models, we keep the relative premium within that model family and still round to 0.25x steps.

  • 0.25x means the model is one of the cheapest options in the catalog
  • 1x means the model is charged at the standard rate
  • 2x means the model consumes credits at twice the standard rate
  • 4.5x means the model consumes credits at four and a half times the standard rate

The multiplier is a pricing weight, not a quality score. A higher multiplier usually means a model is more expensive to run, not that it is always better for every task.

Each card shows the model provider, the current multiplier, and the kinds of tasks that model is best suited for.

Model Provider Multiplier Best for
GPT 5.4
OpenAI logo OpenAI
1x General-purpose OpenAI option at the standard rate.
GPT 5.4 1M
OpenAI logo OpenAI
1.5x Long-context OpenAI variant for larger working sets.
GPT 5.4 mini
OpenAI logo OpenAI
0.5x Lower-cost GPT-5.4 family model for coding, subagents, and higher-volume work.
GPT 5.4 nano
OpenAI logo OpenAI
0.25x Cheapest GPT-5.4 family option for lightweight classification, extraction, and subagent tasks.
Codex 5.3
OpenAI logo OpenAI
1x Balanced coding model for everyday work.
Lumen Scout
Cosine logo Cosine
0.1x Lightweight Cosine-hosted model for cheaper runs.
Kimi K2.5
Moonshot logo Moonshot
0.5x Lower-cost model for faster iteration and first passes.
GLM 5
Z AI logo Z AI
0.25x Lower-cost general-purpose option.
MiniMax M2.5
MiniMax logo MiniMax
0.5x Lower-cost option suited to routine coding tasks.
Sonnet 4.6
Anthropic logo Anthropic
2x Stronger reasoning model for harder implementation work.
Sonnet 4.6 1M
Anthropic logo Anthropic
4x Long-context Sonnet tier for large repositories and documents.
Opus 4.6
Anthropic logo Anthropic
3x Premium model for the most demanding coding tasks.
Opus 4.6 1M
Anthropic logo Anthropic
4.5x Highest-cost long-context Anthropic option.
Haiku 4.5
Anthropic logo Anthropic
0.25x Low-cost Anthropic option for lighter work.
Gemini 3.1 Flash Lite
Google logo Google
0.25x Lowest-cost Gemini tier for lightweight requests.
Gemini 3.1 Pro
Google logo Google
1x General-purpose Gemini tier.
Gemini 3 Flash
Google logo Google
0.25x Lower-cost Gemini tier for quick answers.
Nemotron 3 Super
NVIDIA logo NVIDIA
0.5x Lower-cost reasoning model.

Some model options are accessed through your own provider account, such as ChatGPT-authenticated or GitHub Copilot-authenticated models. Those flows are billed by the external provider or subscription rather than the default Cosine-hosted catalog above.