Models and Pricing
Cosine exposes a single model picker, but the underlying models do not all cost the same to run. To keep pricing predictable, we convert token consumption into a simple multiplier.
How token pricing works
Section titled “How token pricing works”When you use a model, the underlying provider charges for some combination of:
- input tokens
- output tokens
- reasoning tokens
- context-window premiums on long-context variants
Instead of exposing all of those rates directly in the product, Cosine maps each model onto a single multiplier such as 0.5x, 1x, or 3x.
Reasoning matters here because reasoning tokens count toward total token consumption. In practice, higher reasoning settings usually mean the model spends more tokens thinking before it answers, so the total cost of running that model goes up.
What the multiplier means
Section titled “What the multiplier means”A multiplier is a simple pricing weight attached to each model. Cosine uses it to turn underlying token consumption into something easier to compare in the picker.
In practice:
- GPT 5.4:
1x - GPT 5.4 mini:
0.5x - GPT 5.4 nano:
0.25x - Sonnet 4.6:
2x - Opus 4.6:
3x
For variants with larger context windows, such as 1M context models, we keep the relative premium within that model family and still round to 0.25x steps.
How to read the multiplier
Section titled “How to read the multiplier”0.25xmeans the model is one of the cheapest options in the catalog1xmeans the model is charged at the standard rate2xmeans the model consumes credits at twice the standard rate4.5xmeans the model consumes credits at four and a half times the standard rate
The multiplier is a pricing weight, not a quality score. A higher multiplier usually means a model is more expensive to run, not that it is always better for every task.
Current Cosine-hosted model catalog
Section titled “Current Cosine-hosted model catalog”Each card shows the model provider, the current multiplier, and the kinds of tasks that model is best suited for.
| Model | Provider | Multiplier | Best for |
|---|---|---|---|
| GPT 5.4 | | 1x | General-purpose OpenAI option at the standard rate. |
| GPT 5.4 1M | | 1.5x | Long-context OpenAI variant for larger working sets. |
| GPT 5.4 mini | | 0.5x | Lower-cost GPT-5.4 family model for coding, subagents, and higher-volume work. |
| GPT 5.4 nano | | 0.25x | Cheapest GPT-5.4 family option for lightweight classification, extraction, and subagent tasks. |
| Codex 5.3 | | 1x | Balanced coding model for everyday work. |
| Lumen Scout | | 0.1x | Lightweight Cosine-hosted model for cheaper runs. |
| Kimi K2.5 | | 0.5x | Lower-cost model for faster iteration and first passes. |
| GLM 5 | | 0.25x | Lower-cost general-purpose option. |
| MiniMax M2.5 | | 0.5x | Lower-cost option suited to routine coding tasks. |
| Sonnet 4.6 | | 2x | Stronger reasoning model for harder implementation work. |
| Sonnet 4.6 1M | | 4x | Long-context Sonnet tier for large repositories and documents. |
| Opus 4.6 | | 3x | Premium model for the most demanding coding tasks. |
| Opus 4.6 1M | | 4.5x | Highest-cost long-context Anthropic option. |
| Haiku 4.5 | | 0.25x | Low-cost Anthropic option for lighter work. |
| Gemini 3.1 Flash Lite | | 0.25x | Lowest-cost Gemini tier for lightweight requests. |
| Gemini 3.1 Pro | | 1x | General-purpose Gemini tier. |
| Gemini 3 Flash | | 0.25x | Lower-cost Gemini tier for quick answers. |
| Nemotron 3 Super | | 0.5x | Lower-cost reasoning model. |
Models billed outside Cosine
Section titled “Models billed outside Cosine”Some model options are accessed through your own provider account, such as ChatGPT-authenticated or GitHub Copilot-authenticated models. Those flows are billed by the external provider or subscription rather than the default Cosine-hosted catalog above.