Model pricing
Cosine lets you choose which model runs a task. Models have different strengths, speeds, context windows, and costs.
The model picker shows an approximate multiplier so models are easy to compare at a glance. The billing usage view breaks this down further into the credits used for input, cached input, and output tokens.
How model pricing works
Section titled “How model pricing works”When you use a model, usage is measured across different token types:
- Input tokens — Context sent to the model, such as your prompt, instructions, files, memory, and tool results
- Cached input tokens — Reused input context that the provider can bill at a lower rate
- Output tokens — Text, tool calls, code, and other responses produced by the model
Different models price these token types differently. A model can be relatively cheap for input but more expensive for output, or vice versa.
The table below mirrors the pricing breakdown shown in the billing UI. Values are credits used per token for each token type.
| Model | Provider | Approx. multiplier | Input | Cached input | Output | Long-context* |
|---|---|---|---|---|---|---|
| Lumen Outpost | Cosine | ~0.25x | 0.60 | 0.10 | 3 | — |
| Lumen Scout | Cosine | ~0.1x | 0.10 | 0.05 | 0.30 | — |
| GPT 5.5 | OpenAI | ~2.75x | 5 | 0.50 | 30 | True |
| GPT 5.4 | OpenAI | ~1.5x | 2.50 | 0.25 | 15 | True |
| GPT 5.4 1M | OpenAI | ~1.5x | 2.50 | 0.25 | 15 | True |
| GPT 5.4 mini | OpenAI | ~0.5x | 0.75 | 0.075 | 4.50 | — |
| Codex 5.3 | OpenAI | ~1x | 1.75 | 0.175 | 14 | — |
| Opus 4.7 | Anthropic | ~2.75x | 5 | 0.50 | 25 | — |
| Opus 4.6 | Anthropic | ~2.75x | 5 | 0.50 | 25 | — |
| Sonnet 4.6 | Anthropic | ~1.75x | 3 | 0.30 | 15 | — |
| Gemini 3.1 Pro | ~1x | 2 | 0.20 | 12 | True | |
| Gemini 3 Flash | ~0.25x | 0.50 | 0.05 | 3 | — | |
| Kimi K2.6 | Moonshot | ~0.5x | 0.95 | 0.16 | 4 | — |
| GLM 5.1 | Z AI | ~0.75x | 1.40 | 0.26 | 4.40 | — |
| MiniMax M2.7 | MiniMax | ~0.25x | 0.30 | 0.06 | 1.20 | — |
| Qwen 3.6 Plus | Qwen | ~0.25x | 0.50 | 0.10 | 3 | — |
| DeepSeek v4 Pro | DeepSeek | ~1x | 1.74 | 0.14 | 3.48 | — |
Approximate multipliers
Section titled “Approximate multipliers”The approximate multiplier shown in the model picker is a simplified comparison against the standard Cosine-managed rate. It is useful for choosing quickly, but it is not the full billing formula.
For example:
~0.25xmeans the model is usually one of the lower-cost options~1xmeans the model is around the standard managed rate~2.75xmeans the model is usually more expensive to run
The multiplier is a pricing weight, not a quality score. A higher multiplier usually means a model costs more, not that it is always better for every task.
Long-context pricing
Section titled “Long-context pricing”Models marked True in the Long-context* column can use different rates once the input context grows past a long-context threshold.
Long-context pricing matters most for large repositories, long conversations, or tasks that require Cosine to carry a lot of file and tool context at once.
Auto model mode
Section titled “Auto model mode”Auto lets Cosine choose the model for you instead of locking the session to one fixed model up front.
When you select Auto, Cosine treats model choice as a routing decision. It can start from a lower-cost baseline when a task looks simple and step up to stronger models when the task is ambiguous, long-horizon, or more complex.
Auto is still part of the Cosine-managed catalog. It is not a separate provider and does not require a third-party model account.
Models billed outside Cosine
Section titled “Models billed outside Cosine”Some model options are accessed through your own provider account, such as ChatGPT-authenticated or GitHub Copilot-authenticated models. Those flows are billed by the external provider or subscription rather than the default Cosine-hosted catalog above.
Related pages
Section titled “Related pages”- Models — Conceptual overview of model selection
- Reasoning — How reasoning changes speed, depth, and cost
- Swarm mode — Multi-agent execution for larger tasks