Models and Pricing
Cosine exposes a single model picker, but the underlying models do not all cost the same to run. To keep pricing predictable, we convert token consumption into a simple multiplier.
Cosine also lets you choose an agent mode. Modes and models are separate controls:
- a mode changes how the agent approaches the work
- a model changes which underlying model runs the task
Agent modes
Section titled “Agent modes”These are the main modes you will see when creating a task in Cosine:
Manual
Section titled “Manual”Manual is the default single-agent workflow. Use it when you want Cosine to work through the task carefully while keeping you in control of higher-risk actions.
Plan is for research and planning before execution. Use it when you want Cosine to inspect the codebase, build context, and propose an approach without making changes yet.
Swarm is for multi-agent execution. Use it when the task is large enough to split across multiple agents working in parallel.
Auto lets Cosine work with less interruption and adapt its approach as the task develops.
In practice:
- Manual is a good default for focused work
- Plan is useful for ambiguous or higher-risk tasks
- Swarm is useful for larger tasks with parallel workstreams
- Auto is useful when you want a more hands-off workflow
If you are using the CLI, see CLI Modes for switching shortcuts and other interface-specific details.
Auto model mode
Section titled “Auto model mode”Auto lets Cosine choose the model for you instead of locking the session to one specific model up front.
In practice, Auto mode is designed to:
- start from a low-cost baseline when the task looks simple
- step up to stronger models when the task is ambiguous, long-horizon, or more complex
- keep the picker simple when you do not want to micro-manage model selection on every run
Auto is still part of the Cosine-managed catalog. It is not a separate provider and it does not require a third-party account such as ChatGPT or GitHub Copilot.
What Auto does
Section titled “What Auto does”When you select Auto, Cosine treats model choice as a routing decision rather than a fixed override. That means the system can pick an appropriate managed model based on the shape of the work instead of forcing one specific model alias for every prompt.
This is useful when:
- you are not sure which model is best for the task yet
- the task may evolve from a quick question into a larger implementation
- you want a sensible default without manually switching between cheap and premium models
How Auto is billed
Section titled “How Auto is billed”Auto is shown with a 0.25-2x multiplier range rather than a single fixed multiplier.
That range reflects the fact that Auto can route across multiple Cosine-managed models with different costs. The exact usage depends on which underlying model is selected for the work.
How token pricing works
Section titled “How token pricing works”When you use a model, the underlying provider charges for some combination of:
- input tokens
- output tokens
- reasoning tokens
- context-window premiums on long-context variants
Instead of exposing all of those rates directly in the product, Cosine maps each model onto a single multiplier such as 0.5x, 1x, or 3x.
Reasoning matters here because reasoning tokens count toward total token consumption. In practice, higher reasoning settings usually mean the model spends more tokens thinking before it answers, so the total cost of running that model goes up.
What the multiplier means
Section titled “What the multiplier means”A multiplier is a simple pricing weight attached to each model. Cosine uses it to turn underlying token consumption into something easier to compare in the picker.
In practice:
- Auto:
0.25-2x - GPT 5.4:
1x - GPT 5.4 1M:
1.5x - GPT 5.4 mini:
0.5x - GPT 5.4 nano:
0.25x - Codex 5.3:
1x - Kimi K2.5:
0.5x - Lumen Outpost:
0.5x - GLM 5.1:
0.25x - MiniMax M2.7:
0.5x - Qwen 3.6 Plus:
0.5x - Lumen Scout:
0.1x - Sonnet 4.6 1M:
2x - Opus 4.6 1M:
3x - Haiku 4.5:
0.25x - Gemini 3.1 Flash Lite:
0.25x - Gemini 3.1 Pro:
1x - Gemini 3 Flash:
0.25x - Nemotron 3 Super:
0.5x
For variants with larger context windows, such as 1M context models, we keep the relative premium within that model family and still round to 0.25x steps.
How to read the multiplier
Section titled “How to read the multiplier”0.25xmeans the model is one of the cheapest options in the catalog1xmeans the model is charged at the standard rate2xmeans the model consumes credits at twice the standard rate1.5xmeans the model consumes credits at one and a half times the standard rate
The multiplier is a pricing weight, not a quality score. A higher multiplier usually means a model is more expensive to run, not that it is always better for every task.
Current Cosine-hosted model catalog
Section titled “Current Cosine-hosted model catalog”Each card shows the model provider, the current multiplier, and the kinds of tasks that model is best suited for. The current Fireworks-backed options in the managed catalog are GLM 5.1 at 0.25x, MiniMax M2.7 at 0.5x, and Qwen 3.6 Plus at 0.5x.
| Model | Provider | Multiplier | Best for |
|---|---|---|---|
| Auto | | 0.25-2x | Lets Cosine choose an appropriate managed model for the task instead of locking you to one fixed model. |
| GPT 5.4 | | 1x | General-purpose OpenAI option at the standard rate. |
| GPT 5.4 1M | | 1.5x | Long-context OpenAI variant for larger working sets. |
| GPT 5.4 mini | | 0.5x | Lower-cost GPT-5.4 family model for coding, subagents, and higher-volume work. |
| GPT 5.4 nano | | 0.25x | Cheapest GPT-5.4 family option for lightweight classification, extraction, and subagent tasks. |
| Codex 5.3 | | 1x | Balanced coding model for everyday work. |
| Lumen Outpost | | 0.5x | Quality Cosine-hosted model for general-purpose coding and day-to-day implementation work. |
| Lumen Scout | | 0.1x | Lightweight Cosine-hosted model for the cheapest managed runs. |
| Kimi K2.5 | | 0.5x | Lower-cost model for faster iteration and first passes. |
| GLM 5.1 | | 0.25x | Lower-cost Fireworks-backed general-purpose option. |
| MiniMax M2.7 | | 0.5x | Fireworks-backed option suited to routine coding tasks. |
| Qwen 3.6 Plus | | 0.5x | Balanced Fireworks-backed option for coding and agentic tasks. |
| Sonnet 4.6 1M | | 2x | Capable model for harder implementation work and larger context windows. |
| Opus 4.6 1M | | 3x | High-capability model for demanding coding tasks with a larger context window. |
| Haiku 4.5 | | 0.25x | Low-cost Anthropic option for lighter work. |
| Gemini 3.1 Flash Lite | | 0.25x | Lowest-cost Gemini tier for lightweight requests. |
| Gemini 3.1 Pro | | 1x | General-purpose Gemini tier. |
| Gemini 3 Flash | | 0.25x | Lower-cost Gemini tier for quick answers. |
| Nemotron 3 Super | | 0.5x | Lower-cost reasoning model. |
Models billed outside Cosine
Section titled “Models billed outside Cosine”Some model options are accessed through your own provider account, such as ChatGPT-authenticated or GitHub Copilot-authenticated models. Those flows are billed by the external provider or subscription rather than the default Cosine-hosted catalog above.