Skip to content

Models and Pricing

Cosine exposes a single model picker, but the underlying models do not all cost the same to run. To keep pricing predictable, we convert token consumption into a simple multiplier.

Cosine also lets you choose an agent mode. Modes and models are separate controls:

  • a mode changes how the agent approaches the work
  • a model changes which underlying model runs the task

These are the main modes you will see when creating a task in Cosine:

Manual is the default single-agent workflow. Use it when you want Cosine to work through the task carefully while keeping you in control of higher-risk actions.

Plan is for research and planning before execution. Use it when you want Cosine to inspect the codebase, build context, and propose an approach without making changes yet.

Swarm is for multi-agent execution. Use it when the task is large enough to split across multiple agents working in parallel.

Auto lets Cosine work with less interruption and adapt its approach as the task develops.

In practice:

  • Manual is a good default for focused work
  • Plan is useful for ambiguous or higher-risk tasks
  • Swarm is useful for larger tasks with parallel workstreams
  • Auto is useful when you want a more hands-off workflow

If you are using the CLI, see CLI Modes for switching shortcuts and other interface-specific details.

Auto lets Cosine choose the model for you instead of locking the session to one specific model up front.

In practice, Auto mode is designed to:

  • start from a low-cost baseline when the task looks simple
  • step up to stronger models when the task is ambiguous, long-horizon, or more complex
  • keep the picker simple when you do not want to micro-manage model selection on every run

Auto is still part of the Cosine-managed catalog. It is not a separate provider and it does not require a third-party account such as ChatGPT or GitHub Copilot.

When you select Auto, Cosine treats model choice as a routing decision rather than a fixed override. That means the system can pick an appropriate managed model based on the shape of the work instead of forcing one specific model alias for every prompt.

This is useful when:

  • you are not sure which model is best for the task yet
  • the task may evolve from a quick question into a larger implementation
  • you want a sensible default without manually switching between cheap and premium models

Auto is shown with a 0.25-2x multiplier range rather than a single fixed multiplier.

That range reflects the fact that Auto can route across multiple Cosine-managed models with different costs. The exact usage depends on which underlying model is selected for the work.

When you use a model, the underlying provider charges for some combination of:

  • input tokens
  • output tokens
  • reasoning tokens
  • context-window premiums on long-context variants

Instead of exposing all of those rates directly in the product, Cosine maps each model onto a single multiplier such as 0.5x, 1x, or 3x.

Reasoning matters here because reasoning tokens count toward total token consumption. In practice, higher reasoning settings usually mean the model spends more tokens thinking before it answers, so the total cost of running that model goes up.

A multiplier is a simple pricing weight attached to each model. Cosine uses it to turn underlying token consumption into something easier to compare in the picker.

In practice:

  • Auto: 0.25-2x
  • GPT 5.4: 1x
  • GPT 5.4 1M: 1.5x
  • GPT 5.4 mini: 0.5x
  • GPT 5.4 nano: 0.25x
  • Codex 5.3: 1x
  • Kimi K2.5: 0.5x
  • Lumen Outpost: 0.5x
  • GLM 5.1: 0.25x
  • MiniMax M2.7: 0.5x
  • Qwen 3.6 Plus: 0.5x
  • Lumen Scout: 0.1x
  • Sonnet 4.6 1M: 2x
  • Opus 4.6 1M: 3x
  • Haiku 4.5: 0.25x
  • Gemini 3.1 Flash Lite: 0.25x
  • Gemini 3.1 Pro: 1x
  • Gemini 3 Flash: 0.25x
  • Nemotron 3 Super: 0.5x

For variants with larger context windows, such as 1M context models, we keep the relative premium within that model family and still round to 0.25x steps.

  • 0.25x means the model is one of the cheapest options in the catalog
  • 1x means the model is charged at the standard rate
  • 2x means the model consumes credits at twice the standard rate
  • 1.5x means the model consumes credits at one and a half times the standard rate

The multiplier is a pricing weight, not a quality score. A higher multiplier usually means a model is more expensive to run, not that it is always better for every task.

Each card shows the model provider, the current multiplier, and the kinds of tasks that model is best suited for. The current Fireworks-backed options in the managed catalog are GLM 5.1 at 0.25x, MiniMax M2.7 at 0.5x, and Qwen 3.6 Plus at 0.5x.

Model Provider Multiplier Best for
Auto
Cosine logo Cosine
0.25-2x Lets Cosine choose an appropriate managed model for the task instead of locking you to one fixed model.
GPT 5.4
OpenAI logo OpenAI
1x General-purpose OpenAI option at the standard rate.
GPT 5.4 1M
OpenAI logo OpenAI
1.5x Long-context OpenAI variant for larger working sets.
GPT 5.4 mini
OpenAI logo OpenAI
0.5x Lower-cost GPT-5.4 family model for coding, subagents, and higher-volume work.
GPT 5.4 nano
OpenAI logo OpenAI
0.25x Cheapest GPT-5.4 family option for lightweight classification, extraction, and subagent tasks.
Codex 5.3
OpenAI logo OpenAI
1x Balanced coding model for everyday work.
Lumen Outpost
Cosine logo Cosine
0.5x Quality Cosine-hosted model for general-purpose coding and day-to-day implementation work.
Lumen Scout
Cosine logo Cosine
0.1x Lightweight Cosine-hosted model for the cheapest managed runs.
Kimi K2.5
Moonshot logo Moonshot
0.5x Lower-cost model for faster iteration and first passes.
GLM 5.1
Z AI logo Z AI
0.25x Lower-cost Fireworks-backed general-purpose option.
MiniMax M2.7
MiniMax logo MiniMax
0.5x Fireworks-backed option suited to routine coding tasks.
Qwen 3.6 Plus
Qwen logo Qwen
0.5x Balanced Fireworks-backed option for coding and agentic tasks.
Sonnet 4.6 1M
Anthropic logo Anthropic
2x Capable model for harder implementation work and larger context windows.
Opus 4.6 1M
Anthropic logo Anthropic
3x High-capability model for demanding coding tasks with a larger context window.
Haiku 4.5
Anthropic logo Anthropic
0.25x Low-cost Anthropic option for lighter work.
Gemini 3.1 Flash Lite
Google logo Google
0.25x Lowest-cost Gemini tier for lightweight requests.
Gemini 3.1 Pro
Google logo Google
1x General-purpose Gemini tier.
Gemini 3 Flash
Google logo Google
0.25x Lower-cost Gemini tier for quick answers.
Nemotron 3 Super
NVIDIA logo NVIDIA
0.5x Lower-cost reasoning model.

Some model options are accessed through your own provider account, such as ChatGPT-authenticated or GitHub Copilot-authenticated models. Those flows are billed by the external provider or subscription rather than the default Cosine-hosted catalog above.