Model pricing

Cosine lets you choose which model runs a task. Models have different strengths, speeds, context windows, and costs.

The model picker shows an approximate multiplier so models are easy to compare at a glance. The billing usage view breaks this down further into the credits used for input, cached input, and output tokens.

How model pricing works

When you use a model, usage is measured across different token types:

Input tokens — Context sent to the model, such as your prompt, instructions, files, memory, and tool results
Cached input tokens — Reused input context that the provider can bill at a lower rate
Output tokens — Text, tool calls, code, and other responses produced by the model

Different models price these token types differently. A model can be relatively cheap for input but more expensive for output, or vice versa.

The table below mirrors the pricing breakdown shown in the billing UI. Values are credits used per token for each token type.

Model	Provider	Approx. multiplier	Input	Cached input	Output	Long-context*
Lumen Outpost	Cosine	~0.25x	0.60	0.10	3	—
Lumen Scout	Cosine	~0.1x	0.10	0.05	0.30	—
GPT 5.5	OpenAI	~2.75x	5	0.50	30	True
GPT 5.4	OpenAI	~1.5x	2.50	0.25	15	True
GPT 5.4 1M	OpenAI	~1.5x	2.50	0.25	15	True
GPT 5.4 mini	OpenAI	~0.5x	0.75	0.075	4.50	—
Codex 5.3	OpenAI	~1x	1.75	0.175	14	—
Opus 4.7	Anthropic	~2.75x	5	0.50	25	—
Opus 4.6	Anthropic	~2.75x	5	0.50	25	—
Sonnet 4.6	Anthropic	~1.75x	3	0.30	15	—
Gemini 3.1 Pro	Google	~1x	2	0.20	12	True
Gemini 3 Flash	Google	~0.25x	0.50	0.05	3	—
Kimi K2.6	Moonshot	~0.5x	0.95	0.16	4	—
GLM 5.1	Z AI	~0.75x	1.40	0.26	4.40	—
MiniMax M2.7	MiniMax	~0.25x	0.30	0.06	1.20	—
Qwen 3.6 Plus	Qwen	~0.25x	0.50	0.10	3	—
DeepSeek v4 Pro	DeepSeek	~1x	1.74	0.14	3.48	—

Approximate multipliers

The approximate multiplier shown in the model picker is a simplified comparison against the standard Cosine-managed rate. It is useful for choosing quickly, but it is not the full billing formula.

For example:

~0.25x means the model is usually one of the lower-cost options
~1x means the model is around the standard managed rate
~2.75x means the model is usually more expensive to run

The multiplier is a pricing weight, not a quality score. A higher multiplier usually means a model costs more, not that it is always better for every task.

Long-context pricing

Models marked True in the Long-context* column can use different rates once the input context grows past a long-context threshold.

Long-context pricing matters most for large repositories, long conversations, or tasks that require Cosine to carry a lot of file and tool context at once.

Auto model mode

Auto lets Cosine choose the model for you instead of locking the session to one fixed model up front.

When you select Auto, Cosine treats model choice as a routing decision. It can start from a lower-cost baseline when a task looks simple and step up to stronger models when the task is ambiguous, long-horizon, or more complex.

Auto is still part of the Cosine-managed catalog. It is not a separate provider and does not require a third-party model account.

Models billed outside Cosine

Some model options are accessed through your own provider account, such as ChatGPT-authenticated or GitHub Copilot-authenticated models. Those flows are billed by the external provider or subscription rather than the default Cosine-hosted catalog above.

Models — Conceptual overview of model selection
Reasoning — How reasoning changes speed, depth, and cost
Swarm mode — Multi-agent execution for larger tasks