Choosing the Right Model

The Cosine CLI gives you access to a range of AI models. Each has different characteristics in terms of speed, cost, reasoning depth, and output style. Over time, you’ll develop your own preferences — but here’s a practical starting point.

In the current managed catalog, the premium OpenAI options include GPT 5.5, GPT 5.4, GPT 5.4 1M, GPT 5.4 mini, and Codex 5.3. The Fireworks-backed options shown in the picker include Kimi K2.6, DeepSeek v4 Pro, GLM 5.1, MiniMax M2.7, and Qwen 3.6 Plus. Older internal names such as glm-5 and minimax-m2.5 still exist as compatibility aliases where needed, but user-facing docs and picker labels now use the refreshed names.

Model Characteristics at a Glance

Model	Speed	Cost	Best for
GPT 5.5	Slow	High (`2x`)	Hard long-horizon implementation, complex refactors, and high-stakes debugging
GPT 5.4	Medium	Medium (`1x`)	Strong OpenAI reasoning without the premium tier
GPT 5.4 1M	Medium	Medium-high (`1.5x`)	Large-context tasks and oversized repositories
GPT 5.4 mini	Fast	Low (`0.5x`)	Cheaper GPT-family coding and subagent work
Codex 5.3	Fast	Medium (`1x`)	General coding, everyday tasks
GLM 5.1	Very fast	Low (`0.25x`)	Cheap quick checks, lightweight edits, short exploratory runs
MiniMax M2.7	Fast	Low (`0.5x`)	Rapid iteration on routine coding tasks
Qwen 3.6 Plus	Medium	Low (`0.5x`)	Balanced Fireworks-backed coding and agentic work
DeepSeek v4 Pro	Medium	Low (`0.5x`)	Long-context Fireworks-backed coding and agentic work
Sonnet 4.6	Medium	Medium (`2x`)	Nuanced writing, structured reasoning, polished output
Opus 4.6	Slow	High (`3x`)	Deep reasoning, complex multi-step tasks
Gemini 3.1 Pro	Slow	Medium (`1x`)	Alternative perspective and large-context tasks
Kimi K2.5	Very fast	Low (`0.5x`)	Quick first drafts and exploratory work

Practical Defaults

If you’re not sure which model to use, these are solid defaults:

GPT 5.4 on High reasoning — great all-rounder for tasks involving code and structured thinking.
GPT 5.5 on High reasoning — best for the hardest long-horizon coding tasks when quality matters more than cost.
Sonnet on Medium reasoning — better for writing tasks, prose, and anything where tone and nuance matter.
Qwen 3.6 Plus — a good lower-cost managed default when you want a balanced Fireworks-backed option for coding and agentic tasks.
DeepSeek v4 Pro — a long-context Fireworks-backed option for larger coding and agentic tasks at the same 0.5x tier.
MiniMax M2.7 — a strong pick for quick iteration when you want something inexpensive but less bare-bones than the very cheapest tier.

You don’t need to optimise aggressively across models. The long-term goal is that model selection becomes increasingly automatic. For now, defaulting to Codex or Sonnet for most tasks is a reliable approach.

When Speed Matters

If you’re in the middle of a task and need a quick answer or small edit without interrupting your flow, switch to a faster low-cost model like GLM 5.1, MiniMax M2.7, or Kimi K2.5. Fast models are great for:

Sanity-checking a quick idea
Small, well-defined code changes
Getting a first draft to react to

If cost matters as much as speed, GLM 5.1 is the cheapest Fireworks-backed option in the current picker at 0.25x.

When Quality Matters More Than Speed

For complex multi-step tasks running in the background — especially in Swarm Mode — the extra time a larger model takes is worth it. Use GPT 5.5, Opus 4.6, or GPT 5.4 on High when:

You’re producing something that needs to be good the first time
The task involves deep reasoning or multiple interdependent decisions
You’re running it in the background anyway and won’t be waiting

If you want a middle ground between the cheapest tiers and the premium models, Qwen 3.6 Plus is a sensible step up before you reach for Opus.

Models and Personas

Different models have genuinely different characteristics — not just in capability, but in style and “feel.” Some users find Claude models (Sonnet, Opus) produce more visually pleasing HTML and more natural-sounding prose. GPT and Codex models tend to be more precise and structured. Among the lower-cost managed options, MiniMax M2.7 and Qwen 3.6 Plus are both worth trying if you want a different style without moving to a premium tier.

Running the same task with two different models in parallel (one as the main agent, one as a fresh session with no context) is a useful technique for getting varied perspectives on the same problem.

Key Takeaways

GPT 5.4 on High or Sonnet on Medium are good everyday defaults.
Use fast low-cost models such as GLM 5.1, MiniMax M2.7, or Kimi K2.5 for quick in-flow tasks.
Use Qwen 3.6 Plus when you want a balanced Fireworks-backed option at 0.5x.
Use DeepSeek v4 Pro when you want a long-context Fireworks-backed option at 0.5x.
Use larger models (GPT 5.5, Opus 4.6, GPT 5.4 on High) for complex background tasks.
You’ll build your own preferences over time — but don’t over-optimise early.

Next: What are MCPs and Why Do They Matter?