Choosing the Right Model
The Cosine CLI gives you access to a range of AI models. Each has different characteristics in terms of speed, cost, reasoning depth, and output style. Over time, you’ll develop your own preferences — but here’s a practical starting point.
In the current managed catalog, the Fireworks-backed options shown in the picker are GLM 5.1, MiniMax M2.7, and Qwen 3.6 Plus. Older internal names such as glm-5 and minimax-m2.5 still exist as compatibility aliases where needed, but user-facing docs and picker labels now use the refreshed names.
Model Characteristics at a Glance
Section titled “Model Characteristics at a Glance”| Model | Speed | Cost | Best for |
|---|---|---|---|
| Codex 5.3 | Fast | Medium (1x) | General coding, everyday tasks |
| GLM 5.1 | Very fast | Low (0.25x) | Cheap quick checks, lightweight edits, short exploratory runs |
| MiniMax M2.7 | Fast | Low (0.5x) | Rapid iteration on routine coding tasks |
| Qwen 3.6 Plus | Medium | Low (0.5x) | Balanced Fireworks-backed coding and agentic work |
| Sonnet 4.6 | Medium | Medium (2x) | Nuanced writing, structured reasoning, polished output |
| Opus 4.6 | Slow | High (3x) | Deep reasoning, complex multi-step tasks |
| Gemini 3.1 Pro | Slow | Medium (1x) | Alternative perspective and large-context tasks |
| Kimi K2.5 | Very fast | Low (0.5x) | Quick first drafts and exploratory work |
Practical Defaults
Section titled “Practical Defaults”If you’re not sure which model to use, these are solid defaults:
- Codex on High reasoning — great all-rounder for tasks involving code and structured thinking.
- Sonnet on Medium reasoning — better for writing tasks, prose, and anything where tone and nuance matter.
- Qwen 3.6 Plus — a good lower-cost managed default when you want a balanced Fireworks-backed option for coding and agentic tasks.
- MiniMax M2.7 — a strong pick for quick iteration when you want something inexpensive but less bare-bones than the very cheapest tier.
You don’t need to optimise aggressively across models. The long-term goal is that model selection becomes increasingly automatic. For now, defaulting to Codex or Sonnet for most tasks is a reliable approach.
When Speed Matters
Section titled “When Speed Matters”If you’re in the middle of a task and need a quick answer or small edit without interrupting your flow, switch to a faster low-cost model like GLM 5.1, MiniMax M2.7, or Kimi K2.5. Fast models are great for:
- Sanity-checking a quick idea
- Small, well-defined code changes
- Getting a first draft to react to
If cost matters as much as speed, GLM 5.1 is the cheapest Fireworks-backed option in the current picker at 0.25x.
When Quality Matters More Than Speed
Section titled “When Quality Matters More Than Speed”For complex multi-step tasks running in the background — especially in Swarm Mode — the extra time a larger model takes is worth it. Use Opus 4.6 or Codex on High when:
- You’re producing something that needs to be good the first time
- The task involves deep reasoning or multiple interdependent decisions
- You’re running it in the background anyway and won’t be waiting
If you want a middle ground between the cheapest tiers and the premium models, Qwen 3.6 Plus is a sensible step up before you reach for Opus.
Models and Personas
Section titled “Models and Personas”Different models have genuinely different characteristics — not just in capability, but in style and “feel.” Some users find Claude models (Sonnet, Opus) produce more visually pleasing HTML and more natural-sounding prose. Codex models tend to be more precise and structured. Among the lower-cost managed options, MiniMax M2.7 and Qwen 3.6 Plus are both worth trying if you want a different style without moving to a premium tier.
Running the same task with two different models in parallel (one as the main agent, one as a fresh session with no context) is a useful technique for getting varied perspectives on the same problem.
Key Takeaways
Section titled “Key Takeaways”- Codex on High or Sonnet on Medium are good everyday defaults.
- Use fast low-cost models such as GLM 5.1, MiniMax M2.7, or Kimi K2.5 for quick in-flow tasks.
- Use Qwen 3.6 Plus when you want a balanced Fireworks-backed option at
0.5x. - Use larger models (Opus 4.6, Codex on High) for complex background tasks.
- You’ll build your own preferences over time — but don’t over-optimise early.