Skip to content

Choosing the Right Model

The Cosine CLI gives you access to a range of AI models. Each has different characteristics in terms of speed, cost, reasoning depth, and output style. Over time, you’ll develop your own preferences — but here’s a practical starting point.

In the current managed catalog, the Fireworks-backed options shown in the picker are GLM 5.1, MiniMax M2.7, and Qwen 3.6 Plus. Older internal names such as glm-5 and minimax-m2.5 still exist as compatibility aliases where needed, but user-facing docs and picker labels now use the refreshed names.

ModelSpeedCostBest for
Codex 5.3FastMedium (1x)General coding, everyday tasks
GLM 5.1Very fastLow (0.25x)Cheap quick checks, lightweight edits, short exploratory runs
MiniMax M2.7FastLow (0.5x)Rapid iteration on routine coding tasks
Qwen 3.6 PlusMediumLow (0.5x)Balanced Fireworks-backed coding and agentic work
Sonnet 4.6MediumMedium (2x)Nuanced writing, structured reasoning, polished output
Opus 4.6SlowHigh (3x)Deep reasoning, complex multi-step tasks
Gemini 3.1 ProSlowMedium (1x)Alternative perspective and large-context tasks
Kimi K2.5Very fastLow (0.5x)Quick first drafts and exploratory work

If you’re not sure which model to use, these are solid defaults:

  • Codex on High reasoning — great all-rounder for tasks involving code and structured thinking.
  • Sonnet on Medium reasoning — better for writing tasks, prose, and anything where tone and nuance matter.
  • Qwen 3.6 Plus — a good lower-cost managed default when you want a balanced Fireworks-backed option for coding and agentic tasks.
  • MiniMax M2.7 — a strong pick for quick iteration when you want something inexpensive but less bare-bones than the very cheapest tier.

You don’t need to optimise aggressively across models. The long-term goal is that model selection becomes increasingly automatic. For now, defaulting to Codex or Sonnet for most tasks is a reliable approach.

If you’re in the middle of a task and need a quick answer or small edit without interrupting your flow, switch to a faster low-cost model like GLM 5.1, MiniMax M2.7, or Kimi K2.5. Fast models are great for:

  • Sanity-checking a quick idea
  • Small, well-defined code changes
  • Getting a first draft to react to

If cost matters as much as speed, GLM 5.1 is the cheapest Fireworks-backed option in the current picker at 0.25x.

For complex multi-step tasks running in the background — especially in Swarm Mode — the extra time a larger model takes is worth it. Use Opus 4.6 or Codex on High when:

  • You’re producing something that needs to be good the first time
  • The task involves deep reasoning or multiple interdependent decisions
  • You’re running it in the background anyway and won’t be waiting

If you want a middle ground between the cheapest tiers and the premium models, Qwen 3.6 Plus is a sensible step up before you reach for Opus.

Different models have genuinely different characteristics — not just in capability, but in style and “feel.” Some users find Claude models (Sonnet, Opus) produce more visually pleasing HTML and more natural-sounding prose. Codex models tend to be more precise and structured. Among the lower-cost managed options, MiniMax M2.7 and Qwen 3.6 Plus are both worth trying if you want a different style without moving to a premium tier.

Running the same task with two different models in parallel (one as the main agent, one as a fresh session with no context) is a useful technique for getting varied perspectives on the same problem.

  • Codex on High or Sonnet on Medium are good everyday defaults.
  • Use fast low-cost models such as GLM 5.1, MiniMax M2.7, or Kimi K2.5 for quick in-flow tasks.
  • Use Qwen 3.6 Plus when you want a balanced Fireworks-backed option at 0.5x.
  • Use larger models (Opus 4.6, Codex on High) for complex background tasks.
  • You’ll build your own preferences over time — but don’t over-optimise early.

Next: What are MCPs and Why Do They Matter?