Token Windows and Reasoning Effort
Two settings in the Cosine CLI have a significant impact on how well your agent performs: the token window (context) and the reasoning effort level. Understanding them helps you get more out of every session.
The Token Window
Section titled “The Token Window”Every AI model has a context window — the amount of information it can hold in “working memory” at once. The Cosine CLI shows you a percentage in the footer that represents how full this window currently is.
| Window % | What it means |
|---|---|
| 0% | Blank slate — no accumulated context. The agent acts as a base model. |
| ~10–30% | Some context built up — the agent has read relevant files and history. |
| ~50–80% | Rich context — the agent has absorbed significant domain knowledge for this task. |
More context = better, more directed output.
When the window is fuller, the model’s behaviour is shaped toward your specific task, folder, and preferences. This is why:
- Reusing the same session folder over time is more effective than starting fresh every time.
- Plan Mode (which builds context before acting) produces better results than jumping straight to execution.
- Returning to an existing session is more powerful than opening a new one.
Context Compacting
Section titled “Context Compacting”When your context window gets very full, the CLI will automatically compact it — summarising older parts of the conversation to free up space for new content. This is normal and expected in long sessions.
Reasoning Effort
Section titled “Reasoning Effort”The Reasoning Effort setting controls how much internal “thinking” the model does before producing a response. You can set it to Low, Medium, or High. It’s configurable in Configuration.
At higher reasoning effort settings:
- The model generates more internal tokens (an internal monologue, essentially).
- It thinks through the problem more carefully before acting.
- The output is generally higher quality.
- The response takes longer.
- Token usage increases.
At lower reasoning effort settings:
- Responses are faster.
- Token usage is lower.
- Suitable for quick, simple tasks where deep reasoning isn’t needed.
Practical Guidance
Section titled “Practical Guidance”For most substantive tasks, Medium or High reasoning effort produces the best results. For fast, small edits or quick lookups, Low is fine.
A good general default: use Codex on High or Sonnet on Medium for solid everyday performance.
The Connection Between the Two
Section titled “The Connection Between the Two”More tokens in the context window + higher reasoning effort = the model is doing more total thinking. This is why complex, high-quality tasks cost more tokens — the agent is doing more work. Token cost is handled at the account level; as a user, focus on quality over cost optimisation.
Key Takeaways
Section titled “Key Takeaways”- The token window percentage shows how much context the agent has built up. More is better.
- Reusing existing sessions and folders fills up the context window with relevant knowledge.
- Reasoning Effort controls the depth of the model’s internal thinking before acting.
- High reasoning effort = better quality, slower responses, more token usage.
- A good general default: Codex on High or Sonnet on Medium.
Next: Choosing the Right Model