Robert Gibson Product Marketing twitter-icon

February 13, 2026 • 4 mins to read

At AWS re:Invent 2025, we joined AWS’s Generative AI Innovation Centre team to discuss how to fine-tune LLMs for multi-agent orchestration, and why that matters when you’re building software agents for large enterprises.

This post focuses primarily on what our CEO and Cofounder, Alistair Pullen, covered in the talk. Specifically, how Cosine trains and runs specialised orchestrator and worker agents to make smaller, deployment-friendly models behave like capable long-horizon software engineering systems.

Watch the full talk here:

Why multi-agent systems show up in real enterprise deployments

In the talk, AWS framed an agent as an autonomous software system built on top of an LLM, with tools and memory, designed to plan and take actions. The significance for enterprises arises when you step into real workflows (codebases, policies, CI, tickets, approvals), because in this instance, you need systems that can decompose tasks, call tools, and iterate.

A common architecture pattern today is:

Orchestrator agent – a higher-level planner that breaks down the goal and delegates subtasks
Worker agents – specialised executors (code search/edits/tests), operating with a narrower context

This orchestrator and workers pattern is now widely discussed in agent frameworks (including parallelizable multi-agent designs).

The AWS team also highlighted why this trend accelerated: better reasoning models, more model-size options, stronger tool integrations, and emerging protocols for connecting systems and tools.

Cosine: long-horizon coding on constrained infrastructure

Cosine builds coding agents for large enterprises, often in regulated environments, with deployment modes that can include single-tenant VPCs and fully air-gapped on-prem. Under those constraints, running frontier models isn’t always feasible (governance, provenance, cost, GPU footprint).

Alistair’s core point:

‘Multi-agent orchestration is a practical way to close the capability gap of smaller models, especially for long-horizon software engineering tasks.’

Smaller models tend to struggle with long trajectories for several reasons (training distribution, tool-use, or even architectural choices that limit effective long-context behaviour). In those cases, an orchestrator can keep the system on track by:

Decomposing work into manageable subtasks
Prompting iteration and self-checks
Preventing drift across long sequences

How Cosine trains orchestrator and worker systems

A key theme in our section was that you don’t get reliable multi-agent systems by wiring tools together and hoping for the best. You need post-training, distillation, and tight coupling between training-time and runtime scaffolding.

1) Start with a strong teacher model and generate trajectories

Cosine often begins with a highly capable model to produce correct solution trajectories for software engineering tasks. Those trajectories become training signals.

This matches a general model distillation pattern: use a teacher to produce high-quality outputs, then train a smaller student to imitate it, typically via supervised fine-tuning.

2) Distil into smaller deployable models via supervised fine-tuning (SFT)

The practical challenge is always data. Alistair gave a concrete example: early on, manually curating trajectories is slow and expensive. Distillation changes the economics by converting the teacher’s successful runs into labelled training pairs.

This aligns with the AWS framing: SFT is best when the agent needs to learn domain patterns, structured outputs, or reliable behaviours that foundation models don’t consistently provide out of the box.

3) Use reinforcement fine-tuning (RFT) with verifiable rewards for coding

SFT gets a model in the right frame of mind. Still, reinforcement fine-tuning pushes it toward robust tool-use and long-horizon decision-making – especially when you can define verifiable outcomes (e.g. tests passing).

In the talk, Sharlina described GRPO as a commonly used approach in practice; one widely cited reference that introduces GRPO is in DeepSeekMath.

4) Train the orchestrator separately from the worker

Alistair emphasised that training an orchestrator is a different discipline from training a worker. The orchestrator must learn when to:

Call a worker
Ask for follow-up iterations
Stop and mark “complete”
Manage context across sub-agents

Operationally, Cosine treats worker outputs like tool responses: diffs, structured artefacts, execution results, and the orchestrator evaluates whether they satisfy the user’s intent.

5) Use multi-LoRA for tight GPU budgets

In the most constrained deployments, Cosine uses a multi-LoRA approach: swap adapters to change the personality (orchestrator vs worker) on the same base model. LoRA is a well-established parameter-efficient fine-tuning method that adds trainable low-rank matrices while keeping base weights frozen.

Watch the talk for the full details

This write-up captures some of our core ideas, but the walkthrough shows the full picture, especially the training flow and the deployment constraints that drive design decisions. If you’re building agents for enterprise software engineering (or you’re trying to make smaller models behave like reliable operators), the session is worth watching in full.

Watch now.

Robert Gibson • Product Marketing

@RobGibson20

February 13, 2026 • 4 mins to read