Engineering

Multi-Agent Orchestration for Enterprise Coding Agents: What we shared at re:Invent 2025

Learn how Cosine trains orchestrator and worker agents to make smaller, deployment-friendly models handle long-horizon enterprise coding. Shared at AWS re:Invent 2025.

Feb 13, 2026

At AWS re:Invent 2025, we joined AWS’s Generative AI Innovation Center team to discuss how to fine-tune LLMs for multi-agent orchestration, and why that matters when you’re building software agents for large enterprises.

This article focuses primarily on what our CEO and Co-Founder, Alistair Pullen, covered in the talk. Specifically, how Cosine trains and runs specialized orchestrator and worker agents to make smaller, deployment-friendly models behave like capable long-horizon software engineering systems.

Watch the full talk here:

Why multi-agent systems show up in real enterprise deployments

In the talk, AWS framed an agent as an autonomous software system built on top of an LLM, with tools and memory, designed to plan and take actions. The significance for enterprises arises when you step into real workflows (codebases, policies, CI, tickets, approvals), because in this instance, you need systems that can decompose tasks, call tools, and iterate.

A common architecture pattern today is:

  • Orchestrator agent: a higher-level planner that breaks down the goal and delegates subtasks
  • Worker agents: specialized executors (code search/edits/tests), operating with a narrower context

This orchestrator and workers pattern is now widely discussed in agent frameworks (including parallelizable multi-agent designs).

The AWS team also highlighted why this trend accelerated: better reasoning models, more model-size options, stronger tool integrations, and emerging protocols for connecting systems and tools.

Cosine: long-horizon coding on constrained infrastructure

Cosine builds coding agents for large enterprises, often in regulated environments, with deployment options including single-tenant VPCs and fully air-gapped on-prem. Under those constraints, running frontier models isn’t always feasible (due to governance, provenance, cost, and GPU footprint).

Alistair’s core point:

"Multi-agent orchestration is a practical way to close the capability gap of smaller models, especially for long-horizon software engineering tasks."

Smaller models tend to struggle with long trajectories for several reasons (training distributions, tool use, or even architectural choices that limit effective long-context behavior). In those cases, an orchestrator can keep the system on track by:

  • Decomposing work into manageable subtasks
  • Prompting iteration and self-checks
  • Preventing drift across long sequences

How Cosine trains orchestrator and worker systems

A key theme in our section was that you don’t get reliable multi-agent systems by wiring tools together and hoping for the best. You need post-training, distillation, and tight coupling between training-time and runtime scaffolding.

1) Start with a strong teacher model and generate trajectories

Cosine often begins with a highly capable model to produce correct solution trajectories for software engineering tasks. Those trajectories become training signals.

This matches a general model distillation pattern: use a teacher to produce high-quality outputs, then train a smaller student to imitate it, typically via supervised fine-tuning.

2) Distill into smaller deployable models via supervised fine-tuning (SFT)

The practical challenge is always data. Alistair gave a concrete example: early on, manually curating trajectories is slow and expensive. Distillation changes the economics by converting the teacher’s successful runs into labeled training pairs.

This aligns with the AWS framing: SFT is best when the agent needs to learn domain patterns, structured outputs, or reliable behaviors that foundation models don’t consistently provide out of the box.

3) Use reinforcement fine-tuning (RFT) with verifiable rewards for coding

SFT gets a model in the right frame of mind. Still, reinforcement fine-tuning pushes it toward robust tool use and long-horizon decision-making – especially when you can define verifiable outcomes (e.g. tests passing).

In the talk, Sharlina described GRPO as a commonly used approach in practice; a widely cited reference introducing GRPO appears in DeepSeekMath.

4) Train the orchestrator separately from the worker

Alistair emphasized that training an orchestrator is a different discipline from training a worker. The orchestrator must learn when to:

  • Call a worker
  • Ask for follow-up iterations
  • Stop and mark “complete”
  • Manage context across sub-agents

Operationally, Cosine treats worker outputs like tool responses: diffs, structured artifacts, execution results, and the orchestrator evaluates whether they satisfy the user’s intent.

5) Use multi-LoRA for tight GPU budgets

In the most constrained deployments, Cosine uses a multi-LoRA approach: swap adapters to change the personality (orchestrator vs worker) on the same base model. LoRA is a well-established parameter-efficient fine-tuning method that adds trainable low-rank matrices while keeping base weights frozen.

Watch the talk for the full details

This write-up captures some of our core ideas, but the walkthrough shows the full picture, especially the training flow and the deployment constraints that drive design decisions. If you’re building agents for enterprise software engineering (or you’re trying to make smaller models behave like reliable operators), the session is worth watching in full.

Watch now.