Alistair Pullen Co-founder & CEO twitter-icon

May 6, 2025 • 9 mins to read

We started post-training human reasoning traces into LLMs early last year to bring a new level of performance to our AI software engineer. The result is a highly capable agent which can be left to complete tasks asynchronously, or worked with collaboratively – Genie is in the drivers seat and you're the copilot.

Genie 2 represents the culmination of all of the product and model testing we've been doing since our announcement last summer. In short Genie 2 is a fully autonomous software engineering agent which is trained to retrieve data from large codebases, plan solutions, make changes and then iteratively test those changes before reverting to the user for more instructions.

The model now exhibits state-of-the-art performance on the SWE-Lancer agentic coding benchmark. We chose this benchmark because in our view it best mimics the real-world flow of a SWE, particularly in the types of tasks included and the developer's workflow in solving them.

Our SWE-Lancer run earned: $88,250

Swelancer Issues	237
Ran	237	100%
Success/Passed	116	~49%
Success/Ungraded	4	~1.5%
Success/Failed	9	~4%
Error	4	~1.5%
Failure	67	~28%
No result	37	~16%

An AI-first agent environment

One of the biggest advances in Genie 2 lies under the hood: we gave it a native execution environment to test and run code, instead of the CI-based feedback loop we relied on before. With Genie 1, the agent had to rely on external Continuous Integration pipelines to find out if its code worked. We did this as a stop-gap solution as building a code-execution environment is an enormous undertaking. That meant committing code, waiting for tests to run, and parsing the CI results was a slow and indirect process.

Genie 2 changes all of that. It now executes code natively within our platform. The moment Genie writes code, it can compile it, run unit tests, or launch the app in a sandboxed environment to validate its work. Feedback is instant and tightly integrated into Genie’s reasoning loop. This native execution not only speeds up development, but also enables Genie 2 to catch issues on the fly and iteratively refine its solutions just like a human would when running code locally.

Equally important, we’ve made this advanced execution completely invisible to the user. Unlike some solutions, we don’t require you to provision persistent machines or pay for “compute credits” to keep Genie awake and running. There are no ACUs (Agentic Compute Units) or cloud VMs for you to manage with Genie 2. The heavy lifting happens behind the scenes on Cosine’s infrastructure. From your perspective, Genie just works – it runs all necessary builds and tests as part of its service, then spins down resources automatically. This means you get a clean, seamless coding experience without having the feeling that you're on the clock as your ACUs deplete over time. Genie 2 delivers the power of cloud execution with the convenience of a SaaS product: simply ask it to perform a task, and it will handle everything needed to verify and ship that code.

Real-World Engineering Workflow

When building the product around the model, we had two core tenants: meet developers where they work and support both autonomy and collaboration. Software engineering doesn’t happen in a vacuum, and Genie 2 is designed to fit directly into your team’s workflow – whether you want to work asynchronously or collaborate in real-time.

In asynchronous mode, Genie 2 behaves like an autonomous teammate working in the background. You can assign it tasks or user stories, then go about your day while Genie crafts a solution. For example, you might create a Jira ticket or Linear issue for a new feature – and assign it to Genie. Genie 2 will pick up the task, branch the code, implement the changes, and even open a pull request when it’s done. It communicates progress through integrations: our Slack app will post updates or ask for clarifications in your team channel just as a human engineer would. You can head to lunch or go for a coffee and return to find that Genie has completed the task and posted the results for review.

In collaborative mode, Genie 2 can also pair-program with you in real time, the key difference being with existing copilot like products is that Genie is still in charge and making the changes. You can, of course, lean over Genie's shoulder and directly make changes, but from our experience this is a rare phenomenon. This is perfect for more complex or high-stakes changes where you want a tighter feedback loop. Using Cosine’s development interface, you and Genie share the same context and work together on the code. Genie might draft a function or suggest a refactor, and you can watch it happen live in the editor. If Genie goes a bit off-track, you can step in – edit the code or give a quick pointer – and Genie will immediately take that into account and continue. It’s a fluid back-and-forth, very much like working with a human colleague sitting beside you. Thanks to Cosine's platform being integrated with GitHub, any changes you and Genie make can be committed and pushed with a click.

We’ve also built first-class integrations with Slack, Jira, and Linear to bring Genie into the tools you already use. For example, you can chat with Genie in Slack to kick off a task whilst on-the-go. Imagine getting a Slack alert of an error in production and @mentioning Genie in a thread for it to go and open up a PR fixing the issue. If your team lives in Jira or Linear, Genie can update tickets automatically as it progresses on tasks, adding links to code diffs or comments on what it’s done. Genie 2 isn’t just an editor plugin – it’s an AI team member that communicates and works across the same collaboration hubs your human team does.

The post-IDE World

A deliberate choice we made with Genie 2 was not to shoehorn it into a traditional IDE like VS Code or vice-versa. While many AI coding tools run as extensions inside existing editors, we took a different path: we built a custom platform around Genie from the ground up. Why? Because legacy IDEs were never designed for autonomous agents.

Traditional IDEs assume a human at the keyboard – clicking, typing, and manually driving every step. In contrast, Genie 2 needs the freedom to explore a codebase, make multi-file edits, run tests, and iterate on its own. Bolting an “AI assistant” onto VS Code might give you autocompletions or a chat window, but the environment would still expect a person to be in control. It's analogous to how the first wave of autonomous cars look like regular cars and still have steering wheels and pedals, this is a quick and easy way of getting the technology out into the world, but no-one believes that this is the final form of driverless car. Genie makes the leap from using an existing foundation designed for yesterday's development to a purpose built one. We foresaw that to truly unlock an AI engineer, we’d need to shed those old constraints and create an AI-native development environment.

By building our own lightweight IDE specifically for Genie, we unlocked several key advantages:

Rapid & Lightweight – Our Genie environment runs in the browser with minimal bloat. It loads nearly instantly thanks to our highly optimised workspace management layer written in Go. This lean core means Genie 2 can work faster, too. There’s no legacy layer slowing down analysis or code generation. Every bit of performance goes into serving your code.
Rapid Iteration – Because we control the entire stack, we can improve Genie’s environment at breakneck speed. We’re not beholden to upstream IDE releases or plugin APIs. When we invent a new feature for AI-assisted coding, we can ship it immediately. Our clean-slate approach lets us focus 100% on Genie’s evolution and our users’ needs.
Autonomy-Oriented Workflow – Most importantly, our platform was built around autonomous workflows from day one. Genie 2 can execute long-running tasks, handle project-wide refactors, or orchestrate complex tool usage seamlessly in our environment. In a vanilla IDE, an agent would be fighting against the grain to do these things. Our system welcomes it – for example, Genie can open multiple files concurrently, run tests in the background, generate commit diffs, all without a human clicking buttons. The result is an AI that truly acts on your behalf, rather than one that just whispers suggestions while you still do all the busywork.

In short, we didn’t embed Genie 2 into an IDE – we built the IDE around Genie. This “no VS Code inside” strategy might seem unconventional, but it gives us a nimble, purpose-built platform for AI-first software development.

What’s Next for Genie

Genie 2 is a big step forward for Cosine, but it’s only the beginning of what we have planned. We’re already hard at work on the next generation Genie 3, and it’s going to be even more transformative. While we can’t share too many details yet, here’s a small taste of what’s coming:

Our next-gen model is currently being trained using reinforcement learning in a full execution environment. In other words, we’re teaching Genie 3 by letting it loose in the same kind of rich sandbox where it will eventually operate. It will have access to all the tools a human engineer has – compilers, debuggers, documentation, internet resources – and learn through trial and error with feedback. This training approach will further align the AI’s behaviour with how a human engineer thinks and adapts. We expect Genie 3 to be not just an extremely capable SWE, but a highly general problem-solving agent that can reason and act in complex scenarios with even greater autonomy.

Genie 3’s roadmap includes tackling some challenges that Genie 2 can’t yet solve, with an eye toward finally exceeding human-level performance on the toughest tasks. Imagine an AI engineer that not only writes and tests code, but also optimises architecture, makes intelligent design decisions, and can handle abstract planning all on its own. That’s the future we’re building toward. We’re incredibly excited about the progress so far – early experiments are promising – but we’ll save the full reveal for when Genie 3 is ready for prime time. Stay tuned, because the future of AI-assisted development is about to accelerate even more.

Genie 2 is available right now for developers to use. We’ve dropped the waitlists and invite codes – if you want to experience this new way of building software, just head to cosine.sh and sign up. Getting started is as simple as importing a repo or creating a new project in Cosine’s web platform, and Genie will be ready to take the lead.

Happy coding with Genie 2!

Alistair Pullen • Co-founder & CEO

@AlistairPullen

May 6, 2025 • 9 mins to read