Our most productive software engineer

From a record-setting benchmark to your next AI teammate - how Genie became our most productive engineer.

Genie is able to solve bugs, build features, refactor code, and everything in between either fully autonomously or paired with the user, like working with a colleague, not just a copilot.

Experience Genie Read the full report

June 2025

Bringing a supervisor model from train time to product

Introducing AutoPM.

A monumental leap in complex feature and iterative coding performance

With AutoPM enabled Genie is closer to human level performance than ever before. Completely outshining every other agent currently on the market.

May 2025

Genie 2: State of The Art on SWE-Lancer

Breaking through the 50% barrier

Our latest Genie 2.1 iteration manages to solve just over half the published SWE-Lancer benchmark tasks.

This represents an equivilant value of $107k of human work. Out of an available $250k.

December 2024

Genie 2 early internal MVP

Building Genie 2

We took everything we learned from the dogfooding process and applied it to building Genie 2. We focused on making the model more reliable, and the product more intuitive. We also incorporated user feedback to enhance the overall experience.

Built for the post-IDE era

In this post-IDE era, Genie lets you assign any ticket — or even your entire backlog — and works fully asynchronously. You can return later to review, make small tweaks, and merge, all without opening your IDE. It writes unit tests, runs your CI, and when you want to collaborate in real time, Genie takes the lead while you copilot.

Collaborate live or offload your work

Genie is completely headless, but even the best code needs quick, on-the-spot adjustments. That’s why, in the past, when a Genie-generated pull request was 99% right, you could still feel stuck. This is why you have access to the same editor that Genie does so you can lean over its shoulder and make a change, instead of having to prompt it. It’s the closest experience we’ve had to working with a truly asynchronous teammate. We’re not claiming Genie can do everything for everyone. But it’s accelerated our own team’s velocity, and we know it can do the same for yours.

September 2024

Select customers onboarded to Genie 1

Vibe coding doesn't scale

When we started using Genie 1 in real workflows, it highlighted that several of our assumptions about how the model and product were built were off. Our benchmark scores were strong, but they didn't reflect many of the challenges that show up in real-world use.

We learned the hard way that vibe coding doesn't scale, especially when working autonomously across large, existing codebases. So our approach had to become more nuanced.

Genie 1 commits in our GitHub repository

By this time Genie 1 had already become our most contributive developer internally

August 2024

New state of the art SWE Bench score with Genie 1

The start of something big

Last summer, we made real progress.

We achieved the biggest score jump in SWE-Bench history and figured out how to generate billions of tokens of synthetic data that mimicked human reasoning, long before reasoning models were making headlines.

We were onto something big, but we celebrated too early.

January 2024