Our most productive software engineer
From a record-setting benchmark to your next AI teammate - how Lumen
became our most productive engineer.
Lumen is able to solve bugs, build features, refactor code, and everything
in between either fully autonomously or paired with the user, like working
with a colleague, not just a copilot.
June 2025
Bringing a supervisor model from training time to product
Introducing AutoPM.
A monumental leap in complex feature and iterative coding performance
With AutoPM enabled Genie is closer to human-level performance than ever before. Completely outshining every other agent currently on the market.
May 2025
Genie 2: State-of-the-art on SWE-Lancer
Breaking through the 50% barrier
Our latest Genie 2.1 iteration manages to solve just over half the published SWE-Lancer benchmark tasks.
This represents an equivalent value of $107k of human work out of an maximum of $250k.
December 2024
Genie 2 early internal MVP
Building Genie 2
We took everything we learned from the dogfooding process and applied it to building Genie 2. We focused on making the model more reliable, and the product more intuitive. We also incorporated user feedback to enhance the overall experience.
Built for the post-IDE era
In this post-IDE era, Genie lets you assign any ticket — or even your entire backlog — and works fully asynchronously. You can return later to review, make small tweaks, and merge, all without opening your IDE. It writes unit tests, runs your CI, and when you want to collaborate in real time, Genie takes the lead while you copilot.
Collaborate live or offload your work
Genie is completely headless, but even the best code needs quick, on-the-spot adjustments. That's why, in the past, when a Genie-generated pull request was 99% right, you could still feel stuck. This is why you have access to the same editor that Genie does so you can lean over its shoulder and make a change, instead of having to prompt it. It's the closest experience we've had to working with a truly asynchronous teammate. We're not claiming Genie can do everything for everyone. But it's accelerated our own team's velocity, and we know it can do the same for yours.
September 2024
Select customers onboarded to Genie 1
Vibe coding doesn't scale
When we started using Genie 1 in real workflows, it highlighted that several of our assumptions about how the model and product were built were off. Our benchmark scores were strong, but they didn't reflect many of the challenges that show up in real-world use.
We learned the hard way that vibe coding doesn't scale, especially when working autonomously across large, existing codebases. So our approach had to become more nuanced.
By this time Genie 1 had already become our most prolific contributor internally
August 2024
New state-of-the-art SWE-bench score with Genie 1
The start of something big
Last summer, we made real progress.
We achieved the biggest score jump in SWE-Bench history and figured out how to generate billions of tokens of synthetic data that mimicked human reasoning, long before reasoning models were making headlines.
We were onto something big, but we celebrated too early.
January 2024