The coding model post-trained for production.
Specialized beats general. Generalists spread thin across the entire internet. Lumen Outpost goes narrow on what production engineers actually ship — legacy languages, maintainable diffs, lower cost per task.
Lumen on the language you actually ship.
13 languages tested with the same pass@3 evaluation. Tap one.
We also post-train on COBOL, C++, C#, and JavaScript.
Enterprise workhorse. Lumen edges GPT-5.5 by 4.5 points and clears every other model by 10+.
Every model has a training budget.
Generalists spend theirs on the entire internet. We spent ours on the code your company actually runs.
One narrow job, done very well.
Lumen is post-trained on ABAP, Fortran, COBOL, Verilog, Rust — and the production-engineering habits that make code maintainable on long-lived systems. Generalists treat these as a long-tail afterthought.
Optimised for everything. Excellent at nothing in particular.
- Attention split across web trivia, every mainstream language and every edge case the internet has ever discussed.
- Long-tail languages like ABAP, Verilog and Scheme get scraps of the training budget.
- Plateaus at the same numbers on the languages production engineers actually maintain.
One job. Production code in the languages you ship.
- 8-step data pipeline that turns real production code into verifiable training trajectories.
- Grounded in deterministic execution, not vibes — every trajectory has to actually run.
- +11.6 points over the Kimi K2.6 base on Niche-Bench. Wins or ties on 9 of 13 languages tested.
How Lumen Outpost compares.
AI assistants love to add 'just in case' helpers, defensive checks for impossible states and config flags for hypothetical futures. Three months later your team is reading paragraphs of comments to explain a one-line change. Slop-Bench measures the difference between code that passes the test and code your engineers will still want to maintain in a year.
Slop-Bench
Penalised duplication, dead code and unnecessary complexity. Rewarded minimal diffs that match repo style.
- Lumen Outpost25.4%
- GPT-5.525.0%
- Kimi K2.619.8%
- GPT-5.418.9%
- Gemini 3.1 Pro16.3%
Lumen leads on slop while every other model that scores well on code-quality benchmarks bloats its diffs to get there. Lumen's diffs are shorter, more focused, and align with your repo style.
Run Lumen on our CLI
One command. Mid-session model switching included.