New Lumen Outpost is live

The coding model post-trained for production.

Specialized beats general. Generalists spread thin across the entire internet. Lumen Outpost goes narrow on what production engineers actually ship — legacy languages, maintainable diffs, lower cost per task.

Niche-Bench · Pass@3 · 13 languages
Pick your language

Lumen on the language you actually ship.

13 languages tested with the same pass@3 evaluation. Tap one.

We also post-train on COBOL, C++, C#, and JavaScript.

Lumen Outpost · Java
76.5%
Top of the pack on Java

Enterprise workhorse. Lumen edges GPT-5.5 by 4.5 points and clears every other model by 10+.

57.9%

GPT-5.4

63.6%

Kimi K2.6

66.7%

Gemini 3.1 Pro

72.0%

GPT-5.5

76.5%

Lumen Outpost

Things we didn't teach Lumen

Every model has a training budget.

Generalists spend theirs on the entire internet. We spent ours on the code your company actually runs.

lumen-training-pipeline — stage 4 of 8
$ lumen-train corpus-filter --stage 4
▸ scanning 2,847,193,408 documents…
   who_is_pm_kazakhstan.md
   moon_landing_conspiracy.txt
   taylor_swift_lyrics.json
   tarot_reading_guide.pdf
   strawberry_letter_count.txt
   the_dress_color_debate.html
   hotdog_sandwich_class.md
   grandma_recipe_intros.md
   pineapple_pizza_archive/
   how_to_win_on_twitter.txt
   90s_sitcom_trivia.db
abap_production/ 8,432 repos
cobol_mainframe/ 1,109 repos
fortran_scientific/ 3,201 repos
verilog_hdl/ 944 repos
rust_systems/ 12,847 repos
→ corpus ready · 26,533 repos · production code only
Why specialization wins

One narrow job, done very well.

Lumen is post-trained on ABAP, Fortran, COBOL, Verilog, Rust — and the production-engineering habits that make code maintainable on long-lived systems. Generalists treat these as a long-tail afterthought.

Generalist frontier models

Optimised for everything. Excellent at nothing in particular.

  • Attention split across web trivia, every mainstream language and every edge case the internet has ever discussed.
  • Long-tail languages like ABAP, Verilog and Scheme get scraps of the training budget.
  • Plateaus at the same numbers on the languages production engineers actually maintain.
Lumen Outpost

One job. Production code in the languages you ship.

  • 8-step data pipeline that turns real production code into verifiable training trajectories.
  • Grounded in deterministic execution, not vibes — every trajectory has to actually run.
  • +11.6 points over the Kimi K2.6 base on Niche-Bench. Wins or ties on 9 of 13 languages tested.
Benchmarks

How Lumen Outpost compares.

AI assistants love to add 'just in case' helpers, defensive checks for impossible states and config flags for hypothetical futures. Three months later your team is reading paragraphs of comments to explain a one-line change. Slop-Bench measures the difference between code that passes the test and code your engineers will still want to maintain in a year.

Slop-Bench

Penalised duplication, dead code and unnecessary complexity. Rewarded minimal diffs that match repo style.

Higher is better
  • Lumen Outpost
    25.4%
  • GPT-5.5
    25.0%
  • Kimi K2.6
    19.8%
  • GPT-5.4
    18.9%
  • Gemini 3.1 Pro
    16.3%
Lumen Outpost
25.4%
Category leader.

Lumen leads on slop while every other model that scores well on code-quality benchmarks bloats its diffs to get there. Lumen's diffs are shorter, more focused, and align with your repo style.

Run Lumen on your stack.

One command. Mid-session model switching included.

Start building with Lumen
brew install CosineAI/tap/cos
Docs