$ lumen-train corpus-filter --stage 4
▸ scanning 2,847,193,408 documents…
✘ who_is_pm_kazakhstan.md
✘ moon_landing_conspiracy.txt
✘ taylor_swift_lyrics.json
✘ tarot_reading_guide.pdf
✘ strawberry_letter_count.txt
✘ the_dress_color_debate.html
✘ hotdog_sandwich_class.md
✘ grandma_recipe_intros.md
✘ pineapple_pizza_archive/
✘ how_to_win_on_twitter.txt
✘ 90s_sitcom_trivia.db
✓ abap_production/ 8,432 repos
✓ cobol_mainframe/ 1,109 repos
✓ fortran_scientific/ 3,201 repos
✓ verilog_hdl/ 944 repos
✓ rust_systems/ 12,847 repos
→ corpus ready · 26,533 repos · production code only