π§ LFM2.5 Collection Collection of post-trained and base LFM2.5 models. β’ 14 items β’ Updated 3 days ago β’ 166
Qari-OCR: A High-Accuracy Model for Arabic Optical Character Collection π΅π’πππ‘ ππ π‘βπ πππ€ππππ’π ππ€ππ2 ππΏ 2π΅ πππ ππππ-π‘π’πππ ππ ππ π΄πππππ ππΆπ πππ‘ππ ππ‘, ππππ π£0.1 ππ β’ 8 items β’ Updated Mar 2 β’ 18
Arabic Voice Collection - Ω ΩΨͺΨ¨Ψ© Ψ§ΩΨ΅ΩΨͺ Ψ§ΩΨΉΨ±Ψ¨Ω Collection This collection will have all the Arabic voice datasets in different dialects β’ 8 items β’ Updated 7 days ago β’ 8
OCR Collection Collection A diverse set of OCR models for extracting text from images and documents in multiple languages. β’ 7 items β’ Updated 26 days ago β’ 3
view article Article Agentic Resource Discovery: Let agents search burtenshaw, evalstate β’ 12 days ago β’ 15
view article Article There is no such thing as a tokenizer-free lunch catherinearnett β’ Sep 25, 2025 β’ 100
view article Article How we OCR'ed 30,000 papers using Codex, open OCR models and Jobs nielsr β’ Apr 7 β’ 62
OpenResearcher: A Fully Open Pipeline for Long-Horizon Deep Research Trajectory Synthesis Paper β’ 2603.20278 β’ Published Mar 17 β’ 101
V_1: Unifying Generation and Self-Verification for Parallel Reasoners Paper β’ 2603.04304 β’ Published Mar 4 β’ 14
PaperBanana: Automating Academic Illustration for AI Scientists Paper β’ 2601.23265 β’ Published Jan 30 β’ 229
Jr. AI Scientist and Its Risk Report: Autonomous Scientific Exploration from a Baseline Paper Paper β’ 2511.04583 β’ Published Nov 6, 2025 β’ 5
DREAM: Deep Research Evaluation with Agentic Metrics Paper β’ 2602.18940 β’ Published Feb 21 β’ 14
dots.ocr: Multilingual Document Layout Parsing in a Single Vision-Language Model Paper β’ 2512.02498 β’ Published Dec 2, 2025 β’ 4
InternAgent-1.5: A Unified Agentic Framework for Long-Horizon Autonomous Scientific Discovery Paper β’ 2602.08990 β’ Published Feb 9 β’ 79
DFlash: Block Diffusion for Flash Speculative Decoding Paper β’ 2602.06036 β’ Published Feb 5 β’ 88
Privasis: Synthesizing the Largest "Public" Private Dataset from Scratch Paper β’ 2602.03183 β’ Published Feb 3 β’ 12
LightOnOCR-2 π¦ Collection LightOnOCR-2-1B: a lightweight high-performance end-to-end OCR model family β’ 12 items β’ Updated 5 days ago β’ 25