Hynek Kydlicek's picture

Hynek Kydlicek

hynky

·

AI & ML interests

Data-processing

Recent Activity

updated a dataset about 5 hours ago

macrodata/egocentric-vggt-debug-jsonl

published a dataset about 5 hours ago

macrodata/egocentric-vggt-debug-jsonl

updated a dataset about 9 hours ago

macrodata/egocentric-vggt-debug

View all activity

Organizations

upvoted a collection 5 months ago

📄 FinePDFs

82 items • Updated Jan 9 • 29

upvoted an article 6 months ago

Article

Transformers v5: Simple model definitions powering the AI ecosystem

+2

lysandre, ArthurZ, cyrilvallez, reach-vb

•

Dec 1, 2025

• 311

upvoted 5 articles 7 months ago

Article

Parquet Content-Defined Chunking

kszucs

•

Jul 25, 2025

• 75

Article

Why Did MiniMax M2 End Up as a Full Attention Model?

MiniMax-AI

•

Oct 30, 2025

• 80

Article

What makes good reasoning data

MiniMax-AI

•

Oct 30, 2025

• 44

Article

Aligning to What? Rethinking Agent Generalization in MiniMax M2

MiniMax-AI

•

Oct 30, 2025

• 43

Article

Supercharge your OCR Pipelines with Open Models

+5

merve, ariG23498, davanstrien, hynky, andito, reach-vb, pcuenq

•

Oct 21, 2025

• 313

upvoted an article 8 months ago

Article

Gaia2 and ARE: Empowering the community to study agents

+9

clefourrier, gregmialz, mlcu, mortimerp9, XciD, tfrere, evijit, RomainFroger, dheeraj7596, CarolinePascal, upiter

•

Sep 22, 2025

• 134

upvoted 2 articles 11 months ago

Article

Reachy Mini - The Open-Source Robot for Today's and Tomorrow's AI Builders

thomwolf, matthieu-lapeyre

•

Jul 9, 2025

• 800

Article

SmolLM3: smol, multilingual, long-context reasoner

+21

eliebak, cmpatino, anton-l, edbeeching, m-ric, nouamanetazi, akseljoonas, guipenedo, hynky, clefourrier, SaylorTwift, kashif, qgallouedec, hlarcher, glutamatt, Xenova, reach-vb, ngxson, craffel, lewtun, loubnabnl, lvwerra, thomwolf

•

Jul 8, 2025

• 777

upvoted a paper 11 months ago

FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Paper • 2506.20920 • Published Jun 26, 2025 • 78

upvoted a collection about 1 year ago

Built with Distill blog ❤️

Collection of all interactive blogs built on top of Distill template. To create your own check: https://huggingface.co/spaces/lvwerra/distill-blog-tem • 6 items • Updated Mar 14, 2025 • 2

upvoted an article about 1 year ago

Article

Open R1: Update #3

open-r1

•

Mar 11, 2025

• 297

upvoted 2 articles over 1 year ago

Article

Fixing Open LLM Leaderboard with Math-Verify

+2

hynky, alozowski, SaylorTwift, clefourrier

•

Feb 14, 2025

• 31

Article

Open R1: Update #2

open-r1

•

Feb 10, 2025

• 218

upvoted a paper over 1 year ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4, 2025 • 258

upvoted an article over 1 year ago

Article

FineWeb2-C: Help Build Better Language Models in Your Language

davanstrien

•

Dec 23, 2024

• 21

upvoted 2 collections over 1 year ago

🥂 FineWeb2

3 items • Updated Jun 27, 2025 • 24

IrokoBench

a human-translated benchmark dataset for 16 African languages covering three tasks: NLI, MMLU and MGSM • 6 items • Updated May 31, 2024 • 21

upvoted an article over 1 year ago

Article

Scaling AI-based Data Processing with Hugging Face + Dask

+2

scj13, jrbourbeau, lhoestq, davanstrien

•

Oct 9, 2024

• 33