🏗️ Building on HF

Sergio Paniego PRO

sergiopaniego

huggingface

·

https://sergiopaniego.github.io/

AI & ML interests

None yet

Recent Activity

updated a dataset about 21 hours ago

agents-course/final-certificates

updated a dataset about 21 hours ago

agents-course/course-certificates-of-excellence

updated a dataset 1 day ago

huggingface-projects/Deep-RL-Course-Certification

View all activity

Organizations

buckets 58

sergiopaniego/sdpo-hints-demo-bucket

sergiopaniego/sdpo-math-qwen35-static-0f10e6-bucket

sergiopaniego/sdpo-math-qwen35-static-9a9964-bucket

sergiopaniego/sdpo-math-qwen35-static-698c79-bucket

sergiopaniego/sdpo-math-smoke-static-0e42b5-bucket

sergiopaniego/sdpo-abc-math15b-static-b13970-bucket

View 58 buckets

Posts 100

Post

7631

Frontier models use distillation as a step of their post-training pipelines.

In 2026 it has three jobs: compress a big model into a small one, merge RL experts into a single model, and let a model teach itself.

I wrote up which frontier models use each one and how: https://huggingface.co/blog/sergiopaniego/distillation-2026

It pairs with Class 2 of the Training an Agent series Ben and I are doing, where we teach these techniques hands-on with TRL!

Articles 26

Article

31

Profiling in PyTorch (Part 3): Attention is all you profile

View all Articles

Collections 10

View 10 collections

spaces 164

VLM Object Understanding

Explore object detection, visual grounding, keypoint Detecti

Qwen2-VL-7B

Ask questions about charts in images

SmolVLM-trl-dpo-rlaif-v

Generate text from an image and question

SmolVLM-trl-sft-ChartQA

Ask questions about charts in images

Sdpo Vllm Smoke Static 64b0e4

View and monitor key metrics with the Trackio dashboard

Sdpo Hints Eval Static Ae196e

View and explore your tracking data in a dashboard

View 164 Spaces

models 140

sergiopaniego/Qwen3.5-4B-sdpo-math-hints

Updated 9 days ago • 1

sergiopaniego/Qwen3.5-4B-sdpo-math-gold

Updated 9 days ago

sergiopaniego/Qwen3.5-4B-sdpo-math-baseline

Updated 9 days ago

sergiopaniego/sdpo-hints

Updated 9 days ago

sergiopaniego/pi-mono-youtube-livestream-2-scripts

Updated 13 days ago • 2

sergiopaniego/gemma-4-E2B-offpolicy-kd-lr1e4

Updated 13 days ago

sergiopaniego/gemma-4-E2B-offpolicy-kd-lr2e4

Updated 13 days ago

sergiopaniego/gemma-4-E2B-offpolicy-kd-lr5e5

Updated 13 days ago

sergiopaniego/qwen3-0.6b-pimono-gkd-lr2e5

Text Generation • 0.6B • Updated 17 days ago • 494

sergiopaniego/qwen3-0.6b-pimono-gkd-lr1e5

Text Generation • 0.6B • Updated 17 days ago • 487

View 140 models

datasets 14

sergiopaniego/math-sdpo-hints-plain

Viewer • Updated 9 days ago • 600 • 32

sergiopaniego/math-sdpo-hints

Viewer • Updated 9 days ago • 600 • 39

sergiopaniego/gsm8k-sdpo-plain

Viewer • Updated 10 days ago • 700 • 36

sergiopaniego/gsm8k-sdpo-hints

Viewer • Updated 10 days ago • 700 • 39

sergiopaniego/pi-mono-chat

Viewer • Updated 17 days ago • 886 • 103

sergiopaniego/requests-pr-diff

Viewer • Updated May 19 • 1 • 13

sergiopaniego/trl-r2e-test

Viewer • Updated May 18 • 1 • 34

sergiopaniego/chain-sum-rollouts

Viewer • Updated May 4 • 50 • 17

sergiopaniego/ttt-scripted-smoke

Viewer • Updated Apr 17 • 20 • 17

sergiopaniego/sample_videos

Viewer • Updated Jun 30, 2025 • 2 • 25

View 14 datasets