Sergio Paniego PRO
AI & ML interests
Recent Activity
Organizations
Posts 85
Andβ¦ it's already supported in TRL, built by Kashif Rasul. you can really feel the pace of development in the team π
Paper by Ruixiang ZHANG, He Bai, Huangjie Zheng, Navdeep Jaitly, Ronan Collobert, Yizhe Zhang at Apple π
How it works: the model generates completions at a training-time temperature (T_train) with top_k/top_p truncation, then fine-tunes on them with plain cross-entropy. no labels or verifier needed
You can try it right away with this ready-to-run example (Qwen3-4B on rStar-Coder):
https://github.com/huggingface/trl/blob/main/trl/experimental/ssd/ssd.py
or benchmark a checkpoint with the eval script:
https://github.com/huggingface/trl/blob/main/trl/experimental/ssd/ssd_eval.py
One neat insight from the paper: T_train and T_eval compose into an effective T_eff = T_train Γ T_eval, so a broad band of configs works well. even very noisy samples still help
Want to dig deeper?
Paper: Embarrassingly Simple Self-Distillation Improves Code Generation (2604.01193)
Trainer docs: https://huggingface.co/docs/trl/main/en/ssd_trainer
Articles 16
Welcome Gemma 4: Frontier multimodal intelligence on device
- Running on T4RL
CARLA Environment Server
πControl a Carla driving simulation with custom actions
- Running on T4RL
CARLA Environment Server
πControl a CARLA driving simulator with custom actions
- Sleeping
Carla Grpo Trolley
πVisualize your programβs I/O activity in real time
-
sergiopaniego/Qwen3-0.6B-carla-trolley-escape
0.8B β’ Updated β’ 16
- Running3.78k
The Ultra-Scale Playbook
π3.78kThe ultimate guide to training LLM on large GPU Clusters
- Running on CPU UpgradeFeatured3.11k
The Smol Training Playbook
π3.11kThe secrets to building world-class LLMs
- Running301
Evaluation Guidebook
π301Explore LLM benchmark trends over time
- Running221
FineVision: Open Data is All You Need
π221A new open-source dataset for training VLMs
- Running on T4RL
CARLA Environment Server
πControl a Carla driving simulation with custom actions
- Running on T4RL
CARLA Environment Server
πControl a CARLA driving simulator with custom actions
- Sleeping
Carla Grpo Trolley
πVisualize your programβs I/O activity in real time
-
sergiopaniego/Qwen3-0.6B-carla-trolley-escape
0.8B β’ Updated β’ 16
- Running3.78k
The Ultra-Scale Playbook
π3.78kThe ultimate guide to training LLM on large GPU Clusters
- Running on CPU UpgradeFeatured3.11k
The Smol Training Playbook
π3.11kThe secrets to building world-class LLMs
- Running301
Evaluation Guidebook
π301Explore LLM benchmark trends over time
- Running221
FineVision: Open Data is All You Need
π221A new open-source dataset for training VLMs
spaces 124
VLM Object Understanding
Explore object detection, visual grounding, keypoint Detecti
Qwen2-VL-7B
Ask questions about charts in images
SmolVLM-trl-dpo-rlaif-v
Generate text from an image and question
SmolVLM-trl-sft-ChartQA
Ask questions about charts in images
Browsergym Grpo Test
Display interactive tracking dashboards
Browsergym-grpo-Qwen-Qwen3-0.6B-2026-04-15 17-37-35
Show visual dashboards for your tracking data