Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
👋
Open to Work
12.1
TFLOPS
RDTvlokip
PRO
RDTvlokip
94
3
Follow
7p3k's profile picture
grashiya's profile picture
BrianYoong's profile picture
8 followers
·
6 following
https://rdtvlokip.fr
RDTvlokip
théo-charlet
AI & ML interests
None yet
Recent Activity
posted
an
update
1 day ago
I finally changed the architecture of my 15M French LLM. It worked. Then I almost fooled myself about how much and catching that was the real win. After proving last time that architecture is a threshold, not a lever, I got stubborn: could I change how the model learns? Four honest attempts, Lion, a sharper AdamW β2, multi-token prediction, LayerScale. Four failures. The bottleneck wasn't the learning rule either. So I changed the shape of the computation instead: loop the same transformer blocks 4×, deeper reasoning, zero added parameters. It beat the baseline on perplexity, the first thing in the whole project to move that number. Then I added my own twist: let each token decide how deep to think, halting on its own entropy. My first evaluation was spectacular. Coherence up 65%. Hallucinated names down 62%. It was noise. Eight prompts, one seed. I re-ran on 50 prompts × 200 tokens and watched the gains shrink to "modest" and on out-of-domain prompts, recurrence actually made things worse. No universal winner. And none of it is new: it's Adaptive Computation Time (2016), the Universal Transformer (2018), and LoopViT (2026), recombined and measured honestly. The real lesson: A number from 8 prompts is a rumor. The eval harness that kills your own best result is worth more than the result it kills. Cite your lineage. Stay preliminary until multiple seeds say otherwise. The three models are live. The write-up is honest about every caveat 👇 🔗 https://huggingface.co/blog/RDTvlokip/teaching-a-15m-french-llm-to-think-deeper
upvoted
an
article
1 day ago
🔁 Apprendre à un LLM français de 15M à penser plus profond — et à savoir quand s'arrêter 🇫🇷
published
an
article
1 day ago
🔁 Apprendre à un LLM français de 15M à penser plus profond — et à savoir quand s'arrêter 🇫🇷
View all activity
Organizations
RDTvlokip
's Spaces
1
Sort: Recently updated
Running
Agents
3
AG BPE
📈
AG-BPE (Attention-Guided Byte-Pair Encoding)