·
AI & ML interests
None yet
Recent Activity
repliedto their post 1 day ago I spent a week optimizing my 15M French LLM. Not one line of new architecture. And that was the whole point.
After building it from scratch (custom crawler, BPE, LLaMA-style arch, 3-phase trainer), the model wrote perfect French but hallucinated facts and drifted off-topic. So I went hunting for the bottleneck, convinced it was the architecture.
It wasn't. It never is.
The wins came from boring places: a data pipeline that cut documents mid-sentence, two special tokens silently sabotaging generation, and one decoding hyperparameter that doubled coherence (38 → 76 tokens before drift). The flashy research, contrastive decoding, DoLa, gave the smallest gains. One of them was even a false negative caused by my own buggy eval harness.
The real lesson isn't about French LLMs:
Architecture is a threshold, not a lever. Once you clear it, the bottleneck is everywhere except the architecture. Measure first. Read your own data. Verify your code before you trust your conclusion.
The model was never the problem.
Full write-up here 👇
🔗 https://huggingface.co/blog/RDTvlokip/what-i-learned-optimizing-a-15m-french repliedto their post 2 days ago I spent a week optimizing my 15M French LLM. Not one line of new architecture. And that was the whole point.
After building it from scratch (custom crawler, BPE, LLaMA-style arch, 3-phase trainer), the model wrote perfect French but hallucinated facts and drifted off-topic. So I went hunting for the bottleneck, convinced it was the architecture.
It wasn't. It never is.
The wins came from boring places: a data pipeline that cut documents mid-sentence, two special tokens silently sabotaging generation, and one decoding hyperparameter that doubled coherence (38 → 76 tokens before drift). The flashy research, contrastive decoding, DoLa, gave the smallest gains. One of them was even a false negative caused by my own buggy eval harness.
The real lesson isn't about French LLMs:
Architecture is a threshold, not a lever. Once you clear it, the bottleneck is everywhere except the architecture. Measure first. Read your own data. Verify your code before you trust your conclusion.
The model was never the problem.
Full write-up here 👇
🔗 https://huggingface.co/blog/RDTvlokip/what-i-learned-optimizing-a-15m-french repliedto their post 2 days ago I spent a week optimizing my 15M French LLM. Not one line of new architecture. And that was the whole point.
After building it from scratch (custom crawler, BPE, LLaMA-style arch, 3-phase trainer), the model wrote perfect French but hallucinated facts and drifted off-topic. So I went hunting for the bottleneck, convinced it was the architecture.
It wasn't. It never is.
The wins came from boring places: a data pipeline that cut documents mid-sentence, two special tokens silently sabotaging generation, and one decoding hyperparameter that doubled coherence (38 → 76 tokens before drift). The flashy research, contrastive decoding, DoLa, gave the smallest gains. One of them was even a false negative caused by my own buggy eval harness.
The real lesson isn't about French LLMs:
Architecture is a threshold, not a lever. Once you clear it, the bottleneck is everywhere except the architecture. Measure first. Read your own data. Verify your code before you trust your conclusion.
The model was never the problem.
Full write-up here 👇
🔗 https://huggingface.co/blog/RDTvlokip/what-i-learned-optimizing-a-15m-french View all activity Organizations
view article 🔧 L'architecture est un seuil, pas un levier — ce que j'ai appris en optimisant un LLM français de 15M de paramètres 🇫🇷
RDTvlokip
• • 1
view article 🔧 Architecture is a threshold, not a lever — what I learned optimizing a 15M French LLM 🇫🇷
RDTvlokip
• • 1
published an article about 2 months ago view article 🧠 I trained my own French LLM from scratch — alone, with a 1080 Ti, and the power went out ⚡🇫🇷
published an article about 2 months ago view article 🧠 J'ai entraîné mon propre LLM français from scratch — seul, avec une 1080 Ti, et le courant a coupé ⚡🇫🇷
view article 🧲 Embeddings — When AI turns words into GPS coordinates! 📍🧠
view article 🧲 Embeddings — Quand l'IA transforme les mots en coordonnées GPS ! 📍🧠
view article 🎯 PCA (Principal Component Analysis) — Compresser les dimensions comme un boss ! 📊🔥
view article 🎯 PCA (Principal Component Analysis) — Compressing dimensions like a boss! 📊🔥
view article 🎯 K-Means — Quand l'IA organise le chaos en boîtes bien rangées ! 📦✨
view article 🎯 K-Means — When AI organizes chaos into neat boxes! 📦✨
view article 🎯 F1-Score — Quand l'Accuracy te ment en pleine face ! 📊💥
view article 🎯 F1-Score — When Accuracy lies to your face! 📊💥
view article 🎯 Precision & Recall — Les métriques jumelles qui ne sont jamais d'accord ! ⚖️🔍
view article 🎯 Precision & Recall — The twin metrics that never agree! ⚖️🔍
view article 🎆 AI 2026 — The 9 trends that will EXPLODE this year! 🚀💥
view article 🎆 IA 2026 — Les 9 tendances qui vont exploser cette année ! 🚀💥
view article 📊 Cross-Entropy — The loss function that KNOWS how to punish! 🎯🔥
RDTvlokip
• • 1
view article 📊 Cross-Entropy — La fonction de perte qui SAIT punir ! 🎯🔥
RDTvlokip
• • 2
view article 🎯 Learning Rate — L'accélérateur de ton réseau de neurones ! 🚗💨
RDTvlokip
• • 1
view article 🎯 Learning Rate — The gas pedal of your neural network! 🚗💨
RDTvlokip
• • 1