Tower+: Bridging Generality and Translation Specialization in Multilingual LLMs Paper • 2506.17080 • Published Jun 20, 2025 • 7
view article Article KV Cache from scratch in nanoVLM +3 ariG23498, kashif, lusxvr, andito, pcuenq • Jun 4, 2025 • 119
view article Article Topic 33: Slim Attention, KArAt, XAttention and Multi-Token Attention Explained – What’s Really Changing in Transformers? Kseniase • Apr 4, 2025 • 16
view article Article Simplifying Alignment: From RLHF to Direct Preference Optimization (DPO) ariG23498 • Jan 19, 2025 • 50
view article Article 🦸🏻#14: What Is MCP, and Why Is Everyone – Suddenly!– Talking About It? Kseniase • Mar 17, 2025 • 357
view article Article Open-source DeepResearch – Freeing our search agents +3 m-ric, albertvillanova, merve, thomwolf, clefourrier • Feb 4, 2025 • 1.32k
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Paper • 2501.17161 • Published Jan 28, 2025 • 125
Extending Context Window of Large Language Models via Positional Interpolation Paper • 2306.15595 • Published Jun 27, 2023 • 54