🔄 In a Training Loop

Stefan Schweter PRO

stefan-it

·

https://schweter.bayern

AI & ML interests

Flair Library 💕, NER & PoS Tagging, LM Pretraining (mostly encoder-only & encoder-decoder), Historical Language Models, German Language Models, Bavarian NLP 🥨, xLSTM

Recent Activity

liked a dataset about 14 hours ago

SupraLabs/reasoning-corpus-4K-5M-v1

reacted to FredyRivera-dev's post with 🚀 about 16 hours ago

We wrote a full technical guide on how to train a bilingual (ES/EN) LLM from scratch: TinyQwen. Covers: - Hybrid architecture based on Qwen3.5 - Pre-training with 15B tokens - Cost benchmark between H200 and B200 - Post-training with SFT + LoRA - Full code and data, open source With ~$11 of compute on an H200 we ran an initial training run, enough to validate the full architecture and pipeline. Blog post: https://aquiles-ai.vercel.app/blog/tinyqwen-from-scratch Technical feedback welcome, especially from anyone looking to replicate the pipeline with more compute.

commentedon a paper about 17 hours ago

From Data to Device: ELMOD An Efficient German-First 2.7B Language Model for Mobile Inference

View all activity

Organizations

commented a paper about 17 hours ago

From Data to Device: ELMOD An Efficient German-First 2.7B Language Model for Mobile Inference

Paper • 2607.24585 • Published 2 days ago • 1 •

commented a paper 2 days ago

Building a European Multilingual Evaluation Dataset: The MMLU Localisation Project within the EMT Network

Paper • 2607.18432 • Published 9 days ago • 2 •

commented a paper 29 days ago

Do We Still Need Fine Tuning? Turkish Sentiment Analysis in the Era of Large Language Model

Paper • 2606.29614 • Published Jun 28 • 1 •

commented 2 papers 30 days ago

A Study of Temporal Fusion Strategies for Named Entity Recognition in Historical Texts

Paper • 2606.27881 • Published Jun 26 • 1 •

A Study of Temporal Fusion Strategies for Named Entity Recognition in Historical Texts

Paper • 2606.27881 • Published Jun 26 • 1 •

New activity in SBB/ZEFYS2025 about 1 month ago

Dataset Splits

#1 opened about 1 month ago by

New activity in VAGOsolutions/SauerkrautLM-GLiNER about 1 month ago

Benchmark Release

#2 opened about 1 month ago by

commented 3 papers about 2 months ago

MÖVE: A Holistic LLM Benchmark for the German Public Sector

Paper • 2606.13111 • Published Jun 11 • 3 •

KletterMix: Climbing Toward High-Quality German Pretraining Data

Paper • 2606.03773 • Published Jun 2 • 21 •

KletterMix: Climbing Toward High-Quality German Pretraining Data

Paper • 2606.03773 • Published Jun 2 • 21 •

commented 2 papers 2 months ago

GRUFF: LLM Pronoun Fidelity, Reasoning, and Biases in German

Paper • 2605.30214 • Published May 28 • 1 •

LLMSurgeon: Diagnosing Data Mixture of Large Language Models

Paper • 2605.30348 • Published May 28 • 1 •

New activity in openeurollm/Dolci-Instruct-SFT-translated 2 months ago

More information about translation process

#2 opened 2 months ago by

commented 5 papers 3 months ago

BidirLM: From Text to Omnimodal Bidirectional Encoders by Adapting and Composing Causal LLMs

Paper • 2604.02045 • Published Apr 2 • 39 •

BidirLM: From Text to Omnimodal Bidirectional Encoders by Adapting and Composing Causal LLMs

Paper • 2604.02045 • Published Apr 2 • 39 •

A Causal Language Modeling Detour Improves Encoder Continued Pretraining

Paper • 2605.12438 • Published May 12 • 7 •

On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models

Paper • 1511.09249 • Published Nov 30, 2015 • 1 •

Decoding Text Spans for Efficient and Accurate Named-Entity Recognition

Paper • 2604.20447 • Published Apr 22 • 2 •

commented 2 papers 4 months ago

Effective Distillation to Hybrid xLSTM Architectures

Paper • 2603.15590 • Published Mar 16 • 34 •

TildeOpen LLM: Leveraging Curriculum Learning to Achieve Equitable Language Representation

Paper • 2603.08182 • Published Mar 9 • 2 •