🤏 Smol-Data Collection Tried and tested mixes for strong pretraining. Inspired by https://huggingface.co/blog/codelion/optimal-dataset-mixing • 14 items • Updated 1 day ago • 10
Quantized Qwen3.5 Collection Verified models. Compatible with Transformers v5.3 and vLLM v0.16.1rc1 (nightly). Under evaluation. • 10 items • Updated about 18 hours ago • 5
pplx-embed Collection Diffusion-Pretrained Dense and Contextual Embeddings • 7 items • Updated 6 days ago • 82
Mobile-Agent-v3.5: Multi-platform Fundamental GUI Agents Paper • 2602.16855 • Published 17 days ago • 46
ColBERT-Zero 🐶 Collection First large-scale fully pre-trained ColBERT model using only public data, outperforming GTE-ModernColBERT and GTE-ModernBERT • 10 items • Updated 1 day ago • 17
jina-embeddings-v5-text Collection Our 5th-gen embeddings: two lightweight multilingual models with SOTA performance in retrieval, matching, clustering, and classification. • 29 items • Updated 5 days ago • 33
OriOn Collection Visual long document VLMs based on Mistral-Small-3.1-24B-Instruct-2503 and Qwen3-VL-32B-Instruct • 4 items • Updated 1 day ago • 4
LateOn-Code 💻 Collection State-of-the-art late interaction code retrieval models • 6 items • Updated 1 day ago • 14
view article Article LateOn-Code & ColGrep: LightOn unveils state-of-the-art code retrieval models and code search tooling 20 days ago • 47
NeuTTS Nano Multilingual Collection Collection NeuTTS Nano is a TTS model, 3x smaller than NeuTTS Air, that runs on CPU in real-time - now in English, Spanish, French, and German versions! • 13 items • Updated 6 days ago • 16
Closing the Loop: Universal Repository Representation with RPG-Encoder Paper • 2602.02084 • Published 30 days ago • 82