DFlash Collection Block Diffusion for Flash Speculative Decoding • 22 items • Updated 12 days ago • 138
SINQ: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLM Weights Paper • 2509.22944 • Published Sep 26, 2025 • 80
Granite 4.0 Language Models Collection Efficient language models for multilingual generation, coding, RAG, and AI assistant workflows. • 11 items • Updated Apr 29 • 220
story writing favourites Collection Models I personally liked for generating stories in the past. Not a recommendation, most of these are outdated. • 19 items • Updated 13 days ago • 117
SONAR-LLM: Autoregressive Transformer that Thinks in Sentence Embeddings and Speaks in Tokens Paper • 2508.05305 • Published Aug 7, 2025 • 48