TriAttention: Efficient Long Reasoning with Trigonometric KV Compression Paper • 2604.04921 • Published 4 days ago • 90
CohereLabs/cohere-transcribe-03-2026 Automatic Speech Recognition • Updated about 23 hours ago • 150k • 843
Mistral Small 4 Collection A state-of-the-art model, open-weight, with a granular Mixture-of-Experts architecture that fuses instruct, reasoning and agentic skills. • 3 items • Updated 24 days ago • 63
nvidia/Qwen3-Nemotron-235B-A22B-GenRM-2603 Text Generation • 235B • Updated about 1 month ago • 1.81k • 24
nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4 Text Generation • 67B • Updated 3 days ago • 1.71M • 260