unsloth/NVIDIA-Nemotron-3-Super-120B-A12B-GGUF Text Generation • 121B • Updated 15 days ago • 94.1k • 103
view article Article GGML and llama.cpp join HF to ensure the long-term progress of Local AI +4 Feb 20 • 492
view post Post 5209 We collaborated with Hugging Face to enable you to train MoE models 12× faster with 35% less VRAM via our new Triton kernels (no accuracy loss). 🤗Train gpt-oss locally on 12.8GB VRAM with our free notebooks: https://unsloth.ai/docs/new/faster-moe See translation 1 reply · 🔥 29 29 🤗 5 5 + Reply
TeichAI/Qwen3-4B-Thinking-2507-Claude-4.5-Opus-High-Reasoning-Distill-GGUF 4B • Updated Dec 9, 2025 • 2.3k • 26
view reply I got qwen3-coder-next up 24tps when i removed -nkvo and kept -kvu. Think i could increase it further if i took the time to compile llama.cpp and not use docker image.
[models] GTX 1660 Super 6gb Collection The best little card under 100 euros. Full Precision vs Quants not benchmarked. This card is so much better at running inference than you realize. • 11 items • Updated Feb 22 • 2