RTP-LLM/Qwen3-Coder-30B-A3B-Instruct-RTPurbo
31B • Updated • 1 • 2
None defined yet.
Rethinking Cross-Layer Information Routing in Diffusion Transformers
Full Attention Strikes Back: Transferring Full Attention into Sparse within Hundred Training Steps