GQLA: Group-Query Latent Attention for Hardware-Adaptive Large Language Model Decoding Paper • 2605.15250 • Published 16 days ago • 13
MISA: Mixture of Indexer Sparse Attention for Long-Context LLM Inference Paper • 2605.07363 • Published 22 days ago • 12
HISA: Efficient Hierarchical Indexing for Fine-Grained Sparse Attention Paper • 2603.28458 • Published Mar 30 • 44