The models for the paper: Hybrid Linear Attention Done Right: Efficient Distillation and Effective Architectures for Extremely Long Contexts
Yingfa Chen
chen-yingfa
AI & ML interests
Long-context modeling, continual learning, architectures
Recent Activity
authored a paper 2 days ago
MiniCPM-SALA: Hybridizing Sparse and Linear Attention for Efficient Long-Context Modeling authored a paper 2 days ago
Student-in-the-Loop Chain-of-Thought Distillation via Generation-Time Selection authored a paper 2 days ago
DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side DevicesOrganizations
None yet