kyutai/Audio-NTREX-4L
Viewer
•
Updated
•
3.6k
•
170
•
3
None defined yet.
CASA: Cross-Attention via Self-Attention for Efficient Vision-Language Fusion
ARC-Encoder: learning compressed text representations for large language models