video understanding, vision language models, information retrieval
PEEK: Picking Essential frames via Efficient Knowledge distillation