video understanding, vision language models, information retrieval
PEEK: Picking Essential frames via Efficient Knowledge distillation
Query-free frame selection + captioning for video