zhiliang's picture

zhiliang

zzliang

·

pengzhiliang

AI & ML interests

multimodal

Recent Activity

upvoted a paper 10 days ago

Lens: Rethinking Training Efficiency for Foundational Text-to-Image Models

upvoted a paper about 2 months ago

Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding

upvoted a paper 3 months ago

Online Experiential Learning for Language Models

View all activity

Organizations

New activity in microsoft/VibeVoice-1.5B 9 months ago

Possibly helpful info' for Windows users wanting to run this locally.

#10 opened 9 months ago by

Random sounds and music

#5 opened 9 months ago by

VibeVoice's Singing and Music Capabilities

#8 opened 9 months ago by

commented a paper almost 3 years ago

Kosmos-2: Grounding Multimodal Large Language Models to the World

Paper • 2306.14824 • Published Jun 26, 2023 • 36 •