Papers
arxiv:2606.13432

OmniDirector: General Multi-Shot Camera Cloning without Cross-Paired Data

Published on Jun 11
· Submitted by
yawenluo
on Jun 15
#1 Paper of the day
Authors:
,
,
,
,
,
,
,

Abstract

A unified framework for camera motion cloning that uses grid motion videos as representation and integrates multimodal diffusion transformers for enhanced video generation control.

Cloning camera motion from reference videos is an important task in video generation, as videos provide intuitive and precise control. Existing methods either directly use parametric representations that fail to handle multi-shot generation or synthesize cross-paired data, which suffer from data scarcity, resulting in poor performance in complicated camera motion cloning. To address these issues, we introduce a general camera motion representation that encodes cameras as grid motion videos. This camera grid represents the camera parameters visually and supports the integration of diverse trajectories for multi-shot video generation. Building upon this, we propose OmniDirector, a unified framework trained on a million-scale camera grid-video pairs that coordinates characters, actions, and cameras to provide director-level control for multimodal diffusion transformers. Furthermore, we design a novel hierarchical prompt expansion agent that harmoniously integrates different control signals by systematically describing camera motion and visual content through understanding signal relationships. Extensive experiments demonstrate the superior performance and outstanding controllability of our framework. Project page: https://ymlinfeng.github.io/OmniDirector.github.io/

Community

Paper author

Paper author Paper submitter

We propose OmniDirector to clone diverse camera motions from multi-shot videos to animate source images, which is achieved through our proposed camera grid representation. We also design a hierarchical prompt expansion agent that harmoniously integrates multimodal control signals.

Cool paper - I liked the way "OmniDirector: General Multi-Shot Camera Cloning without Cross-Paired Data" frames the problem without making it feel too abstract.

Curious if you think this would still work once the setup gets messier in the wild?

I made a podcast on it with ResearchPod, it makes it easy to get the key concepts on the go:
https://researchpod.app/episode/e0c15be4-22e6-46b6-a2f0-2fc74a30027f

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2606.13432
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.13432 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.13432 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.13432 in a Space README.md to link it from this page.

Collections including this paper 1