Papers
arxiv:2603.14704

Chain-of-Trajectories: Unlocking the Intrinsic Generative Optimality of Diffusion Models via Graph-Theoretic Planning

Published on Mar 16
· Submitted by
Ping Chen
on Mar 18
Authors:
,
,
,
,
,
,
,
,

Abstract

Chain-of-Trajectories framework enables deliberative planning for diffusion models by using Diffusion DNA to dynamically allocate computational resources based on denoising difficulty.

AI-generated summary

Diffusion models operate in a reflexive System 1 mode, constrained by a fixed, content-agnostic sampling schedule. This rigidity arises from the curse of state dimensionality, where the combinatorial explosion of possible states in the high-dimensional noise manifold renders explicit trajectory planning intractable and leads to systematic computational misallocation. To address this, we introduce Chain-of-Trajectories (CoTj), a train-free framework enabling System 2 deliberative planning. Central to CoTj is Diffusion DNA, a low-dimensional signature that quantifies per-stage denoising difficulty and serves as a proxy for the high-dimensional state space, allowing us to reformulate sampling as graph planning on a directed acyclic graph. Through a Predict-Plan-Execute paradigm, CoTj dynamically allocates computational effort to the most challenging generative phases. Experiments across multiple generative models demonstrate that CoTj discovers context-aware trajectories, improving output quality and stability while reducing redundant computation. This work establishes a new foundation for resource-aware, planning-based diffusion modeling. The code is available at https://github.com/UnicomAI/CoTj.

Community

Paper author Paper submitter

CoTj (Chain-of-Trajectories: Unlocking the Intrinsic Generative Optimality of Diffusion Models via Graph-Theoretic Planning)

🧭 Description

CoTj (Chain-of-Trajectories) is a graph-theoretic trajectory planning framework for diffusion models.
It upgrades the standard, fixed-step denoising schedules (System 1) into condition-adaptive, optimally planned trajectories (System 2), enabling flexible, high-fidelity image generation under varying prompts and constraints.

CoTj establishes an offline graph for each condition, searches for optimal denoising paths, and supports both fixed-step optimal sequences and adaptive-length planning to reduce sampling steps without sacrificing output quality.

The latest full paper PDF (CoTj_v20260305.pdf) is included in this repository, and we recommend reading the repo version for the most up-to-date manuscript. The paper is also available via the arXiv.

image

💡 Core Highlights & Breakthroughs

  • 🧠 "System 2" Global Planning: CoTj ends the "blind-box" generation of traditional diffusion models. By extracting a Diffusion DNA in just 0.073ms to quantify generation difficulty, it transforms high-dimensional generation into a graph-theoretic shortest path problem. It takes shortcuts for simple scenes and meticulously refines complex ones, enabling truly deliberate, planned generation.

  • Trajectory Reachability & Emergent Acceleration: Fewer steps don’t imply lower quality. Following geometrically optimal paths ensures high-fidelity latent endpoints remain reachable. A 10-step CoTj reconstruction can surpass multi-step baselines. This precise trajectory optimization naturally produces emergent inference acceleration and seamlessly integrates with cache-adaptive acceleration, reusing computation in high-information-density regions.

  • 🛣️ Trajectory Routing > Solvers: Choosing the right path matters more than stacking high-order solvers. Even under low computational budgets, CoTj demonstrates superior image quality and proves that optimal trajectory planning outweighs solver complexity.

  • 🎬 Robust Video Generation: Validated on Wan2.2, CoTj reveals the Generative Hierarchy principle: stabilize structure first, then animate. By prioritizing fidelity, it eliminates frame collapse and "pseudo-motion" seen in low-step baselines, producing smooth and coherent motion dynamics.

  • 🩺 Model "X-Ray" Diagnostics: Diffusion DNA also functions as a structural diagnostic tool, transparently revealing hidden issues like over-cooking and non-convergence in the late stages of certain distilled models.

📢 Highlights

🚀 Diffusion models officially enter the "System 2" global planning era!

The newly open-sourced, train-free CoTj framework from China Unicom AI Institute enables diffusion models to leave behind "blind-box" generation and gain human-like global planning capability. By extracting Diffusion DNA in just 0.073ms to quantify generation difficulty, high-dimensional generation is transformed into a graph-theoretic shortest path problem. Simple prompts take shortcuts, while complex descriptions are refined meticulously — achieving truly deliberate, planned generation.

1️⃣ Trajectory reachability & emergent acceleration: Fewer steps don’t mean lower quality. By following geometrically optimal paths, high-fidelity latent endpoints remain fully reachable. A 10-step CoTj reconstruction can surpass a baseline with dozens of steps. This precise trajectory optimization directly produces emergent inference acceleration, eliminating redundant computation. It also naturally supports cache-adaptive acceleration, targeting high-information-density regions for computation reuse.

2️⃣ Right path, exponential effect: Even at low computational budgets, image quality is dramatically improved. Data proves that finding the right trajectory outweighs merely stacking high-order solvers.

3️⃣ Robust video generation: Tested on Wan2.2, CoTj reveals the Generative Hierarchy principle: stabilize structure first, then animate. This approach eliminates frame collapse and "pseudo-motion" seen in low-step baselines, prioritizing fidelity to produce smooth dynamic content.

4️⃣ Model "X-Ray" diagnostics: Diffusion DNA can also serve as a structural diagnostic tool, exposing issues in certain distilled models such as over-cooking and late-stage non-convergence.


🚀 Quick Start

CoTj can be directly used with the Qwen-Image pipeline. Example usage:

from CoTj_pipeline_qwenimage import CoTjQwenImagePipeline
import os

model_path = '~/.cache/modelscope/hub/models/Qwen/Qwen-Image/'
mlp_path = './prompt_models/qwenimage_mlp_models/'
device = 'cuda:0'

pipe = None
cotj = CoTjQwenImagePipeline(model_path=model_path, mlp_path=mlp_path, pipe=pipe, device=device)

prompt = "一位身着深蓝色Polo衫的年轻女性研究员,胸前印有“Unicom”的红色Logo,正对镜头自信微笑,在充满科技感的数据中心透明的玻璃幕墙上,用黑色马克笔清晰地写着:“CoTj 让生成式 AI 从‘盲人摸象’的固定模式,迈入‘智能规划’的自适应时代。”"


num_inference_steps = 10

# Baseline Euler sampling
pipe_image = cotj.get_pipe_image(prompt, 
                                 num_inference_steps=num_inference_steps, 
                                 width=1664, 
                                 height=928,
                                 seed=42)

# Fixed-Step Planning
prompt_cotj_image_fixed = cotj.get_prompt_cotj_image_fixed_step(prompt, 
                                                                num_inference_steps=num_inference_steps, 
                                                                width=1664, 
                                                                height=928,
                                                                seed=42)

# Adaptive-Length Planning
prompt_cotj_image_adaptive = cotj.get_prompt_cotj_image_adaptive_step(prompt, 
                                                                      inference_steps_max=50, 
                                                                      fidelity_target=0.99, 
                                                                      width=1664, 
                                                                      height=928,
                                                                      seed=42)

For a complete demo, see CoTj_qwenimage_demo.ipynb.

Note: This example uses Qwen-Image with the default Euler sampler.

🌟 Acknowledgements

This implementation is built upon the Hugging Face Diffusers library.


📖 Citation

If you find CoTj useful, please consider citing:

@article {chen2026cotj,
  title   = {Chain-of-Trajectories: Unlocking the Intrinsic Generative Optimality of Diffusion Models via Graph-Theoretic Planning},
  author  = {Chen, Ping and Liu, Xiang and Zhang, Xingpeng and Shen, Fei and Gong, Xun and Liu, Zhaoxiang and Chen, Zezhou and Hu, Huan and Wang, Kai and Lian, Shiguo},
  journal = {arXiv preprint arXiv:2603.14704},
  year    = {2026}
}

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2603.14704 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2603.14704 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2603.14704 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.