Papers
arxiv:2504.06629

Analyzing the Training Dynamics of Image Restoration Transformers: A Revisit to Layer Normalization

Published on Apr 9, 2025
Authors:
,
,
,
,
,

Abstract

Image restoration transformers suffer from layer normalization issues causing feature magnitude divergence and entropy collapse, which are addressed through a tailored normalization approach that adapts to input-specific statistics.

AI-generated summary

This work analyzes the training dynamics of Image Restoration (IR) Transformers and uncovers a critical yet overlooked issue: conventional LayerNorm (LN) drives feature magnitudes to diverge to a million scale and collapses channel-wise entropy. We analyze this in the perspective of networks attempting to bypass LN's constraints that conflict with IR tasks. Accordingly, we address two misalignments between LN and IR: 1) per-token normalization disrupts spatial correlations, and 2) input-independent scaling discards input-specific statistics. To address this, we propose Image Restoration Transformer Tailored Layer Normalization i-LN, a simple drop-in replacement that normalizes features holistically and adaptively rescales them per input. We provide theoretical insights and empirical evidence that this simple design effectively leads to both improved training dynamics and thereby improved performance, validated by extensive experiments.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2504.06629 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2504.06629 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2504.06629 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.