DF3DV-1K: A Large-Scale Dataset and Benchmark for Distractor-Free Novel View Synthesis
Abstract
A large-scale real-world dataset called DF3DV-1K is introduced to address the lack of clean and cluttered image sets for distractor-free radiance field research, containing 1,048 scenes with 89,924 images across 128 distractor types and 161 scene themes, along with a curated subset DF3DV-41 for robustness evaluation, and demonstrates improved performance when used to fine-tune a diffusion-based 2D enhancer for radiance field methods.
Advances in radiance fields have enabled photorealistic novel view synthesis. In several domains, large-scale real-world datasets have been developed to support comprehensive benchmarking and to facilitate progress beyond scene-specific reconstruction. However, for distractor-free radiance fields, a large-scale dataset with clean and cluttered images per scene remains lacking, limiting the development. To address this gap, we introduce DF3DV-1K, a large-scale real-world dataset comprising 1,048 scenes, each providing clean and cluttered image sets for benchmarking. In total, the dataset contains 89,924 images captured using consumer cameras to mimic casual capture, spanning 128 distractor types and 161 scene themes across indoor and outdoor environments. A curated subset of 41 scenes, DF3DV-41, is systematically designed to evaluate the robustness of distractor-free radiance field methods under challenging scenarios. Using DF3DV-1K, we benchmark nine recent distractor-free radiance field methods and 3D Gaussian Splatting, identifying the most robust methods and the most challenging scenarios. Beyond benchmarking, we demonstrate an application of DF3DV-1K by fine-tuning a diffusion-based 2D enhancer to improve radiance field methods, achieving average improvements of 0.96 dB PSNR and 0.057 LPIPS on the held-out set (e.g., DF3DV-41) and the On-the-go dataset. We hope DF3DV-1K facilitates the development of distractor-free vision and promotes progress beyond scene-specific approaches. The dataset and leaderboard are available at https://johnnylu305.github.io/df3dv1k_web/.
Community
DF3DV-1K, a large-scale real-world dataset for distractor-free novel view synthesis, comprising 1,000+ scenes with clean and cluttered images per scene, together with DI2FIX (Distractor-Free DIFIX), a diffusion-based enhancement module that improves radiance field renderings.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- DeblurNVS: Geometric Latent Diffusion for Novel View Synthesis from Sparse Motion-Blurred Images (2026)
- Generalizable Sparse-View 3D Reconstruction from Unconstrained Images (2026)
- 3DReflecNet: A Large-Scale Dataset for 3D Reconstruction of Reflective, Transparent, and Low-Texture Objects (2026)
- Wild3R: Feed-Forward 3D Gaussian Splatting from Unconstrained Sparse Photo Collection (2026)
- Diffusion-guided Generalizable Enhancer for Urban Scene Reconstruction (2026)
- Sparse-View 3D Gaussian Splatting in the Wild (2026)
- P2GS: Physical Prior-guided Gaussian Splatting for Photometrically Consistent Urban Reconstruction (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Good recommendation! DF3DV may be particularly effective at improving multi-view inconsistency issues caused by transient objects in in-the-wild settings, as it specifically focuses on "casually captured image sequences acquired within a short time span", where transient objects may move locally across only a few frames.
In addition, DF3DV contains approximately 30~50 images per scene on average. This not only supports conventional view settings but also enables evaluation under sparse-view settings, making the dataset applicable to a wider range of novel view synthesis scenarios.
DF3DV may also be a dataset for image restoration, as we demonstrate its effectiveness as a training dataset for the fixer model in the paper.
Neat paper. It feels like we have been waiting for a dataset that actually accounts for messy, real-world backgrounds instead of just clean, lab-controlled scenes. Having both clean and cluttered versions of the same scenes seems like a really smart way to push radiance field methods to be more robust for everyday use.
I am curious if you found any specific distractor types that were consistently impossible to filter out, or if the diffusion-based enhancer handles almost everything?
I made a podcast on it with ResearchPod, it makes it easy to get the key concepts on the go:
https://researchpod.app/episode/0bfeafdb-cb14-4cb5-a871-46ab79f34c88
Models citing this paper 0
No model linking this paper
Datasets citing this paper 2
ChengYou305/DF3DV-1K-Fixer
Spaces citing this paper 0
No Space linking this paper