arxiv:2604.13416

DF3DV-1K: A Large-Scale Dataset and Benchmark for Distractor-Free Novel View Synthesis

Published on Jun 18

· Submitted by

Authors:

Abstract

A large-scale real-world dataset called DF3DV-1K is introduced to address the lack of clean and cluttered image sets for distractor-free radiance field research, containing 1,048 scenes with 89,924 images across 128 distractor types and 161 scene themes, along with a curated subset DF3DV-41 for robustness evaluation, and demonstrates improved performance when used to fine-tune a diffusion-based 2D enhancer for radiance field methods.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Advances in radiance fields have enabled photorealistic novel view synthesis. In several domains, large-scale real-world datasets have been developed to support comprehensive benchmarking and to facilitate progress beyond scene-specific reconstruction. However, for distractor-free radiance fields, a large-scale dataset with clean and cluttered images per scene remains lacking, limiting the development. To address this gap, we introduce DF3DV-1K, a large-scale real-world dataset comprising 1,048 scenes, each providing clean and cluttered image sets for benchmarking. In total, the dataset contains 89,924 images captured using consumer cameras to mimic casual capture, spanning 128 distractor types and 161 scene themes across indoor and outdoor environments. A curated subset of 41 scenes, DF3DV-41, is systematically designed to evaluate the robustness of distractor-free radiance field methods under challenging scenarios. Using DF3DV-1K, we benchmark nine recent distractor-free radiance field methods and 3D Gaussian Splatting, identifying the most robust methods and the most challenging scenarios. Beyond benchmarking, we demonstrate an application of DF3DV-1K by fine-tuning a diffusion-based 2D enhancer to improve radiance field methods, achieving average improvements of 0.96 dB PSNR and 0.057 LPIPS on the held-out set (e.g., DF3DV-41) and the On-the-go dataset. We hope DF3DV-1K facilitates the development of distractor-free vision and promotes progress beyond scene-specific approaches. The dataset and leaderboard are available at https://johnnylu305.github.io/df3dv1k_web/.

View arXiv page View PDF Project page GitHub 25 Add to collection

Community

ChengYou305

Paper submitter 1 day ago

DF3DV-1K, a large-scale real-world dataset for distractor-free novel view synthesis, comprising 1,000+ scenes with clean and cluttered images per scene, together with DI²FIX (Distractor-Free DIFIX), a diffusion-based enhancement module that improves radiance field renderings.

librarian-bot

about 18 hours ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

ChengYou305

about 17 hours ago

Good recommendation! DF3DV may be particularly effective at improving multi-view inconsistency issues caused by transient objects in in-the-wild settings, as it specifically focuses on "casually captured image sequences acquired within a short time span", where transient objects may move locally across only a few frames.

In addition, DF3DV contains approximately 30~50 images per scene on average. This not only supports conventional view settings but also enables evaluation under sparse-view settings, making the dataset applicable to a wider range of novel view synthesis scenarios.

DF3DV may also be a dataset for image restoration, as we demonstrate its effectiveness as a training dataset for the fixer model in the paper.

noahml

about 2 hours ago

Neat paper. It feels like we have been waiting for a dataset that actually accounts for messy, real-world backgrounds instead of just clean, lab-controlled scenes. Having both clean and cluttered versions of the same scenes seems like a really smart way to push radiance field methods to be more robust for everyday use.

I am curious if you found any specific distractor types that were consistently impossible to filter out, or if the diffusion-based enhancer handles almost everything?

I made a podcast on it with ResearchPod, it makes it easy to get the key concepts on the go:
https://researchpod.app/episode/0bfeafdb-cb14-4cb5-a871-46ab79f34c88