2/27/2026 | USA | technology | ✓ Verified - arxiv.org

BetterScene: 3D Scene Synthesis with Representation-Aligned Generative Model

#BetterScene #3D Scene Synthesis #Novel View Synthesis #Stable Video Diffusion #3D Gaussian Splatting #Computer Vision #Diffusion Models

📌 Key Takeaways

BetterScene enhances novel view synthesis quality using extremely sparse photos
It introduces temporal equivariance regularization and vision foundation model-aligned representation
The approach integrates 3D Gaussian Splatting to render features for the SVD enhancer
It demonstrates superior performance on the DL3DV-10K dataset compared to existing methods

📖 Full Retelling

Researchers Yuci Han, Charles Toth, John E. Anderson, William J. Shuart, and Alper Yilmaz introduced BetterScene, a novel approach to 3D scene synthesis that enhances novel view synthesis quality using extremely sparse photos, on the arXiv preprint server on February 26, 2026, addressing persistent challenges in view consistency and artifact reduction in conventional methods. The BetterScene approach leverages the production-ready Stable Video Diffusion (SVD) model pretrained on billions of frames as a strong backbone, investigating the latent space of the diffusion model and introducing two innovative components: temporal equivariance regularization and vision foundation model-aligned representation, both applied to the variational autoencoder module within the SVD pipeline. Unlike conventional methods that typically rely on off-the-shelf pretrained diffusion priors and only fine-tune the UNet module while keeping other components frozen, BetterScene integrates a feed-forward 3D Gaussian Splatting (3DGS) model to render features as inputs for the SVD enhancer, enabling the generation of continuous, artifact-free, consistent novel views that overcome the limitations of previous approaches that still produced inconsistent details even when incorporating geometry-aware regularizations.

🏷️ Themes

3D Scene Synthesis, Computer Vision, Generative Models

📚 Related People & Topics

Computer vision

Computerized information extraction from images

Computer vision tasks include methods for acquiring, processing, analyzing, and understanding digital images, and extraction of high-dimensional data from the real world in order to produce numerical or symbolic information, e.g. in the form of decisions. "Understanding" in this context signifies th...

View Profile → Wikipedia ↗

Entity Intersection Graph

Connections for Computer vision:

🌐 Diffusion model 3 shared

🌐 Hallucination 2 shared

🌐 Vehicular automation 1 shared

🌐 Uncertainty quantification 1 shared

🌐 Monocular 1 shared

View full profile

Original Source

              --> Computer Science > Computer Vision and Pattern Recognition arXiv:2602.22596 [Submitted on 26 Feb 2026] Title: BetterScene: 3D Scene Synthesis with Representation-Aligned Generative Model Authors: Yuci Han , Charles Toth , John E. Anderson , William J. Shuart , Alper Yilmaz View a PDF of the paper titled BetterScene: 3D Scene Synthesis with Representation-Aligned Generative Model, by Yuci Han and 4 other authors View PDF HTML Abstract: We present BetterScene, an approach to enhance novel view synthesis quality for diverse real-world scenes using extremely sparse, unconstrained photos. BetterScene leverages the production-ready Stable Video Diffusion model pretrained on billions of frames as a strong backbone, aiming to mitigate artifacts and recover view-consistent details at inference time. Conventional methods have developed similar diffusion-based solutions to address these challenges of novel view synthesis. Despite significant improvements, these methods typically rely on off-the-shelf pretrained diffusion priors and fine-tune only the UNet module while keeping other components frozen, which still leads to inconsistent details and artifacts even when incorporating geometry-aware regularizations like depth or semantic conditions. To address this, we investigate the latent space of the diffusion model and introduce two components: (1) temporal equivariance regularization and (2) vision foundation model-aligned representation, both applied to the variational autoencoder module within the SVD pipeline. BetterScene integrates a feed-forward 3D Gaussian Splatting (3DGS) model to render features as inputs for the SVD enhancer and generate continuous, artifact-free, consistent novel views. We evaluate on the challenging DL3DV-10K dataset and demonstrate superior performance compared to state-of-the-art methods. Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI) Cite as: arXiv:2602.22596 [cs.CV] (or arXiv:2602.22596v1 [cs.CV] ...
            

Read full article at source

Source

arxiv.org

BetterScene: 3D Scene Synthesis with Representation-Aligned Generative Model

📌 Key Takeaways

📖 Full Retelling

🏷️ Themes

📚 Related People & Topics

Computer vision

Entity Intersection Graph

Source

More from USA

News from Other Countries

🇬🇧 United Kingdom

🇺🇦 Ukraine