WE-GS: An Efficient 3D Gaussian Representation for Unconstrained Photo Collections

Reconstruction Scenes with Variant Appearance and Transient Occluders

WE-GS reconstruct scene from unconstrained photo collections in less than 2 hours (15× faster than NeRF-based methods) and can achieve highly realistic novel view synthesis and novel appearance synthesis in real-time, operating at 181 frames FPS (2000 × faster than NeRF-based methods). Compared to vanilla 3DGS, WE-GS achieved an average 6.6 dB PSNR improvement in rendering quality on the PhotoTourism dataset, with over a 2× reduction in storage.

Overview

Novel View Synthesis (NVS) from unconstrained photo collections is challenging in computer graphics. Recently, 3D Gaussian Splatting (3DGS) has shown promise for photorealistic and real-time NVS of static scenes. Building on 3DGS, we propose an efficient point-based differentiable rendering framework for scene reconstruction from photo collections. Our key innovation is a residual-based spherical harmonic coefficients transfer module that adapts 3DGS to varying lighting conditions and photometric post-processing. This lightweight module can be pre-computed and ensures efficient gradient propagation from rendered images to 3D Gaussian attributes. Additionally, we observe that the appearance encoder and the transient mask predictor, the two most critical parts of NVS from unconstrained photo collections, can be mutually beneficial. We introduce a plug-and-play lightweight spatial attention module to simultaneously predict transient occluders and latent appearance representation for each image. After training and preprocessing, our method aligns with the standard 3DGS format and rendering pipeline, facilitating seamlessly integration into various 3DGS applications. Extensive experiments on diverse datasets show our approach outperforms existing approaches on the rendering quality of novel view and appearance synthesis with high converge and rendering speed.

Video

For the interactive visual comparison below, loading a large number of videos may take some time and resources. You can choose to balance this by watching the supplementary video.