Multi-View Large Reconstruction Model via Geometry-Aware Positional Encoding and Attention

Mengfei Li¹, Xiaoxiao Long^1†, Yixun Liang¹, Weiyu Li¹, Yuan Liu², Peng Li¹,
Wenhan Luo¹, Wenping Wang², Yike Guo^1†

†

¹HKUST, ²HKU

Abstract

Despite recent advancements in the Large Reconstruction Model (LRM) demonstrating impressive results, when extending its input from single image to multiple images, it exhibits inefficiencies, subpar geometric and texture quality, as well as slower convergence speed than expected. It is attributed to that, LRM formulates 3D reconstruction as a naive images-to-3D translation problem, ignoring the strong 3D coherence among the input images. In this paper, we propose a Multi-view Large Reconstruction Model (M-LRM) designed to reconstruct high-quality 3D shapes from multi-views in a 3D-aware manner. Specifically, we introduce a multi-view consistent cross-attention scheme to enable M-LRM to accurately query information from the input images. Moreover, we employ the 3D priors of the input multi-view images to initialize the triplane tokens. Compared to previous methods, the proposed M-LRM can generate 3D shapes of high fidelity. Experimental studies demonstrate that our model achieves a significant performance gain and faster training convergence.

BibTeX

@article{li2024m, title={M-LRM: Multi-view Large Reconstruction Model}, author={Li, Mengfei and Long, Xiaoxiao and Liang, Yixun and Li, Weiyu and Liu, Yuan and Li, Peng and Chi, Xiaowei and Qi, Xingqun and Xue, Wei and Luo, Wenhan and others}, journal={arXiv preprint arXiv:2406.07648}, year={2024} }

Multi-View Large Reconstruction Model via Geometry-Aware Positional Encoding and Attention

Abstract

Sparse-view Reconstruction

Single-image to 3D

Acknowledgement

BibTeX