M-LRM: Multi-view Large Reconstruction Model

Mengfei Li1, Xiaoxiao Long1†, Yixun Liang3, Weiyu Li1, Yuan Liu2, Peng Li1, Xiaowei Chi1, Xingqun Qi1,
Wei Xue1, Wenhan Luo1, Qifeng Liu1†, Yike Guo1

Corresponding authors.   
1HKUST, 2HKU, 3HKUST(GZ)
teaser

Abstract

Despite recent advancements in the Large Reconstruction Model (LRM) demonstrating impressive results, when extending its input from single image to multiple images, it exhibits inefficiencies, subpar geometric and texture quality, as well as slower convergence speed than expected. It is attributed to that, LRM formulates 3D reconstruction as a naive images-to-3D translation problem, ignoring the strong 3D coherence among the input images. In this paper, we propose a Multi-view Large Reconstruction Model (M-LRM) designed to efficiently reconstruct high-quality 3D shapes from multi-views in a 3D-aware manner. Specifically, we introduce a multi-view consistent cross-attention scheme to enable M-LRM to accurately query information from the input images. Moreover, we employ the 3D priors of the input multi-view images to initialize the tri-plane tokens. Compared to LRM, the proposed M-LRM can produce a tri-plane NeRF with 128 * 128 resolution and generate 3D shapes of high fidelity. Experimental studies demonstrate that our model achieves a significant performance gain and faster training convergence than LRM.


pipeline

Pipeline. Overview of M-LRM. M-LRM is a fully differentiable transformer-based framework, featuring a feature encoder, geometry-aware position encoding and a multi-view cross attention block. Given multi-view images with corresponding camera poses, M-LRM incorporates the 2D and 3D features to efficiently conduct 3D-aware multi-view attention. The proposed geometry-aware position encoding allows more detailed and realistic 3D generation..

Sparse-view Reconstruction

Single-image to 3D

Acknowledgement

This work is mainly supported by Hong Kong Generative AI Research & Development Center (HKGAI) led by Prof. Yike Guo. We are grateful for the necessary GPU resources provided by Hong Kong University of Science and Technology (HKUST).

BibTeX

@article{li2024m,   
  title={M-LRM: Multi-view Large Reconstruction Model},
  author={Li, Mengfei and Long, Xiaoxiao and Liang, Yixun and Li, Weiyu and Liu, Yuan and Li, Peng and Chi, Xiaowei and Qi, Xingqun and Xue, Wei and Luo, Wenhan and others},
  journal={arXiv preprint arXiv:2406.07648},
  year={2024}
}