MambaTron: Efficient Cross-Modal Point Cloud Enhancement using Aggregate Selective State Space Modeling



1University of Minnesota
2Homothereum

Workshop on Image/Video/Audio Quality in Computer Vision and Generative AI, WACV 2025.

[Conference] [Paper] [arXiv]

Abstract Link to heading

Point cloud enhancement is the process of generating a high-quality point cloud from an incomplete input. This is done by filling in the missing details from a reference like the ground truth via regression, for example. In addition to unimodal image and point cloud reconstruction, we focus on the task of view-guided point cloud completion, where we gather the missing information from an image, which represents a view of the point cloud and use it to generate the output point cloud. With the recent research efforts surrounding state-space models, originally in natural language processing and now in 2D and 3D vision, Mamba has shown promising results as an efficient alternative to the self-attention mechanism. However, there is limited research towards employing Mamba for cross-attention between the image and the input point cloud, which is crucial in multi-modal problems. In this paper, we introduce MambaTron, a Mamba-Transformer cell that serves as a building block for our network which is capable of unimodal and cross-modal reconstruction which includes view-guided point cloud completion.We explore the benefits of Mamba’s long-sequence efficiency coupled with the Transformer’s excellent analytical capabilities through MambaTron. This approach is one of the first attempts to implement a Mamba-based analogue of cross-attention, especially in computer vision. Our model demonstrates a degree of performance comparable to the current state-of-the-art techniques while using a fraction of the computation resources.

More content will be added soon!

BibTeX Link to heading

@InProceedings{Inaganti_2025_WACV,
    author    = {Inaganti, Sai Tarun and Petrenko, Gennady},
    title     = {MambaTron: Efficient Cross-Modal Point Cloud Enhancement using Aggregate Selective State Space Modeling},
    booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV) Workshops},
    month     = {February},
    year      = {2025},
    pages     = {217-227}
}