360DVO: Deep Visual Odometry for Monocular 360-Degree Camera

The Hong Kong University of Science and Technology
IEEE Robotics and Automation Letters (RA-L), 2026.

*Indicates Equal Contribution, Indicates Corresponding Author

Abstract

In this paper, we present 360DVO, the first deep learning-based OVO framework. Our approach introduces a distortion-aware spherical feature extractor (DAS-Feat) that adaptively learns distortion-resistant features from 360-degree images. These sparse feature patches are then used to establish constraints for effective pose estimation within a novel omnidirectional differentiable bundle adjustment (ODBA) module. To facilitate evaluation in realistic settings, we also contribute a new real-world OVO benchmark. Extensive experiments on this benchmark and public synthetic datasets (TartanAir V2 and 360VO) demonstrate that 360DVO surpasses state-of-the-art baselines (including 360VO and OpenVSLAM), improving robustness by 50% and accuracy by 37.5%.

Pipeline

Our method takes sequential 360-degree RGB frames as input and extracts matching features and context features using our proposed DAS-Feat module on each of them. In DAS-Feat, the key component SphereResNet extracts distortion-resistant features, allowing patches to be cropped without deformation. After patchifying the matching features around their gradient maxima, we compute the correlation of patch features and context features and estimate optical flow through a recurrent network. In the ODBA module, the pose and depth of current frame are jointly optimized by minimizing the distance between predicted patch (from optical flow) and reprojected patch on the adjacent frame.

Experiments

Video Presentation

Demo Videos

BibTeX

@article{YourPaperKey2024,
  title={Your Paper Title Here},
  author={First Author and Second Author and Third Author},
  journal={Conference/Journal Name},
  year={2024},
  url={https://your-domain.com/your-project-page}
}