SCIENCE CHINA Information Sciences, Volume 64 , Issue 1 : 112204(2021) https://doi.org/10.1007/s11432-019-2690-0

Homography-based camera pose estimation with known gravity direction for UAV navigation

More info
  • ReceivedApr 26, 2019
  • AcceptedSep 27, 2019
  • PublishedDec 14, 2020


Relative pose estimation has become a fundamental and important problem in visual simultaneous localization and mapping. This paper statistically optimizes the solution for the homography-based relative pose estimation problem. Assuming a known gravity direction and a dominant ground plane, the homography representation in the normalized image plane enables a least squares pose estimation between two views. Furthermore, an iterative estimation method of the camera trajectory is developed for visual odometry. The accuracy and robustness of the proposed algorithm are experimentally tested on synthetic and real data in indoor and outdoor environments. Various metrics confirm the effectiveness of the proposed method in practical applications.


This work was partially supported by National Natural Science Foundation of China (Grant Nos. 61603303, 61803309, 61703343), Natural Science Foundation of Shaanxi Province (Grant No. 2018JQ6070), China Postdoctoral Science Foundation (Grant No. 2018M633574), and Fundamental Research Funds for the Central Universities (Grant Nos. 3102019ZDHKY02, 3102018JCC003).


[1] Mur-Artal R, Montiel J M M, Tardos J D. ORB-SLAM: A Versatile and Accurate Monocular SLAM System. IEEE Trans Robot, 2015, 31: 1147-1163 CrossRef Google Scholar

[2] Qu Y H, Zhang F, Wu X W. Cooperative geometric localization for a ground target based on the relative distances by multiple UAVs. Sci China Inf Sci, 2019, 62: 10204 CrossRef Google Scholar

[3] Liu S H, Wang S Q, Shi W H. Vehicle tracking by detection in UAV aerial video. Sci China Inf Sci, 2019, 62: 24101 CrossRef Google Scholar

[4] Nister D. An efficient solution to the five-point relative pose problem.. IEEE Trans Pattern Anal Machine Intell, 2004, 26: 756-770 CrossRef PubMed Google Scholar

[5] Kneip L, Siegwart R, Pollefeys M. Finding the exact rotation between two images independently of the translation. In: Proceedings of the European Conference on Computer Vision, 2012. 696--709. Google Scholar

[6] Hartley R, Zisserman A. Multiple View Geometry in Computer Vision. Cambridge: Cambridge University Press, 2003. Google Scholar

[7] Fischler M A, Firschein O. Readings in computer vision: issues, problem, principles, and paradigms. Elsevier, 2014. Google Scholar

[8] Li H, Hartley R, Kim J H. A linear approach to motion estimation using generalized camera models. In: Proceedings of the International Conference on Vision and Pattern Recognition, 2008. 1--8. Google Scholar

[9] Kneip L, Lynen S. Direct optimization of frame-to-frame rotation. In: Proceedings of the International Conference on Computer Vision, 2013. 2352--2359. Google Scholar

[10] Fischler M A, Bolles R C. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM, 1981, 24: 381-395 CrossRef Google Scholar

[11] Kneip L, Furgale P. Opengv: a unified and generalized approach to real-time calibrated geometric vision. In: Proceedings of the International Conference on Robotics and Automation, 2014. 1--8. Google Scholar

[12] Zhang S J, Cao X B, Zhang F. Monocular vision-based iterative pose estimation algorithm from corresponding feature points. Sci China Inf Sci, 2010, 53: 1682-1696 CrossRef Google Scholar

[13] Fraundorfer F, Tanskanen P, Pollefeys M. A minimal case solution to the calibrated relative pose problem for the case of two known orientation angles. In: Proceedings of the European Conference on Computer Vision, 2010. 269--282. Google Scholar

[14] Kalantari M, Hashemi A, Jung F. A New Solution to the Relative Orientation Problem Using Only 3 Points and the Vertical Direction. J Math Imag Vis, 2011, 39: 259-268 CrossRef Google Scholar

[15] Lee G H, Pollefeys M, Fraundorfer F. Relative pose estimation for a multi-camera system with known vertical direction. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition, 2014. 540--547. Google Scholar

[16] Naroditsky O, Zhou X S, Gallier J. Two efficient solutions for visual odometry using directional correspondence.. IEEE Trans Pattern Anal Mach Intell, 2012, 34: 818-824 CrossRef PubMed Google Scholar

[17] Horn B K P, Hilden H M, Negahdaripour S. Closed-form solution of absolute orientation using orthonormal matrices. J Opt Soc Am A, 1988, 5: 1127-1135 CrossRef Google Scholar

[18] Li H, Duan H B. Verification of monocular and binocular pose estimation algorithms in vision-based UAVs autonomous aerial refueling system. Sci China Technol Sci, 2016, 59: 1730-1738 CrossRef Google Scholar

[19] Kneip L, Chli M, Siegwart R Y. Robust real-time visual odometry with a single camera and an imu. In: Proceedings of the British Machine Vision Conference, British Machine Vision Association, 2011. 1--11. Google Scholar

[20] Bazin J, Hongdong Li J, In So Kweon J. A branch-and-bound approach to correspondence and grouping problems.. IEEE Trans Pattern Anal Mach Intell, 2013, 35: 1565-1576 CrossRef PubMed Google Scholar

[21] Guan B, Vasseur P, Demonceaux C, et al. Visual odometry using a homography formulation with decoupled rotation and translation estimation using minimal solutions. In: Proceedings of the International Conference on Robotics and Automation, 2018. 2320--2327. Google Scholar

[22] Saurer O, Vasseur P, Boutteau R. Homography Based Egomotion Estimation with a Common Direction.. IEEE Trans Pattern Anal Mach Intell, 2017, 39: 327-341 CrossRef PubMed Google Scholar

[23] Saurer O, Fraundorfer F, Pollefeys M. Homography based visual odometry with known vertical direction and weak Manhattan world assumption. In: Proceedings of IROS Workshop on Visual Control of Mobile Robots (ViCoMoR 2012), Vilamoura, 2012. 25--30. Google Scholar

[24] Urban S, Leitloff J, Hinz S. MLPNP--a real-time maximum likelihood solution to the perspective-n-point problem. ISPRS Ann Photogramm Remote Sens Spatial Inf Sci, 2016, III-3: 131-138 CrossRef Google Scholar

[25] Conrad D, Desouza G N. Homography-based ground plane detection for mobile robot navigation using a modified EM algorithm. In: Proceedings of the International Conference on Robotics and Automation, 2010. 910--950. Google Scholar

[26] Faessler M, Fontana F, Forster C. Autonomous, Vision-based Flight and Live Dense 3D Mapping with a Quadrotor Micro Aerial Vehicle. J Field Robotics, 2016, 33: 431-450 CrossRef Google Scholar

[27] Lowe D G. Distinctive Image Features from Scale-Invariant Keypoints. Int J Comput Vision, 2004, 60: 91-110 CrossRef Google Scholar

[28] Zhao C, Fan B, Tian L, Hu J, and Pan Q. Statistical optimization feature matching algorithm based on epipolar geometry (in Chinese). Acta Aeronautica et Astronaut Sin, 2018, 39: 158--166. Google Scholar

[29] Sturm J, Engelhard N, Endres F, et al. A benchmark for the evaluation of rgb-d slam systems. In: Proceedings of the International Conference on Intelligent Robots and Systems, 2012. 573--580. Google Scholar

[30] Huang A S, Bachrach A, Henry P, et al. Visual odometry and mapping for autonomous flight using an rgb-d camera. In: Proceedings of the International Conference on Robotics Research, 2017. 235--252. Google Scholar

  • Figure 1

    (Color online) Camera coordinate systems aligned with the ground plane (called the ground coordinate system). The detected 3D points lie on the ground plane.

  • Figure 2

    (Color online) Evaluation of the HLS, 2-point (2pt), and 5-point (5pt) methods during forward ((a) and (c)) and sideways ((b) and (d)) motions with varying image noise.

  • Figure 3

    (Color online) Evaluation of the HLS, HLS+GN, and 2-point methods during forward ((a)–(d)) and sideways ((e)–(h)) motions with increasing IMU noise from 0$~^{\circ}$ to 1$~^{\circ}$. The image noise is fixed at 0.5 pixels standard deviation.

  • Figure 4

    (Color online) Accuracy of our method in estimating the homography scale factor. We increase image noise from 0 to 1 shown in (a). Also, (b) and (c) report the effect of pitch/roll noise from 0$~^{\circ}$ to 1$~^{\circ}$, and the image noise is fixed at 0.5 pixels standard deviation.

  • Figure 5

    (Color online) A few matched feature points in the images taken from the trajectory estimated by the HLS method. (a) The 1st frame; (b) the 400th frame; (c) the 800th frame; (d) the 1200th frame.

  • Figure 6

    (Color online) Evaluating the trajectory drift errors via the RPE of the HLS method and the 2-point (2pt) method based on the ETH dataset.

  • Figure 7

    (Color online) Relationship between the estimated and true trajectories in outdoor experiment settings.

  • Figure 8

    (Color online) Experimental platform (a) and its environment (b): (1) UWB receiver, (2) downward-looking camera, (3) flight controller with IMU, magnetometer, and other components, (4) four UWB anchors, (5) Intel drone, and (6) start point.

  • Figure 9

    (Color online) Top views of the estimated and the true flight trajectories.


    Algorithm 1 The proposed HLS method

    Require:General corresponding measurements $\bar{X}_{i}$ and $\bar{X}_{j}$, and the known ${\boldsymbol~R}_{i'i}$ and ${\boldsymbol~R}_{j'j}$ at time $i$ and $j$ respectively.

    Output: Rotation matrix ${{\boldsymbol~R}}_{ji}$ and translation vector ${{\boldsymbol~t}}_{ji}$.

    Pre-rotate each measurement according to (4), and then normalize them to get $\bar{X}_{i'}$ and $\bar{X}_{j'}$;

    Calculate the coefficient matrices ${\boldsymbol~A}_{i'}$ and ${\boldsymbol~b}_{i'}$ for all point correspondences according to (17);

    Obtain the closed-form solution ${\boldsymbol~H}_{j'i'}$ according to (19);

    Acquire the estimation result ${{\boldsymbol~R}}_{ji}$ and ${{\boldsymbol~t}}_{ji}$ according to (15);

    Return ${{\boldsymbol~R}}_{ji}$ and ${{\boldsymbol~t}}_{ji}$.


    Algorithm 2 Iterative camera trajectory estimation

    Initialize $d_0$ and set ${\boldsymbol~R}_{00}={\boldsymbol~I},\,{\boldsymbol~t}_{00}=\mathbf{0}$;



    Estimate ${\boldsymbol~R}_{ji}$ and ${\boldsymbol~t}_{ji}$ by Algorithm 1;

    Updata camera pose by ${\boldsymbol~R}_{0j}~\leftarrow~{\boldsymbol~R}_{0i}~{\boldsymbol~R}_{ji}^{\rm~T}$ and ${\boldsymbol~t}_{0j}~\leftarrow~{\boldsymbol~t}_{0i}~-~{\boldsymbol~R}_{0i}{\boldsymbol~R}_{ji}^{\rm{T}}{\boldsymbol~t}_{ji}$;

    Update $d_j$ according to (22);

    Return camera trajectory $\left\{~{\left[{\boldsymbol~R}_{0j}|{\boldsymbol~t}_{0j}\right]}\right\}$;

    Iterate time sequence $i~\leftarrow~i+1$;

    Repeat from step 3 to step 8.