Abstract
Determining the relative position and orientation of objects in an environment is a fundamental building block for a wide range of robotics applications. To accomplish this task efficiently in practical settings, a method must be fast, use common sensors, and generalize easily to new objects and environments. We present MSL-RAPTOR, a two-stage algorithm for tracking a rigid body with a monocular camera. The image is first processed by an efficient neural network-based front-end to detect new objects and track 2D bounding boxes between frames. The class label and bounding box is passed to the back-end that updates the object’s pose using an unscented Kalman filter (UKF). The measurement posterior is fed back to the 2D tracker to improve robustness. The object’s class is identified so a class-specific UKF can be used if custom dynamics and constraints are known. Adapting to track the pose of new classes only requires providing a trained 2D object detector or labeled 2D bounding box data, as well as the approximate size of the objects. The performance of MSL-RAPTOR is first verified on the NOCS-REAL275 dataset, achieving results comparable to RGB-D approaches despite not using depth measurements. When tracking a flying drone from onboard another drone, it outperforms the fastest comparable method in speed by a factor of 3, while giving lower translation and rotation median errors by 66% and 23% respectively.
B. Ramtoula and A. Caccavale—Equal contribution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Spica, R., Falanga, D., Cristofalo, E., Montijano, E., Scaramuzza, D., Schwager, M.: A real-time game theoretic planner for autonomous two-player drone racing. In: Proceedings of Robotics: Science and Systems, Pittsburgh, Pennsylvania (2018)
Badue, C., et al.: Self-driving cars: a survey. Expert Syst. Appl. 165, 113816 (2021)
Schmidt, T., Newcombe, R., Fox, D.: DART: dense articulated real-time tracking. In: Proceedings of Robotics: Science and Systems, Berkeley, USA (2014)
Andriluka, M., Roth, S., Schiele, B.: Monocular 3D pose estimation and tracking by detection. In: Conference on Computer Vision and Pattern Recognition, pp. 623–630. IEEE (2010)
Xiang, Y., Schmidt, T., Narayanan, V., Fox, D.: PoseCNN: a convolutional neural network for 6D object pose estimation in cluttered scenes. In: Robotics: Science and Systems, Pittsburgh, Pennsylvania (2018)
Do, T.T., Pham, T., Cai, M., Reid, I.: Real-time monocular object instance 6d pose estimation. In: British Machine Vision Conference (BMVC), vol. 1, p. 6 (2018)
Rad, M., Lepetit, V.: BB8: a scalable, accurate, robust to partial occlusion method for predicting the 3D poses of challenging objects without using depth. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3828–3836 (2017)
Kehl, W., Manhardt, F., Tombari, F., Ilic, S., Navab, N.: SSD-6D: making RGB-based 3D detection and 6D pose estimation great again. In: IEEE International Conference on Computer Vision, pp. 1521–1529 (2017)
Tekin, B., Sinha, S.N., Fua, P.: Real-time seamless single shot 6D object pose prediction. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 292–301 (2018)
Lepetit, V., Moreno-Noguer, F., Fua, P.: EPnP: an accurate O(n) solution to the PnP problem. Int. J. Comput. Vis. 81(2), 155 (2009)
Wang, H., Sridhar, S., Huang, J., Valentin, J., Song, S., Guibas, L.J.: Normalized object coordinate space for category-level 6D object pose and size estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2642–2651 (2019)
Wang, C., Martín-Martín, R., Xu, D., Lv, J., Lu, C., Fei-Fei, L., Savarese, S., Zhu, Y.: 6-PACK: category-level 6D pose tracker with anchor-based keypoints. In: IEEE Conference on Robotics and Automation (ICRA), Paris, France (2020)
Cho, H., Seo, Y.W., Kumar, B.V., Rajkumar, R.R.: A multi-sensor fusion system for moving object detection and tracking in urban driving environments. In: IEEE Conference on Robotics and Automation (ICRA), pp. 1836–1843 (2014)
Buyval, A., Gabdullin, A., Mustafin, R., Shimchik, I.: Realtime vehicle and pedestrian tracking for didi udacity self-driving car challenge. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 2064–2069. IEEE (2018)
Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. arXiv:1804.02767 (2018)
Wang, Q., Zhang, L., Bertinetto, L., Hu, W., Torr, P.H.: Fast online object tracking and segmentation: a unifying approach. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1328–1338 (2019)
Julier, S., Uhlmann, J., Durrant-Whyte, H.F.: A new method for the nonlinear transformation of means and covariances in filters and estimators. IEEE Trans. Autom. Control 45(3), 477–482 (2000)
Wan, E.A., Van Der Merwe, R.: The unscented Kalman filter for nonlinear estimation. In: Proceedings of the IEEE 2000 Adaptive Systems for Signal Processing, Communications, and Control Symposium, pp. 153–158 (2000)
Hartley, R., Zisserman, A.: Multiple view geometry in computer vision. Cambridge University Press (2004)
Jocher, G., guigarfr, perry0418, Ttayu, Veitch-Michaelis, J., Bianconi, G., Baltacı, F., Suess, D., WannaSea, U., IlyaOvodov: ultralytics/yolov3: Rectangular Inference, Conv2d + Batchnorm2d Layer Fusion, April 2019
Kristan, M., Matas, J., Leonardis, A., Vojir, T., Pflugfelder, R., Fernandez, G., Nebehay, G., Porikli, F., Čehovin, L.: A novel performance evaluation methodology for single-target trackers. IEEE Trans. Pattern Anal. Mach. Intell. 38(11), 2137–2155 (2016)
Issac, J., Wuthrich, M., Cifuentes, C.G., Bohg, J., Trimpe, S., Schaal, S.: Depth-based object tracking using a robust gaussian filter. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), Stockholm, Sweden (2016)
Wuthrich, M., Pastor, P., Kalakrishnan, M., Bohg, J., Schaal, S.: Probabilistic object tracking using a range camera. In: 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems (2013)
Zhou, Q.Y., Park, J., Koltun, V.: Open3D: a modern library for 3D data processing. arXiv preprint arXiv:1801.09847 (2018)
Suwajanakorn, S., Snavely, N., Tompson, J.J., Norouzi, M.: Discovery of latent 3D keypoints via end-to-end geometric reasoning. In: Advances in Neural Information Processing Systems, pp. 2059–2070 (2018)
Acknowledgements
This research was supported in part by ONR grant number N00014-18-1-2830, NSF NRI grant 1830402, the Stanford Ford Alliance program, the Mitacs Globalink research award IT15240, and the NSERC Discovery Grant 2019-05165. We are grateful for this support.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Ramtoula, B., Caccavale, A., Beltrame, G., Schwager, M. (2021). MSL-RAPTOR: A 6DoF Relative Pose Tracker for Onboard Robotic Perception. In: Siciliano, B., Laschi, C., Khatib, O. (eds) Experimental Robotics. ISER 2020. Springer Proceedings in Advanced Robotics, vol 19. Springer, Cham. https://doi.org/10.1007/978-3-030-71151-1_46
Download citation
DOI: https://doi.org/10.1007/978-3-030-71151-1_46
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-71150-4
Online ISBN: 978-3-030-71151-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)