Abstract
This article describes a probabilistic approach for improving the accuracy of general object pose estimation algorithms. We propose a histogram filter variant that uses the exploration capabilities of robots, and supports active perception through a next-best-view proposal algorithm. For the histogram-based fusion method we focus on the orientation of the 6 degrees of freedom (DoF) pose, since the position can be processed with common filtering techniques. The detected orientations of the object, estimated with a pose estimator, are used to update the hypothesis of its actual orientation. We discuss the design of experiments to estimate the error model of a detection method, and describe a suitable representation of the orientation histograms. This allows us to consider priors about likely object poses or symmetries, and use information gain measures for view selection. The method is validated and compared to alternatives, based on the outputs of different 6 DoF pose estimators, using real-world depth images acquired using different sensors, and on a large synthetic dataset.


















Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
In our experiments the prior was always uniformly distributed, but it can be easily modified to any distribution the discretization allows to represent (which is much more flexible than a parametric model).
We do not apply an additional uncertainty to the estimates after a viewpoint change, as we were using very precise sensor movements to obtain ground truth.
Please note that the box plot drawing library detects outliers based on interquartile range distance, marks them as dots, and excludes them from percentile calculations.
References
Aldoma, A., Marton, Z. C., Tombari, F., Wohlkinger, W., Potthast, C., Zeisl, B., et al. (2012). Tutorial: Point cloud library—three-dimensional object recognition and 6 DoF pose estimation. IEEE Robotics & Automation Magazine, 19(3), 80–91.
Arbel, T., & Ferrie, F. P. (2001). Entropy-based gaze planning. Image and Vision Computing, 19(11), 779–786.
Barequet, G., & Sharir, M. (1994). Partial surface and volume matching in three dimensions. IEEE Transactions PAMI, 19, 929–948.
Barequet, G., & Sharir, M. (1999). Partial surface matching by using directed footprints. Computational Geometry, 12, 45–62.
Bingham, C. (1974). An antipodally symmetric distribution on the sphere. The Annals of Statistics, 2(6), 1201–1225. doi:10.1214/aos/1176342874.
Chen, H., & Bhanu, B. (2007). 3d free-form object recognition in range images using local surface patches. Pattern Recognition Letters, 28(10), 1252–1262. doi:10.1016/j.patrec.2007.02.009.
Chen, S., Li, Y., & Kwok, N. M. (2011). Active vision in robotic systems: A survey of recent developments. IJRR, 30(11), 1343–1377.
Chen, Y., & Medioni, G. (1992). Object modelling by registration of multiple range images. Image and vision computing, 10(3), 145–155. doi:10.1016/0262-8856(92)90066-C.
Denzler, J., & Brown, C. M. (2002). Information theoretic sensor data selection for active object recognition and state estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(2), 145–157.
Doya, K., Ishii, S., Pouget, A., & Rao, R. P. N. (2007). Bayesian brain: Probabilistic approaches to neural coding. Cambridge: The MIT Press.
Drost, B., Ulrich, M., Navab, N., & Ilic, S. (2010). Model globally, match locally: Efficient and robust 3d object recognition. In IEEE conference on computer vision and pattern recognition (CVPR), 2010, pp. 998–1005. doi:10.1109/CVPR.2010.5540108
Eidenberger, R., Grundmann, T., Feiten, W., & Zoellner, R. (2008). Fast parametric viewpoint estimation for active object detection. In IEEE international conference on multisensor fusion and integration for intelligent systems, 2008 (pp. 309–314). IEEE.
Eidenberger, R., Grundmann, T., Schneider, M., Feiten, W., Fiegert, M., Wichert, G.v., et al. (2012). Scene analysis for service robots. In Towards service robots for everyday environments (pp. 181–213). Springer. http://www.springerlink.com/index/N9212641455202J2.pdf
Eidenberger, R., Grundmann, T., & Zoellner, R. (2009). Probabilistic action planning for active scene modeling in continuous high-dimensional domains. In IEEE international conference on robotics and automation, 2009 (pp. 2412–2417). Ieee. doi:10.1109/ROBOT.2009.5152598. http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=5152598
Fitzgibbon, A. (2001). Robust registration of 2D and 3D point sets. In: Proceedings of the British Machine Vision Conference (pp. 662–670). Manchester, UK.
Gao, T., & Koller, D. (2011). Active classification based on value of classifier. In J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira & K. Q. Weinberger (Eds.), Advances in neural information processing systems 24 (NIPS) (pp. 1062–1070). Curran Associates, Inc. http://papers.nips.cc/paper/4340-active-classification-based-on-value-of-classifier.pdf.
Glover, J., Rusu, R., & Bradski, G. (2011). Monte Carlo pose estimation with quaternion kernels and the bingham distribution. In Proceedings of robotics: Science and systems. Los Angeles, CA, USA.
Grzyb, B. J., Castelló, V., & del Pobil, A. P. (2012). Reachable by walking: Inappropriate integration of near and far space may lead to distance errors. In: Szufnarowska, J. (Ed.), Proceedings of the post-graduate conference on robotics and development of cognition (pp. 12–15).
Hough, P. (1962). Method and means for recognizing complex patterns. U.S. Patent 3.069.654.
Kasper, A., Xue, Z., & Dillmann, R. (2012). The KIT object models database: An object model database for object recognition, localization and manipulation in service robotics. IJRR, 31(8), 927–934.
Kriegel, S., Brucker, M., Marton, Z. C., Bodenmüller, T., & Suppa, M. (2013). Combining object modeling and recognition for active scene exploration. In IEEE international conference on intelligent robots and systems (IROS), Tokyo, Japan.
Kriegel, S., Rink, C., Bodenmüller, T., & Suppa, M. (2015). Efficient next-best-scan planning for autonomous 3d surface reconstruction of unknown objects. JRTIP, 10(4), 611–631.
Laporte, C., & Arbel, T. (2006). Efficient discriminant viewpoint selection for active bayesian recognition. International Journal of Computer Vision, 68(3), 267–287. doi:10.1007/s11263-005-4436-9.
Lynott, D., & Connell, L. (2009). Modality exclusivity norms for 423 object properties. Behavior Research Methods, 41(2), 558–564.
Marton, Z. C., Pangercic, D., Blodow, N., & Beetz, M. (2011). Combined 2D–3D categorization and classification for multimodal perception systems. The International Journal of Robotics Research, 30(11), 1378–1402.
Marton, Z. C., Seidel, F., Balint-Benczedi, F., & Beetz, M. (2012). Ensembles of strong learners for multi-cue classification. Pattern Recognition Letters (PRL), Special Isssue on Scene Understandings and Behaviours Analysis.
Mian, A., Bennamoun, M., & Owens, R. (2010). On the repeatability and quality of keypoints for local feature-based 3d object retrieval from cluttered scenes. International Journal of Computer Vision, 89(2–3), 348–361. doi:10.1007/s11263-009-0296-z.
Mian, A. S., Bennamoun, M., & Owens, R. (2006). Three-dimensional model-based object recognition and segmentation in cluttered scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(10), 1584–1601. doi:10.1109/TPAMI.2006.213.
Novatnack, J., & Nishino, K. (2008). Scale-dependent/invariant local 3d shape descriptors for fully automatic registration of multiple sets of range images. ECCV, 3, 440–453.
Papazov, C., Haddadin, S., Parusel, S., Krieger, K., & Burschka, D. (2012). Rigid 3d geometry matching for grasping of known objects in cluttered scenes. The International Journal of Robotics Research, 31(4), 538–553.
Pronobis, A., Mozos, O. M., Caputo, B., & Jensfelt, P. (2010). Multi-modal semantic place classification. The International Journal of Robotics Research (IJRR), 29(2–3), 298–320. doi:10.1177/0278364909356483.
Riedel, S., Marton, Z. C., & Kriegel, S. (2016). Multi-view orientation estimation using Bingham mixture models. In: 2016 IEEE international conference on automation, quality and testing, robotics (AQTR). doi:10.1109/AQTR.2016.7501381
Rink, C., Kriegel, S., Seth, D., Denninger, M., Marton, Z. C., & Bodenmüller, T. (2016). Monte Carlo registration and its application with autonomous robots. Journal of Sensors. doi:10.1155/2016/2546819
Rink, C., Marton, Z. C., Seth, D., Bodenmüller, T., & Suppa, M. (2013). Feature based particle filter registration of 3D surface models and its application in robotics. In IEEE international conference on intelligent robots and systems (IROS), Tokyo, Japan.
Romea, A., & Srinivasa, S. (2010). Efficient multi-view object recognition and full pose estimation. In 2010 IEEE international conference on robotics and automation (ICRA 2010).
Roy, S. D., Chaudhury, S., & Banerjee, S. (2004). Active recognition through next view planning: A survey. Pattern Recognition, 37(3), 429–446.
Rusu, R., Blodow, N., & Beetz, M. (2009). Fast point feature histograms (fpfh) for 3d registration. In IEEE international conference on robotics and automation, 2009. ICRA ’09 (pp. 3212–3217). doi:10.1109/ROBOT.2009.5152473
Rusu, R., Blodow, N., Marton, Z., & Beetz, M. (2008). Aligning point cloud views using persistent feature histograms. In IEEE/RSJ international conference on intelligent robots and systems, 2008. IROS 2008 (pp. 3384–3391). doi:10.1109/IROS.2008.4650967
Selinger, A., & Nelson, R. (2001). Appearance-based object recognition using multiple views. In IN CVPR01 (pp. 905–911).
Sharf, I., Wolf, A., & Rubin, M. (2010). Arithmetic and geometric solutions for average rigid-body rotation. Mechanism and Machine Theory, 45(9), 1239–1251. doi:10.1016/j.mechmachtheory.2010.05.002. http://www.sciencedirect.com/science/article/pii/S0094114X10000790.
Sipe, M. A., & Casasent, D. (2002). Feature space trajectory methods for active computer vision. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(12), 1634–1643. doi:10.1109/TPAMI.2002.1114854.
Thrun, S., Burgard, W., & Fox, D. (2005). Probabilistic robotics. Cambridge, MA: MIT Press.
Tombari, F., & Di Stefano, F. (2012). Hough voting for 3d object recognition under occlusion and clutter. IPSJ Transactions on Computer Vision and Applications, 4, 20–29.
Tombari, F., Salti, S., & Luigi, D. (2010). Unique signatures of histograms for local surface description. In Proceedings of the 11th European conference on computer vision conference on computer vision: Part III, ECCV’10 (pp. 356–369). Springer, Berlin. http://dl.acm.org/citation.cfm?id=1927006.1927035
Vikstén, F., Söderberg, R., Nordberg, K., & Perwass, C. (2006). Increasing pose estimation performance using multi-cue integration. In ICRA (pp. 3760–3767). IEEE. http://dblp.uni-trier.de/db/conf/icra/icra2006.html#VikstenSNP06
Voit, M., & Stiefelhagen, R. (2006). A bayesian approach for multi-view head pose estimation. In IEEE international conference on multisensor fusion and integration for intelligent systems - MFI06. Heidelberg, Germany.
Wahl, E., Hillenbrand, U., & Hirzinger, G. (2003). Surflet-pair-relation histograms: A statistical 3D-shape representation for rapid classification. In 3DIM (pp. 474–481).
Acknowledgements
The authors would like to thank Anas Al-Nuaimi for helpful discussions, and the help of Laura Beckmann with the real-world ground truth for the evaluation.
Author information
Authors and Affiliations
Corresponding author
Additional information
This is one of several papers published in Autonomous Robots comprising the Special Issue on Active Perception.
This work has partly been supported by the EC, under contract number H2020-ICT-645403-ROBDREAM.
Rights and permissions
About this article
Cite this article
Márton, Z.C., Türker, S., Rink, C. et al. Improving object orientation estimates by considering multiple viewpoints. Auton Robot 42, 423–442 (2018). https://doi.org/10.1007/s10514-017-9633-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10514-017-9633-1