Real-time view-based pose recognition and interpolation for tracking initialization | Journal of Real-Time Image Processing Skip to main content
Log in

Real-time view-based pose recognition and interpolation for tracking initialization

  • Special Issue
  • Published:
Journal of Real-Time Image Processing Aims and scope Submit manuscript

Abstract

In this paper we propose a new approach to real-time view-based pose recognition and interpolation. Pose recognition is particularly useful for identifying camera views in databases, video sequences, video streams, and live recordings. All of these applications require a fast pose recognition process, in many cases video real-time. It should further be possible to extend the database with new material, i.e., to update the recognition system online. The method that we propose is based on P-channels, a special kind of information representation which combines advantages of histograms and local linear models. Our approach is motivated by its similarity to information representation in biological systems but its main advantage is its robustness against common distortions such as clutter and occlusion. The recognition algorithm consists of three steps: (1) low-level image features for color and local orientation are extracted in each point of the image; (2) these features are encoded into P-channels by combining similar features within local image regions; (3) the query P-channels are compared to a set of prototype P-channels in a database using a least-squares approach. The algorithm is applied in two scene registration experiments with fisheye camera data, one for pose interpolation from synthetic images and one for finding the nearest view in a set of real images. The method compares favorable to SIFT-based methods, in particular concerning interpolation. The method can be used for initializing pose-tracking systems, either when starting the tracking or when the tracking has failed and the system needs to re-initialize. Due to its real-time performance, the method can also be embedded directly into the tracking system, allowing a sensor fusion unit choosing dynamically between the frame-by-frame tracking and the pose recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. http://www.ist-matris.org.

  2. By features we denote “a numerical property” [14].

  3. By recognition we denote “identification” aka “the process of associating some observations with a particular instance […] that is already known” [14].

  4. There is no formal requirement for having regularly spaced channels or spatially invariant basis functions. For the purpose of density estimation, these restrictions make however sense and simplify the understanding.

  5. Sampled in a signal processing sense, not in statistical sense.

  6. We based our implementation on the OpenCV library http://sourceforge.net/projects/opencvlibrary/.

  7. Note that the feature extraction needs to be done on the whole image in general, as the localization of relevant areas is unknown at this stage.

References

  1. Agarwal, S., Awan, A., Roth, D.: Learning to detect objects in images via sparse, part-based representation. IEEE Trans. Pattern Anal. Mach. Intell. 26(11), 1475–1490 (2004)

    Article  Google Scholar 

  2. Berg, A., Berg, T., Malik, J.: Shape matching and object recognition using low distortion correspondence. In: IEEE Comput. Vis. Pattern Recognit, vol. 1, pp. 26–33 (2005). doi:10.1109/CVPR.2005.320

  3. Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press, New York (1995)

    Google Scholar 

  4. Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Heidelberg (2006)

  5. Brand, M.: Incremental singular value decomposition of uncertain data with missing values. Technical Report TR-2002-24, Mitsubishi Electric Research Laboratory (2002)

  6. Chen, Q., Defrise, M., Deconinck, F.: Symmetric phase-only matched filtering of Fourier–Mellin transforms for image registration and recognition. Trans. Pattern Anal. Mach. Intell. 16(12), 1156–1168 (1994)

    Article  Google Scholar 

  7. Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, New York (1991)

    MATH  Google Scholar 

  8. Dimitriadou, E., Weingessel, A., Hornik, K.: Fuzzy voting in clustering. In: Fuzzy-Neuro Systems, pp. 63–75. Leipziger Universitätsverlag, Germany (1999)

  9. Farnebäck, G.: Spatial domain methods for orientation and velocity estimation. Lic. Thesis LiU-Tek-Lic-1999:13, Department of EE, Linköping University (1999)

  10. Felsberg, M., Forssén, P.-E., Scharr, H.: Channel smoothing: efficient robust smoothing of low-level signal features. IEEE Trans. Pattern Anal. Mach. Intell. 28(2), 209–222 (2006)

    Article  Google Scholar 

  11. Felsberg, M., Granlund, G.: P-channels: robust multivariate m-estimation of large datasets. In: International Conference on Pattern Recognition, Hong Kong (2006)

  12. Felsberg, M., Hedborg, J.: Real-time visual recognition of objects and scenes using p-channel matching. In: Proceedings of 15th Scandinavian Conference on Image Analysis. LNCS, vol. 4522, pp. 908–917 (2007)

  13. Ferraro, M., Caelli, T.M.: Lie transformation groups, integral transforms, and invariant pattern recognition. Spat. Vis. 8(4), 33–44 (1994)

    Google Scholar 

  14. Fisher, R.B., Dawson-Howe, K., Fitzgibbon, A., Robertson, C., Trucco, E.: Dictionary of Computer Vision and Image Processing. Wiley, London (2005)

  15. Forssén, P.-E.: Low and medium level vision using channel representations. PhD thesis, Linköping University, Sweden (2004)

  16. Gazzaniga, M.S., Ivry, R.B., Mangun, G.R.: Cognitive Neuroscience, 2nd edn. W. W. Norton & Company, New York (2002)

  17. Gopalsamy, K.: Stability of artificial neural networks with impulses. Appl. Math. Comput. 154(3), 783–813 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  18. Granlund, G.H.: The complexity of vision. Signal Process. 74(1), 101–126 (1999)

    Article  MATH  Google Scholar 

  19. Granlund, G.H.: An associative perception–action structure using a localized space variant information representation. In: Proceedings of Algebraic Frames for the Perception–Action Cycle (AFPAC), Kiel, Germany (2000)

  20. Granlund, G.H., Knutsson, H.: Signal Processing for Computer Vision. Kluwer, Dordrecht (1995)

    Google Scholar 

  21. Granlund, G.H., Moe, A.: Unrestricted recognition of 3-d objects for robotics using multi-level triplet invariants. Artif. Intell. Mag. 25(2), 51–67 (2004)

    Google Scholar 

  22. Gustafsson, F.: Adaptive Filtering and Change Detection. Wiley, London (2000)

  23. Hol J, Schön, T.B., Luinge, H., Slycke, P., Gustafsson, F.: Robust real-time tracking by fusing measurements from inertial and vision sensors (2007). doi:10.1007/s11554-007-0040-2

  24. Johansson, B., Elfving, T., Kozlov, V., Censor, Y., Forssén, P.-E., Granlund, G.: The application of an oblique-projected landweber method to a model of supervised learning. Math. Comput. Model. 43, 892–909 (2006)

    Article  MATH  Google Scholar 

  25. Jonsson, E., Felsberg, M.: Reconstruction of probability density functions from channel representations. In: Proceedings of 14th Scandinavian Conference on Image Analysis. LNCS, vol. 3540, pp. 491–500 (2005). doi:10.1007/11499145_50

  26. Jonsson, E., Felsberg, M.: Accurate interpolation in appearance-based pose estimation. In: Proceedings of 15th Scandinavian Conference on Image Analysis. LNCS, vol. 4522, pp. 1–10 (2007)

  27. Knutsson, H., Andersson, M.: Robust N-dimensional orientation estimation using quadrature filters and tensor whitening. In: Proceedings of IEEE International Conference on Acoustics, Speech, & Signal Processing, Adelaide, Australia (1994)

  28. Krüger, N.: Learning object representations using a priori constraints within ORASSYLL. Neural Comput. 13(2), 389–410 (2001)

    Article  Google Scholar 

  29. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)

    Article  Google Scholar 

  30. Mühlich, M., Mester, R.: A considerable improvement in non-iterative homography estimation using TLS and equilibration. Pattern Recognit. Lett. 22, 1181–1189 (2001)

    Article  Google Scholar 

  31. Murphy-Chutorian, E., Aboutalib, S., Triesch, J.: Analysis of a biologically-inspired system for real-time object recognition. Cogn. Sci. Online 3, 1–14 (2005)

    Google Scholar 

  32. Nistér, D., Stewénius, H.: Scalable recognition with a vocabulary tree. In: IEEE Computer Vision and Pattern Recognition, vol. 2, pp. 2161–2168 (2006). doi:10.1109/CVPR.2006.264

  33. Obdržálek, Š., Matas, J.: Sub-linear indexing for large scale object recognition. In: Clocksin, W.F., Fitzgibbon, A.W., Torr, P.H.S. (eds.) BMVC 2005: Proceedings of the 16th British Machine Vision Conference, vol. 1, pp. 1–10. BMVA, London (2005)

  34. Pontil, M., Verri, A.: Support vector machines for 3d object recognition. IEEE Trans. Pattern Anal. Mach. Intell. 20(6), 637–646 (1998)

    Article  Google Scholar 

  35. Roobaert, D., Zillich, M., Eklundh, J.-O.: A pure learning approach to background-invariant object recognition using pedagogical support vector learning. In: IEEE Comput. Vis. Pattern Recognit. 2, 351–357 (2001)

  36. Skoglund, J., Felsberg, M.: Evaluation of subpixel tracking algorithms. In: International Symposium on Visual Computing. LNCS, vol. 4292, pp. 374–382 (2006)

  37. Skoglund, J., Felsberg, M.: Covariance estimation for sad block matching. In: Proceedings of 15th Scandinavian Conference on Image Analysis. LNCS, vol. 4522, pp. 372–382 (2007)

  38. Snippe, H.P., Koenderink, J.J.: Discrimination thresholds for channel-coded systems. Biol. Cybern. 66, 543–551 (1992)

    Article  MATH  Google Scholar 

  39. Chandaria, J., Stricker, D., Thomas, G.: The MATRIS project: real-time markerless camera tracking for AR and broadcast applications. J. Real-Time Image Process (2007, in this issue)

  40. Turk, M., Pentland, A.: Eigenfaces for recognition. J. Cogn. Neurosci. 3(1), 71–86 (1991)

    Article  Google Scholar 

  41. Unser, M.: Splines—a perfect fit for signal and image processing. IEEE Signal Process. Mag. 16, 22–38 (1999)

    Article  Google Scholar 

  42. Vedaldi, A.: An open implementation of SIFT. http://vision.ucla.edu/ vedaldi/code/sift/sift.html. Accessed 23 May 2007

Download references

Acknowledgments

We thank our project partners for providing the test data used in the experiments. We thank in particular Graham Thomas, Jigna Chandaria, Gabriele Bleser, Reinhard Koch, and Kevin Koeser.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael Felsberg.

Additional information

This work has been supported by the CENIIT project CAIRIS (http://www.cvl.isy.liu.se/Research/Object/CAIRIS), EC Grants IST-2003-004176 COSPAL and IST-2002-002013 MATRIS. This paper does not represent the opinion of the European Community, and the European Community is not responsible for any use which may be made of its contents.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Felsberg, M., Hedborg, J. Real-time view-based pose recognition and interpolation for tracking initialization. J Real-Time Image Proc 2, 103–115 (2007). https://doi.org/10.1007/s11554-007-0044-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11554-007-0044-y

Keywords

Navigation