Real-time optical flow processing on embedded GPU: an hardware-aware algorithm to implementation strategy | Journal of Real-Time Image Processing Skip to main content
Log in

Real-time optical flow processing on embedded GPU: an hardware-aware algorithm to implementation strategy

  • Original Research Paper
  • Published:
Journal of Real-Time Image Processing Aims and scope Submit manuscript

Abstract

Determining the optical flow of a video is a compute-intensive task essential for computer vision. For achieving this processing in real time, the whole algorithm deployment chain must be thought of for efficiency first. The development is usually divided into two parts: first, designing an algorithm that meets precision constraints, then, implementing and optimizing its execution on the targeted platform. We argue that unifying those operations enhances performance on the embedded processor. This paper is based on an industrial use case of computer vision. The objective is to determine dense optical flow in real time on an embedded GPU platform: the Nvidia AGX Xavier. The CLG (combined local–global) optical flow method, initially chosen, is analyzed to understand the convergence speed of its underlying optimization problem. The Jacobi solver is selected for implementation because of its parallel nature. The whole multi-level processing is then ported to the GPU, using several specific optimization strategies. In particular, we analyze the impact of fusing the solver’s iterations with the roofline model. As a result, with a 30 W power budget, our implementation runs at 60FPS, on \(640 \times 512\) images, with a four-level processing. Hopefully, this example should provide feedback on the issues that arise when trying to port a method to a parallel platform and serve for further implementations of computer vision algorithms on specialized hardware.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Aliaga, J.I., Pérez, J., Quintana-Ortí, E.S.: Systematic fusion of CUDA kernels for iterative sparse linear system solvers. In: Träff, J.L., Hunold, S., Versaci, F. (eds.) Euro-Par 2015: Parallel Processin. Springer, Berlin (2015). https://doi.org/10.1007/978-3-662-48096-052

    Chapter  Google Scholar 

  2. Baker, S., Scharstein, D., Lewis, J.P., Roth, S., Black, M.J., Szeliski, R.: A database and evaluation methodology for optical flow. Int. J. Comput. Vis. 92(1), 1–31 (1993). https://doi.org/10.1007/s11263-010-0390-2

    Article  Google Scholar 

  3. Brox, T., Bruhn, A., Papenberg, N., Weickert, J.: High accuracy optical flow estimation based on a theory for warping. In: Pajdla, T., Matas, J. (eds.) Computer Vision - ECCV. Springer, Berlin (2004). https://doi.org/10.1007/978-3-540-24673-23

    Chapter  Google Scholar 

  4. Bruhn, A., Weickert, J., Schnörr, C.: Lucas/kanade meets horn/schunck: combining local and global optic flow methods. Int. J. Comput. Vis. 61(3), 1–21 (2005). https://doi.org/10.1023/B:VISI.0000045324.43199.43

    Article  MATH  Google Scholar 

  5. Butler, D.J., Wulff, J., Stanley, G.B., Black, M.J.: A naturalistic open source movie for optical flow evaluation. In: Fitzgibbon, A., et al. (eds.) European Conf on Computer Vision (ECCV) Part IV LNCS 7577, pp. 611–625. Springer, Verlag (2012)

    Google Scholar 

  6. Capito, L., Ozguner, U., Redmill, K.: Optical Flow based Visual Potential Field for Autonomous Driving. In: 2020 IEEE Intelligent Vehicles Symposium (IV), pp. 885–891. (2020) https://doi.org/10.1109/IV47402.2020.9304777

  7. Ding, N., Williams, S.: An Instruction Roofline Model for GPUs. In: 2019 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS), pp. 7–18. (2019) https://doi.org/10.1109/PMBS49563.2019.00007

  8. Dougherty, L., Asmuth, J.C., Gefter, W.B.: Alignment of CT lung volumes with an optical flow method. Acad. Radiol. 10(3), 249–254 (2003). https://doi.org/10.1016/S1076-6332(03)80098-3

    Article  Google Scholar 

  9. Farnebäck, G.: Two-frame motion estimation based on polynomial expansion. In: Bigun, J., Gustavsson, T. (eds.) Image Analysis Lecture Notes in Computer Science, pp. 363–370. Springer, Berlin (2003). https://doi.org/10.1007/3-540-45103-X50

    Chapter  Google Scholar 

  10. Filipovič, J., Madzin, M., Fousek, J., Matyska, L.: Optimizing CUDA code by kernel fusion-application on BLAS. J. Supercomput. 71(10), 3934–3957 (2015). https://doi.org/10.1007/s11227-015-1483-z

    Article  Google Scholar 

  11. Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the KITTI dataset. Int. J. Robot. Res. 32(11), 1231–1237 (2010). https://doi.org/10.1177/0278364913491297

    Article  Google Scholar 

  12. Horn, B.K.P., Schunck, B.G.: Determining optical flow. Artif. Intell. 17(1), 185–203 (1993). https://doi.org/10.1016/0004-3702(81)90024-2

    Article  Google Scholar 

  13. Jara-Wilde, J., Cerda, M., Delpiano, J., Härtel, S.: An implementation of combined local-global optical flow. Image Process. On Line 5, 139–158 (2015). https://doi.org/10.5201/ipol.2015.44

    Article  MathSciNet  Google Scholar 

  14. Lucas, B.D., Kanade, T.: An Iterative Image Registration Technique with an Application to Stereo Vision. In: Proceedings of the 7th International Joint Conference on Artificial Intelligence. Morgan Kaufmann. (1981) http://dl.acm.org/citation.cfm?id=1623264.1623280

  15. McGuire, K., de Croon, G., De Wagter, C., Tuyls, K., Kappen, H.: Efficient optical flow and stereo vision for velocity estimation and obstacle avoidance on an autonomous pocket drone. IEEE Robot. Autom. Lett. 2(2), 1070–1076 (2017). https://doi.org/10.1109/LRA.2017.2658940

    Article  Google Scholar 

  16. Moussu, C.: GPU Based Real-Time Optical Flow Computation. Imperial College London, London (2010)

    Google Scholar 

  17. Nguyen, M.T., Castonguay, P., Laurendeau, E.: GPU parallelization of multigrid RANS solver for three-dimensional aerodynamic simulations on multiblock grids. J. Supercomput. 75(5), 2562–2583 (2019). https://doi.org/10.1007/s11227-018-2653-6

    Article  Google Scholar 

  18. Podestá, E., Castro, M., do Nascimento, B.M.: Energy efficient stencil computations on the low-power manycore MPPA-256 processor. In: Aldinucci, M., Padovani, L., Torquati, M. (eds.) Euro-Par 2018: Parallel Processing. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-96983-146

    Chapter  Google Scholar 

  19. Saad, Y.: Iterative Methods for Sparse Linear Systems. SIAM, Philadelphia (2003)

    Book  Google Scholar 

  20. Seznec, M., Gac, N., Orieux, F., Naik, A.S.: An Efficiency-Driven Approach For Real-Time Optical Flow Processing On Parallel Hardware. In: 2020 IEEE International Conference on Image Processing (ICIP), pp. 3055–3059. (2020) https://doi.org/10.1109/ICIP40778.2020.9191164

  21. Shewchuk, J.R.: An Introduction to the Conjugate Gradient Method Without the Agonizing Pain. Carnegie-Mellon University. Department of Computer Science, Pittsburgh (1994)

    Google Scholar 

  22. Sun, D., Roth, S., Black, M.J.: A quantitative analysis of current practices in optical flow estimation and the principles behind them. Int. J. Comput. Vis. 106(2), 115–137 (2014). https://doi.org/10.1007/s11263-013-0644-x

    Article  Google Scholar 

  23. Tabik, S., Ortega, G., Garzón, E.M.: Performance evaluation of kernel fusion BLAS routines on the GPU: iterative solvers as case study. J. Supercomput. 70(2), 577–587 (2014). https://doi.org/10.1007/s11227-014-1102-4

    Article  Google Scholar 

  24. Williams, S.W.: Auto-Tuning Performance on Multicore Computers. EECS Department, University of California, Berkeley (2008)

    Google Scholar 

  25. Woźniakowski, H.: Roundoff-error analysis of a new class of conjugate-gradient algorithms. Linear Algebra Appl. 29, 507–529 (1980). https://doi.org/10.1016/0024-3795(80)90259-1

    Article  MathSciNet  MATH  Google Scholar 

  26. Zach, C., Pock, T., Bischof, H.: A duality based approach for realtime TV-L 1 optical flow. In: Hamprecht, F.A., Schnörr, C., Jähne, B. (eds.) Pattern Recognition, pp. 214–223. Springer, Berlin Heidelberg, Berlin (2007). https://doi.org/10.1007/978-3-540-74936-322

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mickaël Seznec.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Seznec, M., Gac, N., Orieux, F. et al. Real-time optical flow processing on embedded GPU: an hardware-aware algorithm to implementation strategy. J Real-Time Image Proc 19, 317–329 (2022). https://doi.org/10.1007/s11554-021-01187-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11554-021-01187-8

Keywords

Navigation