Abstract
Determining the optical flow of a video is a compute-intensive task essential for computer vision. For achieving this processing in real time, the whole algorithm deployment chain must be thought of for efficiency first. The development is usually divided into two parts: first, designing an algorithm that meets precision constraints, then, implementing and optimizing its execution on the targeted platform. We argue that unifying those operations enhances performance on the embedded processor. This paper is based on an industrial use case of computer vision. The objective is to determine dense optical flow in real time on an embedded GPU platform: the Nvidia AGX Xavier. The CLG (combined local–global) optical flow method, initially chosen, is analyzed to understand the convergence speed of its underlying optimization problem. The Jacobi solver is selected for implementation because of its parallel nature. The whole multi-level processing is then ported to the GPU, using several specific optimization strategies. In particular, we analyze the impact of fusing the solver’s iterations with the roofline model. As a result, with a 30 W power budget, our implementation runs at 60FPS, on \(640 \times 512\) images, with a four-level processing. Hopefully, this example should provide feedback on the issues that arise when trying to port a method to a parallel platform and serve for further implementations of computer vision algorithms on specialized hardware.
Similar content being viewed by others
References
Aliaga, J.I., Pérez, J., Quintana-Ortí, E.S.: Systematic fusion of CUDA kernels for iterative sparse linear system solvers. In: Träff, J.L., Hunold, S., Versaci, F. (eds.) Euro-Par 2015: Parallel Processin. Springer, Berlin (2015). https://doi.org/10.1007/978-3-662-48096-052
Baker, S., Scharstein, D., Lewis, J.P., Roth, S., Black, M.J., Szeliski, R.: A database and evaluation methodology for optical flow. Int. J. Comput. Vis. 92(1), 1–31 (1993). https://doi.org/10.1007/s11263-010-0390-2
Brox, T., Bruhn, A., Papenberg, N., Weickert, J.: High accuracy optical flow estimation based on a theory for warping. In: Pajdla, T., Matas, J. (eds.) Computer Vision - ECCV. Springer, Berlin (2004). https://doi.org/10.1007/978-3-540-24673-23
Bruhn, A., Weickert, J., Schnörr, C.: Lucas/kanade meets horn/schunck: combining local and global optic flow methods. Int. J. Comput. Vis. 61(3), 1–21 (2005). https://doi.org/10.1023/B:VISI.0000045324.43199.43
Butler, D.J., Wulff, J., Stanley, G.B., Black, M.J.: A naturalistic open source movie for optical flow evaluation. In: Fitzgibbon, A., et al. (eds.) European Conf on Computer Vision (ECCV) Part IV LNCS 7577, pp. 611–625. Springer, Verlag (2012)
Capito, L., Ozguner, U., Redmill, K.: Optical Flow based Visual Potential Field for Autonomous Driving. In: 2020 IEEE Intelligent Vehicles Symposium (IV), pp. 885–891. (2020) https://doi.org/10.1109/IV47402.2020.9304777
Ding, N., Williams, S.: An Instruction Roofline Model for GPUs. In: 2019 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS), pp. 7–18. (2019) https://doi.org/10.1109/PMBS49563.2019.00007
Dougherty, L., Asmuth, J.C., Gefter, W.B.: Alignment of CT lung volumes with an optical flow method. Acad. Radiol. 10(3), 249–254 (2003). https://doi.org/10.1016/S1076-6332(03)80098-3
Farnebäck, G.: Two-frame motion estimation based on polynomial expansion. In: Bigun, J., Gustavsson, T. (eds.) Image Analysis Lecture Notes in Computer Science, pp. 363–370. Springer, Berlin (2003). https://doi.org/10.1007/3-540-45103-X50
Filipovič, J., Madzin, M., Fousek, J., Matyska, L.: Optimizing CUDA code by kernel fusion-application on BLAS. J. Supercomput. 71(10), 3934–3957 (2015). https://doi.org/10.1007/s11227-015-1483-z
Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the KITTI dataset. Int. J. Robot. Res. 32(11), 1231–1237 (2010). https://doi.org/10.1177/0278364913491297
Horn, B.K.P., Schunck, B.G.: Determining optical flow. Artif. Intell. 17(1), 185–203 (1993). https://doi.org/10.1016/0004-3702(81)90024-2
Jara-Wilde, J., Cerda, M., Delpiano, J., Härtel, S.: An implementation of combined local-global optical flow. Image Process. On Line 5, 139–158 (2015). https://doi.org/10.5201/ipol.2015.44
Lucas, B.D., Kanade, T.: An Iterative Image Registration Technique with an Application to Stereo Vision. In: Proceedings of the 7th International Joint Conference on Artificial Intelligence. Morgan Kaufmann. (1981) http://dl.acm.org/citation.cfm?id=1623264.1623280
McGuire, K., de Croon, G., De Wagter, C., Tuyls, K., Kappen, H.: Efficient optical flow and stereo vision for velocity estimation and obstacle avoidance on an autonomous pocket drone. IEEE Robot. Autom. Lett. 2(2), 1070–1076 (2017). https://doi.org/10.1109/LRA.2017.2658940
Moussu, C.: GPU Based Real-Time Optical Flow Computation. Imperial College London, London (2010)
Nguyen, M.T., Castonguay, P., Laurendeau, E.: GPU parallelization of multigrid RANS solver for three-dimensional aerodynamic simulations on multiblock grids. J. Supercomput. 75(5), 2562–2583 (2019). https://doi.org/10.1007/s11227-018-2653-6
Podestá, E., Castro, M., do Nascimento, B.M.: Energy efficient stencil computations on the low-power manycore MPPA-256 processor. In: Aldinucci, M., Padovani, L., Torquati, M. (eds.) Euro-Par 2018: Parallel Processing. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-96983-146
Saad, Y.: Iterative Methods for Sparse Linear Systems. SIAM, Philadelphia (2003)
Seznec, M., Gac, N., Orieux, F., Naik, A.S.: An Efficiency-Driven Approach For Real-Time Optical Flow Processing On Parallel Hardware. In: 2020 IEEE International Conference on Image Processing (ICIP), pp. 3055–3059. (2020) https://doi.org/10.1109/ICIP40778.2020.9191164
Shewchuk, J.R.: An Introduction to the Conjugate Gradient Method Without the Agonizing Pain. Carnegie-Mellon University. Department of Computer Science, Pittsburgh (1994)
Sun, D., Roth, S., Black, M.J.: A quantitative analysis of current practices in optical flow estimation and the principles behind them. Int. J. Comput. Vis. 106(2), 115–137 (2014). https://doi.org/10.1007/s11263-013-0644-x
Tabik, S., Ortega, G., Garzón, E.M.: Performance evaluation of kernel fusion BLAS routines on the GPU: iterative solvers as case study. J. Supercomput. 70(2), 577–587 (2014). https://doi.org/10.1007/s11227-014-1102-4
Williams, S.W.: Auto-Tuning Performance on Multicore Computers. EECS Department, University of California, Berkeley (2008)
Woźniakowski, H.: Roundoff-error analysis of a new class of conjugate-gradient algorithms. Linear Algebra Appl. 29, 507–529 (1980). https://doi.org/10.1016/0024-3795(80)90259-1
Zach, C., Pock, T., Bischof, H.: A duality based approach for realtime TV-L 1 optical flow. In: Hamprecht, F.A., Schnörr, C., Jähne, B. (eds.) Pattern Recognition, pp. 214–223. Springer, Berlin Heidelberg, Berlin (2007). https://doi.org/10.1007/978-3-540-74936-322
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Seznec, M., Gac, N., Orieux, F. et al. Real-time optical flow processing on embedded GPU: an hardware-aware algorithm to implementation strategy. J Real-Time Image Proc 19, 317–329 (2022). https://doi.org/10.1007/s11554-021-01187-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11554-021-01187-8