TimeLens-XL: Real-Time Event-Based Video Frame Interpolation with Large Motion | SpringerLink
Skip to main content

TimeLens-XL: Real-Time Event-Based Video Frame Interpolation with Large Motion

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15142))

Included in the following conference series:

  • 87 Accesses

Abstract

Video Frame Interpolation (VFI) aims to predict intermediate frames between consecutive low frame rate inputs. To handle the real-world complex motion between frames, event cameras, which capture high-frequency brightness changes at micro-second temporal resolution, are used to aid interpolation, denoted as Event-VFI. One critical step of Event-VFI is optical flow estimation. Prior methods that adopt either a two-segment formulation or a parametric trajectory model cannot correctly recover large and complex motions between frames, which suffer from accumulated error in flow estimation. To solve this problem, we propose TimeLens-XL, a physically grounded lightweight network that decomposes large motion between two frames into a sequence of small motions for better accuracy. It estimates the entire motion trajectory recursively and samples the bi-directional flow for VFI. Benefiting from the accurate and robust flow prediction, intermediate frames can be efficiently synthesized with simple warping and blending. As a result, the network is extremely lightweight, with only 1/5\(\sim \)1/10 computational cost and model size of prior works, while also achieving state-of-the-art performance on several challenging benchmarks. To our knowledge, TimeLens-XL is the first real-time (27FPS) Event-VFI algorithm at a resolution of \(1280\times 720\) using a single RTX 3090 GPU. Furthermore, we have collected a new RGB+Event dataset (HQ-EVFI) consisting of more than 100 challenging scenes with large complex motions and accurately synchronized high-quality RGB-EVS streams. HQ-EVFI addresses several limitations presented in prior datasets and can serve as a new benchmark. Please visit our project website at https://openimaginglab.github.io/TimeLens-XL/ for the code and dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 8465
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 10581
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Becuase TimeLens++ [32] has not released any code or weights, we may not directly compare with it. To this end, we estimate computational cost and runtime based on TimeLens [33], which is marked with asterisks (*).

References

  1. Anderson, R., et al.: Jump: virtual reality video. ACM Trans. Graph. 35(6), 1–13 (2016)

    Article  Google Scholar 

  2. Barron, J.T.: A general and adaptive robust loss function. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4331–4339 (2019)

    Google Scholar 

  3. Brandli, C., Berner, R., Yang, M., Liu, S.C., Delbruck, T.: A \(240\times 180\) 130 DB 3 us latency global shutter spatiotemporal vision sensor. IEEE J. Solid-State Circuits 49(10), 2333–2341 (2014)

    Article  Google Scholar 

  4. Cheng, X., Chen, Z.: Video frame interpolation via deformable separable convolution, vol. 34, pp. 10607–10614 (2020)

    Google Scholar 

  5. Ding, K., Ma, K., Wang, S., Simoncelli, E.P.: Image quality assessment: unifying structure and texture similarity. IEEE Trans. Pattern Anal. Mach. Intell. (T-PAMI) 44(5), 2567–2581 (2020)

    Google Scholar 

  6. He, W., et al.: Timereplayer: unlocking the potential of event cameras for video interpolation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 17804–17813 (2022)

    Google Scholar 

  7. Hu, P., Niklaus, S., Sclaroff, S., Saenko, K.: Many-to-many splatting for efficient video frame interpolation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3553–3562 (2022)

    Google Scholar 

  8. Hu, Y., Liu, S.C., Delbruck, T.: v2e: From video frames to realistic DVS events. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1312–1321 (2021)

    Google Scholar 

  9. Huang, Z., Zhang, T., Heng, W., Shi, B., Zhou, S.: Real-Time Intermediate Flow Estimation for Video Frame Interpolation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision - ECCV 2022, ECCV 2022, LNCS, vol. 13674, pp 624–642. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19781-9_36

  10. Jiang, H., Sun, D., Jampani, V., Yang, M.H., Learned-Miller, E., Kautz, J.: Super SloMo: high quality estimation of multiple intermediate frames for video interpolation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9000–9008 (2018)

    Google Scholar 

  11. Jiang, Z., Zhang, Y., Zou, D., Ren, J., Lv, J., Liu, Y.: Learning event-based motion deblurring. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3320–3329 (2020)

    Google Scholar 

  12. Kim, T., Chae, Y., Jang, H.K., Yoon, K.J.: Event-based video frame interpolation with cross-modal asymmetric bidirectional motion fields. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 18032–18042 (2023)

    Google Scholar 

  13. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  14. Kodama, K., et al.: 1.22 \(\mu \)m 35.6 Mpixel RGB hybrid event-based vision sensor with 4.88 \(\mu \)m-pitch event pixels and up to 10k event frame rate by adaptive control on event sparsity. In: IEEE International Solid-State Circuits Conference (ISSCC), pp. 92–94. IEEE (2023)

    Google Scholar 

  15. Lee, H., Kim, T., Chung, T.y., Pak, D., Ban, Y., Lee, S.: AdaCoF: adaptive collaboration of flows for video frame interpolation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5316–5325 (2020)

    Google Scholar 

  16. Lee, S., Choi, N., Choi, W.I.: Enhanced correlation matching based video frame interpolation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2839–2847 (2022)

    Google Scholar 

  17. Li, Z., Zhu, Z.L., Han, L.H., Hou, Q., Guo, C.L., Cheng, M.M.: Amt: all-pairs multi-field transforms for efficient frame interpolation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9801–9810 (2023)

    Google Scholar 

  18. Lin, S., et al.: Learning event-driven video deblurring and interpolation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12353, pp. 695–710. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58598-3_41

    Chapter  Google Scholar 

  19. Liu, Y., Xie, L., Siyao, L., Sun, W., Qiao, Yu., Dong, C.: Enhanced quadratic video interpolation. In: Bartoli, A., Fusiello, A. (eds.) ECCV 2020. LNCS, vol. 12538, pp. 41–56. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-66823-5_3

    Chapter  Google Scholar 

  20. Liu, Y.L., Liao, Y.T., Lin, Y.Y., Chuang, Y.Y.: Deep video frame interpolation using cyclic frame generation. vol. 33, pp. 8794–8802 (2019)

    Google Scholar 

  21. Liu, Z., Yeh, R.A., Tang, X., Liu, Y., Agarwala, A.: Video frame synthesis using deep voxel flow. In: IEEE International Conference on Computer Vision (ICCV), pp. 4463–4471 (2017)

    Google Scholar 

  22. Meyer, S., Wang, O., Zimmer, H., Grosse, M., Sorkine-Hornung, A.: Phase-based frame interpolation for video. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1410–1418 (2015)

    Google Scholar 

  23. Nah, S., Hyun Kim, T., Mu Lee, K.: Deep multi-scale convolutional neural network for dynamic scene deblurring. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3883–3891 (2017)

    Google Scholar 

  24. Niklaus, S., Liu, F.: Softmax splatting for video frame interpolation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5437–5446 (2020)

    Google Scholar 

  25. Niklaus, S., Mai, L., Liu, F.: Video frame interpolation via adaptive separable convolution. In: IEEE International Conference on Computer Vision (ICCV), pp. 261–270 (2017)

    Google Scholar 

  26. Park, J., Ko, K., Lee, C., Kim, C.-S.: BMBC: bilateral motion estimation with bilateral cost volume for video interpolation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12359, pp. 109–125. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58568-6_7

    Chapter  Google Scholar 

  27. Reda, F., Kontkanen, J., Tabellion, E., Sun, D., Pantofaru, C., Curless, B.: Frame interpolation for large motion. Cornell University, arXiv, https://doi.org/10.48550/arXiv2202 (2022)

  28. Serrano-Gotarredona, T., Linares-Barranco, B.: A \(128\times 128\) 1.5% contrast sensitivity 0.9% FPN 3us latency 4 mw asynchronous frame-free dynamic vision sensor using transimpedance preamplifiers. IEEE J. Solid-State Circ. 48(3), 827–838 (2013)

    Google Scholar 

  29. Su, S., Delbracio, M., Wang, J., Sapiro, G., Heidrich, W., Wang, O.: Deep video deblurring for hand-held cameras. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1279–1288 (2017)

    Google Scholar 

  30. Sun, L., et al.: Event-based fusion for motion deblurring with cross-modal attention. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision - ECCV 2022, ECCV 2022, LNCS, vol. 13678, pp. 412–428. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19797-0_24

  31. Sun, L., et al.: Event-based frame interpolation with ad-hoc deblurring. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 18043–18052 (2023)

    Google Scholar 

  32. Tulyakov, S., Bochicchio, A., Gehrig, D., Georgoulis, S., Li, Y., Scaramuzza, D.: Time lens++: event-based frame interpolation with parametric non-linear flow and multi-scale fusion. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 17755–17764 (2022)

    Google Scholar 

  33. Tulyakov, S., et al.: Time lens: event-based video frame interpolation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 16155–16164 (2021)

    Google Scholar 

  34. Wu, C.Y., Singhal, N., Krahenbuhl, P.: Video compression through image interpolation. In: European Conference on Computer Vision (ECCV), pp. 416–431 (2018)

    Google Scholar 

  35. Xiang, X., Tian, Y., Zhang, Y., Fu, Y., Allebach, J.P., Xu, C.: Zooming slow-MO: fast and accurate one-stage space-time video super-resolution. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3370–3379 (2020)

    Google Scholar 

  36. Xu, X., Siyao, L., Sun, W., Yin, Q., Yang, M.H.: Quadratic video interpolation. In: Advances in Neural Information Processing Systems (NeurIPS), vol. 32 (2019)

    Google Scholar 

  37. Zhang, G., Zhu, Y., Wang, H., Chen, Y., Wu, G., Wang, L.: Extracting motion and appearance via inter-frame attention for efficient video frame interpolation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5682–5692 (2023)

    Google Scholar 

  38. Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 586–595 (2018)

    Google Scholar 

Download references

Acknowledgements

This work is partially supported by the National Key R&D Program of China (NO.2022ZD0160201) and CUHK Direct Grants (RCFUS) No. 4055189. This work was done at Yongrui Ma and Yutian Chen’s internship at the Shanghai Artificial Intelligence Laboratory.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yongrui Ma .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 11516 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ma, Y., Guo, S., Chen, Y., Xue, T., Gu, J. (2025). TimeLens-XL: Real-Time Event-Based Video Frame Interpolation with Large Motion. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15142. Springer, Cham. https://doi.org/10.1007/978-3-031-72907-2_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-72907-2_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-72906-5

  • Online ISBN: 978-3-031-72907-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics