Abstract
Video Frame Interpolation (VFI) aims to predict intermediate frames between consecutive low frame rate inputs. To handle the real-world complex motion between frames, event cameras, which capture high-frequency brightness changes at micro-second temporal resolution, are used to aid interpolation, denoted as Event-VFI. One critical step of Event-VFI is optical flow estimation. Prior methods that adopt either a two-segment formulation or a parametric trajectory model cannot correctly recover large and complex motions between frames, which suffer from accumulated error in flow estimation. To solve this problem, we propose TimeLens-XL, a physically grounded lightweight network that decomposes large motion between two frames into a sequence of small motions for better accuracy. It estimates the entire motion trajectory recursively and samples the bi-directional flow for VFI. Benefiting from the accurate and robust flow prediction, intermediate frames can be efficiently synthesized with simple warping and blending. As a result, the network is extremely lightweight, with only 1/5\(\sim \)1/10 computational cost and model size of prior works, while also achieving state-of-the-art performance on several challenging benchmarks. To our knowledge, TimeLens-XL is the first real-time (27FPS) Event-VFI algorithm at a resolution of \(1280\times 720\) using a single RTX 3090 GPU. Furthermore, we have collected a new RGB+Event dataset (HQ-EVFI) consisting of more than 100 challenging scenes with large complex motions and accurately synchronized high-quality RGB-EVS streams. HQ-EVFI addresses several limitations presented in prior datasets and can serve as a new benchmark. Please visit our project website at https://openimaginglab.github.io/TimeLens-XL/ for the code and dataset.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Anderson, R., et al.: Jump: virtual reality video. ACM Trans. Graph. 35(6), 1–13 (2016)
Barron, J.T.: A general and adaptive robust loss function. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4331–4339 (2019)
Brandli, C., Berner, R., Yang, M., Liu, S.C., Delbruck, T.: A \(240\times 180\) 130 DB 3 us latency global shutter spatiotemporal vision sensor. IEEE J. Solid-State Circuits 49(10), 2333–2341 (2014)
Cheng, X., Chen, Z.: Video frame interpolation via deformable separable convolution, vol. 34, pp. 10607–10614 (2020)
Ding, K., Ma, K., Wang, S., Simoncelli, E.P.: Image quality assessment: unifying structure and texture similarity. IEEE Trans. Pattern Anal. Mach. Intell. (T-PAMI) 44(5), 2567–2581 (2020)
He, W., et al.: Timereplayer: unlocking the potential of event cameras for video interpolation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 17804–17813 (2022)
Hu, P., Niklaus, S., Sclaroff, S., Saenko, K.: Many-to-many splatting for efficient video frame interpolation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3553–3562 (2022)
Hu, Y., Liu, S.C., Delbruck, T.: v2e: From video frames to realistic DVS events. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1312–1321 (2021)
Huang, Z., Zhang, T., Heng, W., Shi, B., Zhou, S.: Real-Time Intermediate Flow Estimation for Video Frame Interpolation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision - ECCV 2022, ECCV 2022, LNCS, vol. 13674, pp 624–642. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19781-9_36
Jiang, H., Sun, D., Jampani, V., Yang, M.H., Learned-Miller, E., Kautz, J.: Super SloMo: high quality estimation of multiple intermediate frames for video interpolation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9000–9008 (2018)
Jiang, Z., Zhang, Y., Zou, D., Ren, J., Lv, J., Liu, Y.: Learning event-based motion deblurring. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3320–3329 (2020)
Kim, T., Chae, Y., Jang, H.K., Yoon, K.J.: Event-based video frame interpolation with cross-modal asymmetric bidirectional motion fields. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 18032–18042 (2023)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Kodama, K., et al.: 1.22 \(\mu \)m 35.6 Mpixel RGB hybrid event-based vision sensor with 4.88 \(\mu \)m-pitch event pixels and up to 10k event frame rate by adaptive control on event sparsity. In: IEEE International Solid-State Circuits Conference (ISSCC), pp. 92–94. IEEE (2023)
Lee, H., Kim, T., Chung, T.y., Pak, D., Ban, Y., Lee, S.: AdaCoF: adaptive collaboration of flows for video frame interpolation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5316–5325 (2020)
Lee, S., Choi, N., Choi, W.I.: Enhanced correlation matching based video frame interpolation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2839–2847 (2022)
Li, Z., Zhu, Z.L., Han, L.H., Hou, Q., Guo, C.L., Cheng, M.M.: Amt: all-pairs multi-field transforms for efficient frame interpolation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9801–9810 (2023)
Lin, S., et al.: Learning event-driven video deblurring and interpolation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12353, pp. 695–710. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58598-3_41
Liu, Y., Xie, L., Siyao, L., Sun, W., Qiao, Yu., Dong, C.: Enhanced quadratic video interpolation. In: Bartoli, A., Fusiello, A. (eds.) ECCV 2020. LNCS, vol. 12538, pp. 41–56. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-66823-5_3
Liu, Y.L., Liao, Y.T., Lin, Y.Y., Chuang, Y.Y.: Deep video frame interpolation using cyclic frame generation. vol. 33, pp. 8794–8802 (2019)
Liu, Z., Yeh, R.A., Tang, X., Liu, Y., Agarwala, A.: Video frame synthesis using deep voxel flow. In: IEEE International Conference on Computer Vision (ICCV), pp. 4463–4471 (2017)
Meyer, S., Wang, O., Zimmer, H., Grosse, M., Sorkine-Hornung, A.: Phase-based frame interpolation for video. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1410–1418 (2015)
Nah, S., Hyun Kim, T., Mu Lee, K.: Deep multi-scale convolutional neural network for dynamic scene deblurring. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3883–3891 (2017)
Niklaus, S., Liu, F.: Softmax splatting for video frame interpolation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5437–5446 (2020)
Niklaus, S., Mai, L., Liu, F.: Video frame interpolation via adaptive separable convolution. In: IEEE International Conference on Computer Vision (ICCV), pp. 261–270 (2017)
Park, J., Ko, K., Lee, C., Kim, C.-S.: BMBC: bilateral motion estimation with bilateral cost volume for video interpolation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12359, pp. 109–125. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58568-6_7
Reda, F., Kontkanen, J., Tabellion, E., Sun, D., Pantofaru, C., Curless, B.: Frame interpolation for large motion. Cornell University, arXiv, https://doi.org/10.48550/arXiv2202 (2022)
Serrano-Gotarredona, T., Linares-Barranco, B.: A \(128\times 128\) 1.5% contrast sensitivity 0.9% FPN 3us latency 4 mw asynchronous frame-free dynamic vision sensor using transimpedance preamplifiers. IEEE J. Solid-State Circ. 48(3), 827–838 (2013)
Su, S., Delbracio, M., Wang, J., Sapiro, G., Heidrich, W., Wang, O.: Deep video deblurring for hand-held cameras. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1279–1288 (2017)
Sun, L., et al.: Event-based fusion for motion deblurring with cross-modal attention. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision - ECCV 2022, ECCV 2022, LNCS, vol. 13678, pp. 412–428. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19797-0_24
Sun, L., et al.: Event-based frame interpolation with ad-hoc deblurring. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 18043–18052 (2023)
Tulyakov, S., Bochicchio, A., Gehrig, D., Georgoulis, S., Li, Y., Scaramuzza, D.: Time lens++: event-based frame interpolation with parametric non-linear flow and multi-scale fusion. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 17755–17764 (2022)
Tulyakov, S., et al.: Time lens: event-based video frame interpolation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 16155–16164 (2021)
Wu, C.Y., Singhal, N., Krahenbuhl, P.: Video compression through image interpolation. In: European Conference on Computer Vision (ECCV), pp. 416–431 (2018)
Xiang, X., Tian, Y., Zhang, Y., Fu, Y., Allebach, J.P., Xu, C.: Zooming slow-MO: fast and accurate one-stage space-time video super-resolution. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3370–3379 (2020)
Xu, X., Siyao, L., Sun, W., Yin, Q., Yang, M.H.: Quadratic video interpolation. In: Advances in Neural Information Processing Systems (NeurIPS), vol. 32 (2019)
Zhang, G., Zhu, Y., Wang, H., Chen, Y., Wu, G., Wang, L.: Extracting motion and appearance via inter-frame attention for efficient video frame interpolation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5682–5692 (2023)
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 586–595 (2018)
Acknowledgements
This work is partially supported by the National Key R&D Program of China (NO.2022ZD0160201) and CUHK Direct Grants (RCFUS) No. 4055189. This work was done at Yongrui Ma and Yutian Chen’s internship at the Shanghai Artificial Intelligence Laboratory.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Ma, Y., Guo, S., Chen, Y., Xue, T., Gu, J. (2025). TimeLens-XL: Real-Time Event-Based Video Frame Interpolation with Large Motion. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15142. Springer, Cham. https://doi.org/10.1007/978-3-031-72907-2_11
Download citation
DOI: https://doi.org/10.1007/978-3-031-72907-2_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72906-5
Online ISBN: 978-3-031-72907-2
eBook Packages: Computer ScienceComputer Science (R0)