Image De-occlusion via Event-enhanced Multi-modal Fusion Hybrid Network

Li, Si-Qi; Gao, Yue; Dai, Qiong-Hai

doi:10.1007/s11633-022-1350-3

Image De-occlusion via Event-enhanced Multi-modal Fusion Hybrid Network

Research Article
Open access
Published: 08 July 2022

Volume 19, pages 307–318, (2022)
Cite this article

Download PDF

You have full access to this open access article

Machine Intelligence Research Aims and scope Submit manuscript

Image De-occlusion via Event-enhanced Multi-modal Fusion Hybrid Network

Download PDF

1259 Accesses
2 Altmetric
Explore all metrics

Abstract

Seeing through dense occlusions and reconstructing scene images is an important but challenging task. Traditional frame-based image de-occlusion methods may lead to fatal errors when facing extremely dense occlusions due to the lack of valid information available from the limited input occluded frames. Event cameras are bio-inspired vision sensors that record the brightness changes at each pixel asynchronously with high temporal resolution. However, synthesizing images solely from event streams is ill-posed since only the brightness changes are recorded in the event stream, and the initial brightness is unknown. In this paper, we propose an event-enhanced multi-modal fusion hybrid network for image de-occlusion, which uses event streams to provide complete scene information and frames to provide color and texture information. An event stream encoder based on the spiking neural network (SNN) is proposed to encode and denoise the event stream efficiently. A comparison loss is proposed to generate clearer results. Experimental results on a large-scale event-based and frame-based image de-occlusion dataset demonstrate that our proposed method achieves state-of-the-art performance.

Article PDF

Fast event-inpainting based on lightweight generative adversarial nets

Article 08 July 2021

Event-Based Fusion for Motion Deblurring with Cross-modal Attention

Motion Aware Event Representation-Driven Image Deblurring

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

V. Vaish, B. Wilburn, N. Joshi, M. Levoy. Using plane + parallax for calibrating dense camera arrays. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, Washington DC, USA, Article number 1, 2004. DOI: https://doi.org/10.1109/CVPR.2004.1315006.
Book Google Scholar
V. Vaish, M. Levoy, R. Szeliski, C. L. Zitnick, S. B. Kang. Reconstructing occluded surfaces using synthetic apertures: Stereo, focus and robust measures. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, New York, USA, pp. 2331–2338, 2006. DOI: https://doi.org/10.1109/CVPR.2006.244.
Google Scholar
D. Falanga, K. Kleber, D. Scaramuzza. Dynamic obstacle avoidance for quadrotors with event cameras. Science Robotics, vol. 5, no. 40, Article number eaaz9712, 2020. DOI: https://doi.org/10.1126/scirobotics.aaz9712.
Google Scholar
N. Joshi, S. Avidan, W. Matusik, D. J. Kriegman. Synthetic aperture tracking: Tracking through occlusions In Proceedings of the 11th International Conference on Computer Vision, IEEE, Rio de Janeiro, Brazil, 2007. DOI: https://doi.org/10.1109/ICCV.2007.4409032.
Google Scholar
T. Yang, Y. N. Zhang, X. M. Tong, X. Q. Zhang, R. Yu. A new hybrid synthetic aperture imaging model for tracking and seeing people through occlusion. IEEE Transactions on Circuits and Systems for Video Technology, vol. 23, no. 9, pp. 1461–1475, 2013. DOI: https://doi.org/10.1109/TCSVT.2013.2242553.
Article Google Scholar
Z. Pei, Y. N. Zhang, T. Yang, X. W. Zhang, Y. H. Yang. A novel multi-object detection method in complex scene using synthetic aperture imaging. Pattern Recognition, vol. 45, no. 4, pp. 1637–1658, 2012. DOI: https://doi.org/10.1016/j.patcog.2011.10.003.
Article Google Scholar
Z. L. Xiao, L. P. Si, G. Q. Zhou. Seeing beyond foreground occlusion: A joint framework for SAP-based scene depth and appearance reconstruction. IEEE Journal of Selected Topics in Signal Processing, vol. 11, no. 7, pp. 979–991, 2017. DOI: https://doi.org/10.1109/JSTSP.2017.2715012.
Article Google Scholar
Z. Pei, Y. N. Zhang, X. D. Chen, Y. H. Yang. Synthetic aperture imaging using pixel labeling via energy minimization. Pattern Recognition, vol. 46, no. 1, pp. 174–187, 2013. DOI: https://doi.org/10.1016/j.patcog.2012.06.014.
Article Google Scholar
Y. Q. Wang, T. H. Wu, J. G. Yang, L. G. Wang, W. An, Y. L. Guo. DeOccNet: Learning to see through foreground occlusions in light fields. In Proceedings of IEEE Winter Conference on Applications of Computer Vision, IEEE, Snowmass, USA, pp. 118–127, 2020. DOI: https://doi.org/10.1109/WACV45572.2020.9093448.
Google Scholar
C. Brandli, R. Berner, M. H. Yang, S. C. Liu, T. Delbruck. A 240×180 130 dB 3 µs latency global shutter spatiotemporal vision sensor. IEEE Journal of Solid-state Circuits, vol. 49, no. 10, pp. 2333–2341, 2014. DOI: https://doi.org/10.1109/JSSC.2014.2342715.
Article Google Scholar
Y. J. Li, H. Zhou, B. B. Yang, Y. Zhang, Z. P. Cui, H. J. Bao, G. F. Zhang. Graph-based asynchronous event processing for rapid object recognition. In Proceedings of IEEE/CVF International Conference on Computer Vision, IEEE, Montreal, Canada, pp. 914–923, 2021. DOI: https://doi.org/10.1109/ICCV48922.2021.00097.
Google Scholar
Y. Bi, A. Chadha, A. Abbas, E. Bourtsoulatze, Y. Andreopoulos. Graph-based object classification for neuromorphic vision sensing. In Proceedings of IEEE/CVF International Conference on Computer Vision, IEEE, Seoul, Korea, pp. 491–501, 2019. DOI: https://doi.org/10.1109/ICCV.2019.00058.
MATH Google Scholar
L. Y. Pan, C. Scheerlinck, X. Yu, R. Hartley, M. M. Liu, Y. C. Dai. Bringing a blurry frame alive at high frame-rate with an event camera. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Long Beach, USA, pp. 6813–6822, 2019. DOI: https://doi.org/10.1109/CVPR.2019.00698.
Google Scholar
H. Rebecq, R. Ranftl, V. Koltun, D. Scaramuzza. High speed and high dynamic range video with an event camera. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 6, pp. 1964–1980, 2021. DOI: https://doi.org/10.1109/TPAMI.2019.2963386.
Article Google Scholar
Z. Jiang, Y. Zhang, D. Q. Zou, J. Ren, J. C. Lv, Y. B. Liu. Learning event-based motion deblurring. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Seattle, USA, pp. 3317–3326, 2020. DOI: https://doi.org/10.1109/CVPR42600.2020.00338.
Google Scholar
S. Tulyakov, D. Gehrig, S. Georgoulis, J. Erbach, M. Gehrig, Y. Y. Li, D. Scaramuzza. Time lens: Event-based video frame interpolation. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Nashville, USA, pp. 16150–16159, 2021. DOI: https://doi.org/10.1109/CVPR46437.2021.01589.
Google Scholar
J. J. Hagenaars, F. Paredes-Vallés, G. de Croon. Self-supervised learning of event-based optical flow with spiking neural networks. In Proceedings of the 34th International Conference on Neural Information Processing Systems, pp. 7167–7179, 2021.
H. Akolkar, S. H. Ieng, R. Benosman. Real-time high speed motion prediction using fast aperture-robust event-driven visual flow. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 1, pp. 361–372, 2022. DOI: https://doi.org/10.1109/TPAMI.2020.3010468.
Google Scholar
L. Y. Pan, M. M. Liu, R. Hartley. Single image optical flow estimation with an event camera. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Seattle, USA, pp. 1669–1678, 2020. DOI: https://doi.org/10.1109/CVPR42600.2020.00174.
H. Kim, S. Leutenegger, A. J. Davison. Real-time 3D reconstruction and 6-DoF tracking with an event camera. In Proceedings of the 14th European Conference on Computer Vision, Springer, Amsterdam, The Netherlands, pp. 349–364, 2016. DOI: https://doi.org/10.1007/978-3-319-46466-4_21.
Google Scholar
G. Gallego, H. Rebecq, D. Scaramuzza. A unifying contrast maximization framework for event cameras, with applications to motion, depth, and optical flow estimation. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Salt Lake City, USA, pp. 3867–3876, 2018. DOI: https://doi.org/10.1109/CVPR.2018.00407.
X. Zhang, W. Liao, L. Yu, W. Yang, G. S. Xia. Event-based synthetic aperture imaging with a hybrid network. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Nashville, USA, pp. 14230–14239, 2021. DOI: https://doi.org/10.1109/CVPR46437.2021.01401.
Google Scholar
S. Q. Li, Y. T. Feng, Y. P. Li, Y. Jiang, C. Q. Zou, Y. Gao. Event stream super-resolution via spatiotemporal constraint learning. In Proceedings of IEEE/CVF International Conference on Computer Vision, IEEE, Montreal, Canada, pp. 4460–4469, 2021. DOI: https://doi.org/10.1109/ICCV48922.2021.00444.
Google Scholar
J. Wang, Y. H. Zhou, H. F. Sima, Z. Q. Huo, A. Z. Mi. Image inpainting based on structural tensor edge intensity model. International Journal of Automation and Computing, vol. 18, no. 2, pp. 256–265, 2021. DOI: https://doi.org/10.1007/s11633-020-1256-x.
Article Google Scholar
E. M. Izhikevich. Simple model of spiking neurons. IEEE Transactions on Neural Networks, vol. 14, no. 6, pp. 1569–1572, 2003. DOI: https://doi.org/10.1109/TNN.2003.820440.
Article MathSciNet Google Scholar
A. L. Hodgkin, A. F. Huxley. A quantitative description of membrane current and its application to conduction and excitation in nerve. The Journal of Physiology, vol. 117, no. 4, pp. 500–544, 1952. DOI: https://doi.org/10.1113/jphysiol.1952.sp004764.
Article Google Scholar
W. Gerstner. Time structure of the activity in neural network models. Physical Review E, vol. 51, no. 1, pp. 738–758, 1995. DOI: https://doi.org/10.1103/PhysRevE.51.738.
Article MathSciNet Google Scholar
B. Yang, G. Bender, Q. V. Le, J. Ngiam. CondConv: Conditionally parameterized convolutions for efficient inference. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, Canada, Article number 117, 2019.
C. Z. Wu, J. Sun, J. Wang, L. F. Xu, S. Zhan. Encoding-decoding network with pyramid self-attention module for retinal vessel segmentation. International Journal of Automation and Computing, vol. 18, no. 6, pp. 973–980, 2021. DOI: https://doi.org/10.1007/s11633-020-1277-0.
Article Google Scholar
L. C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, A. L. Yuille. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 4, pp. 834–848, 2018. DOI: https://doi.org/10.1109/TPAMI.2017.2699184.
Article Google Scholar
R. Zhang, P. Isola, A. A. Efros, E. Shechtman, O. Wang. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Salt Lake City, USA, pp. 586–595, 2018. DOI: https://doi.org/10.1109/CVPR.2018.00068.
Google Scholar
K. Simonyan, A. Zisserman. Very deep convolutional networks for large-scale image recognition. In Proceedings of the 3rd International Conference on Learning Representations, San Diego, USA, 2015.
J. Deng, W. Dong, R. Socher, L. J. Li, K. Li, L. Fei-Fei. ImageNet: A large-scale hierarchical image database. In IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Miami, USA, pp. 248–255, 2009. DOI: https://doi.org/10.1109/CVPR.2009.5206848.
Google Scholar
J. Johnson, A. Alahi, L. Fei-Fei. Perceptual losses for realtime style transfer and super-resolution. In Proceedings of the 14th European Conference on Computer Vision, Springer, Amsterdam, The Netherlands, pp. 694–711, 2016. DOI: https://doi.org/10.1007/978-3-319-46475-6_43.
Google Scholar
S. B. Shrestha, G. Orchard. SLAYER: Spike layer error reassignment in time. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, Montreal, Canada, pp. 1419–1428, 2018.
D. P. Kingma, J. Ba. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations, San Diego, USA, 2015.
I. Loshchilov, F. Hutter. SGDR: Stochastic gradient descent with warm restarts. In Proceedings of the 5th International Conference on Learning Representations, Toulon, France, 2017.
A. Z. Zhu, L. Z. Yuan, K. Chaney, K. Daniilidis. Unsupervised event-based learning of optical flow, depth, and ego-motion. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Long Beach, USA, pp. 989–997, 2019. DOI: https://doi.org/10.1109/CVPR.2019.00108.
Google Scholar

Download references

Acknowledgements

This work was supported by National Natural Science Funds of China (Nos. 62088102 and 62021002), and Beijing Natural Science Foundation, China (No. 4222025).

Author information

Authors and Affiliations

Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing, 100084, China
Si-Qi Li, Yue Gao & Qiong-Hai Dai
Institute for Brain and Cognitive Sciences, Tsinghua University, Beijing, 100084, China
Si-Qi Li, Yue Gao & Qiong-Hai Dai
Beijing Laboratory of Brain and Cognitive Intelligence, Beijing Municipal Education Commission, Tsinghua University, Beijing, 100084, China
Si-Qi Li, Yue Gao & Qiong-Hai Dai
Key Laboratory for Information System Security, School of Software, Tsinghua University, Beijing, 100084, China
Si-Qi Li & Yue Gao
Department of Automation, Tsinghua University, Beijing, 100084, China
Qiong-Hai Dai

Authors

Si-Qi Li
View author publications
You can also search for this author in PubMed Google Scholar
Yue Gao
View author publications
You can also search for this author in PubMed Google Scholar
Qiong-Hai Dai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Si-Qi Li.

Additional information

Colored figures are available in the online version at https://link.springer.com/journal/11633

Si-Qi Li received the B. Eng. degree in automation from School of Automation Science and Electrical Engineering, Beihang University, China in 2019. He is currently a Ph.D. degree candidate at School of Software, Tsinghua University, China.

His research interests include computer vision and machine learning.

Yue Gao received the B. Sc. degree in electronic and information engineering from Harbin Institute of Technology, China in 2005, and the M. Eng. degree in software engineering and the Ph. D. degree in control science and engineering from Tsinghua University, China in 2008 and 2012, respectively. He is an associate professor with School of Software, Tsinghua University, China.

His research interests include artificial intelligence and computer vision.

Qiong-Hai Dai received the M. Sc. and Ph. D. degrees in computer science and automation from Northeastern University, China in 1994 and 1996, respectively. He is currently a professor with Department of Automation and the Director of Institute for Brain and Cognitive Sciences, Tsinghua University, China. He has authored or coauthored over 200 conference articles and journal articles and two books.

His research interests include computational photography and microscopy, computer vision and graphics, and intelligent signal processing.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Li, SQ., Gao, Y. & Dai, QH. Image De-occlusion via Event-enhanced Multi-modal Fusion Hybrid Network. Mach. Intell. Res. 19, 307–318 (2022). https://doi.org/10.1007/s11633-022-1350-3

Download citation

Received: 04 May 2022
Accepted: 14 June 2022
Published: 08 July 2022
Issue Date: August 2022
DOI: https://doi.org/10.1007/s11633-022-1350-3

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Image De-occlusion via Event-enhanced Multi-modal Fusion Hybrid Network

Abstract

Article PDF

Similar content being viewed by others

Fast event-inpainting based on lightweight generative adversarial nets

Event-Based Fusion for Motion Deblurring with Cross-modal Attention

Motion Aware Event Representation-Driven Image Deblurring

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Image De-occlusion via Event-enhanced Multi-modal Fusion Hybrid Network

Abstract

Article PDF

Similar content being viewed by others

Fast event-inpainting based on lightweight generative adversarial nets

Event-Based Fusion for Motion Deblurring with Cross-modal Attention

Motion Aware Event Representation-Driven Image Deblurring

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation