Abstract
Due to the difficulty of collecting real paired data, most existing desmoking methods train the models by synthesizing smoke, generalizing poorly to real surgical scenarios. Although a few works have explored single-image real-world desmoking in unpaired learning manners, they still encounter challenges in handling dense smoke. In this work, we address these issues together by introducing the self-supervised surgery video desmoking (SelfSVD). On the one hand, we observe that the frame captured before the activation of high-energy devices is generally clear (named pre-smoke frame, PS frame), thus it can serve as supervision for other smoky frames, making real-world self-supervised video desmoking practically feasible. On the other hand, in order to enhance the desmoking performance, we further feed the valuable information from PS frame into models, where a masking strategy and a regularization term are presented to avoid trivial solutions. In addition, we construct a real surgery video dataset for desmoking, which covers a variety of smoky scenes. Extensive experiments on the dataset show that our SelfSVD can remove smoke more effectively and efficiently while recovering more photo-realistic details than the state-of-the-art methods. The dataset, codes, and pre-trained models are available at https://github.com/ZcsrenlongZ/SelfSVD.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Azam, M.A., Khan, K.B., Rehman, E., Khan, S.U.: Smoke removal and image enhancement of laparoscopic images by an artificial multi-exposure image fusion method. Soft. Comput. 26(16), 8003–8015 (2022)
Bhat, G., Danelljan, M., Van Gool, L., Timofte, R.: Deep burst super-resolution. In: CVPR (2021)
Bishop, C.M., Nasrabadi, N.M.: Pattern Recognition and Machine Learning, vol. 4. Springer, New York (2006)
Blau, Y., Mechrez, R., Timofte, R., Michaeli, T., Zelnik-Manor, L.: The 2018 PIRM challenge on perceptual image super-resolution. In: Proceedings of the European Conference on Computer Vision (ECCV) Workshops (2018)
Cai, B., Xu, X., Jia, K., Qing, C., Tao, D.: Dehazenet: an end-to-end system for single image haze removal. TIP (2016)
Chan, K.C., Wang, X., Yu, K., Dong, C., Loy, C.C.: Basicvsr: the search for essential components in video super-resolution and beyond. In: CVPR (2021)
Chan, K.C., Zhou, S., Xu, X., Loy, C.C.: Basicvsr++: improving video super-resolution with enhanced propagation and alignment. In: CVPR (2022)
Chen, L., Tang, W., John, N.W., Wan, T.R., Zhang, J.J.: De-smokegcn: generative cooperative networks for joint surgical smoke detection and removal. T-MI (2019)
Chen, Z., Wang, Y., Yang, Y., Liu, D.: PSD: principled synthetic-to-real dehazing guided by physical priors. In: CVPR (2021)
Choi, L.K., You, J., Bovik, A.C.: Referenceless prediction of perceptual fog density and perceptual image defogging. TIP (2015)
Dong, H., et al.: Multi-scale boosted dehazing network with dense feature fusion. In: CVPR (2020)
Donoho, D.L.: Compressed sensing. IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006)
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Dudhane, A., Zamir, S.W., Khan, S., Khan, F.S., Yang, M.H.: Burst image restoration and enhancement. In: CVPR (2022)
Engin, D., Genç, A., Kemal Ekenel, H.: Cycle-dehaze: enhanced cyclegan for single image dehazing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 825–833 (2018)
Fan, J., Guo, F., Qian, J., Li, X., Li, J., Yang, J.: Non-aligned supervision for real image dehazing. arXiv preprint arXiv:2303.04940 (2023)
Goodfellow, I., et al.: Generative adversarial nets. NeurIPS (2014)
Gu, L., Liu, P., Jiang, C., Luo, M., Xu, Q.: Virtual digital defogging technology improves laparoscopic imaging quality. Surg. Innovation 22(2), 171–176 (2015)
Guo, C.L., Yan, Q., Anwar, S., Cong, R., Ren, W., Li, C.: Image dehazing transformer with transmission-aware 3d position embedding. In: CVPR (2022)
Guo, Y., et al.: Dadfnet: dual attention and dual frequency-guided dehazing network for video-empowered intelligent transportation. arXiv preprint arXiv:2304.09588 (2023)
He, K., Sun, J., Tang, X.: Single image haze removal using dark channel prior. TPAMI (2010)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Holl, P., Koltun, V., Um, K., Thuerey, N.: phiflow: a differentiable PDE solving framework for deep learning via physical simulations. In: NeurIPS workshop, vol. 2 (2020)
Hong, T., et al.: MARS-GAN: multilevel-feature-learning attention-aware based generative adversarial network for removing surgical smoke. IEEE Trans. Med. Imaging 42(8), 2299–2312 (2023). https://doi.org/10.1109/TMI.2023.3245298
Huynh-Thu, Q., Ghanbari, M.: Scope of validity of PSNR in image/video quality assessment. Electron. Lett. (2008)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Li, B., Peng, X., Wang, Z., Xu, J., Feng, D.: Aod-net: all-in-one dehazing network. In: ICCV (2017)
Li, B., Peng, X., Wang, Z., Xu, J., Feng, D.: End-to-end united video dehazing and detection. In: AAAI (2018)
Li, B., Gou, Y., Gu, S., Liu, J.Z., Zhou, J.T., Peng, X.: You only look yourself: unsupervised and untrained single image dehazing neural network. Int. J. Comput. Vis. 129, 1754–1767 (2021)
Li, B., Gou, Y., Liu, J.Z., Zhu, H., Zhou, J.T., Peng, X.: Zero-shot image dehazing. IEEE Trans. Image Process. 29, 8457–8466 (2020)
Li, J., Li, Y., Zhuo, L., Kuang, L., Yu, T.: Usid-net: unsupervised single image dehazing network via disentangled representations. IEEE Trans. Multimedia (2022)
Li, Y., Ren, D., Shu, X., Zuo, W.: Learning single image defocus deblurring with misaligned training pairs. In: AAAI (2023)
Liang, J., Cao, J., Sun, G., Zhang, K., Van Gool, L., Timofte, R.: Swinir: image restoration using swin transformer. In: ICCV (2021)
Lin, J., et al.: A desmoking algorithm for endoscopic images based on improved u-net model. Concurrency Comput. Pract. Exp. 33(22), e6320 (2021)
Liu, Y., Wan, L., Fu, H., Qin, J., Zhu, L.: Phase-based memory network for video dehazing. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 5427–5435 (2022)
Loshchilov, I., Hutter, F.: SGDR: stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983 (2016)
Loukas, C.: Video content analysis of surgical procedures. Surg. Endosc. 32, 553–568 (2018)
Ma, L., Song, H., Zhang, X., Liao, H.: A smoke removal method based on combined data and modified u-net for endoscopic images. In: EMBC (2021)
Mao, X., Li, Q., Xie, H., Lau, R.Y., Wang, Z., Paul Smolley, S.: Least squares generative adversarial networks. In: ICCV (2017)
Mittal, A., Soundararajan, R., Bovik, A.C.: Making a “completely blind’’ image quality analyzer. IEEE Signal Process. Lett. 20(3), 209–212 (2012)
Pan, Y., Bano, S., Vasconcelos, F., Park, H., Jeong, T.T., Stoyanov, D.: Desmoke-lap: improved unpaired image-to-image translation for desmoking in laparoscopic surgery. IJCARS (2022)
Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. NeurIPS (2019)
Qiu, Y., Zhang, K., Wang, C., Luo, W., Li, H., Jin, Z.: Mb-taylorformer: multi-branch efficient transformer expanded by taylor formula for image dehazing. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12802–12813 (2023)
Ren, W., Zhang, J., Xu, X., Ma, L., Cao, X., Meng, G., Liu, W.: Deep video dehazing with semantic segmentation. TIP (2018)
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Salazar-Colores, S., Jimenez, H.M., Ortiz-Echeverri, C.J., Flores, G.: Desmoking laparoscopy surgery images using an image-to-image translation guided by an embedded dark channel. Access (2020)
Salazar-Colores, S., Alberto-Moreno, H., Ortiz-Echeverri, C.J., Flores, G.: Desmoking laparoscopy surgery images using an image-to-image translation guided by an embedded dark channel (2020)
Sengar, V., Seemakurthy, K., Gubbi, J., P, B.: Multi-task learning based approach for surgical video desmoking. In: Proceedings of the Twelfth Indian Conference on Computer Vision, Graphics and Image Processing, pp. 1–9 (2021)
Shyam, P., Yoon, K.J., Kim, K.S.: Towards domain invariant single image dehazing. In: AAAI (2021)
Su, X., Wu, Q.: Multi-stages de-smoking model based on cyclegan for surgical de-smoking. Int. J. Mach. Learn. Cybern. 1–14 (2023)
Sun, D., Yang, X., Liu, M.Y., Kautz, J.: PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume. In: ICCV (2018)
Tchaka, K., Pawar, V.M., Stoyanov, D.: Chromaticity based smoke removal in endoscopic images. In: Medical Imaging 2017: Image Processing (2017)
Venkatesh, V., Sharma, N., Srivastava, V., Singh, M.: Unsupervised smoke to desmoked laparoscopic surgery images using contrast driven cyclic-desmokegan. Comput. Biol. Med. (2020)
Wang, C., Alaya Cheikh, F., Kaaniche, M., Beghdadi, A., Elle, O.J.: Variational based smoke removal in laparoscopic images. BEO (2018)
Wang, C., Mohammed, A.K., Cheikh, F.A., Beghdadi, A., Elle, O.J.: Multiscale deep desmoking for laparoscopic surgery. In: Medical Imaging 2019: Image Processing, vol. 10949, pp. 505–513. SPIE (2019)
Wang, F., Sun, X., Li, J.: Surgical smoke removal via residual swin transformer network. IJCARS (2023)
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. TIP (2004)
Wu, R., Zhang, Z., Zhang, S., Zhang, H., Zuo, W.: RBSR: efficient and flexible recurrent network for burst super-resolution. In: PRCV (2023)
Xiao, B., Zheng, Z., Chen, X., Lv, C., Zhuang, Y., Wang, T.: Single UHD image dehazing via interpretable pyramid network (2022)
Xu, J., et al.: Video dehazing via a multi-range temporal alignment network with physical prior. In: CVPR (2023)
Yang, X., Xu, Z., Luo, J.: Towards perceptual image dehazing by physics-based disentanglement and adversarial training. In: AAAI (2018)
Yang, Y., Wang, C., Liu, R., Zhang, L., Guo, X., Tao, D.: Self-augmented unpaired image dehazing via density and depth decomposition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2037–2046 (2022)
Zhang, X., et al.: Learning to restore hazy video: a new real-world dataset and a new method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9239–9248 (2021)
Zhang, X., Chen, Q., Ng, R., Koltun, V.: Zoom to learn, learn to zoom. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3762–3770 (2019)
Zhang, Z., Wang, R., Zhang, H., Chen, Y., Zuo, W.: Self-supervised learning for real-world super-resolution from dual zoomed observations. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13678, pp. 610–627. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19797-0_35
Zhao, S., Zhang, L., Shen, Y., Zhou, Y.: Refinednet: a weakly supervised refinement framework for single image dehazing. TIP (2021)
Zheng, Q., et al.: Development and validation of a deep learning-based laparoscopic system for improving video quality. IJCARS (2023)
Zheng, Y., Zhan, J., He, S., Dong, J., Du, Y.: Curricular contrastive regularization for physics-aware single image dehazing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5785–5794 (2023)
Zhou, Y., Hu, Z., Xuan, Z., Wang, Y., Hu, X.: Synchronizing detection and removal of smoke in endoscopic images with cyclic consistency adversarial nets. IEEE/ACM Trans. Comput. Biol. Bioinform. 1–12 (2022). https://doi.org/10.1109/TCBB.2022.3204673
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: ICCV (2017)
Zhu, Q., Mai, J., Shao, L.: A fast single image haze removal algorithm using color attenuation prior. TIP (2015)
Acknowledgement
This work was supported in part by the National Natural Science Foundation of China (NSFC) under Grant No. 62371164 and No. U22B2035.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Wu, R. et al. (2025). Self-Supervised Video Desmoking for Laparoscopic Surgery. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15130. Springer, Cham. https://doi.org/10.1007/978-3-031-73220-1_18
Download citation
DOI: https://doi.org/10.1007/978-3-031-73220-1_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73219-5
Online ISBN: 978-3-031-73220-1
eBook Packages: Computer ScienceComputer Science (R0)