Abstract
Natural image matting aims to estimate the alpha matte of the foreground from a given image. Various approaches have been explored to address this problem, such as interactive matting methods that use guidance such as click or trimap, and automatic matting methods tailored to specific objects. However, existing matting methods are designed for specific objects or guidance, neglecting the common requirement of aggregating global and local contexts in image matting. As a result, these methods often encounter challenges in accurately identifying the foreground and generating precise boundaries, which limits their effectiveness in unforeseen scenarios. In this paper, we propose a simple and universal matting framework, named Dual-Context Aggregation Matting (DCAM), which enables robust image matting with arbitrary guidance or without guidance. Specifically, DCAM first adopts a semantic backbone network to extract low-level features and context features from the input image and guidance. Then, we introduce a dual-context aggregation network that incorporates global object aggregators and local appearance aggregators to iteratively refine the extracted context features. By performing both global contour segmentation and local boundary refinement, DCAM exhibits robustness to diverse types of guidance and objects. Finally, we adopt a matting decoder network to fuse the low-level features and the refined context features for alpha matte estimation. Experimental results on five matting datasets demonstrate that the proposed DCAM outperforms state-of-the-art matting methods in both automatic matting and interactive matting tasks, which highlights the strong universality and high performance of DCAM.
Similar content being viewed by others
Data Availability
The source code and model will be made available upon reasonable request for academic use and within the limitations of the provided informed consent by the corresponding author upon acceptance.
References
Chen T, Cheng MM, Tan P, Shamir A, Hu SM (2009) Sketch2Photo: Internet Image Montage. In: SIGGRAPH ASIA
Chen Y, Guan J, Cham WK (2018) Robust multi-focus image fusion using edge model and multi-matting. TIP 27:1526–1541
Gastal ESL, Oliveira MM (2010) Shared sampling for real-time alpha matting. Comput Graph Forum 29(2):575–584
Gong M, Qian Y, Cheng L (2015) Integrated foreground segmentation and boundary matting for live videos. TIP 24(4):1356–1370
Lin S, Ryabtsev A, Sengupta S, Curless BL, Seitz SM, Kemelmacher-Shlizerman I (2021) Real-time high-resolution background matting. In: CVPR, pp 8762–8771
Zongker DE, Werner DM, Curless B, Salesin DH (1999) Environment matting and compositing. In: ACM SIGGRAPH, pp 205–214
Li J, Zhang J, Maybank SJ, Tao D (2022) Bridging composite and real: towards end-to-end deep image matting. Int J Comp Vision
Berman A, Dadourian A, Vlahos P (1998) Method for removing from an image the background surrounding a selected object
Ruzon MA, Tomasi C (2000) Alpha estimation in natural images. In: CVPR
Wang J, Cohen MF (2007) Optimized color sampling for robust matting. In: CVPR
He K, Rhemann C, Rother C, Tang X, Sun J (2011) A global sampling method for alpha matting. In: CVPR
Shahrian E, Rajan D, Price B, Cohen S (2013) Improving image matting using comprehensive sampling sets. In: CVPR
Chen X, He F, Yu H (2019) A matting method based on full feature coverage. Multimed Tool Appl 78:11173–11201
Chuang Y-Y, Curless B, Salesin DH, Szeliski R (2001) A bayesian approach to digital matting. In: CVPR
Sun J, Jia J, Tang C-K, Shum H-Y (2004) Poisson matting. In: SIGGRAPH
Grady L, Westermann R (2005) Random walks for interactive alpha-matting. In: VIIP
Levin A, Lischinski D, Weiss Y (2008) A closed-form solution to natural image matting. TPAMI 30(2):228–242
Levin A, Rav-Acha A, Lischinski D (2008) Spectral matting. TPAMI 30(10):1699–1712
He K, Sun J, Tang X (2010) Fast matting using large kernel matting laplacian matrices. In: CVPR
Chen Q, Li D, Tang C-K (2013) KNN matting. TPAMI 35(9):2175–2188
Li D, Chen Q, Tang C-K (2013) Motion-aware KNN laplacian for video matting. In: ICCV
Aksoy Y, Aydin TO, Pollefeys M (2017) Designing effective inter-pixel information flow for natural image matting. In: CVPR
Xu N, Price B, Cohen S, Huang T (2017) Deep image matting. In: CVPR
Tang J, Aksoy Y, Oztireli C, Gross M, Aydin TO (2019) Learning-based sampling for natural image matting. In: CVPR
Cai S, Zhang X, Fan H, Huang H, Liu J, Liu J, Liu J, Wang J, Sun J (2019) Disentangled image matting. In: ICCV
Lu H, Dai Y, Shen C, Xu S (2019) Indices matter: learning to index for deep image matting. In: ICCV
Li Y, Lu H (2020) Natural image matting via guided contextual attention. In: AAAI
Forte M, Pitié F (2020) F, B, Alpha matting. arXiv preprint arXiv:2003.07711
Hou Q, Liu F (2020) Context-aware image matting for simultaneous foreground and alpha estimation. In: ICCV
Yu H, Xu N, Huang Z, Zhou Y, Shi H (2021) High-resolution deep image matting. In: AAAI
Yu Q, Zhang J, Zhang H, Wang Y, Lin Z, Xu N, Bai Y, Yuille A (2021) Mask guided matting via progressive refinement network. In: CVPR
Wang R, Xie J, Han J, Qi D (2021) Improving deep image matting via local smoothness assumption. arXiv preprint arXiv:2112.13809
Liu Y, Xie J, Shi X, Qiao Y, Huang Y, Tang Y, Yang X (2021) Tripartite information mining and integration for image matting. In: ICCV
Sun Y, Tang C-K, Tai Y-W (2021) Semantic image matting. In: CVPR
Dai Y, Price B, Zhang H, Shen C (2022) Boosting robustness of image matting with context assembling and strong data augmentation. In: CVPR
Park G, Son S, Yoo J, Kim S, Kwak N (2022) MatteFormer: transformer-based image matting via prior-tokens. In: CVPR
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: ICCV
Chen Q, Ge T, Xu Y, Zhang Z, Yang X, Gai K (2018) Semantic human matting. In: ACM MM
Zhang Y, Gong L, Fan L, Ren P, Xu W (2019) A late fusion cnn for digital matting. In: CVPR
Qiao Y, Liu Y, Yang X, Zhou D, Xu M, Zhang Q, Wei X (2020) Attention-guided hierarchical structure aggregation for image matting. In: CVPR
Ke Z, Sun J, Li K, Yan Q, Lau RWH (2022) Modnet: Real-time trimap-free portrait matting via objective decomposition. In: AAAI
Yu Z, Li X, Huang H, Zheng W, Chen L (2021) Cascade image matting with deformable graph refinement. In: ICCV
Li J, Ma S, Zhang J, Tao D (2021) Privacy-preserving portrait matting. In: ACM MM. MM ’21, pp 3501–3509
Liu J, Yao Y, Hou W, Cui M, Xie X, Zhang C, Hua X-s (2020) Boosting semantic human matting with coarse annotations. In: CVPR
Srivastava A, Raghu S, Thyagarajan AK, Vaidyaraman J, Kothandaraman M, Sudheendra P, Goel A (2022) Alpha matting for portraits using encoder-decoder models. Multimed Tool Appl 81(10):14517–14528
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: CVPR
Fu J, Liu J, Jiang J, Li Y, Bao Y, Lu H (2020) Scene segmentation with dual relation-aware attention network. TPAMI
Yuan Y, Chen X, Wang J (2020) Object-contextual representations for semantic segmentation. In: ECCV
Wang J, Sun K, Cheng T, Jiang B, Deng C, Zhao Y, Liu D, Mu Y, Tan M, Wang X, Liu W, Xiao B (2019) Deep high-resolution representation learning for visual recognition. TPAMI
Sun K, Xiao B, Liu D, Wang J (2019) Deep high-resolution representation learning for human pose estimation. In: CVPR
Bo D, Pichao W, Wang F (2023) Afformer: Head-free lightweight semantic segmentation with linear transformer. In: AAAI
Liu Q, Zhang S, Meng Q, Zhong B, Liu P, Yao H (2023) End-to-end human instance matting. IEEE TCSVT
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, Desmaison A, Kopf A, Yang E, DeVito Z, Raison M, Tejani A, Chilamkurthy S, Steiner B, Fang L, Bai J, Chintala S (2019) PyTorch: an imperative style, high-performance deep learning library. In: NeurIPS
He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: ICCV
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) ImageNet: A large-scale hierarchical image database. In: CVPR
Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980
Qin X, Zhang Z, Huang C, Dehghan M, Zaiane O, Jagersand M (2020) U2-net: Going deeper with nested u-structure for salient object detection. Patt Recog 106:107404
Dai Y, Lu H, Shen C (2021) Learning affinity-aware upsampling for deep image matting. In: Cvpr
Wang R, Xie J, Han J, Qi D (2022) Improving deep image matting via local smoothness assumption. In: ICME
Cai H, Xue F, Xu L, Guo L (2022) TransMatting: enhancing transparent objects matting with transformers. In: ECCV
Qiao Y, Liu Y, Wei Z, Wang Y, Cai Q, Zhang G, Yang X (2023) Hierarchical and progressive image matting. ACM TOMM 19(2)
Zhu B, Chen Y, Wang J, Liu S, Zhang B, Tang M (2017) Fast deep matting for portrait animation on mobile phone. In: ACM MM
Funding
This work is supported by the National Natural Science Foundation of China (No. 62272134).
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflicts of interest
We declare that there are no conflicts of interest regarding the publication of this research manuscript.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liu, Q., Lv, X., Yu, W. et al. Dual-context aggregation for universal image matting. Multimed Tools Appl 83, 53119–53137 (2024). https://doi.org/10.1007/s11042-023-17517-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-17517-w