Abstract
Image Manipulation Detection (IMD) is becoming increasingly important as tampering technologies advance. However, most state-of-the-art (SoTA) methods require high-quality training datasets featuring image- and pixel-level annotations. The effectiveness of these methods suffers when applied to manipulated or noisy samples that differ from the training data. To address these challenges, we present a unified framework that combines unsupervised and weakly supervised approaches for IMD. Our approach introduces a novel pre-processing stage based on a controllable fitting function from Implicit Neural Representation (INR). Additionally, we introduce a new selective pixel-level contrastive learning approach, which concentrates exclusively on high-confidence regions, thereby mitigating uncertainty associated with the absence of pixel-level labels. In weakly supervised mode, we utilize ground-truth image-level labels to guide predictions from an adaptive pooling method, facilitating comprehensive exploration of manipulation regions for image-level detection. The unsupervised model is trained using a self-distillation training method with selected high-confidence pseudo-labels obtained from the deepest layers via different sources. Extensive experiments demonstrate that our proposed method outperforms existing unsupervised and weakly supervised methods. Moreover, it competes effectively against fully supervised methods on novel manipulation detection tasks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bammey, Q., Gioi, R.G.V., Morel, J.M.: An adaptive neural network for unsupervised mosaic consistency analysis in image forensics. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14194–14204 (2020)
Bi, X., Wei, Y., Xiao, B., Li, W.: RRU-Net: the ringed residual u-net for image splicing forgery detection. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (2019)
Bondi, L., Lameri, S., Güera, D., Bestagini, P., Delp, E.J., Tubaro, S.: Tampering detection and localization through clustering of camera-based CNN features. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1855–1864. IEEE (2017)
Chen, K., Hong, L., Xu, H., Li, Z., Yeung, D.Y.: Multisiam: self-supervised multi-instance Siamese representation learning for autonomous driving. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7546–7554 (2021)
Chen, X., Dong, C., Ji, J., Cao, J., Li, X.: Image manipulation detection by multi-view multiscale supervision. In: IEEE/CVF International Conference on Computer Vision, pp. 14185–14193 (2021)
Chen, Y., Liu, S., Wang, X.: Learning continuous image representation with local implicit image function. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8628–8638 (2021)
Chen, Z., et al.: Videoinr: learning video implicit neural representation for continuous space-time super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2047–2057 (2022)
Choi, C.H., Choi, J.H., Lee, H.K.: CFA pattern identification of digital cameras using intermediate value counting. In: Proceedings of the thirteenth ACM multimedia workshop on Multimedia and Security, pp. 21–26 (2011)
Cozzolino, D., Verdoliva, L.: Noiseprint: a CNN based camera model fingerprint. IEEE Trans. Inf. Forensics Secur. 15, 144–159 (2019)
Dhariwal, P., Nichol, A.: Diffusion models beat GANs on image synthesis. Adv. Neural. Inf. Process. Syst. 34, 8780–8794 (2021)
Dong, J., Wang, W., Tan, T.: CASIA image tampering detection evaluation database (2010). http://forensics.idealtest.org
Dong, J., Wang, W., Tan, T.: CASIA image tampering detection evaluation database. In: 2013 IEEE China Summit and International Conference on Signal and Information Processing, pp. 422–426. IEEE (2013)
Dupont, E., Goliński, A., Alizadeh, M., Teh, Y.W., Doucet, A.: Coin: compression with implicit neural representations. arXiv preprint arXiv:2103.03123 (2021)
Ergen, T., Kozat, S.S.: Unsupervised anomaly detection with LSTM neural networks. IEEE Trans. Neural Networks Learn. Syst. 31(8), 3127–3141 (2019)
Ester, M., Kriegel, H.P., Sander, J., Xu, X., et al.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, vol. 96, pp. 226–231 (1996)
Feng, Y., Feng, Y., You, H., Zhao, X., Gao, Y.: MeshNet: mesh neural network for 3D shape representation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 8279–8286 (2019)
Ferrara, P., Bianchi, T., De Rosa, A., Piva, A.: Image forgery localization via fine-grained analysis of CFA artifacts. IEEE Trans. Inf. Forensics Secur. 7(5), 1566–1577 (2012)
Fridrich, J., Kodovsky, J.: Rich models for steganalysis of digital images. IEEE Trans. Inf. Forensics Secur. 7(3), 868–882 (2012)
Guan, H., et al.: MFC datasets: large-scale benchmark datasets for media forensic challenge evaluation. In: IEEE/CVF Winter Conference on Applications of Computer Vision Workshops, pp. 63–72. IEEE (2019)
Guillaro, F., Cozzolino, D., Sud, A., Dufour, N., Verdoliva, L.: Trufor: leveraging all-round clues for trustworthy image forgery detection and localization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20606–20615 (2023)
Guo, X., Liu, X., Ren, Z., Grosz, S., Masi, I., Liu, X.: Hierarchical fine-grained image forgery detection and localization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3155–3165 (2023)
He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Hsu, Y.F., Chang, S.F.: Detecting image splicing using geometry invariants and camera characteristics consistency. In: 2006 IEEE International Conference on Multimedia and Expo, pp. 549–552. IEEE (2006)
Hu, X., Zhang, Z., Jiang, Z., Chaudhuri, S., Yang, Z., Nevatia, R.: SPAN: spatial pyramid attention network for image manipulation localization. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12366, pp. 312–328. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58589-1_19
Ji, K., Chen, F., Guo, X., Xu, Y., Wang, J., Chen, J.: Uncertainty-guided learning for improving image manipulation detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 22456–22465 (2023)
Koch, G., Zemel, R., Salakhutdinov, R., et al.: Siamese neural networks for one-shot image recognition. In: ICML Deep Learning Workshop, vol. 2. Lille (2015)
Kwan, H.M., Gao, G., Zhang, F., Gower, A., Bull, D.: Hinerv: video compression with hierarchical encoding-based neural representation. In: Advances in Neural Information Processing Systems, vol. 36 (2024)
Kwon, M.J., Nam, S.H., Yu, I.J., Lee, H.K., Kim, C.: Learning jpeg compression artifacts for image manipulation detection and localization. In: International Journal of Computer Vision, pp. 1875–1895 (2022)
Li, J., Chen, Y., Xing, Y.: Memory mechanism for unsupervised anomaly detection. In: The 39th Conference on Uncertainty in Artificial Intelligence (2023)
Li, S., Xia, X., Ge, S., Liu, T.: Selective-supervised contrastive learning with noisy labels. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 316–325 (2022)
Liu, D., Yu, J.: Otsu method and k-means. In: 2009 Ninth International Conference on Hybrid Intelligent Systems, vol. 1, pp. 344–349. IEEE (2009)
Liu, X., Liu, Y., Chen, J., Liu, X.: PSCC-Net: progressive Spatio-channel correlation network for image manipulation detection and localization. IEEE Trans. Circuits Syst. Video Technol. 32(11), 7505–7517 (2022)
Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)
Lu, X., Wang, W., Ma, C., Shen, J., Shao, L., Porikli, F.: See more, know more: Unsupervised video object segmentation with co-attention siamese networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3623–3632 (2019)
Lyu, S., Pan, X., Zhang, X.: Exposing region splicing forgeries with blind local noise estimation. Int. J. Comput. Vision 110, 202–221 (2014)
Mahdian, B., Saic, S.: Using noise inconsistencies for blind image forensics. Image Vis. Comput. 27(10), 1497–1503 (2009)
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. Commun. ACM 65(1), 99–106 (2021)
Molaei, A., et al.: Implicit neural representation in medical imaging: a comparative survey. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2381–2391 (2023)
Niu, Y., Tondi, B., Zhao, Y., Ni, R., Barni, M.: Image splicing detection, localization and attribution via jpeg primary quantization matrix estimation and clustering. IEEE Trans. Inf. Forensics Secur. 16, 5397–5412 (2021)
Novozamsky, A., Mahdian, B., Saic, S.: Imd2020: a large-scale annotated dataset tailored for detecting manipulated images. In: IEEE/CVF Winter Conference on Applications of Computer Vision Workshops, pp. 71–80 (2020)
Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9(1), 62–66 (1979)
Pan, X., Zhang, X., Lyu, S.: Exposing image forgery with blind noise estimation. In: Proceedings of the thirteenth ACM Multimedia Workshop on Multimedia and Security, pp. 15–20 (2011)
Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Pathak, D., Shelhamer, E., Long, J., Darrell, T.: Fully convolutional multi-class multiple instance learning. arXiv preprint arXiv:1412.7144 (2014)
Pyatykh, S., Hesser, J., Zheng, L.: Image noise level estimation by principal component analysis. IEEE Trans. Image Process. 22(2), 687–699 (2012)
Qian, Y., Hong, X., Guo, Z., Arandjelović, O., Donovan, C.R.: Semi-supervised crowd counting with contextual modeling: facilitating holistic understanding of crowd scenes. IEEE Trans. Circuits Syst. Video Technol. (2024)
Qiao, T., Zhang, J., Xu, D., Tao, D.: MirrorGAN: learning text-to-image generation by redescription. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1505–1514 (2019)
Radenović, F., Tolias, G., Chum, O.: Fine-tuning CNN image retrieval with no human annotation. IEEE Trans. Pattern Anal. Mach. Intell. 41(7), 1655–1668 (2018)
Shi, J., Xu, N., Bui, T., Dernoncourt, F., Wen, Z., Xu, C.: A benchmark and baseline for language-driven image editing. In: Proceedings of the Asian Conference on Computer Vision (2020)
Smucny, J., Shi, G., Lesh, T.A., Carter, C.S., Davidson, I.: Data augmentation with mixup: Enhancing performance of a functional neuroimaging-based prognostic deep learning classifier in recent onset psychosis. NeuroImage: Clinical 36, 103214 (2022)
Tao, C., et al.: Siamese image modeling for self-supervised vision representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2132–2141 (2023)
Wang, J., et al.: ObjectFormer for image manipulation detection and localization. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2364–2373 (2022)
Wang, L., et al.: Learning to detect salient objects with image-level supervision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 136–145 (2017)
Wei, Y., et al.: STC: a simple to complex framework for weakly-supervised semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2314–2320 (2016)
Wen, B., Zhu, Y., Subramanian, R., Ng, T.T., Shen, X., Winkler, S.: Coverage - a novel database for copy-move forgery detection. In: IEEE International Conference on Image Processing (ICIP) (2016)
Wu, H., Chen, Y., Zhou, J.: Rethinking image forgery detection via contrastive learning and unsupervised clustering. arXiv preprint arXiv:2308.09307 (2023)
Wu, H., Zhou, J., Tian, J., Liu, J.: Robust image forgery detection over online social network shared images. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13440–13449 (2022)
Wu, Y., AbdAlmageed, W., Natarajan, P.: Mantra-Net: Manipulation tracing network for detection and localization of image forgeries with anomalous features. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9543–9552 (2019)
Xu, T., et al.: AttnGAN: fine-grained text to image generation with attentional generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1316–1324 (2018)
Yang, C., Li, H., Lin, F., Jiang, B., Zhao, H.: Constrained R-CNN: a general image manipulation detection model. In: IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE (2020)
Yang, S., Ding, M., Wu, Y., Li, Z., Zhang, J.: Implicit neural representation for cooperative low-light image enhancement. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12918–12927 (2023)
Yoon, J., Yu, S., Bansal, M.: Raccoon: remove, add, and change video content with auto-generated narratives. arXiv preprint arXiv:2405.18406 (2024)
Zhai, Y., Luan, T., Doermann, D., Yuan, J.: Towards generic image manipulation detection with weakly-supervised self-consistency learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 22390–22400 (2023)
Zhang, B., Tang, J., Niessner, M., Wonka, P.: 3dshape2vecset: a 3D shape representation for neural fields and generative diffusion models. arXiv preprint arXiv:2301.11445 (2023)
Zhang, H., et al.: Nerd: neural representation of distribution for medical image segmentation. arXiv preprint arXiv:2103.04020 (2021)
Zhang, K., Mo, L., Chen, W., Sun, H., Su, Y.: Magicbrush: a manually annotated dataset for instruction-guided image editing. In: Advances in Neural Information Processing Systems, vol. 36 (2024)
Zhang, L., Bao, C., Ma, K.: Self-distillation: towards efficient and compact neural networks. IEEE Trans. Pattern Anal. Mach. Intell. 44(8), 4388–4403 (2021)
Zhang, L., Rao, A., Agrawala, M.: Adding conditional control to text-to-image diffusion models. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3836–3847 (2023)
Zhang, W., Pang, J., Chen, K., Loy, C.C.: Dense Siamese network for dense unsupervised learning. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13690, pp. 464–480. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20056-4_27
Zhang, Z., Bui, T.D.: Attention-based selection strategy for weakly supervised object localization. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 10305–10311. IEEE (2021)
Zhang, Z., Chang, M.C.: Two-stage dual augmentation with clip for improved text-to-sketch synthesis. In: 2023 IEEE 6th International Conference on Multimedia Information Processing and Retrieval (MIPR), pp. 1–6. IEEE (2023)
Zhang, Z., Chang, M.C., Bui, T.D.: Improving class activation map for weakly supervised object localization. In: ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2624–2628. IEEE (2022)
Zhang, Z., Li, M., Chang, M.C.: A new benchmark and model for challenging image manipulation detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, pp. 7405–7413 (2024)
Acknowledgements
This work is supported by the DARPA Semantic Forensics (SemaFor) Program under contract HR001120C0123 and NSF CCSS-2348046.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, Z., Li, M., Li, X., Chang, MC., Hsieh, JW. (2025). Image Manipulation Detection with Implicit Neural Representation and Limited Supervision. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15146. Springer, Cham. https://doi.org/10.1007/978-3-031-73223-2_15
Download citation
DOI: https://doi.org/10.1007/978-3-031-73223-2_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73222-5
Online ISBN: 978-3-031-73223-2
eBook Packages: Computer ScienceComputer Science (R0)