Image Manipulation Detection with Implicit Neural Representation and Limited Supervision | SpringerLink
Skip to main content

Image Manipulation Detection with Implicit Neural Representation and Limited Supervision

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Abstract

Image Manipulation Detection (IMD) is becoming increasingly important as tampering technologies advance. However, most state-of-the-art (SoTA) methods require high-quality training datasets featuring image- and pixel-level annotations. The effectiveness of these methods suffers when applied to manipulated or noisy samples that differ from the training data. To address these challenges, we present a unified framework that combines unsupervised and weakly supervised approaches for IMD. Our approach introduces a novel pre-processing stage based on a controllable fitting function from Implicit Neural Representation (INR). Additionally, we introduce a new selective pixel-level contrastive learning approach, which concentrates exclusively on high-confidence regions, thereby mitigating uncertainty associated with the absence of pixel-level labels. In weakly supervised mode, we utilize ground-truth image-level labels to guide predictions from an adaptive pooling method, facilitating comprehensive exploration of manipulation regions for image-level detection. The unsupervised model is trained using a self-distillation training method with selected high-confidence pseudo-labels obtained from the deepest layers via different sources. Extensive experiments demonstrate that our proposed method outperforms existing unsupervised and weakly supervised methods. Moreover, it competes effectively against fully supervised methods on novel manipulation detection tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 8465
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 10581
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bammey, Q., Gioi, R.G.V., Morel, J.M.: An adaptive neural network for unsupervised mosaic consistency analysis in image forensics. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14194–14204 (2020)

    Google Scholar 

  2. Bi, X., Wei, Y., Xiao, B., Li, W.: RRU-Net: the ringed residual u-net for image splicing forgery detection. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (2019)

    Google Scholar 

  3. Bondi, L., Lameri, S., Güera, D., Bestagini, P., Delp, E.J., Tubaro, S.: Tampering detection and localization through clustering of camera-based CNN features. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1855–1864. IEEE (2017)

    Google Scholar 

  4. Chen, K., Hong, L., Xu, H., Li, Z., Yeung, D.Y.: Multisiam: self-supervised multi-instance Siamese representation learning for autonomous driving. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7546–7554 (2021)

    Google Scholar 

  5. Chen, X., Dong, C., Ji, J., Cao, J., Li, X.: Image manipulation detection by multi-view multiscale supervision. In: IEEE/CVF International Conference on Computer Vision, pp. 14185–14193 (2021)

    Google Scholar 

  6. Chen, Y., Liu, S., Wang, X.: Learning continuous image representation with local implicit image function. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8628–8638 (2021)

    Google Scholar 

  7. Chen, Z., et al.: Videoinr: learning video implicit neural representation for continuous space-time super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2047–2057 (2022)

    Google Scholar 

  8. Choi, C.H., Choi, J.H., Lee, H.K.: CFA pattern identification of digital cameras using intermediate value counting. In: Proceedings of the thirteenth ACM multimedia workshop on Multimedia and Security, pp. 21–26 (2011)

    Google Scholar 

  9. Cozzolino, D., Verdoliva, L.: Noiseprint: a CNN based camera model fingerprint. IEEE Trans. Inf. Forensics Secur. 15, 144–159 (2019)

    Article  Google Scholar 

  10. Dhariwal, P., Nichol, A.: Diffusion models beat GANs on image synthesis. Adv. Neural. Inf. Process. Syst. 34, 8780–8794 (2021)

    Google Scholar 

  11. Dong, J., Wang, W., Tan, T.: CASIA image tampering detection evaluation database (2010). http://forensics.idealtest.org

  12. Dong, J., Wang, W., Tan, T.: CASIA image tampering detection evaluation database. In: 2013 IEEE China Summit and International Conference on Signal and Information Processing, pp. 422–426. IEEE (2013)

    Google Scholar 

  13. Dupont, E., Goliński, A., Alizadeh, M., Teh, Y.W., Doucet, A.: Coin: compression with implicit neural representations. arXiv preprint arXiv:2103.03123 (2021)

  14. Ergen, T., Kozat, S.S.: Unsupervised anomaly detection with LSTM neural networks. IEEE Trans. Neural Networks Learn. Syst. 31(8), 3127–3141 (2019)

    Article  MathSciNet  Google Scholar 

  15. Ester, M., Kriegel, H.P., Sander, J., Xu, X., et al.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, vol. 96, pp. 226–231 (1996)

    Google Scholar 

  16. Feng, Y., Feng, Y., You, H., Zhao, X., Gao, Y.: MeshNet: mesh neural network for 3D shape representation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 8279–8286 (2019)

    Google Scholar 

  17. Ferrara, P., Bianchi, T., De Rosa, A., Piva, A.: Image forgery localization via fine-grained analysis of CFA artifacts. IEEE Trans. Inf. Forensics Secur. 7(5), 1566–1577 (2012)

    Article  Google Scholar 

  18. Fridrich, J., Kodovsky, J.: Rich models for steganalysis of digital images. IEEE Trans. Inf. Forensics Secur. 7(3), 868–882 (2012)

    Article  Google Scholar 

  19. Guan, H., et al.: MFC datasets: large-scale benchmark datasets for media forensic challenge evaluation. In: IEEE/CVF Winter Conference on Applications of Computer Vision Workshops, pp. 63–72. IEEE (2019)

    Google Scholar 

  20. Guillaro, F., Cozzolino, D., Sud, A., Dufour, N., Verdoliva, L.: Trufor: leveraging all-round clues for trustworthy image forgery detection and localization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20606–20615 (2023)

    Google Scholar 

  21. Guo, X., Liu, X., Ren, Z., Grosz, S., Masi, I., Liu, X.: Hierarchical fine-grained image forgery detection and localization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3155–3165 (2023)

    Google Scholar 

  22. He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020)

    Google Scholar 

  23. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  24. Hsu, Y.F., Chang, S.F.: Detecting image splicing using geometry invariants and camera characteristics consistency. In: 2006 IEEE International Conference on Multimedia and Expo, pp. 549–552. IEEE (2006)

    Google Scholar 

  25. Hu, X., Zhang, Z., Jiang, Z., Chaudhuri, S., Yang, Z., Nevatia, R.: SPAN: spatial pyramid attention network for image manipulation localization. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12366, pp. 312–328. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58589-1_19

    Chapter  Google Scholar 

  26. Ji, K., Chen, F., Guo, X., Xu, Y., Wang, J., Chen, J.: Uncertainty-guided learning for improving image manipulation detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 22456–22465 (2023)

    Google Scholar 

  27. Koch, G., Zemel, R., Salakhutdinov, R., et al.: Siamese neural networks for one-shot image recognition. In: ICML Deep Learning Workshop, vol. 2. Lille (2015)

    Google Scholar 

  28. Kwan, H.M., Gao, G., Zhang, F., Gower, A., Bull, D.: Hinerv: video compression with hierarchical encoding-based neural representation. In: Advances in Neural Information Processing Systems, vol. 36 (2024)

    Google Scholar 

  29. Kwon, M.J., Nam, S.H., Yu, I.J., Lee, H.K., Kim, C.: Learning jpeg compression artifacts for image manipulation detection and localization. In: International Journal of Computer Vision, pp. 1875–1895 (2022)

    Google Scholar 

  30. Li, J., Chen, Y., Xing, Y.: Memory mechanism for unsupervised anomaly detection. In: The 39th Conference on Uncertainty in Artificial Intelligence (2023)

    Google Scholar 

  31. Li, S., Xia, X., Ge, S., Liu, T.: Selective-supervised contrastive learning with noisy labels. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 316–325 (2022)

    Google Scholar 

  32. Liu, D., Yu, J.: Otsu method and k-means. In: 2009 Ninth International Conference on Hybrid Intelligent Systems, vol. 1, pp. 344–349. IEEE (2009)

    Google Scholar 

  33. Liu, X., Liu, Y., Chen, J., Liu, X.: PSCC-Net: progressive Spatio-channel correlation network for image manipulation detection and localization. IEEE Trans. Circuits Syst. Video Technol. 32(11), 7505–7517 (2022)

    Article  MathSciNet  Google Scholar 

  34. Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)

    Article  MathSciNet  Google Scholar 

  35. Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)

  36. Lu, X., Wang, W., Ma, C., Shen, J., Shao, L., Porikli, F.: See more, know more: Unsupervised video object segmentation with co-attention siamese networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3623–3632 (2019)

    Google Scholar 

  37. Lyu, S., Pan, X., Zhang, X.: Exposing region splicing forgeries with blind local noise estimation. Int. J. Comput. Vision 110, 202–221 (2014)

    Article  Google Scholar 

  38. Mahdian, B., Saic, S.: Using noise inconsistencies for blind image forensics. Image Vis. Comput. 27(10), 1497–1503 (2009)

    Article  Google Scholar 

  39. Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. Commun. ACM 65(1), 99–106 (2021)

    Article  Google Scholar 

  40. Molaei, A., et al.: Implicit neural representation in medical imaging: a comparative survey. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2381–2391 (2023)

    Google Scholar 

  41. Niu, Y., Tondi, B., Zhao, Y., Ni, R., Barni, M.: Image splicing detection, localization and attribution via jpeg primary quantization matrix estimation and clustering. IEEE Trans. Inf. Forensics Secur. 16, 5397–5412 (2021)

    Article  Google Scholar 

  42. Novozamsky, A., Mahdian, B., Saic, S.: Imd2020: a large-scale annotated dataset tailored for detecting manipulated images. In: IEEE/CVF Winter Conference on Applications of Computer Vision Workshops, pp. 71–80 (2020)

    Google Scholar 

  43. Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9(1), 62–66 (1979)

    Article  Google Scholar 

  44. Pan, X., Zhang, X., Lyu, S.: Exposing image forgery with blind noise estimation. In: Proceedings of the thirteenth ACM Multimedia Workshop on Multimedia and Security, pp. 15–20 (2011)

    Google Scholar 

  45. Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems, vol. 32 (2019)

    Google Scholar 

  46. Pathak, D., Shelhamer, E., Long, J., Darrell, T.: Fully convolutional multi-class multiple instance learning. arXiv preprint arXiv:1412.7144 (2014)

  47. Pyatykh, S., Hesser, J., Zheng, L.: Image noise level estimation by principal component analysis. IEEE Trans. Image Process. 22(2), 687–699 (2012)

    Article  MathSciNet  Google Scholar 

  48. Qian, Y., Hong, X., Guo, Z., Arandjelović, O., Donovan, C.R.: Semi-supervised crowd counting with contextual modeling: facilitating holistic understanding of crowd scenes. IEEE Trans. Circuits Syst. Video Technol. (2024)

    Google Scholar 

  49. Qiao, T., Zhang, J., Xu, D., Tao, D.: MirrorGAN: learning text-to-image generation by redescription. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1505–1514 (2019)

    Google Scholar 

  50. Radenović, F., Tolias, G., Chum, O.: Fine-tuning CNN image retrieval with no human annotation. IEEE Trans. Pattern Anal. Mach. Intell. 41(7), 1655–1668 (2018)

    Article  Google Scholar 

  51. Shi, J., Xu, N., Bui, T., Dernoncourt, F., Wen, Z., Xu, C.: A benchmark and baseline for language-driven image editing. In: Proceedings of the Asian Conference on Computer Vision (2020)

    Google Scholar 

  52. Smucny, J., Shi, G., Lesh, T.A., Carter, C.S., Davidson, I.: Data augmentation with mixup: Enhancing performance of a functional neuroimaging-based prognostic deep learning classifier in recent onset psychosis. NeuroImage: Clinical 36, 103214 (2022)

    Google Scholar 

  53. Tao, C., et al.: Siamese image modeling for self-supervised vision representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2132–2141 (2023)

    Google Scholar 

  54. Wang, J., et al.: ObjectFormer for image manipulation detection and localization. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2364–2373 (2022)

    Google Scholar 

  55. Wang, L., et al.: Learning to detect salient objects with image-level supervision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 136–145 (2017)

    Google Scholar 

  56. Wei, Y., et al.: STC: a simple to complex framework for weakly-supervised semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2314–2320 (2016)

    Article  Google Scholar 

  57. Wen, B., Zhu, Y., Subramanian, R., Ng, T.T., Shen, X., Winkler, S.: Coverage - a novel database for copy-move forgery detection. In: IEEE International Conference on Image Processing (ICIP) (2016)

    Google Scholar 

  58. Wu, H., Chen, Y., Zhou, J.: Rethinking image forgery detection via contrastive learning and unsupervised clustering. arXiv preprint arXiv:2308.09307 (2023)

  59. Wu, H., Zhou, J., Tian, J., Liu, J.: Robust image forgery detection over online social network shared images. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13440–13449 (2022)

    Google Scholar 

  60. Wu, Y., AbdAlmageed, W., Natarajan, P.: Mantra-Net: Manipulation tracing network for detection and localization of image forgeries with anomalous features. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9543–9552 (2019)

    Google Scholar 

  61. Xu, T., et al.: AttnGAN: fine-grained text to image generation with attentional generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1316–1324 (2018)

    Google Scholar 

  62. Yang, C., Li, H., Lin, F., Jiang, B., Zhao, H.: Constrained R-CNN: a general image manipulation detection model. In: IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE (2020)

    Google Scholar 

  63. Yang, S., Ding, M., Wu, Y., Li, Z., Zhang, J.: Implicit neural representation for cooperative low-light image enhancement. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12918–12927 (2023)

    Google Scholar 

  64. Yoon, J., Yu, S., Bansal, M.: Raccoon: remove, add, and change video content with auto-generated narratives. arXiv preprint arXiv:2405.18406 (2024)

  65. Zhai, Y., Luan, T., Doermann, D., Yuan, J.: Towards generic image manipulation detection with weakly-supervised self-consistency learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 22390–22400 (2023)

    Google Scholar 

  66. Zhang, B., Tang, J., Niessner, M., Wonka, P.: 3dshape2vecset: a 3D shape representation for neural fields and generative diffusion models. arXiv preprint arXiv:2301.11445 (2023)

  67. Zhang, H., et al.: Nerd: neural representation of distribution for medical image segmentation. arXiv preprint arXiv:2103.04020 (2021)

  68. Zhang, K., Mo, L., Chen, W., Sun, H., Su, Y.: Magicbrush: a manually annotated dataset for instruction-guided image editing. In: Advances in Neural Information Processing Systems, vol. 36 (2024)

    Google Scholar 

  69. Zhang, L., Bao, C., Ma, K.: Self-distillation: towards efficient and compact neural networks. IEEE Trans. Pattern Anal. Mach. Intell. 44(8), 4388–4403 (2021)

    Google Scholar 

  70. Zhang, L., Rao, A., Agrawala, M.: Adding conditional control to text-to-image diffusion models. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3836–3847 (2023)

    Google Scholar 

  71. Zhang, W., Pang, J., Chen, K., Loy, C.C.: Dense Siamese network for dense unsupervised learning. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13690, pp. 464–480. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20056-4_27

    Chapter  Google Scholar 

  72. Zhang, Z., Bui, T.D.: Attention-based selection strategy for weakly supervised object localization. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 10305–10311. IEEE (2021)

    Google Scholar 

  73. Zhang, Z., Chang, M.C.: Two-stage dual augmentation with clip for improved text-to-sketch synthesis. In: 2023 IEEE 6th International Conference on Multimedia Information Processing and Retrieval (MIPR), pp. 1–6. IEEE (2023)

    Google Scholar 

  74. Zhang, Z., Chang, M.C., Bui, T.D.: Improving class activation map for weakly supervised object localization. In: ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2624–2628. IEEE (2022)

    Google Scholar 

  75. Zhang, Z., Li, M., Chang, M.C.: A new benchmark and model for challenging image manipulation detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, pp. 7405–7413 (2024)

    Google Scholar 

Download references

Acknowledgements

This work is supported by the DARPA Semantic Forensics (SemaFor) Program under contract HR001120C0123 and NSF CCSS-2348046.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhenfei Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, Z., Li, M., Li, X., Chang, MC., Hsieh, JW. (2025). Image Manipulation Detection with Implicit Neural Representation and Limited Supervision. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15146. Springer, Cham. https://doi.org/10.1007/978-3-031-73223-2_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-73223-2_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-73222-5

  • Online ISBN: 978-3-031-73223-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics