Rethinking Data Bias: Dataset Copyright Protection via Embedding Class-Wise Hidden Bias | SpringerLink
Skip to main content

Rethinking Data Bias: Dataset Copyright Protection via Embedding Class-Wise Hidden Bias

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15079))

Included in the following conference series:

  • 341 Accesses

Abstract

Public datasets play a crucial role in advancing data-centric AI, yet they remain vulnerable to illicit uses. This paper presents ‘undercover bias,’ a novel dataset watermarking method that can reliably identify and verify unauthorized data usage. Our approach is inspired by an observation that trained models often inadvertently learn biased knowledge and can function on bias-only data, even without any information directly related to a target task. Leveraging this, we deliberately embed class-wise hidden bias via unnoticeable watermarks, which are unrelated to the target dataset but share the same labels. Consequently, a model trained on this watermarked data covertly learns to classify these watermarks. The model’s performance in classifying the watermarks serves as irrefutable evidence of unauthorized usage, which cannot be achieved by chance. Our approach presents multiple benefits: 1) stealthy and model-agnostic watermarks; 2) minimal impact on the target task; 3) irrefutable evidence of misuse; and 4) improved applicability in practical scenarios. We validate these benefits through extensive experiments and extend our method to fine-grained classification and image segmentation tasks. Our implementation is available at here (https://github.com/jjh6297/UndercoverBias).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 8465
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 10581
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. https://image-net.org/challenges/LSVRC/announcement-June-2-2015 , June 2015

  2. https://www.kaggle.com/c/petfinder-adoption-prediction/discussion/125436, January 2020

  3. Aghakhani, H., Meng, D., Wang, Y.X., Kruegel, C., Vigna, G.: Bullseye polytope: a scalable clean-label poisoning attack with improved transferability. In: EuroS &P, pp. 159–178 (2021)

    Google Scholar 

  4. Baluja, S.: Hiding images in plain sight: deep steganography. In: NeurIPS, pp. 2066–2076 (2017)

    Google Scholar 

  5. Chen, X., Liu, C., Li, B., Lu, K., Song, D.: Targeted backdoor attacks on deep learning systems using data poisoning. arXiv preprint arXiv:1712.05526 (2017)

  6. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR, pp. 248–255 (2009)

    Google Scholar 

  7. Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. ICCV 88(2), 303–338 (2010)

    Google Scholar 

  8. Geiping, J., et al.: Witches’ brew: industrial scale data poisoning via gradient matching. In: ICLR (2021)

    Google Scholar 

  9. Goodfellow, I.J., et al.: Challenges in representation learning: a report on three machine learning contests. In: NeurIPS (2013)

    Google Scholar 

  10. Gu, T., Liu, K., Dolan-Gavitt, B., Garg, S.: BadNets: evaluating backdooring attacks on deep neural networks. IEEE Access 7, 47230–47244 (2019)

    Article  Google Scholar 

  11. Heo, B., Yun, S., Han, D., Chun, S., Choe, J., Oh, S.J.: Rethinking spatial dimensions of vision transformers. In: ICCV (2021)

    Google Scholar 

  12. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: CVPR (2017)

    Google Scholar 

  13. Huang, W.R., Geiping, J., Fowl, L., Taylor, G., Goldstein, T.: MetaPoison: practical general-purpose clean-label data poisoning. In: NeurIPS, vol. 33, pp. 12080–12091 (2020)

    Google Scholar 

  14. Jiang, W., Li, H., Xu, G., Zhang, T.: Color backdoor: a robust poisoning attack in color space. In: CVPR, pp. 8133–8142 (2023)

    Google Scholar 

  15. Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)

    Google Scholar 

  16. Le, Y., Yang, X.: Tiny ImageNet visual recognition challenge. CS 231N(7), 7 (2015)

    Google Scholar 

  17. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  18. Lee, J., Kim, E., Lee, J., Lee, J., Choo, J.: Learning debiased representation via disentangled feature augmentation. In: NeurIPS, vol. 34, pp. 25123–25133 (2021)

    Google Scholar 

  19. Li, Y., Bai, Y., Jiang, Y., Yang, Y., Xia, S.T., Li, B.: Untargeted backdoor watermark: towards harmless and stealthy dataset copyright protection. In: NeurIPS (2022)

    Google Scholar 

  20. Li, Y., Zhang, Z., Bai, J., Wu, B., Jiang, Y., Xia, S.T.: Open-sourced dataset protection via backdoor watermarking. In: NeurIPS Workshops (2020)

    Google Scholar 

  21. Li, Y., Li, Y., Wu, B., Li, L., He, R., Lyu, S.: Invisible backdoor attack with sample-specific triggers. In: ICCV, pp. 16463–16472 (2021)

    Google Scholar 

  22. Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  23. Liu, G., Xu, T., Ma, X., Wang, C.: Your model trains on my data? Protecting intellectual property of training data via membership fingerprint authentication. IEEE Trans. Inf. Forensics Secur. 17, 1024–1037 (2022)

    Article  Google Scholar 

  24. Liu, Y., Ma, X., Bailey, J., Lu, F.: Reflection backdoor: a natural backdoor attack on deep neural networks. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12355, pp. 182–199. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58607-2_11

    Chapter  Google Scholar 

  25. Nam, J., Cha, H., Ahn, S., Lee, J., Shin, J.: Learning from failure: de-biasing classifier from biased classifier. In: NeurIPS, vol. 33, pp. 20673–20684 (2020)

    Google Scholar 

  26. Ramaswamy, V.V., Kim, S.S., Russakovsky, O.: Fair attribute classification through latent space de-biasing. In: CVPR, pp. 9301–9310 (2021)

    Google Scholar 

  27. Ribeiro, M.T., Singh, S., Guestrin, C.: “Why should i trust you?” Explaining the predictions of any classifier. In: SIGKDD, pp. 1135–1144 (2016)

    Google Scholar 

  28. Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: CVPR, pp. 10684–10695 (2020)

    Google Scholar 

  29. Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28

    Chapter  Google Scholar 

  30. Sablayrolles, A., Douze, M., Schmid, C., Jégou, H.: Radioactive data: tracing through training. In: ICML, pp. 8326–8335 (2020)

    Google Scholar 

  31. Saha, A., Subramanya, A., Pirsiavash, H.: Hidden trigger backdoor attacks. In: AAAI, vol. 34, pp. 11957–11965 (2020)

    Google Scholar 

  32. Schwarzschild, A., Goldblum, M., Gupta, A., Dickerson, J.P., Goldstein, T.: Just how toxic is data poisoning? A unified benchmark for backdoor and data poisoning attacks. In: ICML, pp. 9389–9398 (2021)

    Google Scholar 

  33. Shafahi, A., et al.: Poison frogs! Targeted clean-label poisoning attacks on neural networks. In: NeurIPS, vol. 31 (2018)

    Google Scholar 

  34. Souri, H., Fowl, L., Chellappa, R., Goldblum, M., Goldstein, T.: Sleeper agent: scalable hidden trigger backdoors for neural networks trained from scratch. In: NeurIPS, vol. 35, pp. 19165–19178 (2022)

    Google Scholar 

  35. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. JMLR 15(1), 1929–1958 (2014)

    MathSciNet  Google Scholar 

  36. Tan, M., Le, Q.: EfficientNet: rethinking model scaling for convolutional neural networks. In: ICML, pp. 6105–6114 (2019)

    Google Scholar 

  37. Tompson, J., Goroshin, R., Jain, A., LeCun, Y., Bregler, C.: Efficient object localization using convolutional networks. In: CVPR, pp. 648–656 (2015)

    Google Scholar 

  38. Touvron, H., et al.: ResMLP: feedforward networks for image classification with data-efficient training. In: ICLR (2021)

    Google Scholar 

  39. Wang, T., Yao, Y., Xu, F., An, S., Tong, H., Wang, T.: An invisible black-box backdoor attack through frequency domain. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13673, pp. 396–413. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19778-9_23

    Chapter  Google Scholar 

  40. Wang, W., et al.: PVT v2: improved baselines with pyramid vision transformer. Comput. Vis. Media, pp. 1–10 (2022)

    Google Scholar 

  41. Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. TIP 13(4), 600–612 (2004)

    Google Scholar 

  42. Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017)

  43. Zhang, J., et al.: Model watermarking for image processing networks. In: AAAI, vol. 34, pp. 12805–12812 (2020)

    Google Scholar 

  44. Zhu, Z., Xie, L., Yuille, A.: Object recognition with and without objects. In: IJCAI, pp. 3609–3615 (2017)

    Google Scholar 

Download references

Acknowledgements

This work was partly supported by Electronics and Telecommunications Research Institute (ETRI) grant funded by the Korean government foundation (24ZB1200, Research of Human-centered Autonomous Intelligence System Original Technology, 40%), the Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government. (MSIT) (RS-2023-00215760, Guide Dog: Development of Navigation AI Technology of a Guidance Robot for the Visually Impaired Person, 30%), and Korea Institute of Marine Science & Technology Promotion (KIMST) funded by the Korea Coast Guard (RS-2023-00238652, Integrated Satellite-based Applications Development for Korea Coast Guard, 30%).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chan-Hyun Youn .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 2530 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jang, J., Han, B., Kim, J., Youn, CH. (2025). Rethinking Data Bias: Dataset Copyright Protection via Embedding Class-Wise Hidden Bias. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15079. Springer, Cham. https://doi.org/10.1007/978-3-031-72664-4_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-72664-4_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-72663-7

  • Online ISBN: 978-3-031-72664-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics