Rethinking Data Bias: Dataset Copyright Protection via Embedding Class-Wise Hidden Bias

Jang, Jinhyeok; Han, Byungok; Kim, Jaehong; Youn, Chan-Hyun

doi:10.1007/978-3-031-72664-4_1

Jinhyeok Jang^13,14,
Byungok Han¹³,
Jaehong Kim¹³ &
…
Chan-Hyun Youn¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15079))

Included in the following conference series:

European Conference on Computer Vision

341 Accesses

Abstract

Public datasets play a crucial role in advancing data-centric AI, yet they remain vulnerable to illicit uses. This paper presents ‘undercover bias,’ a novel dataset watermarking method that can reliably identify and verify unauthorized data usage. Our approach is inspired by an observation that trained models often inadvertently learn biased knowledge and can function on bias-only data, even without any information directly related to a target task. Leveraging this, we deliberately embed class-wise hidden bias via unnoticeable watermarks, which are unrelated to the target dataset but share the same labels. Consequently, a model trained on this watermarked data covertly learns to classify these watermarks. The model’s performance in classifying the watermarks serves as irrefutable evidence of unauthorized usage, which cannot be achieved by chance. Our approach presents multiple benefits: 1) stealthy and model-agnostic watermarks; 2) minimal impact on the target task; 3) irrefutable evidence of misuse; and 4) improved applicability in practical scenarios. We validate these benefits through extensive experiments and extend our method to fine-grained classification and image segmentation tasks. Our implementation is available at here (https://github.com/jjh6297/UndercoverBias).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 8465; Price includes VAT (Japan)

Softcover Book: JPY 10581; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Finding Needles in a Haystack: A Black-Box Approach to Invisible Watermark Detection

Pairwise open-sourced dataSet protection based on adaptive blind watermarking

Article 04 January 2023

Protecting ownership rights of ML models using watermarking in the light of adversarial attacks

Article 23 February 2024

References

https://image-net.org/challenges/LSVRC/announcement-June-2-2015 , June 2015
https://www.kaggle.com/c/petfinder-adoption-prediction/discussion/125436, January 2020
Aghakhani, H., Meng, D., Wang, Y.X., Kruegel, C., Vigna, G.: Bullseye polytope: a scalable clean-label poisoning attack with improved transferability. In: EuroS &P, pp. 159–178 (2021)
Google Scholar
Baluja, S.: Hiding images in plain sight: deep steganography. In: NeurIPS, pp. 2066–2076 (2017)
Google Scholar
Chen, X., Liu, C., Li, B., Lu, K., Song, D.: Targeted backdoor attacks on deep learning systems using data poisoning. arXiv preprint arXiv:1712.05526 (2017)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR, pp. 248–255 (2009)
Google Scholar
Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. ICCV 88(2), 303–338 (2010)
Google Scholar
Geiping, J., et al.: Witches’ brew: industrial scale data poisoning via gradient matching. In: ICLR (2021)
Google Scholar
Goodfellow, I.J., et al.: Challenges in representation learning: a report on three machine learning contests. In: NeurIPS (2013)
Google Scholar
Gu, T., Liu, K., Dolan-Gavitt, B., Garg, S.: BadNets: evaluating backdooring attacks on deep neural networks. IEEE Access 7, 47230–47244 (2019)
Article Google Scholar
Heo, B., Yun, S., Han, D., Chun, S., Choe, J., Oh, S.J.: Rethinking spatial dimensions of vision transformers. In: ICCV (2021)
Google Scholar
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: CVPR (2017)
Google Scholar
Huang, W.R., Geiping, J., Fowl, L., Taylor, G., Goldstein, T.: MetaPoison: practical general-purpose clean-label data poisoning. In: NeurIPS, vol. 33, pp. 12080–12091 (2020)
Google Scholar
Jiang, W., Li, H., Xu, G., Zhang, T.: Color backdoor: a robust poisoning attack in color space. In: CVPR, pp. 8133–8142 (2023)
Google Scholar
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)
Google Scholar
Le, Y., Yang, X.: Tiny ImageNet visual recognition challenge. CS 231N(7), 7 (2015)
Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Lee, J., Kim, E., Lee, J., Lee, J., Choo, J.: Learning debiased representation via disentangled feature augmentation. In: NeurIPS, vol. 34, pp. 25123–25133 (2021)
Google Scholar
Li, Y., Bai, Y., Jiang, Y., Yang, Y., Xia, S.T., Li, B.: Untargeted backdoor watermark: towards harmless and stealthy dataset copyright protection. In: NeurIPS (2022)
Google Scholar
Li, Y., Zhang, Z., Bai, J., Wu, B., Jiang, Y., Xia, S.T.: Open-sourced dataset protection via backdoor watermarking. In: NeurIPS Workshops (2020)
Google Scholar
Li, Y., Li, Y., Wu, B., Li, L., He, R., Lyu, S.: Invisible backdoor attack with sample-specific triggers. In: ICCV, pp. 16463–16472 (2021)
Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Liu, G., Xu, T., Ma, X., Wang, C.: Your model trains on my data? Protecting intellectual property of training data via membership fingerprint authentication. IEEE Trans. Inf. Forensics Secur. 17, 1024–1037 (2022)
Article Google Scholar
Liu, Y., Ma, X., Bailey, J., Lu, F.: Reflection backdoor: a natural backdoor attack on deep neural networks. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12355, pp. 182–199. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58607-2_11
Chapter Google Scholar
Nam, J., Cha, H., Ahn, S., Lee, J., Shin, J.: Learning from failure: de-biasing classifier from biased classifier. In: NeurIPS, vol. 33, pp. 20673–20684 (2020)
Google Scholar
Ramaswamy, V.V., Kim, S.S., Russakovsky, O.: Fair attribute classification through latent space de-biasing. In: CVPR, pp. 9301–9310 (2021)
Google Scholar
Ribeiro, M.T., Singh, S., Guestrin, C.: “Why should i trust you?” Explaining the predictions of any classifier. In: SIGKDD, pp. 1135–1144 (2016)
Google Scholar
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: CVPR, pp. 10684–10695 (2020)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Sablayrolles, A., Douze, M., Schmid, C., Jégou, H.: Radioactive data: tracing through training. In: ICML, pp. 8326–8335 (2020)
Google Scholar
Saha, A., Subramanya, A., Pirsiavash, H.: Hidden trigger backdoor attacks. In: AAAI, vol. 34, pp. 11957–11965 (2020)
Google Scholar
Schwarzschild, A., Goldblum, M., Gupta, A., Dickerson, J.P., Goldstein, T.: Just how toxic is data poisoning? A unified benchmark for backdoor and data poisoning attacks. In: ICML, pp. 9389–9398 (2021)
Google Scholar
Shafahi, A., et al.: Poison frogs! Targeted clean-label poisoning attacks on neural networks. In: NeurIPS, vol. 31 (2018)
Google Scholar
Souri, H., Fowl, L., Chellappa, R., Goldblum, M., Goldstein, T.: Sleeper agent: scalable hidden trigger backdoors for neural networks trained from scratch. In: NeurIPS, vol. 35, pp. 19165–19178 (2022)
Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. JMLR 15(1), 1929–1958 (2014)
MathSciNet Google Scholar
Tan, M., Le, Q.: EfficientNet: rethinking model scaling for convolutional neural networks. In: ICML, pp. 6105–6114 (2019)
Google Scholar
Tompson, J., Goroshin, R., Jain, A., LeCun, Y., Bregler, C.: Efficient object localization using convolutional networks. In: CVPR, pp. 648–656 (2015)
Google Scholar
Touvron, H., et al.: ResMLP: feedforward networks for image classification with data-efficient training. In: ICLR (2021)
Google Scholar
Wang, T., Yao, Y., Xu, F., An, S., Tong, H., Wang, T.: An invisible black-box backdoor attack through frequency domain. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13673, pp. 396–413. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19778-9_23
Chapter Google Scholar
Wang, W., et al.: PVT v2: improved baselines with pyramid vision transformer. Comput. Vis. Media, pp. 1–10 (2022)
Google Scholar
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. TIP 13(4), 600–612 (2004)
Google Scholar
Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017)
Zhang, J., et al.: Model watermarking for image processing networks. In: AAAI, vol. 34, pp. 12805–12812 (2020)
Google Scholar
Zhu, Z., Xie, L., Yuille, A.: Object recognition with and without objects. In: IJCAI, pp. 3609–3615 (2017)
Google Scholar

Download references

Acknowledgements

This work was partly supported by Electronics and Telecommunications Research Institute (ETRI) grant funded by the Korean government foundation (24ZB1200, Research of Human-centered Autonomous Intelligence System Original Technology, 40%), the Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government. (MSIT) (RS-2023-00215760, Guide Dog: Development of Navigation AI Technology of a Guidance Robot for the Visually Impaired Person, 30%), and Korea Institute of Marine Science & Technology Promotion (KIMST) funded by the Korea Coast Guard (RS-2023-00238652, Integrated Satellite-based Applications Development for Korea Coast Guard, 30%).

Author information

Authors and Affiliations

ETRI, Daejeon, South Korea
Jinhyeok Jang, Byungok Han & Jaehong Kim
KAIST, Daejeon, South Korea
Jinhyeok Jang & Chan-Hyun Youn

Authors

Jinhyeok Jang
View author publications
You can also search for this author in PubMed Google Scholar
Byungok Han
View author publications
You can also search for this author in PubMed Google Scholar
Jaehong Kim
View author publications
You can also search for this author in PubMed Google Scholar
Chan-Hyun Youn
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chan-Hyun Youn .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Germany
Stefan Roth
Princeton University, Princeton, NJ, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 2530 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jang, J., Han, B., Kim, J., Youn, CH. (2025). Rethinking Data Bias: Dataset Copyright Protection via Embedding Class-Wise Hidden Bias. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15079. Springer, Cham. https://doi.org/10.1007/978-3-031-72664-4_1

Download citation

DOI: https://doi.org/10.1007/978-3-031-72664-4_1
Published: 26 October 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72663-7
Online ISBN: 978-3-031-72664-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Rethinking Data Bias: Dataset Copyright Protection via Embedding Class-Wise Hidden Bias