Locating and Mitigating Gender Bias in Large Language Models

Cai, Yuchen; Cao, Ding; Guo, Rongxi; Wen, Yaqin; Liu, Guiquan; Chen, Enhong

doi:10.1007/978-981-97-5672-8_40

Yuchen Cai^10,11,
Ding Cao^10,11,
Rongxi Guo^10,11,
Yaqin Wen^10,11,
Guiquan Liu^10,11 &
…
Enhong Chen^10,11

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14878))

Included in the following conference series:

International Conference on Intelligent Computing

840 Accesses
1 Citations

Abstract

Large language models (LLM) are pre-trained on extensive corpora to learn facts and human cognition which contain human preferences. However, this process can inadvertently lead to these models acquiring biases and stereotypes prevalent in society. Prior research has typically tackled the issue of bias through a one-dimensional perspective, concentrating either on locating or mitigating it. This limited perspective has created obstacles in facilitating research on bias to synergistically complement and progressively build upon one another. In this study, we integrate the processes of locating and mitigating bias within a unified framework. Initially, we use causal mediation analysis to trace the causal effects of different components’ activation within a large language model. Building on this, we propose the LSDM (Least Square Debias Method), a knowledge-editing-based method for mitigating gender bias in occupational pronouns, and compare it against two baselines on three gender bias datasets and seven knowledge competency test datasets. The experimental results indicate that the primary contributors to gender bias are the bottom MLP modules acting on the last token of occupational pronouns and the top attention module acting on the final word in the sentence. Furthermore, LSDM mitigates gender bias in the model more effectively than the other methods, reducing gender bias in occupational pronouns by 71.4%, while fully preserving the model’s capabilities in all other aspects.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 8465; Price includes VAT (Japan)

Softcover Book: JPY 10581; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Gender Bias in Neural Natural Language Processing

An Investigation of Structures Responsible for Gender Bias in BERT and DistilBERT

MarIA and BETO are sexist: evaluating gender bias in large language models for Spanish

Article Open access 23 July 2023

Notes

1.
We set \({\upnu }\) to be three times larger than the empirical standard deviation of embeddings. Refer to Meng et al. [18] for specifics.

References

Bisk, Y., Zellers, R., Gao, J., Choi, Y., et al.: PIQA: Reasoning about physical commonsense in natural language. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 7432–7439 (2020)
Google Scholar
Bolukbasi, T., Chang, K.W., Zou, J.Y., Saligrama, V., Kalai, A.T.: Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. Adv. Neural Inf. Process. Syst. 29 (2016)
Google Scholar
Caliskan, A., Bryson, J.J., Narayanan, A.: Semantics derived automatically from language corpora contain human-like biases. Science 356(6334), 183–186 (2017)
Google Scholar
Cheng, S., et al.: Can we edit multimodal large language models? (2023) arXiv:2310.08475
Cheng, S., et al.: Editing language model-based knowledge graph embeddings (2023). arXiv:2301.10405
Choi, J.H., Hickman, K.E., Monahan, A., Schwarcz, D.: ChatGPT goes to law school (2023)
Google Scholar
Cohen, D., et al.: Dynamic planning in open-ended dialogue using reinforcement learning (2022). arXiv:2208.02294
Dai, D., Dong, L., Hao, Y., Sui, Z., Wei, F.: Knowledge neurons in pretrained transformers (2021). arXiv abs/2104.08696. https://api.semanticscholar.org/CorpusID:233296761
Ferrara, E.: Should chatGPT be biased? Challenges and risks of bias in large language models (2023). arXiv:2304.03738
Gandikota, R., Materzynska, J., Fiotto-Kaufman, J., Bau, D.: Erasing concepts from diffusion models. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2426–2436 (2023)
Google Scholar
Garg, N., Schiebinger, L., Jurafsky, D., Zou, J.: Word embeddings quantify 100 years of gender and ethnic stereotypes. Proc. Natl. Acad. Sci. U.S.A. 115(16), E3635–E3644 (2018)
Google Scholar
Geva, M., Bastings, J., Filippova, K., Globerson, A.: Dissecting recall of factual associations in auto-regressive language models (2023). arXiv abs/2304.14767. https://api.semanticscholar.org/CorpusID:258417932
Geva, M., Caciularu, A., Wang, K., Goldberg, Y.: Transformer feed-forward layers build predictions by promoting concepts in the vocabulary space (2022). arXiv abs/2203.14680. https://api.semanticscholar.org/CorpusID:247762385
Gilson, A., et al.: How does chatGPT perform on the united states medical licensing examination? The implications of large language models for medical education and knowledge assessment. JMIR Med. Educ. 9(1), e45312 (2023)
Google Scholar
Guo, Y., Yang, Y., Abbasi, A.: Auto-debias: debiasing masked language models with automated biased prompts. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1012–1023 (2022)
Google Scholar
Kohonen, T.: Correlation matrix memories. IEEE Trans. Comput. C-21, 353–359 (1972). https://api.semanticscholar.org/CorpusID:21483100
May, C., Wang, A., Bordia, S., Bowman, S.R., Rudinger, R.: On measuring social biases in sentence encoders (2019). arXiv:1903.10561
Meng, K., Bau, D., Andonian, A., Belinkov, Y.: Locating and editing factual associations in GPT. Adv. Neural. Inf. Process. Syst. 35, 17359–17372 (2022)
Google Scholar
Meng, K., Sharma, A.S., Andonian, A., Belinkov, Y., Bau, D.: Mass-editing memory in a transformer (2022). arXiv:2210.07229
Paperno, D., et al.: The LAMBADA dataset: word prediction requiring a broad discourse context (2016). arXiv:1606.06031
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners (2019). https://api.semanticscholar.org/CorpusID:160025533
Ramamurthy, R., et al.: Is reinforcement learning (not) for natural language processing? Benchmarks, baselines, and building blocks for natural language policy optimization (2022). arXiv:2210.01241
Roemmele, M., Bejan, C.A., Gordon, A.S.: Choice of plausible alternatives: an evaluation of commonsense causal reasoning. In: 2011 AAAI Spring Symposium Series (2011)
Google Scholar
Rudinger, R., Naradowsky, J., Leonard, B., Van Durme, B.: Gender bias in coreference resolution (2018). arXiv:1804.09301
Sap, M., Rashkin, H., Chen, D., LeBras, R., Choi, Y.: Socialiqa: commonsense reasoning about social interactions (2019). arXiv:1904.09728
Sun, T., et al.: Mitigating gender bias in natural language processing: literature review (2019). arXiv:1906.08976
Talmor, A., Herzig, J., Lourie, N., Berant, J.: Commonsenseqa: a question answering challenge targeting commonsense knowledge (2018). arXiv:1811.00937
Touvron, H., et al.: LLaMA: open and efficient foundation language models (2023). arXiv:2302.13971
Vig, J., et al.: Causal mediation analysis for interpreting neural NLP: the case of gender bias (2020). arXiv:2004.12265
Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: Glue: a multi-task benchmark and analysis platform for natural language understanding (2018). arXiv:1804.07461
Wang, B., Komatsuzaki, A.: GPT-J-6B: a 6 billion parameter autoregressive language model (2021)
Google Scholar
Webster, K., et al.: Measuring and reducing gendered correlations in pre-trained models (2020). arXiv:2010.06032
Yang, A., et al.: Baichuan 2: open large-scale language models (2023). arXiv:2309.10305
Yao, Y., et al.: Editing large language models: problems, methods, and opportunities (2023). arXiv:2305.13172
Zellers, R., Holtzman, A., Bisk, Y., Farhadi, A., Choi, Y.: Hellaswag: can a machine really finish your sentence? (2019). arXiv:1905.07830
Zhao, J., Wang, T., Yatskar, M., Ordonez, V., Chang, K.W.: Gender bias in coreference resolution: evaluation and debiasing methods (2018). arXiv:1804.06876
Ziegler, D.M., et al.: Fine-tuning language models from human preferences (2019). arXiv:1909.08593
Zmigrod, R., Mielke, S.J., Wallach, H., Cotterell, R.: Counterfactual data augmentation for mitigating gender stereotypes in languages with rich morphology. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 1651–1661 (2019)
Google Scholar
Cai, Y., Cao, D., Guo, R., Wen, Y., Liu, G., Chen, E.: Locating and mitigating gender bias in large language models (2024). arXiv:2403.14409
Cai, Y., Cao, D., Guo, R., Wen, Y., Liu, G., Chen, E.: Editing knowledge representation of language model via rephrased prefix prompts (2024). arXiv:2403.14381

Download references

Author information

Authors and Affiliations

University of Science and Technology of China, Hefei, China
Yuchen Cai, Ding Cao, Rongxi Guo, Yaqin Wen, Guiquan Liu & Enhong Chen
State Key Laboratory of Cognitive Intelligence, Hefei, China
Yuchen Cai, Ding Cao, Rongxi Guo, Yaqin Wen, Guiquan Liu & Enhong Chen

Authors

Yuchen Cai
View author publications
You can also search for this author in PubMed Google Scholar
Ding Cao
View author publications
You can also search for this author in PubMed Google Scholar
Rongxi Guo
View author publications
You can also search for this author in PubMed Google Scholar
Yaqin Wen
View author publications
You can also search for this author in PubMed Google Scholar
Guiquan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Enhong Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guiquan Liu .

Editor information

Editors and Affiliations

Eastern Institute of Technology, Ningbo, China
De-Shuang Huang
Tianjin University of Science and Technology, Tianjin, China
Zhanjun Si
Tianjin University of Science and Technology, Tianjin, China
Chuanlei Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cai, Y., Cao, D., Guo, R., Wen, Y., Liu, G., Chen, E. (2024). Locating and Mitigating Gender Bias in Large Language Models. In: Huang, DS., Si, Z., Zhang, C. (eds) Advanced Intelligent Computing Technology and Applications. ICIC 2024. Lecture Notes in Computer Science(), vol 14878. Springer, Singapore. https://doi.org/10.1007/978-981-97-5672-8_40

Download citation

DOI: https://doi.org/10.1007/978-981-97-5672-8_40
Published: 01 August 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-5671-1
Online ISBN: 978-981-97-5672-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Locating and Mitigating Gender Bias in Large Language Models

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Gender Bias in Neural Natural Language Processing

An Investigation of Structures Responsible for Gender Bias in BERT and DistilBERT

MarIA and BETO are sexist: evaluating gender bias in large language models for Spanish

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Locating and Mitigating Gender Bias in Large Language Models

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Gender Bias in Neural Natural Language Processing

An Investigation of Structures Responsible for Gender Bias in BERT and DistilBERT

MarIA and BETO are sexist: evaluating gender bias in large language models for Spanish

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation