Attentive Perturbation: Extending Prefix Tuning to Large Language Models Inner Representations

Falissard, Louis; Affeldt, Séverine; Nadif, Mohamed

doi:10.1007/978-3-031-53969-5_36

Louis Falissard¹³,
Séverine Affeldt¹³ &
Mohamed Nadif¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14505))

Included in the following conference series:

International Conference on Machine Learning, Optimization, and Data Science

520 Accesses

Abstract

From adapters to prefix-tuning, parameter efficient fine-tuning (PEFT) has been a well investigated research field in the past few years, which has led to an entire family of alternative approaches for large language model fine-tuning. All these methods rely on the fundamental idea of introducing additional learnable parameters to the model, while freezing all pre-trained representations during training. This fine-tuning process is generally done through refitting all model parameters to the new, supervised objective function. This process, however, still requires a considerable amount of computing power, which might not be readily available to everyone. In addition, even with the use of transfer learning, this method requires substantial amounts of data. In this article, we propose a novel and fairly straightforward extension of the prefix-tuning approach to modify both the model’s attention weight and its internal representations. Our proposal introduces a “token-tuning” method relying on soft lookup based embeddings derived using attention mechanisms. We call this efficient extension “attentive perturbation”, and empirically show that it outperforms other PEFT methods on most natural language understanding tasks in the few-shot learning setting.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 9380; Price includes VAT (Japan)

Softcover Book: JPY 11725; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Parameter-efficient fine-tuning of large language models using semantic knowledge tuning

Article Open access 28 December 2024

An Adaptive Learning Method for Solving the Extreme Learning Rate Problem of Transformer

Structure-inducing pre-training

Article Open access 01 June 2023

References

Ben Zaken, E., Goldberg, Y., Ravfogel, S.: BitFit: simple parameter-efficient fine-tuning for transformer-based masked language-models. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, pp. 1–9. Association for Computational Linguistics, Dublin, Ireland (2022)
Google Scholar
Brown, T., et al.: Language models are few-shot learners. In: Advances in Neural Information Processing Systems, vol. 33, pp. 1877–1901 (2020)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Houlsby, N., et al.: Parameter-efficient transfer learning for NLP. In: International Conference on Machine Learning, pp. 2790–2799. PMLR (2019)
Google Scholar
Hu, E.J., et al.: LoRA: low-rank adaptation of large language models (2021). https://doi.org/10.48550/ARXIV.2106.09685
Lester, B., Al-Rfou, R., Constant, N.: The power of scale for parameter-efficient prompt tuning. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 3045–3059. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic (2021)
Google Scholar
Liu, X., et al.: P-tuning: prompt tuning can be comparable to fine-tuning across scales and tasks. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (vol. 2: Short Papers), pp. 61–68. Association for Computational Linguistics, Dublin, Ireland (2022)
Google Scholar
Mao, Y., et al.: UniPELT: A unified framework for parameter-efficient language model tuning. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (vol. 1: Long Papers), pp. 6253–6264. Association for Computational Linguistics, Dublin, Ireland (2022)
Google Scholar
Pfeiffer, J., Kamath, A., Rücklé, A., Cho, K., Gurevych, I.: AdapterFusion: non-destructive task composition for transfer learning. In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics, pp. 487–503. Association for Computational Linguistics (2021)
Google Scholar
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language Models are Unsupervised Multitask Learners (2019)
Google Scholar
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer (2019). https://doi.org/10.48550/ARXIV.1910.10683
Rebuffi, S.A., Bilen, H., Vedaldi, A.: Learning multiple visual domains with residual adapters. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.: GLUE: a multi-task benchmark and analysis platform for natural language understanding. In: Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pp. 353–355. Association for Computational Linguistics, Brussels, Belgium (2018)
Google Scholar

Download references

Acknowledgments

This work was supported by a grant overseen by the French National Research Agency (ANR) (ANR-19-CE23-0002). It also received the labelling of Cap Digital and EuroBiomed competitiveness clusters.

Author information

Authors and Affiliations

Centre Borelli UMR 9010, Université Paris Cité, 75006, Paris, France
Louis Falissard, Séverine Affeldt & Mohamed Nadif

Authors

Louis Falissard
View author publications
You can also search for this author in PubMed Google Scholar
Séverine Affeldt
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Nadif
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Louis Falissard .

Editor information

Editors and Affiliations

University of Catania, Catania, Catania, Italy
Giuseppe Nicosia
Newcastle University, Newcastle upon Tyne, UK
Varun Ojha
University of Oxford, Oxford, UK
Emanuele La Malfa
University of Cambridge, Cambridge, UK
Gabriele La Malfa
University of Florida, Gainesville, FL, USA
Panos M. Pardalos
Dana-Farber Cancer Institute, Boston, MA, USA
Renato Umeton

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Falissard, L., Affeldt, S., Nadif, M. (2024). Attentive Perturbation: Extending Prefix Tuning to Large Language Models Inner Representations. In: Nicosia, G., Ojha, V., La Malfa, E., La Malfa, G., Pardalos, P.M., Umeton, R. (eds) Machine Learning, Optimization, and Data Science. LOD 2023. Lecture Notes in Computer Science, vol 14505. Springer, Cham. https://doi.org/10.1007/978-3-031-53969-5_36

Download citation

DOI: https://doi.org/10.1007/978-3-031-53969-5_36
Published: 16 February 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-53968-8
Online ISBN: 978-3-031-53969-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Attentive Perturbation: Extending Prefix Tuning to Large Language Models Inner Representations

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Parameter-efficient fine-tuning of large language models using semantic knowledge tuning

An Adaptive Learning Method for Solving the Extreme Learning Rate Problem of Transformer

Structure-inducing pre-training

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Attentive Perturbation: Extending Prefix Tuning to Large Language Models Inner Representations

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Parameter-efficient fine-tuning of large language models using semantic knowledge tuning

An Adaptive Learning Method for Solving the Extreme Learning Rate Problem of Transformer

Structure-inducing pre-training

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation