PEFT-SER: On the Use of Parameter Efficient Transfer Learning Approaches For Speech Emotion Recognition Using Pre-trained Speech Models

Feng, Tiantian; Narayanan, Shrikanth

doi:10.1109/ACII59096.2023.10388152

Computer Science > Sound

arXiv:2306.05350 (cs)

[Submitted on 8 Jun 2023 (v1), last revised 14 Feb 2024 (this version, v2)]

Title:PEFT-SER: On the Use of Parameter Efficient Transfer Learning Approaches For Speech Emotion Recognition Using Pre-trained Speech Models

Authors:Tiantian Feng, Shrikanth Narayanan

View PDF

Abstract:Many recent studies have focused on fine-tuning pre-trained models for speech emotion recognition (SER), resulting in promising performance compared to traditional methods that rely largely on low-level, knowledge-inspired acoustic features. These pre-trained speech models learn general-purpose speech representations using self-supervised or weakly-supervised learning objectives from large-scale datasets. Despite the significant advances made in SER through the use of pre-trained architecture, fine-tuning these large pre-trained models for different datasets requires saving copies of entire weight parameters, rendering them impractical to deploy in real-world settings. As an alternative, this work explores parameter-efficient fine-tuning (PEFT) approaches for adapting pre-trained speech models for emotion recognition. Specifically, we evaluate the efficacy of adapter tuning, embedding prompt tuning, and LoRa (Low-rank approximation) on four popular SER testbeds. Our results reveal that LoRa achieves the best fine-tuning performance in emotion recognition while enhancing fairness and requiring only a minimal extra amount of weight parameters. Furthermore, our findings offer novel insights into future research directions in SER, distinct from existing approaches focusing on directly fine-tuning the model architecture. Our code is publicly available under: this https URL.

Comments:	This work was accepted to the 11th International Conference on Affective Computing and Intelligent Interaction (ACII), 2023
Subjects:	Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2306.05350 [cs.SD]
	(or arXiv:2306.05350v2 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2306.05350
Related DOI:	https://doi.org/10.1109/ACII59096.2023.10388152

Submission history

From: Tiantian Feng [view email]
[v1] Thu, 8 Jun 2023 16:53:02 UTC (1,213 KB)
[v2] Wed, 14 Feb 2024 09:26:51 UTC (1,243 KB)

Computer Science > Sound

Title:PEFT-SER: On the Use of Parameter Efficient Transfer Learning Approaches For Speech Emotion Recognition Using Pre-trained Speech Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:PEFT-SER: On the Use of Parameter Efficient Transfer Learning Approaches For Speech Emotion Recognition Using Pre-trained Speech Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators