声纹识别、语音情感分析模型

原创

laoge776 2024-03-20 09:00:04 ©著作权

©著作权归作者所有：来自51CTO博客作者laoge776的原创作品，请联系作者获取转载授权，否则将追究法律责任

声纹识别

1.VoiceprintRecognition-Pytorch声纹识别

1.1基本介绍

本项目使用了EcapaTdnn、ResNetSE、ERes2Net、CAM++等多种先进的声纹识别模型，同时也支持了MelSpectrogram、Spectrogram、MFCC、Fbank等多种数据预处理方法，使用了ArcFace Loss，ArcFace loss：Additive Angular Margin Loss（加性角度间隔损失函数），对应项目中的AAMLoss，对特征向量和权重归一化，对θ加上角度间隔m，角度间隔比余弦间隔在对角度的影响更加直接，除此之外，还支持AMLoss、ARMLoss、CELoss等多种损失函数。·

1.2项目特性

支持模型：EcapaTdnn、TDNN、Res2Net、ResNetSE、ERes2Net、CAM++
支持池化层：AttentiveStatsPool(ASP)、SelfAttentivePooling(SAP)、TemporalStatisticsPooling(TSP)、TemporalAveragePooling(TAP)、TemporalStatsPool(TSTP)
支持损失函数：AAMLoss、AMLoss、ARMLoss、CELoss
支持预处理方法：MelSpectrogram、Spectrogram、MFCC、Fbank

1.2.1 模型涉及的论文

1.3使用环境

Anaconda 3
Python 3.8
Pytorch 1.13.1
Windows 10 or Ubuntu 18.04

配置环境

声纹识别、语音情感分析模型_基本介绍

声纹识别、语音情感分析模型_基本介绍_02

声纹识别、语音情感分析模型_基本介绍_03

GPU，阻塞，下面模型同理

声纹识别、语音情感分析模型_python_04

不使用GPU的情况

将use_gpu对应的值改为false

声纹识别、语音情感分析模型_python_05

执行下面的命令进行声纹对比

python infer_contrast.py --audio_path1=audio/a_1.wav --audio_path2=audio/b_2.wav

显然两段音频的声纹完全不一致，但是结果显示声纹对比率达到了0.99+，识别失败！

声纹识别、语音情感分析模型_python_06

GPU情况下

出现以下问题

声纹识别、语音情感分析模型_基本介绍_07

搜索后发现是因为版本问题导致

1.4安装环境

首先安装的是Pytorch的GPU版本，如果已经安装过了，请跳过。

conda install pytorch==1.13.1 torchvisinotallow==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.6 -c pytorch -c nvidia

安装ppvector库。

使用pip安装，命令如下：

python -m pip install mvector -U -i https://pypi.tuna.tsinghua.edu.cn/simple

训练模型

执行下面这个命令，开始训练（CUDA_VISIBLE_DEVICES=0==使用一张显卡进行训练）

CUDA_VISIBLE_DEVICES=0 python train.py

在训练到第448个epoch时报错（阻塞）

声纹识别、语音情感分析模型_python_08

2.VoiceprintRecognition-PaddlePaddle声纹识别（未完成，gcc版本不兼容）

2.1基本介绍

同1.1的基本介绍

2.2使用环境

Anaconda 3
Python 3.8
PaddlePaddle 2.4.1
Windows 10 or Ubuntu 18.04

2.3项目特性

支持模型：EcapaTdnn、TDNN、Res2Net、ResNetSE、ERes2Net、CAM++
支持池化层：AttentiveStatsPool(ASP)、SelfAttentivePooling(SAP)、TemporalStatisticsPooling(TSP)、TemporalAveragePooling(TAP)、TemporalStatsPool(TSTP)
支持损失函数：AAMLoss、AMLoss、ARMLoss、CELoss
支持预处理方法：MelSpectrogram、Spectrogram、MFCC、Fbank

2.3.1 模型涉及的论文

2.4安装环境

首先安装的是PaddlePaddle的GPU版本，如果已经安装过了，请跳过

conda install paddlepaddle-gpu==2.4.1 cudatoolkit=10.2 --channel https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/

安装ppvector库

python -m pip install ppvector -U -i https://pypi.tuna.tsinghua.edu.cn/simple

出现这个问题，是缺少cuda10.2这个包，本地环境搜索后发现没有，所以需要去英伟达官方下载

声纹识别、语音情感分析模型_sed_09

按照runfile方式进行配置

声纹识别、语音情感分析模型_sed_10

出现gcc版本不兼容的问题

声纹识别、语音情感分析模型_python_11

语音情感分析

3.SpeechEmotionRecognition-Pytorch语音情感识别

3.1 基本介绍

基于pytorch的语音情感识别模型，目前效果一般

3.2使用环境

Anaconda 3
Python 3.8
Pytorch 1.13.1
Windows 10 or Ubuntu 18.04

3.3项目特性

数据集只使用Audio_Speech_Actors_01-24.zip

3.4 安装环境

首先安装的是Pytorch的GPU版本
安装mser库。

python -m pip install mser -U -i https://pypi.tuna.tsinghua.edu.cn/simple

声纹识别、语音情感分析模型_sed_12

将GPU选项改为False

声纹识别、语音情感分析模型_sed_13

识别结果

声纹识别、语音情感分析模型_python_14

4.SpeechEmotionRecognition-PaddlePaddle语音情感识别

4.1基本介绍

基于paddlepaddle的项目

4.2使用环境

Anaconda 3
Python 3.8
PaddlePaddle 2.4.0
Windows 10 or Ubuntu 18.04

4.3 项目特性

RAVDESS数据集只使用Audio_Speech_Actors_01-24.zip

4.4安装环境

首先安装的是PaddlePaddle的GPU版本
安装ppser库

python -m pip install ppser -U -i https://pypi.tuna.tsinghua.edu.cn/simple

5.speechbrain/emotion-recognition-wav2vec2-IEMOCAP

环境部署

Install with GitHub

（在Python 3.7+的环境下）

git clone https://github.com/speechbrain/speechbrain.git
cd speechbrain
pip install -r requirements.txt
pip install --editable .

开始实验

from speechbrain.pretrained.interfaces import foreign_class
classifier = foreign_class(source="speechbrain/emotion-recognition-wav2vec2-IEMOCAP", pymodule_file="custom_interface.py", classname="CustomEncoderWav2vec2Classifier")
out_prob, score, index, text_lab = classifier.classify_file("speechbrain/emotion-recognition-wav2vec2-IEMOCAP/anger.wav")
print(text_lab)