Reducing the Prior Mismatch of Stochastic Differential Equations for Diffusion-based Speech Enhancement

Lay, Bunlong; Welker, Simon; Richter, Julius; Gerkmann, Timo

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2302.14748 (eess)

[Submitted on 28 Feb 2023 (v1), last revised 30 May 2023 (this version, v2)]

Title:Reducing the Prior Mismatch of Stochastic Differential Equations for Diffusion-based Speech Enhancement

Authors:Bunlong Lay, Simon Welker, Julius Richter, Timo Gerkmann

View PDF

Abstract:Recently, score-based generative models have been successfully employed for the task of speech enhancement. A stochastic differential equation is used to model the iterative forward process, where at each step environmental noise and white Gaussian noise are added to the clean speech signal. While in limit the mean of the forward process ends at the noisy mixture, in practice it stops earlier and thus only at an approximation of the noisy mixture. This results in a discrepancy between the terminating distribution of the forward process and the prior used for solving the reverse process at inference. In this paper, we address this discrepancy and propose a forward process based on a Brownian bridge. We show that such a process leads to a reduction of the mismatch compared to previous diffusion processes. More importantly, we show that our approach improves in objective metrics over the baseline process with only half of the iteration steps and having one hyperparameter less to tune.

Comments:	5 pages, 2 figures, Accepted to Interspeech 20223
Subjects:	Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
Cite as:	arXiv:2302.14748 [eess.AS]
	(or arXiv:2302.14748v2 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2302.14748

Submission history

From: Bunlong Lay [view email]
[v1] Tue, 28 Feb 2023 16:45:42 UTC (344 KB)
[v2] Tue, 30 May 2023 13:05:55 UTC (299 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Reducing the Prior Mismatch of Stochastic Differential Equations for Diffusion-based Speech Enhancement

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Reducing the Prior Mismatch of Stochastic Differential Equations for Diffusion-based Speech Enhancement

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators