Monaural Speech Separation Based on Gain Adapted Minimum Mean Square Error Estimation

Radfar, M. H.; Dansereau, R. M.; Chan, W.-Y.

doi:10.1007/s11265-008-0274-7

Monaural Speech Separation Based on Gain Adapted Minimum Mean Square Error Estimation

Published: 04 October 2008

Volume 61, pages 21–37, (2010)
Cite this article

Journal of Signal Processing Systems Aims and scope Submit manuscript

M. H. Radfar^1,2,
R. M. Dansereau¹ &
W.-Y. Chan²

201 Accesses
7 Citations
Explore all metrics

Abstract

We present a new model-based monaural speech separation technique for separating two speech signals from a single recording of their mixture. This work is an attempt to solve a fundamental limitation in current model-based monaural speech separation techniques in which it is assumed that the data used in the training and test phases of the separation model have the same energy level. To overcome this limitation, a gain adapted minimum mean square error estimator is derived which estimates sources under different signal-to-signal ratios. Specifically, the speakers’ gains are incorporated as unknown parameters into the separation model and then the estimator is derived in terms of the source distributions and the signal-to-signal ratio. Experimental results show that the proposed system improves the separation performance significantly when compared with a similar model without gain adaptation as well as a maximum likelihood estimator with gain estimation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Single-channel speech separation using combined EMD and speech-specific information

Article 23 October 2017

Maximum A Posteriori Spectral Estimation with Source Log-Spectral Priors for Multichannel Speech Enhancement

Single-channel speech separation using empirical mode decomposition and multi pitch information with estimation of number of speakers

Article 29 November 2016

References

Camastra, F., & Vinciarelli, A. (2007). Machine learning for audio, image and video analysis: Theory and applications (advanced information and knowledge processing). New York: Springer.
Google Scholar
Wang, D., & Brown, G. J. (Eds.) (2006). Computational auditory scene analysis: Principles, algorithms, and applications. New York: IEEE/Wiley-Interscience.
Google Scholar
Ellis, D. (2006). Model-based scene analysis. In D. Wang & G. Brown (Eds.), Computational auditory scene analysis: Principles, algorithms, and applications. New York: Wiley/IEEE Press.
Google Scholar
Roweis, S. (2000). One microphone source separation. In Proc. Neural Inf. Process. Syst. (pp. 793–799).
Rowies, S. T. (2003). Factorial models and refiltering for speech separation and denoising. In EUROSPEECH–03 (Vol. 7, pp. 1009–1012), May.
Radfar, M. H., & Dansereau, R. M. (2007). Single channel speech separation using soft mask filtering. IEEE Transactions on Audio, Speech and Language Processing, 15(8), 2299–2310, Nov.
Article Google Scholar
Radfar, M. H., & Dansereau, R. M. (2007). Long-term gain estimation in model-based single channel speech separation. In Proc. IEEE workshop on applications of signal processing to audio and acoustics (WASPAA2007). New Paltz, New York, October.
Radfar, M. H., & Dansereau, R. M. (2007). Single channel speech separation using minimum mean square error estimation of sources’ log spectra. In Proc. IEEE international workshop on machine learning for signal processing (MLSP’2007 Thessalonike, Greece), Aug.
Radfar, M. H., Dansereau, R. M., & Sayadiyan, A. (2006). Performance evaluation of three features for model-based single channel speech separation problem. In Interspeech 2006, Intern. Conf. on Spoken Language Processing (ICSLP’2006 Pittsburgh, USA) (pp. 17–21), Sept.
Radfar, M. H., & Dansereau, R. M. (2007). Single channel speech separation using maximum a posteriori estimation. In Proc. international conference on spoken language processing (Interspeech– ICSLP 07). Antwerp, Belgium, Aug.
Schmidt, M. N., & Olsson, R. K. (2007). Linear regression on sparse features for single-channel speech separation. In Proc. IEEE workshop on applications of signal processing to audio and acoustics (WASPAA2007) (pp. 26–29). New Paltz, New York, October.
Reddy, A. M., & Raj, B. (2007). Soft mask methods for single-channel speaker separation. Audio, Speech and Language Processing, IEEE Transactions on, 15(6), 1766–1776, Aug.
Article Google Scholar
Weiss, R., & Ellis, D. (2006). Estimating single-channel source separation masks: Relevance vector machine classifiers vs. pitch-based masking. In Proc. workshop on statistical and perceptual audition SAPA-06 (pp. 31–36), Oct.
Beierholm, T., Pedersen, B. D., & Winther, O. (2004). Low complexity Bayesian single channel source separation. In Proc. ICASSP–04 (Vol. 5, pp. 529–532), May.
Kristjansson, T., Attias, T. H., & Hershey, J. (2004). Single microphone source separation using high resolution signal reconstruction. In Proc. ICASSP–04 (pp. 817–820), May.
Radfar, M. H., Dansereau, R. M., & Sayadiyan (2007). A maximum likelihood estimation of vocal-tract-related filter characteristics for single channel speech separation. EURASIP Journal on Audio, Speech, and Music Processing, 2007, 15, 84186. doi:10.1155/2007/84186.
Article Google Scholar
Radfar, M. H., Dansereau, R. M., & Sayadiyan, A. (2007). Monaural speech segregation based on fusion of source-driven with model-driven techniques. Speech Communication, 49(6), 464–476, June.
Article Google Scholar
Reyes-Gomez, M. J., Ellis, D., & Jojic, N. (2004). Multiband audio modeling for single channel acoustic source separation. In Proc. ICASSP–04 (Vol. 5, pp. 641–644), May.
Reddy, A. M., & Raj, B. (2004). A minimum mean squared error estimator for single channel speaker separation. In INTERSPEECH–2004 (pp. 2445–2448), Oct.
Brown, G. J., & Wang, D. L. (2005). Separation of speech by computational auditory scene analysis. In Speech enhancement (pp. 371–402). New York: Springer.
Chapter Google Scholar
Brown, G. J., & Cooke, M. (1994). Auditory scene analysis. Computer Speech and Language, 8(4), 297–336.
Article Google Scholar
Cooke, M., & Ellis, D. P. W. (2001). The auditory organization of speech and other sources in listeners and computational models. Speech Communication, 35(3), 141–177, October.
Article MATH Google Scholar
Ellis, D. P. W. (1999). Using knowledge to organize sound: The prediction-driven approach to computational auditory scene analysis and its application to speech/nonspeech mixtures. Speech Communication, 27(3), 281–298, April.
Article Google Scholar
Nakatani, T., & Okuno, H. G. (1999). Harmonic sound stream segregation using localization and its application to speech stream segregation. Speech Communication, 27(3), 209–222, April.
Article Google Scholar
Brown, G. J., & Wang, D. L. (2005). Separation of speech by computational auditory scene analysis. In J. Benesty, S. Makino, & J. Chen (Eds.), Speech enhancement (pp. 371–402). New York: Springer.
Chapter Google Scholar
Darwin, C. J., & Carlyon, R. P. (1995). Auditory grouping. In B. C. J. Moore (Ed.), The handbook of perception and cognition (Vol. 6, chapter Hearing, pp. 387–424). London: Academic.
Google Scholar
Wang, D. L., & Brown, G. J. (1999). Separation of speech from interfering sounds based on oscillatory correlation. IEEE Transactions on Neural Networks, 10, 684–697, May.
Article Google Scholar
Hu, G., & Wang, D. L. (2004). Monaural speech segregation based on pitch tracking and amplitude modulation. IEEE Transactions on Neural Networks, 15(5), 1135–1150, Sept.
Article Google Scholar
Bregman, A. S. (1994). Computatinal auditory scene analysis. Cambridge: MIT.
Google Scholar
Li, Y., Amari, S., Cichocki, A., Ho, D. W. C., & Shengli, X. (2006). Underdetermined blind source separation based on sparse representation. IEEE Transactions on Speech Audio Processing, 54(2), 423–437, Feb.
Google Scholar
Theis, F. J., Puntonet, C. G., & Lang, E. W. (2006). Median-based clustering for underdetermined blind signal processing. IEEE Signal Processing Letters, 13(2), 96–99, Feb.
Article Google Scholar
Bofill, P., & Zibulevsky, M. (2001). Underdetermined blind source separation using sparse representations. Signal Process, 81, 2353–2362.
Article MATH Google Scholar
Jutten, C., & Herault, J. (1991). Blind separation of sources, Part I: An adaptive algorithm based on neuromimetic architecture. Signal Processing, 24, 1–10.
Article MATH Google Scholar
Common, P. (1994). Independent component analysis, a new concept? Signal Processing, 36, 287–314.
Article Google Scholar
Bell, A. J., & Sejnowski, T. J. (1995). An information-maximization approach to blind separation and blind deconvolution. Neural Computation, 7, 1129–1159.
Article Google Scholar
Amari, S. I., & Cardoso, J. F. (1997). Blind source separation–semiparametric statistical approach. IEEE Transactions on Signal Processing, 45(11), 2692–2700.
Article Google Scholar
Luo, Y., Wang, W., Chambers, J. A., Lambotharan, S., & Proudler, I. K. (2006). Exploitation of source non-stationarity in underdetermined blind source separation with advanced clustering techniques. IEEE Transactions Signal Processing, 54(6), 2198–2212, June.
Article Google Scholar
Lewicki, M. S., & Sejnowski, T. J. (1998). Learning nonlinear overcomplete representations for efficient coding. In M. I. Jordan, M. J. Kearns, & S. A. Solla (Eds.), Advances in neural information processing systems (Vol. 10). Cambridge: MIT.
Google Scholar
Schmidt, M. N., & Olsson, R. K. (2006). Single-channel speech separation using sparse non-negative matrix factorization. In Proc. Interspeech 2006, Intern. Conf. on Spoken Language Processing (ICSLP’2006 Pittsburgh), Sept.
Virtanen, T. (2003). Sound source separation using sparse coding with temporal continuity objective. In Proc. Int. Comput. Music Conference (pp. 231–234).
Jang, G. J., & Lee, T. W. (2003). A probabilistic approach to single channel source separation. In Proc. Advances in Neural Inform. Process. Systems (pp. 1173–1180).
Radfar, M. H., Banihashemi, A. H., Dansereau, R. M., & Sayadiyan, A. (2006). A non-linear minimum mean square error estimator for the mixture-maximization approximation. Electronic Letters, 42(12), 75–76, June.
Article Google Scholar
Ephraim, Y. (1992). Gain-adapted hidden markov models for recognition of clean andnoisy speech. IEEE Transactions on Audio, Speech and Language Processing, 40(6), 1303–1316, Jun.
MATH MathSciNet Google Scholar
Zhao, D. Y., & Kleijn, W. B. (2007). Hmm-based gain modeling for enhancement of speech in noise. IEEE Transactions on Audio, Speech and Language Processing, 15(3), 882–892, March.
Article Google Scholar
Benaroya, L., Bimbot, F., & Gribonval, R. (2006). Audio source separation with a single sensor. IEEE Transactions on Speech Audio Processing, 14(1), 191–199, Jan.
Article Google Scholar
Parsons, T. W. (1976). Separation of speech from interfering speech by means of harmonic selection. Journal of the Acoustical Society of America, 60, 911–918, Aug.
Article Google Scholar
Kameoka, H., Nishimoto, T., & Sagayama, S. (2004). Multi-pitch trajectory estimation of concurrent speech based on harmonic GMM and nonlinear kalman filtering. In INTERSPEECH-2004 (Vol. 1, pp. 2433–2436), Oct.
de Cheveigné, A., & Kawahara, H. (1999). Multiple period estimation and pitch perception model. Speech Communication, 27, 175–185, April.
Article Google Scholar
Kwon, Y. H., Park, D. J., & Ihm, B. C. (2000). Simplified pitch detection algorithm of mixed speech signals. In Proc. ISCAS–83 (Vol. 3, pp. 722–725), May.
Wu, M., Wang, D. L., & Brown, G. J. (2003). A multipitch tracking algorithm for noisy speech. IEEE Transactions on Acoustics, Speech, and Signal Process, 11(3), 229–241, May.
Google Scholar
Tolonen, D., & Karjalainen, M. (2000). A computationally efficient multipitch analysis model. IEEE Transactions on Acoustics, Speech, and Signal Process, 8, 708–716, Nov.
Article Google Scholar
Chazan, D., Stettiner, Y., & Malah, D. (1993). Optimal multi-pitch estimation using the EM algorithm for co-channel speech separation. In Proc. ICASSP–93 (pp. 728–731), April.
Weintraub, M. (1986). A computational model for separating two simultaneous talkers. In Proc. ICASSP–86 (Vol. 11, pp. 81–84), April.
Hanson, B. A., & Wong, D. Y. (1984). The harmonic magnitude suppression (HMS) technique for intelligibility enhancement in the presence of interfering speech. In Proc. ICASSP–84 (Vol. 9, pp. 65–68), Mar.
Morgan, D. P., George, E. B., Lee, L. T., & Key, S. M. (1997). Cochannel speaker separation by harmonic enhancment and suppression. IEEE Transactions in Acoustics, Speech, and Signal Process, 5(5), 407–424, Sept.
Article Google Scholar
Kanjilal, P. P., & Palit, S. (1994). Extraction of multiple periodic waveforms from noisy data. In Proc. ICASSP-94 ( Vol. 2, pp. 361–364), April.
Ephraim, Y., & Merhav, N. (1992). Lower and upper bounds on the minimum mean-square error in composite source signal estimation. IEEE Transactions on Information Theory, 38(6), 1709–1724, Nov.
Article MATH MathSciNet Google Scholar
Nadas, A., Nahamoo, D., & Picheny, M. A. (1989). Speech recognition using noise-adaptive prototypes. IEEE Transactions on Acoustics, Speech, and Signal Processing, 37(10), 1495–1503, Oct.
Article Google Scholar
Papoulis, A. (1991). Probability, random variables, and stochastic processes. New York: McGraw-Hill.
Google Scholar
Bradie, B. (2006). A friendly introduciton to numerical analysis. Englewood Cliffs: Pearson Prentice Hall.
Google Scholar
Rabiner, L. R., & Schafer, R. W. (1978). Digital processing of speech signals. Englewood Cliffs: Prentice-Hall.
Google Scholar
Cooke, M. P., Barker, J., Cunningham, S. P., & Shao, X. (2005). An audio-visual corpus for speech perception and automatic speech recognition. JASA, Nov.
Spiegel, M. R. (1998). Schaum’s mathematical handbook of formulas and tables (2nd edn). New York: McGraw-Hill, June.
Google Scholar

Download references

Acknowledgements

The authors wish to thank the Natural Sciences and Engineering Research Council (NSERC) of Canada for funding this project. Also, the authors would like to thank the reviewers for their valuable comments.

Author information

Authors and Affiliations

Department of Systems and Computer Engineering, Carleton University, Ottawa, Canada
M. H. Radfar & R. M. Dansereau
Department of Electrical and Computer Engineering, Queen’s University, Kingston, Canada
M. H. Radfar & W.-Y. Chan

Authors

M. H. Radfar
View author publications
You can also search for this author in PubMed Google Scholar
R. M. Dansereau
View author publications
You can also search for this author in PubMed Google Scholar
W.-Y. Chan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to M. H. Radfar.

Additional information

A preliminary version of this paper was presented at the IEEE Workshop on Machine Learning for Signal Processing (MLSP) held in Thessaloniki, Greece in August 2007.

Appendices

Appendix A: Proof of (24)

In this appendix we show that $p(x_{1}(d)|{\mathbf y},\theta,k_1^r,k_2^r)$ is given by Eq. 24. Using Bayes’ theorem and the probability chain rule and noting that the sources are independent, we can express $p(x_1^r(d)|{\mathbf y},\theta,k_1^r,k_2^r)$ in terms of the observation probability and the prior probability of the sources. Thus, we have

$$\begin{array}{lll} p\left(x_1^r(d)|{\mathbf y},\theta,k_1^r,k_1^r,k_2^r\right)&=&\displaystyle\frac{p\left(x_1^r(d),{\mathbf y},\theta,k_1^r,k_2^r\right)}{p\left({\mathbf y},\theta,k_1^r,k_2^r\right)}\nonumber\\ &=&\displaystyle\frac{\int_{x_2^r(d)} p\left(x_1^r(d),x_2^r(d),{\mathbf y},\theta,k_1^r,k_2^r\right)\,dx_2(d)}{\int_{x_1^r(d)}\int_{x_2^r(d)} p\left(x_1^r(d),x_2^r(d),{\mathbf y},\theta,k_1^r,k_2^r\right)\,dx_2(d)dx_1(d)}\nonumber\\ &=&\frac{\int_{x_2^r(d)} p\left({\mathbf y}|x_1^r(d),x_2^r(d),\theta,k_1^r,k_2^r\right) } {\int_{x_1^r(d)}\int_{x_2^r(d)} p\left({\mathbf y}|x_1^r(d),x_2^r(d),\theta,k_1^r,k_2^r\right)}\times \frac{p\left(x_1^r(d)|k_1^r\right)p\left(x_2^r(d)|k_2^r\right)\,dx_2(d) }{p\left(x_1^r(d)|k_1^r\right)p\left(x_2^r(d)|k_2^r\right)\,dx_2(d)dx_1(d)} \end{array}$$

(27)

Substituting Eqs. 14 and 15 into Eq. 27, we obtain

$$ p\left(x_1^r(d)|{\mathbf y},k_1^r,k_2^r\right) =\frac{\int_{x_2^r(d)} \exp\left(\frac{-\left(y(d)-f\bigl(x^r_{1}(d),x^r_{2}(d),\theta\bigr)\right)^2}{2\sigma^2(d)}\right)}{\int_{x_1^r(d)} \int_{x_2^r(d)} \exp\left(\frac{-\left(y(d)-f\bigl(x^r_{1}(d),x^r_{2}(d),\theta\bigr)\right)^2}{2\sigma^2(d)}\right)} \times \frac{\exp\left(\frac{-\bigl(x_1^r(d)-\mu_1^{k_1^r}(d)\bigr)^2}{2\sigma_1^{2k_1^r}(d)}\right) \exp\left(\frac{-\bigl(x_2^r(d)-\mu_2^{k_2^r}(d)\bigr)^2}{2\sigma_2^{2k_2^r}(d)}\right)\, dx_2(d)}{ \exp\left(\frac{-\bigl(x_1^r(d)-\mu_1^{k_1^r}(d)\bigr)^2}{2\sigma_1^{2k_1^r}(d)}\right) \exp\left(\frac{-\bigl(x_2^r(d)-\mu_2^{k_2^r}(d)\bigr)^2}{2\sigma_2^{2k_2^r}(d)}\right)\,dx_2(d)dx_1(d)}. $$

(28)

In order to solve the integrations in Eq. 28 we consider the following conditions.

$$\begin{array}{lll} f\bigl(x^r_{1}(d),x^r_{2}(d),\theta\bigr)= \begin{cases} x_1(d)+h(\theta),& \mu^{k_1^r}_{1}(d)+h(\theta)\geq \mu^{k_2^r}_{2}(d)+h(-\theta)\qquad \text{condition}\, I\\[5pt] x_2(d)+h(-\theta),& \mu^{k_1^r}_{1}(d)+h(\theta)< \mu^{k_2^r}_{2}(d)+h(-\theta)\qquad \text{condition}\, II \end{cases}. \end{array}$$

(29)

Thus, under condition I, we have

$$ p\left(x_1^r(d)|{\mathbf y},k_1^r,k_2^r\right) =\frac{ \exp\left(\frac{-\bigl(y(d)-x_1^r(d)-h(\theta)\bigr)^2}{2\sigma^2(d)} +\frac{-\bigl(x_1^r(d)-\mu_1^{k_1^r}(d)\bigr)^2}{2\sigma_1^{2k_1^r}(d)}\right)}{\int_{x_1^r(d)} \exp\left(\frac{-\bigl(y(d)-x_1^r(d)\bigr)^2}{2\sigma^2(d)}+ \frac{-\bigl(x_1^r(d)-\mu_1^{k_1^r}(d)\bigr)^2}{2\sigma_1^{2k_1^r}(d)}\right)dx_1(d)}. $$

(30)

The term in the $\exp(\cdot)$ function in Eq. 30 can be rewritten as

$$ -\frac{\sigma_1^{2k_1^r}(d)+\sigma^2(d)}{2\sigma_1^{2k_1^r}(d)\sigma^2(d)} \times\left(x_1^r(d)-\frac{\sigma_1^{2k_1^r}(d)y(d)+\sigma_1^{2k_1^r}h(\theta)+\sigma^2(d)\mu_1^{k_1^r}(d)}{\sigma_1^{2k_1^r}(d)+\sigma^2(d)}\right)^2 -\frac{\left(y(d)-h(\theta)-\mu_1^{k_1^r}(d)\right)^2}{2\left(\sigma_1^{2k_1^r}(d)+\sigma^2(d)\right)}. $$

(31)

Noting that the last term in Eq. 31 is independent of x ₁(d) and hence cancelled out, we GET

$$ p\left(x_1^r(d)|{\mathbf y},k_1^r,k_2^r\right) =\displaystyle \frac{\exp\left(-\frac{\sigma_1^{2k_1^r}(d)+\sigma^2(d)}{2\sigma_1^{2k_1^r}(d)\sigma^2(d)}\bigg(x_1^r(d)-\frac{\sigma_1^{2k_1^r}(d)y(d)-\sigma_1^{2k_1^r}(d)h(\theta)+\sigma^2(d)\mu_1^{k_1^r}(d)}{\sigma_1^{2k_1^r}(d)+\sigma^2(d)}\bigg)^2\right) }{\int_{x_1^r(d)}\exp\left(-\frac{\sigma_1^{2k_1^r}(d)+\sigma^2(d)}{2\sigma_1^{2k_1^r}(d)\sigma^2(d)}\bigg(x_1^r(d)-\frac{\sigma_1^{2k_1^r}(d)y(d)-\sigma_1^{2k_1^r}(d)h(\theta)+\sigma^2(d)\mu_1^{k_1^r}(d)}{\sigma_1^{2k_1^r}(d)+\sigma^2(d)}\bigg)^2\right)\,dx_1(d)}. $$

(32)

The integration in the denominator is simply calculated as

$$\begin{array}{lll} \int_{x_1^r(d)}\exp\left(-\frac{\sigma_1^{2k_1^r}(d)+\sigma^2(d)}{2\sigma_1^{2k_1^r}(d)\sigma^2(d)}\left(x_1^r(d) -\frac{\sigma_1^{2k_1^r}(d)y-\sigma_1^{2k_1^r}(d)h(\theta)+\sigma^2(d)\mu_1^{k_1^r}(d)}{\sigma_1^{2k_1^r}(d)+\sigma^2(d)}\right)^2\right)\,dx_1(d) \sqrt{2\pi \frac{\sigma_1^{2k_1^r}(d)\sigma^2(d)}{\sigma_1^{2k_1^r}(d)+\sigma^2(d)}}. \end{array}$$

(33)

Substituting Eq. 33 into Eq. 32, we arrive at a Gaussian distribution in the form

$$\begin{array}{ll} p\left(x_1^r(d)|{\mathbf y},k_1^r,k_2^r\right)&= \frac{1}{\sqrt{2\pi \frac{\sigma_1^{2k_1^r}(d)\sigma^2(d)}{\sigma_1^{2k_1^r}(d)+\sigma^2(d)}}} \exp\left(-\frac{\bigl(x_1^r(d)-\frac{\sigma_1^{2k_1^r}(d)y(d)-\sigma_1^{2k_1^r}(d)h(\theta)+\sigma^2(d)\mu_1^{k_1^r}(d)}{\sigma_1^{2k_1^r}(d)+\sigma^2(d)}\bigr)^2} {2\frac{\sigma_1^{2k_1^r}(d)\sigma^2(d)}{\sigma_1^{2k_1^r}(d)+\sigma^2(d)}}\right)\nonumber\\ &= \mathcal{N}\left(\frac{\sigma_1^{2k_1^r}(d)y(d)-\sigma_1^{2k_1^r}(d)h(\theta)+\sigma^2(d)\mu_1^{k_1^r}(d)}{\sigma_1^{2k_1^r}(d)+\sigma^2(d)},\frac{\sigma_1^{2k_1^r}(d)\sigma^2(d)}{\sigma_1^{2k_1^r}(d)+\sigma^2(d)}\right). \end{array}$$

(34)

In a similar fashion, under condition II Eq. 28 reduces to

$$\begin{aligned} & p\left(x_1^r(d)|{\mathbf y},k_1^r,k_2^r\right) \\ & \quad =\frac{ \int_{x_2^r(d)} \exp\left(\frac{-\bigl(y(d)-x_2^r(d)-h(-\theta)\bigr)^2}{2\sigma^2(d)}\right) \exp\left(\frac{-(x_2^r(d)-\mu_2^{k_2^r}(d))^2}{2\sigma_2^{2k_2^r}(d)}\right)\, dx_2(d)}{\int_{x_2^r(d)} \exp\left(\frac{-\bigl(y(d)-x_2^r(d)\bigr)^2}{2\sigma^2(d)}\right) \exp\left(\frac{-\bigl(x_2^r(d)-\mu_2^{k_2^r}(d)\bigr)^2}{2\sigma_2^{2k_2^r}(d)}\right)\,dx_2(d)}\times\frac{ \exp\left(\frac{-\bigl(x_1^r(d)-\mu_1^{k_1^r}(d)\bigr)^2}{2\sigma_1^{2k_1^r}(d)}\right) }{ \int_{x_1^r(d)} \exp\left(\frac{-\bigl(x_1^r(d)-\mu_1^{k_1^r}(d)\bigr)^2}{2\sigma_1^{2k_1^r}(d)}\right)\,dx_1(d)}\\ & \quad = \frac{\exp\left(\frac{-\bigl(x_1^r(d)-\mu_1^{k_1^r}(d)\bigr)^2}{2\sigma_1^{2k_1^r}(d)}\right)}{\int_{x_1^r(d)} \exp\left(\frac{-\bigl(x_1^r(d)-\mu_1^{k_1^r}(d)\bigr)^2}{2\sigma_1^{2k_1^r}(d)}\right)\,dx_1(d)}= \frac{1}{\sqrt{2\pi\sigma_1^{2k_1^r}(d)}}\exp\left(\frac{-\bigl(x_1^r(d)-\mu_1^{k_1^r}(d)\bigr)^2}{2\sigma_1^{2k_1^r}(d)}\right) =\mathcal{N}\left(\mu_{1d}^i,\sigma_1^{2k_1^r}(d)\right) \end{aligned}$$

(35)

Hence, Eqs. 34 and 35 give Eq. 23 in Section 3.

Appendix B: Quadratic Algorithm for Finding θ ^*

In this Appendix, the procedure for finding θ ^* given in Eq. 22 is described. The goal is to find a value at which the global maximum of Q(θ) occurs. Since Q(θ) is well-approximated by a convex function, a quadratic optimization approach can be used to find θ ^* with small numbers of iterations. (see Table 1 for the numerical results). Figure 10 shows a typical form of Q(θ) which eases the understanding of the algorithm presented in Table 2. Concisely speaking, the algorithm works as follows. First, the coordinates of three points of Q(θ) is determined, lets say {(θ _l,A), (θ _c,C),(θ _r,B)}. Then, a quadratic function of the form f(x) = ax ² + bx + c is fitted to these three points and $x^*=-\frac{b}{2a}$ which maximizes f(x) is obtained in terms of {(θ _l,A), (θ _c,C),(θ _r,B)} using a function called Quadratic(·), that is, $x^*=Quadratic(\theta_l,A,\theta_r,B,\theta_c,C)$. Next, using x ^* and Q(x ^*), the coordinates {(θ _l,A), (θ _c,C),(θ _r,B)} are updated until a value of θ _c is reached such that Q(θ _l) ≤ Q(θ _c) ≥ Q(θ _r).

Table 2 Quadratic optimization algorithm

Full size table

Appendix C: Derivation of (10)

In this appendix, we show how the computation of the right-hand side of Eq. 9 leads to Eq. 10, that is

$$ E\Bigl(y^r(d)\big|x_1^r(d),x_2^r(d),\theta\Bigr) =\max\Bigl(x_1^r(d)+h(\theta),x_2^r(d)+h(-\theta)\Bigr). $$

(36)

The procedure is similar to that presented in [42] except that the source signal gains, g ₁ and g ₂, are incorporated. Let

$$ \begin{array}{ll} {\dot{{\mathbf x}}^r_i} & {=\Bigl|\mathcal{F}_D\Bigl(\{X_i(t)\}_{t=(r-1)M}^{(r-1)M+N-1}\Bigr)\Bigr|} \\ {} &{=\left[\dot{x}^r_i(0),\ldots,\dot{x}^r_i(d),\ldots,\dot{x}^r_i(D-1)\right]^{\top},} \end{array}$$

and

$$ \begin{array}{ll} \dot{{\mathbf y}}^r&=\Bigl|\mathcal{F}_D\Bigl(\{Y_i(t)\}_{t=(r-1)M}^{(r-1)M+N-1}\Bigr)\Bigr| \\ &=\left[\dot{y}^r(0),\ldots,\dot{y}^r(d),\ldots,\dot{y}^r(D-1)\right] \end{array}$$

represent the magnitudes of the D-point discrete Fourier transforms of sources and the observation signal, respectively. Let $\phi^r=\angle\mathcal{F}_{\!D}\Bigl(\!\{X_i(t)\}_{t=(r-1)M}^{(r-1)M+N-1}\!\Bigr)-$ $\angle\mathcal{F}_{\!D}\Bigl(\!\{X_i(t)\}_{t=(r-1)M}^{(r-1)M+N-1}\!\Bigr)\!=\![\phi^r(0),\ldots,\phi^r(d),\ldots, \phi^r(D\!-1)\!]^{\top}$ where $\angle$ denotes the phase operator. Given Eq. 2, the relation between the log magnitude of $\dot{y}^r_i(d)$ and those of $\dot{x}^r_i(d)$, i ∈ {1,2} is given by

$$ \begin{array}{ll} y^r(d)&=\log_{10}\dot{y}^r(d) \\ &=\frac{1}{2}\log_{10}\Bigl[ g_1^2\bigl(\dot{x}^r_1(d)\bigr)^2+g_2^2\bigl(\dot{x}^r_2(d)\bigr)^2 \\ & \qquad\qquad\, +2g_1g_2\dot{x}^r_1(d) \dot{x}^r_2(d)\cos\bigl(\mathrm{\phi}^r(d)\bigr)\Bigr]. \end{array}$$

(37)

The goal is to obtain the MMSE estimate of y ^r(d) given $x_1^r(d)$, $x_2^r(d)$, g ₁, and g ₂. Mathematically, the MMSE estimator is expressed as

$$ \hat{y}^r(d)=E\Bigl(y^r(d)\big|x_1^r(d),x_2^r(d),g_1,g_2\Bigr). $$

(38)

From Eq. 37 and initially assuming that $\dot{x}_1^r(d)$,$\dot{x}_2^r(d)$, g ₁, and g ₂ are given, the only random variable on the right-hand side of Eq. 37 is φ ^r(d) which, as shown in [42], can be modeled by a uniform distribution over the interval [ − π;π]; that is $p\bigl(\phi^r(d)\bigr)=\frac{1}{2\pi}$ where $p\bigl(\phi^r(d)\bigr)$ denotes the PDF of φ ^r(d). Therefore,

$$\begin{array}{lll} \hat{y}^r(d)&=&\frac{1}{\pi}\int_0^{\pi} \frac{1}{2}\log_{10}\, \Biggl[ g_1^2\bigl(\dot{x}^r_1(d)\bigr)^2+g_2^2\bigl(\dot{x}^r_2(d)\bigr)^2\nonumber\\ && +2g_1g_2\dot{x}^r_1(d) \dot{x}^r_2(d)\cos\bigl(\mathrm{\phi}^r(d)\bigr)\Biggr]d\phi^r(d). \end{array}$$

(39)

The above integration is computed using an integration table (e.g. [63, pp. 111]) and the result is

$$\begin{array}{rl} \hat{y}^r(d)&= \max \Bigl(\log_{10}\bigl( g_1\dot{x}_1^r(d)\bigr),\log_{10}\bigl(g_2\dot{x}_2^r(d)\bigr)\Bigr)\nonumber\\ d&=0,\ldots,D-1. \end{array}$$

(40)

Noting that log₁₀ g ₁ = h(θ), log₁₀ g ₂ = h( − θ), and $\log_{10}\bigl(\dot{x}_i^r(d)\bigr)\!=\!x_i^r(d)$, i ∈ {1,2}, Eq. 40 can be rewritten as

$$ \hat{y}^r(d)=\max \bigl(x_1^r(d)+h(\theta),x_2^r(d)+h(-\theta)\bigr) $$

(41)

which is identical to Eq. 10.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Radfar, M.H., Dansereau, R.M. & Chan, WY. Monaural Speech Separation Based on Gain Adapted Minimum Mean Square Error Estimation. J Sign Process Syst 61, 21–37 (2010). https://doi.org/10.1007/s11265-008-0274-7

Download citation

Received: 15 February 2008
Revised: 17 August 2008
Accepted: 02 September 2008
Published: 04 October 2008
Issue Date: October 2010
DOI: https://doi.org/10.1007/s11265-008-0274-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Monaural Speech Separation Based on Gain Adapted Minimum Mean Square Error Estimation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Single-channel speech separation using combined EMD and speech-specific information

Maximum A Posteriori Spectral Estimation with Source Log-Spectral Priors for Multichannel Speech Enhancement

Single-channel speech separation using empirical mode decomposition and multi pitch information with estimation of number of speakers

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix A: Proof of (24)

Appendix B: Quadratic Algorithm for Finding θ ^*

Appendix C: Derivation of (10)

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Monaural Speech Separation Based on Gain Adapted Minimum Mean Square Error Estimation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Single-channel speech separation using combined EMD and speech-specific information

Maximum A Posteriori Spectral Estimation with Source Log-Spectral Priors for Multichannel Speech Enhancement

Single-channel speech separation using empirical mode decomposition and multi pitch information with estimation of number of speakers

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix A: Proof of (24)

Appendix B: Quadratic Algorithm for Finding θ *

Appendix C: Derivation of (10)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation

Appendix B: Quadratic Algorithm for Finding θ ^*