Glimpsing speech interrupted by speech-modulated noise - PubMed Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 May;143(5):3058.
doi: 10.1121/1.5038273.

Glimpsing speech interrupted by speech-modulated noise

Affiliations

Glimpsing speech interrupted by speech-modulated noise

Rachel E Miller et al. J Acoust Soc Am. 2018 May.

Abstract

Everyday environments frequently present speech in modulated noise backgrounds, such as from a competing talker. Under such conditions, temporal glimpses of speech may be preserved at favorable signal-to-noise ratios during the amplitude dips of the masker. Speech recognition is determined, in part, by these speech glimpses. However, properties of the noise when it dominates the speech may also be important. This study interrupted speech to provide either high-intensity or low-intensity speech glimpses derived from measurements of speech-on-speech masking. These interrupted intervals were deleted and subsequently filled by steady-state noise or one of four different types of noise amplitude modulated by the same or different sentence. Noise was presented at two different levels. Interruption by silence was also examined. Speech recognition was best with high-intensity glimpses and improved when the noise was modulated by missing high-intensity segments. Additional noise conditions detailed significant interactions between the noise level and glimpsed speech level. Overall, high-intensity speech segments, and the amplitude modulation (AM) of the segments, are crucial for speech recognition. Speech recognition is further influenced by the properties of the competing noise (i.e., level and AM) which interact with the glimpsed speech level. Acoustic properties of both speech-dominated and noise-dominated intervals of speech-noise mixtures determine speech recognition.

PubMed Disclaimer

Figures

FIG. 1.
FIG. 1.
Waveforms for an example sentence are displayed to demonstrate the speech interruption method and modulated noise types. (a) uninterrupted speech channel, (b) speech-modulated noise channel used to define interruptions, (c) running local SNR based on comparison of (a) and (b), (d) +SNR speech interrupted to preserve intervals (i.e., glimpses) above 0 dB SNR, (e) −SNR speech interrupted to preserve intervals below 0 dB SNR, and (f) different replacement SMNs. Replacement SMNs are paired according the modulation source: unmodulated, modulated by the target sentence (Target, Preceding), or modulated by a different sentence (Random, Time-Compressed). The vertical boundary lines in (a)–(e) indicate the identification of a noise replacement interval (d) or preserved speech interval (e). The displayed modulated noises in (f) were scaled to the average sentence level.
FIG. 2.
FIG. 2.
Keyword recognition accuracy for the experimental conditions. (A) +SNR conditions preserved speech at positive local SNRs; (B) −SNR conditions preserved speech at negative local SNRs. Replacement noise levels were scaled to either the sentence or segment levels. Error bars = standard error of the mean. Asterisks indicate significantly different performance between the two scaling types (Bonferroni-adjusted p < 0.05). UNMOD refers to unmodulated noise. TARG (target SMN) and PRE (preceding SMN) had noise modulated by the target sentence. RAND (random SMN) and TC (time-compressed SMN) had noise modulated by a different sentence.
FIG. 3.
FIG. 3.
The top panels display difference scores between noise and silent replacement for the different noise type conditions. +SNR conditions preserved speech at positive local SNRs, while −SNR conditions preserved speech at negative local SNRs. The bottom row displays the average level of the replacement noise relative to the average level of the original, uninterrupted target sentence. Note that all sentence-scaled conditions were equal to the average sentence level and are therefore at 0 dB. Error bars = standard error of the mean. UNMOD refers to unmodulated noise. TARG (target SMN) and PRE (preceding SMN) had noise modulated by the target sentence. RAND (random SMN) and TC (time-compressed SMN) had noise modulated by a different sentence.

Similar articles

Cited by

References

    1. Apoux, F. , and Healy, E. W. (2013). “ A glimpsing account of the role of temporal fine structure information in speech recognition,” in ( Springer, New York), pp. 119–126. - PubMed
    1. Bashford, J. A. , Riener, K. R. , and Warren, R. M. (1992). “ Increasing the intelligibility of speech through multiple phonemic restorations,” Percept. Psychophys. 51, 211–217.10.3758/BF03212247 - DOI - PubMed
    1. Bashford, J. A. , and Warren, R. M. (1987). “ Multiple phonemic restorations follow the rules for auditory induction,” Percept. Psychophys. 42, 114–121.10.3758/BF03210499 - DOI - PubMed
    1. Bashford, J. A. , Warren, R. M. , and Brown, C. A. (1996). “ Use of speech-modulated noise adds strong ‘bottom-up’ cues for phonemic restoration,” Percept. Psychophys. 58, 342–350.10.3758/BF03206810 - DOI - PubMed
    1. Başkent, D. , Eiler, C. , and Edwards, B. (2009). “ Effects of envelope discontinuities on perceptual restoration of amplitude-compressed speech,” J. Acoust. Soc. Am. 125, 3995–4005.10.1121/1.3125329 - DOI - PubMed

Publication types