The effect of target/masker fundamental frequency contour similarity on masked-speech recognition

Lauren Calandruccio, Peter A. Wasiuk, Emily Buss, Lori J. Leibold, Jessica Kong, Ann Holmes, Jacob Oleson

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Greater informational masking is observed when the target and masker speech are more perceptually similar. Fundamental frequency (f0) contour, or the dynamic movement of f0, is thought to provide cues for segregating target speech presented in a speech masker. Most of the data demonstrating this effect have been collected using digitally modified stimuli. Less work has been done exploring the role of f0 contour for speech-in-speech recognition when all of the stimuli have been produced naturally. The goal of this project was to explore the importance of target and masker f0 contour similarity by manipulating the speaking style of talkers producing the target and masker speech streams. Sentence recognition thresholds were evaluated for target and masker speech that was produced with either flat, normal, or exaggerated speaking styles; performance was also measured in speech spectrum shaped noise and for conditions in which the stimuli were processed through an ideal-binary mask. Results confirmed that similarities in f0 contour depth elevated speech-in-speech recognition thresholds; however, when the target and masker had similar contour depths, targets with normal f0 contours were more resistant to masking than targets with flat or exaggerated contours. Differences in energetic masking across stimuli cannot account for these results.

Original languageEnglish (US)
Pages (from-to)1065-1076
Number of pages12
JournalJournal of the Acoustical Society of America
Volume146
Issue number2
DOIs
StatePublished - Aug 1 2019

Fingerprint

speech recognition
stimuli
target masking
masking
sentences
thresholds
Speech Recognition
Fundamental Frequency
cues
masks
Stimulus
Masking

ASJC Scopus subject areas

  • Arts and Humanities (miscellaneous)
  • Acoustics and Ultrasonics

Cite this

The effect of target/masker fundamental frequency contour similarity on masked-speech recognition. / Calandruccio, Lauren; Wasiuk, Peter A.; Buss, Emily; Leibold, Lori J.; Kong, Jessica; Holmes, Ann; Oleson, Jacob.

In: Journal of the Acoustical Society of America, Vol. 146, No. 2, 01.08.2019, p. 1065-1076.

Research output: Contribution to journalArticle

Calandruccio, Lauren ; Wasiuk, Peter A. ; Buss, Emily ; Leibold, Lori J. ; Kong, Jessica ; Holmes, Ann ; Oleson, Jacob. / The effect of target/masker fundamental frequency contour similarity on masked-speech recognition. In: Journal of the Acoustical Society of America. 2019 ; Vol. 146, No. 2. pp. 1065-1076.
@article{fff371298325467094aabf205bbcea24,
title = "The effect of target/masker fundamental frequency contour similarity on masked-speech recognition",
abstract = "Greater informational masking is observed when the target and masker speech are more perceptually similar. Fundamental frequency (f0) contour, or the dynamic movement of f0, is thought to provide cues for segregating target speech presented in a speech masker. Most of the data demonstrating this effect have been collected using digitally modified stimuli. Less work has been done exploring the role of f0 contour for speech-in-speech recognition when all of the stimuli have been produced naturally. The goal of this project was to explore the importance of target and masker f0 contour similarity by manipulating the speaking style of talkers producing the target and masker speech streams. Sentence recognition thresholds were evaluated for target and masker speech that was produced with either flat, normal, or exaggerated speaking styles; performance was also measured in speech spectrum shaped noise and for conditions in which the stimuli were processed through an ideal-binary mask. Results confirmed that similarities in f0 contour depth elevated speech-in-speech recognition thresholds; however, when the target and masker had similar contour depths, targets with normal f0 contours were more resistant to masking than targets with flat or exaggerated contours. Differences in energetic masking across stimuli cannot account for these results.",
author = "Lauren Calandruccio and Wasiuk, {Peter A.} and Emily Buss and Leibold, {Lori J.} and Jessica Kong and Ann Holmes and Jacob Oleson",
year = "2019",
month = "8",
day = "1",
doi = "10.1121/1.5121314",
language = "English (US)",
volume = "146",
pages = "1065--1076",
journal = "Journal of the Acoustical Society of America",
issn = "0001-4966",
publisher = "Acoustical Society of America",
number = "2",

}

TY - JOUR

T1 - The effect of target/masker fundamental frequency contour similarity on masked-speech recognition

AU - Calandruccio, Lauren

AU - Wasiuk, Peter A.

AU - Buss, Emily

AU - Leibold, Lori J.

AU - Kong, Jessica

AU - Holmes, Ann

AU - Oleson, Jacob

PY - 2019/8/1

Y1 - 2019/8/1

N2 - Greater informational masking is observed when the target and masker speech are more perceptually similar. Fundamental frequency (f0) contour, or the dynamic movement of f0, is thought to provide cues for segregating target speech presented in a speech masker. Most of the data demonstrating this effect have been collected using digitally modified stimuli. Less work has been done exploring the role of f0 contour for speech-in-speech recognition when all of the stimuli have been produced naturally. The goal of this project was to explore the importance of target and masker f0 contour similarity by manipulating the speaking style of talkers producing the target and masker speech streams. Sentence recognition thresholds were evaluated for target and masker speech that was produced with either flat, normal, or exaggerated speaking styles; performance was also measured in speech spectrum shaped noise and for conditions in which the stimuli were processed through an ideal-binary mask. Results confirmed that similarities in f0 contour depth elevated speech-in-speech recognition thresholds; however, when the target and masker had similar contour depths, targets with normal f0 contours were more resistant to masking than targets with flat or exaggerated contours. Differences in energetic masking across stimuli cannot account for these results.

AB - Greater informational masking is observed when the target and masker speech are more perceptually similar. Fundamental frequency (f0) contour, or the dynamic movement of f0, is thought to provide cues for segregating target speech presented in a speech masker. Most of the data demonstrating this effect have been collected using digitally modified stimuli. Less work has been done exploring the role of f0 contour for speech-in-speech recognition when all of the stimuli have been produced naturally. The goal of this project was to explore the importance of target and masker f0 contour similarity by manipulating the speaking style of talkers producing the target and masker speech streams. Sentence recognition thresholds were evaluated for target and masker speech that was produced with either flat, normal, or exaggerated speaking styles; performance was also measured in speech spectrum shaped noise and for conditions in which the stimuli were processed through an ideal-binary mask. Results confirmed that similarities in f0 contour depth elevated speech-in-speech recognition thresholds; however, when the target and masker had similar contour depths, targets with normal f0 contours were more resistant to masking than targets with flat or exaggerated contours. Differences in energetic masking across stimuli cannot account for these results.

UR - http://www.scopus.com/inward/record.url?scp=85070665326&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85070665326&partnerID=8YFLogxK

U2 - 10.1121/1.5121314

DO - 10.1121/1.5121314

M3 - Article

C2 - 31472562

AN - SCOPUS:85070665326

VL - 146

SP - 1065

EP - 1076

JO - Journal of the Acoustical Society of America

JF - Journal of the Acoustical Society of America

SN - 0001-4966

IS - 2

ER -