Effects of natural variability in cross-modal temporal correlations on audiovisual speech recognition benefit

Research output: Contribution to journalConference article

Abstract

In audiovisual (AV) speech, correlations over time between visible mouth movements and the amplitude envelope of auditory speech help to reduce uncertainty as to when peaks in the auditory signal will occur. Previous studies demonstrated greater AV benefit to speech detection in noise for sentences with higher cross-modal correlations than sentences with lower cross-modal correlations. This study examined whether the mechanisms that underlie AV detection benefits have downstream effects on speech recognition in noise. Participants were presented 72 sentences in noise, in auditory-only and AV conditions, at either their 50% auditory speech recognition threshold in noise (SRT-50) or at a signal-to-noise ratio (SNR) 6 dB poorer than their SRT-50. They were asked to repeat each sentence. Mean AV benefit across subjects was calculated for each sentence. Pearson correlations and mixed modeling were used to examined whether variability in AV benefit across sentences was related to natural variation in the degree of cross-modal correlation across sentences. In the more difficult listening condition, higher crossmodal correlations were associated with higher AV sentence recognition benefit. The relationship was strongest in the 0.8-2.2 kHz and 0.8-6 kHz frequency regions. These results demonstrate that cross-modal correlations contribute to variability in AV speech recognition in noise.

Original languageEnglish (US)
Pages (from-to)2260-2264
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2019-September
DOIs
StatePublished - Jan 1 2019
Event20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019 - Graz, Austria
Duration: Sep 15 2019Sep 19 2019

Fingerprint

Temporal Correlation
Speech Recognition
Speech recognition
Acoustic noise
Signal to noise ratio
Pearson Correlation
Envelope
Audiovisual Speech
Natural Variability
Uncertainty
Hearing
Modeling
Demonstrate
Speech

Keywords

  • Audiovisual
  • Multimodal
  • Speech recognition

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modeling and Simulation

Cite this

@article{8a97518969da43cbb4c4709025bbecbc,
title = "Effects of natural variability in cross-modal temporal correlations on audiovisual speech recognition benefit",
abstract = "In audiovisual (AV) speech, correlations over time between visible mouth movements and the amplitude envelope of auditory speech help to reduce uncertainty as to when peaks in the auditory signal will occur. Previous studies demonstrated greater AV benefit to speech detection in noise for sentences with higher cross-modal correlations than sentences with lower cross-modal correlations. This study examined whether the mechanisms that underlie AV detection benefits have downstream effects on speech recognition in noise. Participants were presented 72 sentences in noise, in auditory-only and AV conditions, at either their 50{\%} auditory speech recognition threshold in noise (SRT-50) or at a signal-to-noise ratio (SNR) 6 dB poorer than their SRT-50. They were asked to repeat each sentence. Mean AV benefit across subjects was calculated for each sentence. Pearson correlations and mixed modeling were used to examined whether variability in AV benefit across sentences was related to natural variation in the degree of cross-modal correlation across sentences. In the more difficult listening condition, higher crossmodal correlations were associated with higher AV sentence recognition benefit. The relationship was strongest in the 0.8-2.2 kHz and 0.8-6 kHz frequency regions. These results demonstrate that cross-modal correlations contribute to variability in AV speech recognition in noise.",
keywords = "Audiovisual, Multimodal, Speech recognition",
author = "Kaylah Lalonde",
year = "2019",
month = "1",
day = "1",
doi = "10.21437/Interspeech.2019-2931",
language = "English (US)",
volume = "2019-September",
pages = "2260--2264",
journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
issn = "2308-457X",

}

TY - JOUR

T1 - Effects of natural variability in cross-modal temporal correlations on audiovisual speech recognition benefit

AU - Lalonde, Kaylah

PY - 2019/1/1

Y1 - 2019/1/1

N2 - In audiovisual (AV) speech, correlations over time between visible mouth movements and the amplitude envelope of auditory speech help to reduce uncertainty as to when peaks in the auditory signal will occur. Previous studies demonstrated greater AV benefit to speech detection in noise for sentences with higher cross-modal correlations than sentences with lower cross-modal correlations. This study examined whether the mechanisms that underlie AV detection benefits have downstream effects on speech recognition in noise. Participants were presented 72 sentences in noise, in auditory-only and AV conditions, at either their 50% auditory speech recognition threshold in noise (SRT-50) or at a signal-to-noise ratio (SNR) 6 dB poorer than their SRT-50. They were asked to repeat each sentence. Mean AV benefit across subjects was calculated for each sentence. Pearson correlations and mixed modeling were used to examined whether variability in AV benefit across sentences was related to natural variation in the degree of cross-modal correlation across sentences. In the more difficult listening condition, higher crossmodal correlations were associated with higher AV sentence recognition benefit. The relationship was strongest in the 0.8-2.2 kHz and 0.8-6 kHz frequency regions. These results demonstrate that cross-modal correlations contribute to variability in AV speech recognition in noise.

AB - In audiovisual (AV) speech, correlations over time between visible mouth movements and the amplitude envelope of auditory speech help to reduce uncertainty as to when peaks in the auditory signal will occur. Previous studies demonstrated greater AV benefit to speech detection in noise for sentences with higher cross-modal correlations than sentences with lower cross-modal correlations. This study examined whether the mechanisms that underlie AV detection benefits have downstream effects on speech recognition in noise. Participants were presented 72 sentences in noise, in auditory-only and AV conditions, at either their 50% auditory speech recognition threshold in noise (SRT-50) or at a signal-to-noise ratio (SNR) 6 dB poorer than their SRT-50. They were asked to repeat each sentence. Mean AV benefit across subjects was calculated for each sentence. Pearson correlations and mixed modeling were used to examined whether variability in AV benefit across sentences was related to natural variation in the degree of cross-modal correlation across sentences. In the more difficult listening condition, higher crossmodal correlations were associated with higher AV sentence recognition benefit. The relationship was strongest in the 0.8-2.2 kHz and 0.8-6 kHz frequency regions. These results demonstrate that cross-modal correlations contribute to variability in AV speech recognition in noise.

KW - Audiovisual

KW - Multimodal

KW - Speech recognition

UR - http://www.scopus.com/inward/record.url?scp=85074731223&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85074731223&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2019-2931

DO - 10.21437/Interspeech.2019-2931

M3 - Conference article

AN - SCOPUS:85074731223

VL - 2019-September

SP - 2260

EP - 2264

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

SN - 2308-457X

ER -