Recognition of speech spectrograms

Beth G. Greene, David B. Pisoni, Thomas D. Carrell

Research output: Contribution to journalArticle

16 Citations (Scopus)

Abstract

The performance of eight naive observers in learning to identify speech spectrograms was studied over a 2-month period. Single tokens from a 50-word phonetically balanced (PB) list were recorded by several talkers and displayed on a Spectraphonics Speech Spectrographic Display system. Identification testing occurred immediately after daily training sessions. After approximately 20 h of training, naive subjects correctly identified the 50 PB words from a single talker over 95% of the time. Generalization tests with the same words were then carried out with different tokens from the original talker, new tokens from another male talker, a female talker, and finally, a synthetic talker. The generalization results for these talkers showed recognition performance at 91 %, 76%, 76%, and 48%, respectively. Finally, generalization tests with a novel set of PB words produced by the original talker were also carried out to examine in detail the perceptual strategies and visual features that subjects abstracted from the training set. Our results demonstrate that even without formal training in phonetics or acoustics naive observers can learn to identify visual displays of speech at very high levels of accuracy. Analysis of subjects’ performance in a verbal protocol task demonstrated that they rely on salient visual correlates of many phonetic features in speech.

Original languageEnglish (US)
Pages (from-to)32-43
Number of pages12
JournalJournal of the Acoustical Society of America
Volume76
Issue number1
DOIs
StatePublished - Jul 1984

Fingerprint

spectrograms
education
phonetics
display devices
lists
learning
acoustics
Talkers

ASJC Scopus subject areas

  • Arts and Humanities (miscellaneous)
  • Acoustics and Ultrasonics

Cite this

Recognition of speech spectrograms. / Greene, Beth G.; Pisoni, David B.; Carrell, Thomas D.

In: Journal of the Acoustical Society of America, Vol. 76, No. 1, 07.1984, p. 32-43.

Research output: Contribution to journalArticle

Greene, Beth G. ; Pisoni, David B. ; Carrell, Thomas D. / Recognition of speech spectrograms. In: Journal of the Acoustical Society of America. 1984 ; Vol. 76, No. 1. pp. 32-43.
@article{5011d9b90e2f4eb7998d4d1d564c68c2,
title = "Recognition of speech spectrograms",
abstract = "The performance of eight naive observers in learning to identify speech spectrograms was studied over a 2-month period. Single tokens from a 50-word phonetically balanced (PB) list were recorded by several talkers and displayed on a Spectraphonics Speech Spectrographic Display system. Identification testing occurred immediately after daily training sessions. After approximately 20 h of training, naive subjects correctly identified the 50 PB words from a single talker over 95{\%} of the time. Generalization tests with the same words were then carried out with different tokens from the original talker, new tokens from another male talker, a female talker, and finally, a synthetic talker. The generalization results for these talkers showed recognition performance at 91 {\%}, 76{\%}, 76{\%}, and 48{\%}, respectively. Finally, generalization tests with a novel set of PB words produced by the original talker were also carried out to examine in detail the perceptual strategies and visual features that subjects abstracted from the training set. Our results demonstrate that even without formal training in phonetics or acoustics naive observers can learn to identify visual displays of speech at very high levels of accuracy. Analysis of subjects’ performance in a verbal protocol task demonstrated that they rely on salient visual correlates of many phonetic features in speech.",
author = "Greene, {Beth G.} and Pisoni, {David B.} and Carrell, {Thomas D.}",
year = "1984",
month = "7",
doi = "10.1121/1.391035",
language = "English (US)",
volume = "76",
pages = "32--43",
journal = "Journal of the Acoustical Society of America",
issn = "0001-4966",
publisher = "Acoustical Society of America",
number = "1",

}

TY - JOUR

T1 - Recognition of speech spectrograms

AU - Greene, Beth G.

AU - Pisoni, David B.

AU - Carrell, Thomas D.

PY - 1984/7

Y1 - 1984/7

N2 - The performance of eight naive observers in learning to identify speech spectrograms was studied over a 2-month period. Single tokens from a 50-word phonetically balanced (PB) list were recorded by several talkers and displayed on a Spectraphonics Speech Spectrographic Display system. Identification testing occurred immediately after daily training sessions. After approximately 20 h of training, naive subjects correctly identified the 50 PB words from a single talker over 95% of the time. Generalization tests with the same words were then carried out with different tokens from the original talker, new tokens from another male talker, a female talker, and finally, a synthetic talker. The generalization results for these talkers showed recognition performance at 91 %, 76%, 76%, and 48%, respectively. Finally, generalization tests with a novel set of PB words produced by the original talker were also carried out to examine in detail the perceptual strategies and visual features that subjects abstracted from the training set. Our results demonstrate that even without formal training in phonetics or acoustics naive observers can learn to identify visual displays of speech at very high levels of accuracy. Analysis of subjects’ performance in a verbal protocol task demonstrated that they rely on salient visual correlates of many phonetic features in speech.

AB - The performance of eight naive observers in learning to identify speech spectrograms was studied over a 2-month period. Single tokens from a 50-word phonetically balanced (PB) list were recorded by several talkers and displayed on a Spectraphonics Speech Spectrographic Display system. Identification testing occurred immediately after daily training sessions. After approximately 20 h of training, naive subjects correctly identified the 50 PB words from a single talker over 95% of the time. Generalization tests with the same words were then carried out with different tokens from the original talker, new tokens from another male talker, a female talker, and finally, a synthetic talker. The generalization results for these talkers showed recognition performance at 91 %, 76%, 76%, and 48%, respectively. Finally, generalization tests with a novel set of PB words produced by the original talker were also carried out to examine in detail the perceptual strategies and visual features that subjects abstracted from the training set. Our results demonstrate that even without formal training in phonetics or acoustics naive observers can learn to identify visual displays of speech at very high levels of accuracy. Analysis of subjects’ performance in a verbal protocol task demonstrated that they rely on salient visual correlates of many phonetic features in speech.

UR - http://www.scopus.com/inward/record.url?scp=0021461482&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0021461482&partnerID=8YFLogxK

U2 - 10.1121/1.391035

DO - 10.1121/1.391035

M3 - Article

C2 - 6747109

AN - SCOPUS:0021461482

VL - 76

SP - 32

EP - 43

JO - Journal of the Acoustical Society of America

JF - Journal of the Acoustical Society of America

SN - 0001-4966

IS - 1

ER -