The performance of eight naive observers in learning to identify speech spectrograms was studied over a 2-month period. Single tokens from a 50-word phonetically balanced (PB) list were recorded by several talkers and displayed on a Spectraphonics Speech Spectrographic Display system. Identification testing occurred immediately after daily training sessions. After approximately 20 h of training, naive subjects correctly identified the 50 PB words from a single talker over 95% of the time. Generalization tests with the same words were then carried out with different tokens from the original talker, new tokens from another male talker, a female talker, and finally, a synthetic talker. The generalization results for these talkers showed recognition performance at 91 %, 76%, 76%, and 48%, respectively. Finally, generalization tests with a novel set of PB words produced by the original talker were also carried out to examine in detail the perceptual strategies and visual features that subjects abstracted from the training set. Our results demonstrate that even without formal training in phonetics or acoustics naive observers can learn to identify visual displays of speech at very high levels of accuracy. Analysis of subjects’ performance in a verbal protocol task demonstrated that they rely on salient visual correlates of many phonetic features in speech.
ASJC Scopus subject areas
- Arts and Humanities (miscellaneous)
- Acoustics and Ultrasonics