An optimal set of flesh points on tongue and lips for speech-movement classification

Jun Wang, Ashok K Samal, Panying Rong, Jordan R. Green

Research output: Contribution to journalArticle

16 Citations (Scopus)

Abstract

Purpose: The authors sought to determine an optimal set of flesh points on the tongue and lips for classifying speech movements. Method: The authors used electromagnetic articulographs (Carstens AG500 and NDI Wave) to record tongue and lip movements from 13 healthy talkers who articulated 8 vowels, 11 consonants, a phonetically balanced set of words, and a set of short phrases during the recording. We used a machine-learning classifier (supportector machine) to classify the speech stimuli on the basis of articulatory movements. We then compared classification accuracies of the flesh-point combinations to determine an optimal set of sensors. Results: When data from the 4 sensors (T1: the vicinity between the tongue tip and tongue blade; T4: the tongue-body back; UL: the upper lip; and LL: the lower lip) were combined, phoneme and word classifications were most accurate and were comparable with the full set (including T2: the tongue-body front; and T3: the tongue-body front). Conclusion: We identified a 4-sensor set—that is, T1, T4, UL, LL that yielded a classification accuracy (91%–95%) equivalent to that using all 6 sensors. These findings provide an empirical basis for selecting sensors and their locations for scientific and emerging clinical applications that incorporate articulatory movements.

Original languageEnglish (US)
Pages (from-to)15-26
Number of pages12
JournalJournal of Speech, Language, and Hearing Research
Volume59
Issue number1
DOIs
StatePublished - Feb 2016

Fingerprint

Lip
Tongue
recording
stimulus
Electromagnetic Phenomena
learning
Flesh
Sensor

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language
  • Speech and Hearing

Cite this

An optimal set of flesh points on tongue and lips for speech-movement classification. / Wang, Jun; Samal, Ashok K; Rong, Panying; Green, Jordan R.

In: Journal of Speech, Language, and Hearing Research, Vol. 59, No. 1, 02.2016, p. 15-26.

Research output: Contribution to journalArticle

@article{2ea22a4ff9c846e78fbd6b52598c46d9,
title = "An optimal set of flesh points on tongue and lips for speech-movement classification",
abstract = "Purpose: The authors sought to determine an optimal set of flesh points on the tongue and lips for classifying speech movements. Method: The authors used electromagnetic articulographs (Carstens AG500 and NDI Wave) to record tongue and lip movements from 13 healthy talkers who articulated 8 vowels, 11 consonants, a phonetically balanced set of words, and a set of short phrases during the recording. We used a machine-learning classifier (supportector machine) to classify the speech stimuli on the basis of articulatory movements. We then compared classification accuracies of the flesh-point combinations to determine an optimal set of sensors. Results: When data from the 4 sensors (T1: the vicinity between the tongue tip and tongue blade; T4: the tongue-body back; UL: the upper lip; and LL: the lower lip) were combined, phoneme and word classifications were most accurate and were comparable with the full set (including T2: the tongue-body front; and T3: the tongue-body front). Conclusion: We identified a 4-sensor set—that is, T1, T4, UL, LL that yielded a classification accuracy (91{\%}–95{\%}) equivalent to that using all 6 sensors. These findings provide an empirical basis for selecting sensors and their locations for scientific and emerging clinical applications that incorporate articulatory movements.",
author = "Jun Wang and Samal, {Ashok K} and Panying Rong and Green, {Jordan R.}",
year = "2016",
month = "2",
doi = "10.1044/2015_JSLHR-S-14-0112",
language = "English (US)",
volume = "59",
pages = "15--26",
journal = "Journal of Speech, Language, and Hearing Research",
issn = "1092-4388",
publisher = "American Speech-Language-Hearing Association (ASHA)",
number = "1",

}

TY - JOUR

T1 - An optimal set of flesh points on tongue and lips for speech-movement classification

AU - Wang, Jun

AU - Samal, Ashok K

AU - Rong, Panying

AU - Green, Jordan R.

PY - 2016/2

Y1 - 2016/2

N2 - Purpose: The authors sought to determine an optimal set of flesh points on the tongue and lips for classifying speech movements. Method: The authors used electromagnetic articulographs (Carstens AG500 and NDI Wave) to record tongue and lip movements from 13 healthy talkers who articulated 8 vowels, 11 consonants, a phonetically balanced set of words, and a set of short phrases during the recording. We used a machine-learning classifier (supportector machine) to classify the speech stimuli on the basis of articulatory movements. We then compared classification accuracies of the flesh-point combinations to determine an optimal set of sensors. Results: When data from the 4 sensors (T1: the vicinity between the tongue tip and tongue blade; T4: the tongue-body back; UL: the upper lip; and LL: the lower lip) were combined, phoneme and word classifications were most accurate and were comparable with the full set (including T2: the tongue-body front; and T3: the tongue-body front). Conclusion: We identified a 4-sensor set—that is, T1, T4, UL, LL that yielded a classification accuracy (91%–95%) equivalent to that using all 6 sensors. These findings provide an empirical basis for selecting sensors and their locations for scientific and emerging clinical applications that incorporate articulatory movements.

AB - Purpose: The authors sought to determine an optimal set of flesh points on the tongue and lips for classifying speech movements. Method: The authors used electromagnetic articulographs (Carstens AG500 and NDI Wave) to record tongue and lip movements from 13 healthy talkers who articulated 8 vowels, 11 consonants, a phonetically balanced set of words, and a set of short phrases during the recording. We used a machine-learning classifier (supportector machine) to classify the speech stimuli on the basis of articulatory movements. We then compared classification accuracies of the flesh-point combinations to determine an optimal set of sensors. Results: When data from the 4 sensors (T1: the vicinity between the tongue tip and tongue blade; T4: the tongue-body back; UL: the upper lip; and LL: the lower lip) were combined, phoneme and word classifications were most accurate and were comparable with the full set (including T2: the tongue-body front; and T3: the tongue-body front). Conclusion: We identified a 4-sensor set—that is, T1, T4, UL, LL that yielded a classification accuracy (91%–95%) equivalent to that using all 6 sensors. These findings provide an empirical basis for selecting sensors and their locations for scientific and emerging clinical applications that incorporate articulatory movements.

UR - http://www.scopus.com/inward/record.url?scp=84959197729&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84959197729&partnerID=8YFLogxK

U2 - 10.1044/2015_JSLHR-S-14-0112

DO - 10.1044/2015_JSLHR-S-14-0112

M3 - Article

C2 - 26564030

AN - SCOPUS:84959197729

VL - 59

SP - 15

EP - 26

JO - Journal of Speech, Language, and Hearing Research

JF - Journal of Speech, Language, and Hearing Research

SN - 1092-4388

IS - 1

ER -