Across-speaker articulatory normalization for speaker-independent silent speech recognition

Jun Wang, Ashok K Samal, Jordan R. Green

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Citations (Scopus)

Abstract

Silent speech interfaces (SSIs), which recognize speech from articulatory information (i.e., without using audio information), have the potential to enable persons with laryngectomy or a neurological disease to produce synthesized speech with a natural sounding voice using their tongue and lips. Current approaches to SSIs have largely relied on speaker-dependent recognition models to minimize the negative effects of talker variation on recognition accuracy. Speaker-independent approaches are needed to reduce the large amount of training data required from each user; only limited articulatory samples are often available for persons with moderate to severe speech impairments, due to the logistic difficulty of data collection. This paper reported an across-speaker articulatory normalization approach based on Procrustes matching, a bidimensional regression technique for removing translational, scaling, and rotational effects of spatial data. A dataset of short functional sentences was collected from seven English talkers. A support vector machine was then trained to classify sentences based on normalized tongue and lip movements. Speaker-independent classification accuracy (tested using leave-one-subject-out cross validation) improved significantly, from 68.63% to 95.90%, following normalization. These results support the feasibility of a speaker-independent SSI using Procrustes matching as the basis for articulatory normalization across speakers.

Original languageEnglish (US)
Title of host publicationProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
PublisherInternational Speech and Communication Association
Pages1179-1183
Number of pages5
StatePublished - 2014
Event15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages, INTERSPEECH 2014 - Singapore, Singapore
Duration: Sep 14 2014Sep 18 2014

Other

Other15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages, INTERSPEECH 2014
CountrySingapore
CitySingapore
Period9/14/149/18/14

Fingerprint

Speech Recognition
Speech recognition
Normalization
Person
Spatial Data
Cross-validation
Logistics
Speech
Support Vector Machine
Regression
Support vector machines
Classify
Scaling
Minimise
Dependent

Keywords

  • Procrustes analysis
  • Silent speech recognition
  • Speech kinematics
  • Support vector machine

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modeling and Simulation

Cite this

Wang, J., Samal, A. K., & Green, J. R. (2014). Across-speaker articulatory normalization for speaker-independent silent speech recognition. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (pp. 1179-1183). International Speech and Communication Association.

Across-speaker articulatory normalization for speaker-independent silent speech recognition. / Wang, Jun; Samal, Ashok K; Green, Jordan R.

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. International Speech and Communication Association, 2014. p. 1179-1183.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Wang, J, Samal, AK & Green, JR 2014, Across-speaker articulatory normalization for speaker-independent silent speech recognition. in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. International Speech and Communication Association, pp. 1179-1183, 15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages, INTERSPEECH 2014, Singapore, Singapore, 9/14/14.
Wang J, Samal AK, Green JR. Across-speaker articulatory normalization for speaker-independent silent speech recognition. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. International Speech and Communication Association. 2014. p. 1179-1183
Wang, Jun ; Samal, Ashok K ; Green, Jordan R. / Across-speaker articulatory normalization for speaker-independent silent speech recognition. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. International Speech and Communication Association, 2014. pp. 1179-1183
@inproceedings{33134232406f478ab2d20a589198ecc4,
title = "Across-speaker articulatory normalization for speaker-independent silent speech recognition",
abstract = "Silent speech interfaces (SSIs), which recognize speech from articulatory information (i.e., without using audio information), have the potential to enable persons with laryngectomy or a neurological disease to produce synthesized speech with a natural sounding voice using their tongue and lips. Current approaches to SSIs have largely relied on speaker-dependent recognition models to minimize the negative effects of talker variation on recognition accuracy. Speaker-independent approaches are needed to reduce the large amount of training data required from each user; only limited articulatory samples are often available for persons with moderate to severe speech impairments, due to the logistic difficulty of data collection. This paper reported an across-speaker articulatory normalization approach based on Procrustes matching, a bidimensional regression technique for removing translational, scaling, and rotational effects of spatial data. A dataset of short functional sentences was collected from seven English talkers. A support vector machine was then trained to classify sentences based on normalized tongue and lip movements. Speaker-independent classification accuracy (tested using leave-one-subject-out cross validation) improved significantly, from 68.63{\%} to 95.90{\%}, following normalization. These results support the feasibility of a speaker-independent SSI using Procrustes matching as the basis for articulatory normalization across speakers.",
keywords = "Procrustes analysis, Silent speech recognition, Speech kinematics, Support vector machine",
author = "Jun Wang and Samal, {Ashok K} and Green, {Jordan R.}",
year = "2014",
language = "English (US)",
pages = "1179--1183",
booktitle = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
publisher = "International Speech and Communication Association",

}

TY - GEN

T1 - Across-speaker articulatory normalization for speaker-independent silent speech recognition

AU - Wang, Jun

AU - Samal, Ashok K

AU - Green, Jordan R.

PY - 2014

Y1 - 2014

N2 - Silent speech interfaces (SSIs), which recognize speech from articulatory information (i.e., without using audio information), have the potential to enable persons with laryngectomy or a neurological disease to produce synthesized speech with a natural sounding voice using their tongue and lips. Current approaches to SSIs have largely relied on speaker-dependent recognition models to minimize the negative effects of talker variation on recognition accuracy. Speaker-independent approaches are needed to reduce the large amount of training data required from each user; only limited articulatory samples are often available for persons with moderate to severe speech impairments, due to the logistic difficulty of data collection. This paper reported an across-speaker articulatory normalization approach based on Procrustes matching, a bidimensional regression technique for removing translational, scaling, and rotational effects of spatial data. A dataset of short functional sentences was collected from seven English talkers. A support vector machine was then trained to classify sentences based on normalized tongue and lip movements. Speaker-independent classification accuracy (tested using leave-one-subject-out cross validation) improved significantly, from 68.63% to 95.90%, following normalization. These results support the feasibility of a speaker-independent SSI using Procrustes matching as the basis for articulatory normalization across speakers.

AB - Silent speech interfaces (SSIs), which recognize speech from articulatory information (i.e., without using audio information), have the potential to enable persons with laryngectomy or a neurological disease to produce synthesized speech with a natural sounding voice using their tongue and lips. Current approaches to SSIs have largely relied on speaker-dependent recognition models to minimize the negative effects of talker variation on recognition accuracy. Speaker-independent approaches are needed to reduce the large amount of training data required from each user; only limited articulatory samples are often available for persons with moderate to severe speech impairments, due to the logistic difficulty of data collection. This paper reported an across-speaker articulatory normalization approach based on Procrustes matching, a bidimensional regression technique for removing translational, scaling, and rotational effects of spatial data. A dataset of short functional sentences was collected from seven English talkers. A support vector machine was then trained to classify sentences based on normalized tongue and lip movements. Speaker-independent classification accuracy (tested using leave-one-subject-out cross validation) improved significantly, from 68.63% to 95.90%, following normalization. These results support the feasibility of a speaker-independent SSI using Procrustes matching as the basis for articulatory normalization across speakers.

KW - Procrustes analysis

KW - Silent speech recognition

KW - Speech kinematics

KW - Support vector machine

UR - http://www.scopus.com/inward/record.url?scp=84910051225&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84910051225&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84910051225

SP - 1179

EP - 1183

BT - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

PB - International Speech and Communication Association

ER -