Limitations of Visual Speech Recognition

Newman, Jacob, Theobald, Barry and Cox, Stephen (2010) Limitations of Visual Speech Recognition. In: Proceedings of the International Conference on Auditory-Visual Speech Processing, 2010-01-01.

Full text not available from this repository. (Request a copy)


In this paper we investigate the limits of automated lip-reading systems and we consider the improvement that could be gained were additional information from other (non-visible) speech articulators available to the recogniser. Hidden Markov model (HMM) speech recognisers are trained using electromagnetic articulography (EMA) data drawn from the MOCHA-TIMIT data set. Articulatory information is systematically withheld from the recogniser and the performance is tested and compared with that of a typical state of the art lip-reading system. We find that, as expected, the performance of the recogniser degrades as articulatory information is lost, and that a typical lip-reading system achieves a level of performance similar to an EMA-based recogniser that uses information from only the front of the tongue forwards. Our results show that there is significant information in the articulator positions towards the back of the mouth that could be exploited were it available, but even this is insufficient to achieve the same level of performance as can be achieved by an acoustic speech recogniser.

Item Type: Conference or Workshop Item (Paper)
Faculty \ School: Faculty of Science > School of Computing Sciences
Depositing User: Nicola Talbot
Date Deposited: 14 Mar 2011 10:06
Last Modified: 21 Jul 2021 23:40

Actions (login required)

View Item View Item