Confusion modelling for lip-reading

Howell, Dominic (2015) Confusion modelling for lip-reading. Doctoral thesis, University of East Anglia.

[thumbnail of D_Howell_Thesis.pdf]
Download (16MB) | Preview


Lip-reading is mostly used as a means of communication by people with hearing di�fficulties. Recent work has explored the automation of this process, with the aim
of building a speech recognition system entirely driven by lip movements. However, this work has so far produced poor results because of factors such as high variability
of speaker features, diffi�culties in mapping from visual features to speech sounds, and high co-articulation of visual features.
The motivation for the work in this thesis is inspired by previous work in dysarthric speech recognition [Morales, 2009]. Dysathric speakers have poor control over their
articulators, often leading to a reduced phonemic repertoire. The premise of this thesis is that recognition of the visual speech signal is a similar problem to recog-
nition of dysarthric speech, in that some information about the speech signal has been lost in both cases, and this brings about a systematic pattern of errors in the
decoded output.
This work attempts to exploit the systematic nature of these errors by modelling them in the framework of a weighted finite-state transducer cascade. Results
indicate that the technique can achieve slightly lower error rates than the conventional approach. In addition, it explores some interesting more general questions for
automated lip-reading.

Item Type: Thesis (Doctoral)
Faculty \ School: Faculty of Science > School of Computing Sciences
Depositing User: Jackie Webb
Date Deposited: 26 Jun 2015 11:14
Last Modified: 26 Jun 2015 11:14

Actions (login required)

View Item View Item