Howell, Dominic (2015) Confusion modelling for lip-reading. Doctoral thesis, University of East Anglia.
Preview |
PDF
Download (16MB) | Preview |
Abstract
Lip-reading is mostly used as a means of communication by people with hearing di�fficulties. Recent work has explored the automation of this process, with the aim
of building a speech recognition system entirely driven by lip movements. However, this work has so far produced poor results because of factors such as high variability
of speaker features, diffi�culties in mapping from visual features to speech sounds, and high co-articulation of visual features.
The motivation for the work in this thesis is inspired by previous work in dysarthric speech recognition [Morales, 2009]. Dysathric speakers have poor control over their
articulators, often leading to a reduced phonemic repertoire. The premise of this thesis is that recognition of the visual speech signal is a similar problem to recog-
nition of dysarthric speech, in that some information about the speech signal has been lost in both cases, and this brings about a systematic pattern of errors in the
decoded output.
This work attempts to exploit the systematic nature of these errors by modelling them in the framework of a weighted finite-state transducer cascade. Results
indicate that the technique can achieve slightly lower error rates than the conventional approach. In addition, it explores some interesting more general questions for
automated lip-reading.
Item Type: | Thesis (Doctoral) |
---|---|
Faculty \ School: | Faculty of Science > School of Computing Sciences |
Depositing User: | Jackie Webb |
Date Deposited: | 26 Jun 2015 11:14 |
Last Modified: | 26 Jun 2015 11:14 |
URI: | https://ueaeprints.uea.ac.uk/id/eprint/53395 |
DOI: |
Downloads
Downloads per month over past year
Actions (login required)
View Item |