Extraction of visual features for lipreading

Matthews, I., Cootes, T. F., Bangham, J. A., Cox, S. J. and Harvey, R. W. ORCID: https://orcid.org/0000-0001-9925-8316 (2002) Extraction of visual features for lipreading. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24 (2). pp. 198-213. ISSN 0162-8828

Full text not available from this repository. (Request a copy)


The multimodal nature of speech is often ignored in human-computer interaction, but lip deformations and other body motion, such as those of the head, convey additional information. We integrate speech cues from many sources and this improves intelligibility, especially when the acoustic signal is degraded. The paper shows how this additional, often complementary, visual speech information can be used for speech recognition. Three methods for parameterizing lip image sequences for recognition using hidden Markov models are compared. Two of these are top-down approaches that fit a model of the inner and outer lip contours and derive lipreading features from a principal component analysis of shape or shape and appearance, respectively. The third, bottom-up, method uses a nonlinear scale-space analysis to form features directly from the pixel intensity. All methods are compared on a multitalker visual speech recognition task of isolated letters

Item Type: Article
Faculty \ School: Faculty of Science > School of Computing Sciences
UEA Research Groups: Faculty of Science > Research Groups > Smart Emerging Technologies
Faculty of Science > Research Groups > Interactive Graphics and Audio
Depositing User: Vishal Gautam
Date Deposited: 07 Mar 2011 14:08
Last Modified: 27 Oct 2023 00:42
URI: https://ueaeprints.uea.ac.uk/id/eprint/22167
DOI: 10.1109/34.982900

Actions (login required)

View Item View Item