Visual units and confusion modelling for automatic lip-reading

Howell, Dominic; Cox, Stephen; Theobald, Barry

doi:10.1016/j.imavis.2016.03.003

Visual units and confusion modelling for automatic lip-reading

Tools

Howell, Dominic, Cox, Stephen and Theobald, Barry (2016) Visual units and confusion modelling for automatic lip-reading. Image and Vision Computing, 51. pp. 1-12. ISSN 0262-8856

Preview

PDF (Accepted_manuscript) - Accepted Version
Available under License Creative Commons Attribution Non-commercial No Derivatives.
Download (534kB) | Preview

Abstract

Automatic lip-reading (ALR) is a challenging task because the visual speech signal is known to be missing some important information, such as voicing. We propose an approach to ALR that acknowledges that this information is missing but assumes that it is substituted or deleted in a systematic way that can be modelled. We describe a system that learns such a model and then incorporates it into decoding, which is realised as a cascade of weighted finite-state transducers. Our results show a small but statistically significant improvement in recognition accuracy. We also investigate the issue of suitable visual units for ALR, and show that visemes are sub-optimal, not but because they introduce lexical ambiguity, but because the reduction in modelling units entailed by their use reduces accuracy.

Item Type:	Article
Uncontrolled Keywords:	lip-reading,speech recognition,visemes,weighted finite state transducers,confusion matrices,confusion modelling
Faculty \ School:	Faculty of Science > School of Computing Sciences Faculty of Science
UEA Research Groups:	Faculty of Science > Research Groups > Visual Computing and Signal Processing Faculty of Science > Research Groups > Smart Emerging Technologies (former - to 2025)
Depositing User:	Pure Connector
Date Deposited:	20 Apr 2016 11:00
Last Modified:	15 Oct 2025 12:30
URI:	https://ueaeprints.uea.ac.uk/id/eprint/58310
DOI:	10.1016/j.imavis.2016.03.003

Downloads

Downloads per month over past year

Actions (login required)

View Item