The Challenge of Multispeaker Lip-Reading

Cox, Stephen; Harvey, Richard; Lan, Yuxuan; Newman, Jacob; Theobald, Barry-John

The Challenge of Multispeaker Lip-Reading

Tools

Cox, Stephen, Harvey, Richard, Lan, Yuxuan, Newman, Jacob and Theobald, Barry-John (2008) The Challenge of Multispeaker Lip-Reading. In: International Conference on Auditory-Visual Speech Processing, 2008-09-26 - 2008-09-29.

Full text not available from this repository. (Request a copy)

Abstract

In speech recognition, the problem of speaker variability has been well studied. Common approaches to dealing with it include normalising for a speaker's vocal tract length and learning a linear transform that moves the speaker-independent models closer to to a new speaker. In pure lip-reading (no audio) the problem has been less well studied. Results are often presented that are based on speaker-dependent (single speaker) or multispeaker (speakers in the test-set are also in the training-set) data, situations that are of limited use in real applications. This paper shows the danger of not using different speakers in the trainingand test-sets. Firstly, we present classification results on a new single-word database AVletters 2 which is a high-definition version of the well known AVletters database. By careful choice of features, we show that it is possible for the performance of visual-only lip-reading to be very close to that of audio-only recognition for the single speaker and multi-speaker configurations. However, in the speaker independent configuration, the performance of the visual-only channel degrades dramatically. By applying multidimensional scaling (MDS) to both the audio features and visual features, we demonstrate that lip-reading visual features, when compared with the MFCCs commonly used for audio speech recognition, have inherently small variation within a single speaker across all classes spoken. However, visual features are highly sensitive to the identity of the speaker, whereas audio features are relatively invariant.

Item Type:	Conference or Workshop Item (Paper)
Faculty \ School:	Faculty of Science > School of Computing Sciences
UEA Research Groups:	Faculty of Science > Research Groups > Visual Computing and Signal Processing (former - to 2025) Faculty of Science > Research Groups > Smart Emerging Technologies (former - to 2025) Faculty of Science > Research Groups > Data Science and AI Faculty of Science > Research Groups > Health Computing
Related URLs:	http://www.isca-speech.org/archive_open/...
Depositing User:	Nicola Talbot
Date Deposited:	14 Mar 2011 09:04
Last Modified:	04 Feb 2026 00:45
URI:	https://ueaeprints.uea.ac.uk/id/eprint/26021
DOI:

Actions (login required)

View Item