Improving computer lipreading via DNN sequence discriminative training techniques

Thangthai, Kwanchiva; Harvey, Richard

doi:10.21437/Interspeech.2017-106

Improving computer lipreading via DNN sequence discriminative training techniques

Tools

Thangthai, Kwanchiva and Harvey, Richard ORCID: https://orcid.org/0000-0001-9925-8316 (2017) Improving computer lipreading via DNN sequence discriminative training techniques. In: Proceedings of Interspeech 2017. ISCA, pp. 3657-3661.

Preview

PDF (Accepted manuscript) - Accepted Version
Download (237kB) | Preview

Abstract

Although there have been some promising results in computer lipreading, there has been a paucity of data on which to train automatic systems. However the recent emergence of the TCD-TIMIT corpus, with around 6000 words, 59 speakers and seven hours of recorded audio-visual speech, allows the deployment of more recent techniques in audio-speech such as Deep Neural Networks (DNNs) and sequence discriminative training. In this paper we combine the DNN with a Hidden Markov Model (HMM) to the, so called, hybrid DNN-HMM configuration which we train using a variety of sequence discriminative training methods. This is then followed with a weighted finite state transducer. The conclusion is that the DNN offers very substantial improvement over a conventional classifier which uses a Gaussian Mixture Model (GMM) to model the densities even when optimised with Speaker Adaptive Training. Sequence adaptive training offers further improvements depending on the precise variety employed but those improvements are of the order of ~10\% improvement in word accuracy. Putting these two results together implies that lipreading is moving from something of rather esoteric interest to becoming a practical reality in the foreseeable future.

Item Type:	Book Section
Uncontrolled Keywords:	visual-only speech recognition,computer lipreading
Faculty \ School:	Faculty of Science > School of Computing Sciences
UEA Research Groups:	Faculty of Science > Research Groups > Visual Computing and Signal Processing (former - to 2025) Faculty of Science > Research Groups > Smart Emerging Technologies (former - to 2025)
Depositing User:	Pure Connector
Date Deposited:	08 Jun 2017 05:09
Last Modified:	18 Jun 2026 21:56
URI:	https://ueaeprints.uea.ac.uk/id/eprint/63708
DOI:	10.21437/Interspeech.2017-106

Downloads

Downloads per month over past year

Actions (login required)

View Item