Improved speaker independent lip reading using speaker adaptive training and deep neural networks

Almajai, Ibrahim, Cox, Stephen, Harvey, Richard and Lan, Yuxuan (2016) Improved speaker independent lip reading using speaker adaptive training and deep neural networks. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). The Institute of Electrical and Electronics Engineers (IEEE), pp. 2722-2726.

[thumbnail of Accepted manuscript]
Preview
PDF (Accepted manuscript) - Accepted Version
Download (367kB) | Preview

Abstract

Recent improvements in tracking and feature extraction mean that speaker-dependent lip-reading of continuous speech using a medium size vocabulary (around 1000 words) is realistic. However, the recognition of previously unseen speakers has been found to be a very challenging task, because of the large variation in lip-shapes across speakers and the lack of large, tracked databases of visual features, which are very expensive to produce. By adapting a technique that is established in speech recognition but has not previously been used in lip-reading, we show that error-rates for speaker-independent lip-reading can be very significantly reduced. Furthermore, we show that error-rates can be even further reduced by the additional use of Deep Neural Networks (DNN). We also find that there is no need to map phonemes to visemes for context-dependent visual speech transcription.

Item Type: Book Section
Faculty \ School: Faculty of Science > School of Computing Sciences
Faculty of Science
UEA Research Groups: Faculty of Science > Research Groups > Interactive Graphics and Audio
Faculty of Science > Research Groups > Smart Emerging Technologies
Depositing User: Pure Connector
Date Deposited: 11 May 2017 05:08
Last Modified: 06 Feb 2025 13:20
URI: https://ueaeprints.uea.ac.uk/id/eprint/63479
DOI: 10.1109/ICASSP.2016.7472172

Downloads

Downloads per month over past year

Actions (login required)

View Item View Item