Improved speaker independent lip reading using speaker adaptive training and deep neural networks

Almajai, Ibrahim, Cox, Stephen, Harvey, Richard and Lan, Yuxuan (2016) Improved speaker independent lip reading using speaker adaptive training and deep neural networks. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE Press, pp. 2722-2726.

[img]
Preview
PDF (Accepted manuscript) - Submitted Version
Download (367kB) | Preview

Abstract

Recent improvements in tracking and feature extraction mean that speaker-dependent lip-reading of continuous speech using a medium size vocabulary (around 1000 words) is realistic. However, the recognition of previously unseen speakers has been found to be a very challenging task, because of the large variation in lip-shapes across speakers and the lack of large, tracked databases of visual features, which are very expensive to produce. By adapting a technique that is established in speech recognition but has not previously been used in lip-reading, we show that error-rates for speaker-independent lip-reading can be very significantly reduced. Furthermore, we show that error-rates can be even further reduced by the additional use of Deep Neural Networks (DNN). We also find that there is no need to map phonemes to visemes for context-dependent visual speech transcription.

Item Type: Book Section
Faculty \ School: Faculty of Science > School of Computing Sciences
Faculty of Science
Depositing User: Pure Connector
Date Deposited: 11 May 2017 05:08
Last Modified: 25 Jul 2019 04:25
URI: https://ueaeprints.uea.ac.uk/id/eprint/63479
DOI: 10.1109/ICASSP.2016.7472172

Actions (login required)

View Item View Item