Improved speaker independent lip reading using speaker adaptive training and deep neural networks

Almajai, Ibrahim, Cox, Stephen, Harvey, Richard ORCID: and Lan, Yuxuan (2016) Improved speaker independent lip reading using speaker adaptive training and deep neural networks. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). The Institute of Electrical and Electronics Engineers (IEEE), pp. 2722-2726.

[thumbnail of Accepted manuscript]
PDF (Accepted manuscript) - Accepted Version
Download (367kB) | Preview


Recent improvements in tracking and feature extraction mean that speaker-dependent lip-reading of continuous speech using a medium size vocabulary (around 1000 words) is realistic. However, the recognition of previously unseen speakers has been found to be a very challenging task, because of the large variation in lip-shapes across speakers and the lack of large, tracked databases of visual features, which are very expensive to produce. By adapting a technique that is established in speech recognition but has not previously been used in lip-reading, we show that error-rates for speaker-independent lip-reading can be very significantly reduced. Furthermore, we show that error-rates can be even further reduced by the additional use of Deep Neural Networks (DNN). We also find that there is no need to map phonemes to visemes for context-dependent visual speech transcription.

Item Type: Book Section
Faculty \ School: Faculty of Science > School of Computing Sciences
Faculty of Science
UEA Research Groups: Faculty of Science > Research Groups > Interactive Graphics and Audio
Faculty of Science > Research Groups > Smart Emerging Technologies
Depositing User: Pure Connector
Date Deposited: 11 May 2017 05:08
Last Modified: 19 Apr 2023 21:37
DOI: 10.1109/ICASSP.2016.7472172

Actions (login required)

View Item View Item