Almajai, Ibrahim, Cox, Stephen, Harvey, Richard ORCID: https://orcid.org/0000-0001-9925-8316 and Lan, Yuxuan (2016) Improved speaker independent lip reading using speaker adaptive training and deep neural networks. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). The Institute of Electrical and Electronics Engineers (IEEE), pp. 2722-2726.
Preview |
PDF (Accepted manuscript)
- Accepted Version
Download (367kB) | Preview |
Abstract
Recent improvements in tracking and feature extraction mean that speaker-dependent lip-reading of continuous speech using a medium size vocabulary (around 1000 words) is realistic. However, the recognition of previously unseen speakers has been found to be a very challenging task, because of the large variation in lip-shapes across speakers and the lack of large, tracked databases of visual features, which are very expensive to produce. By adapting a technique that is established in speech recognition but has not previously been used in lip-reading, we show that error-rates for speaker-independent lip-reading can be very significantly reduced. Furthermore, we show that error-rates can be even further reduced by the additional use of Deep Neural Networks (DNN). We also find that there is no need to map phonemes to visemes for context-dependent visual speech transcription.
Item Type: | Book Section |
---|---|
Faculty \ School: | Faculty of Science > School of Computing Sciences Faculty of Science |
UEA Research Groups: | Faculty of Science > Research Groups > Interactive Graphics and Audio Faculty of Science > Research Groups > Smart Emerging Technologies |
Depositing User: | Pure Connector |
Date Deposited: | 11 May 2017 05:08 |
Last Modified: | 19 Apr 2023 21:37 |
URI: | https://ueaeprints.uea.ac.uk/id/eprint/63479 |
DOI: | 10.1109/ICASSP.2016.7472172 |
Downloads
Downloads per month over past year
Actions (login required)
View Item |