Almajai, Ibrahim, Cox, Stephen, Harvey, Richard and Lan, Yuxuan (2016) Improved speaker independent lip reading using speaker adaptive training and deep neural networks. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). The Institute of Electrical and Electronics Engineers (IEEE), pp. 2722-2726.
| Preview | PDF (Accepted manuscript)
 - Accepted Version Download (367kB) | Preview | 
Abstract
Recent improvements in tracking and feature extraction mean that speaker-dependent lip-reading of continuous speech using a medium size vocabulary (around 1000 words) is realistic. However, the recognition of previously unseen speakers has been found to be a very challenging task, because of the large variation in lip-shapes across speakers and the lack of large, tracked databases of visual features, which are very expensive to produce. By adapting a technique that is established in speech recognition but has not previously been used in lip-reading, we show that error-rates for speaker-independent lip-reading can be very significantly reduced. Furthermore, we show that error-rates can be even further reduced by the additional use of Deep Neural Networks (DNN). We also find that there is no need to map phonemes to visemes for context-dependent visual speech transcription.
| Item Type: | Book Section | 
|---|---|
| Faculty \ School: | Faculty of Science > School of Computing Sciences Faculty of Science | 
| UEA Research Groups: | Faculty of Science > Research Groups > Visual Computing and Signal Processing Faculty of Science > Research Groups > Smart Emerging Technologies (former - to 2025) | 
| Depositing User: | Pure Connector | 
| Date Deposited: | 11 May 2017 05:08 | 
| Last Modified: | 16 Oct 2025 13:34 | 
| URI: | https://ueaeprints.uea.ac.uk/id/eprint/63479 | 
| DOI: | 10.1109/ICASSP.2016.7472172 | 
Downloads
Downloads per month over past year
Actions (login required)
|  | View Item | 
 
         Tools
 Tools Tools
 Tools