Taylor, Sarah, Kato, Akihiro, Milner, Ben and Matthews, Iain (2016) Audio-to-Visual Speech Conversion using Deep Neural Networks. In: Proceedings of the Interspeech Conference 2016. International Speech Communication Association, USA, pp. 1482-1486.
| Preview | PDF (Accepted manuscript)
 - Accepted Version Download (665kB) | Preview | 
Abstract
We study the problem of mapping from acoustic to visual speech with the goal of generating accurate, perceptually natural speech animation automatically from an audio speech signal. We present a sliding window deep neural network that learns a mapping from a window of acoustic features to a window of visual features from a large audio-visual speech dataset. Overlapping visual predictions are averaged to generate continuous, smoothly varying speech animation. We outperform a baseline HMM inversion approach in both objective and subjective evaluations and perform a thorough analysis of our results.
| Item Type: | Book Section | 
|---|---|
| Faculty \ School: | Faculty of Science Faculty of Science > School of Computing Sciences | 
| UEA Research Groups: | Faculty of Science > Research Groups > Visual Computing and Signal Processing Faculty of Science > Research Groups > Smart Emerging Technologies (former - to 2025) Faculty of Science > Research Groups > Data Science and AI Faculty of Science > Research Groups > Cyber Intelligence and Networks | 
| Depositing User: | Pure Connector | 
| Date Deposited: | 24 Sep 2016 01:06 | 
| Last Modified: | 16 Oct 2025 08:33 | 
| URI: | https://ueaeprints.uea.ac.uk/id/eprint/60483 | 
| DOI: | 10.21437/Interspeech.2016-483 | 
Downloads
Downloads per month over past year
Actions (login required)
|  | View Item | 
 
         Tools
 Tools Tools
 Tools