Taylor, Sarah, Kato, Akihiro, Milner, Ben and Matthews, Iain (2016) Audio-to-Visual Speech Conversion using Deep Neural Networks. In: Proceedings of the Interspeech Conference 2016. International Speech Communication Association, USA, pp. 1482-1486.
Preview |
PDF (Accepted manuscript)
- Accepted Version
Download (665kB) | Preview |
Abstract
We study the problem of mapping from acoustic to visual speech with the goal of generating accurate, perceptually natural speech animation automatically from an audio speech signal. We present a sliding window deep neural network that learns a mapping from a window of acoustic features to a window of visual features from a large audio-visual speech dataset. Overlapping visual predictions are averaged to generate continuous, smoothly varying speech animation. We outperform a baseline HMM inversion approach in both objective and subjective evaluations and perform a thorough analysis of our results.
Item Type: | Book Section |
---|---|
Faculty \ School: | Faculty of Science Faculty of Science > School of Computing Sciences |
UEA Research Groups: | Faculty of Science > Research Groups > Interactive Graphics and Audio Faculty of Science > Research Groups > Smart Emerging Technologies |
Depositing User: | Pure Connector |
Date Deposited: | 24 Sep 2016 01:06 |
Last Modified: | 20 Apr 2023 01:11 |
URI: | https://ueaeprints.uea.ac.uk/id/eprint/60483 |
DOI: | 10.21437/Interspeech.2016-483 |
Downloads
Downloads per month over past year
Actions (login required)
View Item |