Audio-to-Visual Speech Conversion using Deep Neural Networks

Taylor, Sarah; Kato, Akihiro; Milner, Ben; Matthews, Iain

doi:10.21437/Interspeech.2016-483

Audio-to-Visual Speech Conversion using Deep Neural Networks

Tools

Taylor, Sarah, Kato, Akihiro, Milner, Ben and Matthews, Iain (2016) Audio-to-Visual Speech Conversion using Deep Neural Networks. In: Proceedings of the Interspeech Conference 2016. International Speech Communication Association, USA, pp. 1482-1486.

Preview

PDF (Accepted manuscript) - Accepted Version
Download (665kB) | Preview

Abstract

We study the problem of mapping from acoustic to visual speech with the goal of generating accurate, perceptually natural speech animation automatically from an audio speech signal. We present a sliding window deep neural network that learns a mapping from a window of acoustic features to a window of visual features from a large audio-visual speech dataset. Overlapping visual predictions are averaged to generate continuous, smoothly varying speech animation. We outperform a baseline HMM inversion approach in both objective and subjective evaluations and perform a thorough analysis of our results.

Item Type:	Book Section
Faculty \ School:	Faculty of Science Faculty of Science > School of Computing Sciences
UEA Research Groups:	Faculty of Science > Research Groups > Visual Computing and Signal Processing (former - to 2025) Faculty of Science > Research Groups > Smart Emerging Technologies (former - to 2025) Faculty of Science > Research Groups > Data Science and AI Faculty of Science > Research Groups > Cyber Intelligence and Networks
Depositing User:	Pure Connector
Date Deposited:	24 Sep 2016 01:06
Last Modified:	05 Feb 2026 12:30
URI:	https://ueaeprints.uea.ac.uk/id/eprint/60483
DOI:	10.21437/Interspeech.2016-483

Downloads

Downloads per month over past year

Actions (login required)

View Item