Audio-to-Visual Speech Conversion using Deep Neural Networks

Taylor, Sarah, Kato, Akihiro, Milner, Ben and Matthews, Iain (2016) Audio-to-Visual Speech Conversion using Deep Neural Networks. In: Proceedings of the Interspeech Conference 2016. International Speech Communication Association, pp. 1482-1486.

PDF (Accepted manuscript) - Submitted Version
Download (649kB) | Preview


    We study the problem of mapping from acoustic to visual speech with the goal of generating accurate, perceptually natural speech animation automatically from an audio speech signal. We present a sliding window deep neural network that learns a mapping from a window of acoustic features to a window of visual features from a large audio-visual speech dataset. Overlapping visual predictions are averaged to generate continuous, smoothly varying speech animation. We outperform a baseline HMM inversion approach in both objective and subjective evaluations and perform a thorough analysis of our results.

    Item Type: Book Section
    Faculty \ School: Faculty of Science
    Faculty of Science > School of Computing Sciences
    Depositing User: Pure Connector
    Date Deposited: 24 Sep 2016 02:06
    Last Modified: 26 Apr 2018 10:31
    DOI: 10.21437/Interspeech.2016-483

    Actions (login required)

    View Item