Book Section #60483

Taylor, Sarah, Kato, Akihiro, Milner, Ben and Matthews, Iain (2016) UNSPECIFIED In: UNSPECIFIED International Speech Communication Association, pp. 1482-1486.

PDF (Accepted manuscript) - Submitted Version
Download (649kB) | Preview


    We study the problem of mapping from acoustic to visual speech with the goal of generating accurate, perceptually natural speech animation automatically from an audio speech signal. We present a sliding window deep neural network that learns a mapping from a window of acoustic features to a window of visual features from a large audio-visual speech dataset. Overlapping visual predictions are averaged to generate continuous, smoothly varying speech animation. We outperform a baseline HMM inversion approach in both objective and subjective evaluations and perform a thorough analysis of our results.

    Item Type: Book Section
    Faculty \ School: Faculty of Science
    Faculty of Science > School of Computing Sciences
    Depositing User: Pure Connector
    Date Deposited: 24 Sep 2016 02:06
    Last Modified: 25 Jul 2018 01:12
    DOI: 10.21437/Interspeech.2016-483

    Actions (login required)

    View Item