Book Section #60483

Taylor, Sarah, Kato, Akihiro, Milner, Ben and Matthews, Iain (2016) UNSPECIFIED In: UNSPECIFIED International Speech Communication Association, pp. 1482-1486.

[img]
Preview
PDF (Accepted manuscript) - Submitted Version
Download (649kB) | Preview

    Abstract

    We study the problem of mapping from acoustic to visual speech with the goal of generating accurate, perceptually natural speech animation automatically from an audio speech signal. We present a sliding window deep neural network that learns a mapping from a window of acoustic features to a window of visual features from a large audio-visual speech dataset. Overlapping visual predictions are averaged to generate continuous, smoothly varying speech animation. We outperform a baseline HMM inversion approach in both objective and subjective evaluations and perform a thorough analysis of our results.

    Item Type: Book Section
    Faculty \ School: Faculty of Science
    Faculty of Science > School of Computing Sciences
    Depositing User: Pure Connector
    Date Deposited: 24 Sep 2016 02:06
    Last Modified: 25 Jul 2018 01:12
    URI: https://ueaeprints.uea.ac.uk/id/eprint/60483
    DOI: 10.21437/Interspeech.2016-483

    Actions (login required)

    View Item