Visual speech synthesis using dynamic visemes, contextual features and DNNs

Thangthai, Ausdang; Milner, Ben; Taylor, Sarah

doi:10.21437/Interspeech.2016-1084

Visual speech synthesis using dynamic visemes, contextual features and DNNs

Tools

Thangthai, Ausdang, Milner, Ben and Taylor, Sarah (2016) Visual speech synthesis using dynamic visemes, contextual features and DNNs. In: Proceedings of the Interspeech Conference 2016. International Speech Communication Association, USA, pp. 2458-2462.

[thumbnail of interspeech-2016-phoneme2visual-v0.14]

Preview

PDF (interspeech-2016-phoneme2visual-v0.14) - Accepted Version
Download (138kB) | Preview

Abstract

This paper examines methods to improve visual speech synthesis from a text input using a deep neural network (DNN). Two representations of the input text are considered, namely into phoneme sequences or dynamic viseme sequences. From these sequences, contextual features are extracted that include information at varying linguistic levels, from frame level down to the utterance level. These are extracted from a broad sliding window that captures context and produces features that are input into the DNN to estimate visual features. Experiments first compare the accuracy of these visual features against an HMM baseline method which establishes that both the phoneme and dynamic viseme systems perform better with best performance obtained by a combined phoneme-dynamic viseme system. An investigation into the features then reveals the importance of the frame level information which is able to avoid discontinuities in the visual feature sequence and produces a smooth and realistic output.

Item Type:	Book Section
Faculty \ School:	Faculty of Science > School of Computing Sciences Faculty of Science
UEA Research Groups:	Faculty of Science > Research Groups > Visual Computing and Signal Processing (former - to 2025) Faculty of Science > Research Groups > Smart Emerging Technologies (former - to 2025) Faculty of Science > Research Groups > Data Science and AI Faculty of Science > Research Groups > Cyber Intelligence and Networks
Depositing User:	Pure Connector
Date Deposited:	24 Sep 2016 01:06
Last Modified:	05 Feb 2026 12:30
URI:	https://ueaeprints.uea.ac.uk/id/eprint/60485
DOI:	10.21437/Interspeech.2016-1084

Downloads

Downloads per month over past year

Actions (login required)

View Item