A Mouth Full of Words: Visually Consistent Acoustic Redubbing

Taylor, Sarah, Theobald, Barry-John and Matthews, Iain (2015) A Mouth Full of Words: Visually Consistent Acoustic Redubbing. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). The Institute of Electrical and Electronics Engineers (IEEE), AUS, pp. 4904-4908. ISBN 978-1-4673-6997-8

[thumbnail of preprint]
Preview
PDF (preprint) - Accepted Version
Download (1MB) | Preview

Abstract

This paper introduces a method for automatic redubbing of video that exploits the many-to-many mapping of phoneme sequences to lip movements modelled as dynamic visemes [1]. For a given utterance, the corresponding dynamic viseme sequence is sampled to construct a graph of possible phoneme sequences that synchronize with the video. When composed with a pronunciation dictionary and language model, this produces a vast number of word sequences that are in sync with the original video, literally putting plausible words into the mouth of the speaker. We demonstrate that traditional, one-to-many, static visemes lack flexibility for this application as they produce significantly fewer word sequences. This work explores the natural ambiguity in visual speech and offers insight for automatic speech recognition and the importance of language modeling.

Item Type: Book Section
Additional Information: Print ISSN: 1520-6149 Electronic ISSN: 2379-190X
Uncontrolled Keywords: audio-visual speech,dynamic visemes,acoustic redubbing
Faculty \ School: Faculty of Science > School of Computing Sciences
UEA Research Groups: Faculty of Science > Research Groups > Interactive Graphics and Audio
Faculty of Science > Research Groups > Smart Emerging Technologies
Depositing User: Pure Connector
Date Deposited: 06 Jan 2016 13:02
Last Modified: 28 Apr 2023 00:17
URI: https://ueaeprints.uea.ac.uk/id/eprint/56081
DOI: 10.1109/ICASSP.2015.7178903

Downloads

Downloads per month over past year

Actions (login required)

View Item View Item