Reconstructing intelligible audio speech from visual speech features

Milner, Ben; Le Cornu, Thomas

Reconstructing intelligible audio speech from visual speech features

Tools

Milner, Ben and Le Cornu, Thomas (2015) Reconstructing intelligible audio speech from visual speech features. In: Interspeech 2015, 2015-09-06 - 2015-09-10.

Preview

PDF (interspeech-v2a-2015) - Accepted Version
Download (188kB) | Preview

Abstract

This work describes an investigation into the feasibility of producing intelligible audio speech from only visual speech fea- tures. The proposed method aims to estimate a spectral enve- lope from visual features which is then combined with an arti- ficial excitation signal and used within a model of speech pro- duction to reconstruct an audio signal. Different combinations of audio and visual features are considered, along with both a statistical method of estimation and a deep neural network. The intelligibility of the reconstructed audio speech is measured by human listeners, and then compared to the intelligibility of the video signal only and when combined with the reconstructed audio.

Item Type:	Conference or Workshop Item (Paper)
Faculty \ School:	Faculty of Science > School of Computing Sciences Faculty of Science
UEA Research Groups:	Faculty of Science > Research Groups > Visual Computing and Signal Processing (former - to 2025) Faculty of Science > Research Groups > Smart Emerging Technologies (former - to 2025) Faculty of Science > Research Groups > Data Science and AI Faculty of Science > Research Groups > Cyber Intelligence and Networks
Depositing User:	Pure Connector
Date Deposited:	21 Jan 2016 10:00
Last Modified:	14 May 2026 15:25
URI:	https://ueaeprints.uea.ac.uk/id/eprint/56718
DOI:

Downloads

Downloads per month over past year

Actions (login required)

View Item