Improving Lip-reading Performance for Robust Audiovisual Speech Recognition using DNNs

Thangthai, Kwanchiva; Harvey, Richard; Cox, Stephen; Theobald, Barry-John

Improving Lip-reading Performance for Robust Audiovisual Speech Recognition using DNNs

Tools

Thangthai, Kwanchiva, Harvey, Richard ORCID: https://orcid.org/0000-0001-9925-8316, Cox, Stephen and Theobald, Barry-John (2015) Improving Lip-reading Performance for Robust Audiovisual Speech Recognition using DNNs. In: FAAVSP - The 1st Joint Conference on Facial Analysis, Animation, and Auditory-Visual Speech Processing. UNSPECIFIED, AUT, pp. 127-131.

Full text not available from this repository. (Request a copy)

Abstract

This paper presents preliminary experiments using the Kaldi toolkit to investigate audiovisual speech recognition (AVSR) in noisy environments using deep neural networks (DNNs). In particular we use a single-speaker large vocabulary, continuous audiovisual speech corpus to compare the performance of visual-only, audio-only and audiovisual speech recognition. The models trained using the Kaldi toolkit are compared with the performance of models trained using conventional hidden Markov models (HMMs). In addition, we compare the performance of a speech recognizer both with and without visual features over nine different SNR levels of babble noise ranging from 20dB down to -20dB. The results show that the DNN outperforms conventional HMMs in all experimental conditions, especially for the lip-reading only system, which achieves a gain of 37.19% accuracy (84.67% absolute word accuracy). Moreover, the DNN provides an effective improvement of 10 and 12dB SNR respectively for both the single modal and bimodal speech recognition systems. However, integrating the visual features using simple feature fusion is only effective in SNRs at 5dB and above. Below this the degradion in accuracy of an audiovisual system is similar to the audio only recognizer. Index Terms: lip-reading, speech reading, audiovisual speech recognition

Item Type:	Book Section
Faculty \ School:	Faculty of Science > School of Computing Sciences Faculty of Science
UEA Research Groups:	Faculty of Science > Research Groups > Visual Computing and Signal Processing (former - to 2025) Faculty of Science > Research Groups > Smart Emerging Technologies (former - to 2025)
Related URLs:	http://www.isca-speech.org/archive/avsp1... http://www.isca-speech.org/archive/avsp1...
Depositing User:	Pure Connector
Date Deposited:	08 Jun 2017 05:09
Last Modified:	18 Jun 2026 21:56
URI:	https://ueaeprints.uea.ac.uk/id/eprint/63709
DOI:

Actions (login required)

View Item