Maximising audio-visual speech correlation

Almajai, Ibrahim and Milner, Ben (2007) Maximising audio-visual speech correlation. In: Auditory-Visual Speech Processing (AVSP2007), 2007-08-31 - 2007-09-03.

Full text not available from this repository. (Request a copy)


The aim of this work is to investigate a selection of audio and visual speech features with the aim of finding pairs that maximise audio-visual correlation. Two audio speech features have been used in the analysis - filterbank vectors and the first four formant frequencies. Similarly, three visual features have also been considered - active appearance model (AAM), 2-D DCT and cross-DCT. From a database of 200 sentences, audio and visual speech features have been extracted and multiple linear regression used to measure the audio-visual correlation. Results reveal filterbank features to exhibit multiple correlation of around R=0.8 to visual features, while formant frequencies show substantially less correlation to visual features - R=0.6 for formants 1 and 2 and less than R=0.4 for formants 3 and 4. The three visual features show almost identical correlation to the audio features, varying in multiple correlation by less than 0.1, even though the methods of visual feature extraction are very different. Measuring the audio-visual correlation within each phoneme and then averaging the correlation across all phonemes showed an increase in correlation to R=0.9.

Item Type: Conference or Workshop Item (Paper)
Faculty \ School: Faculty of Science > School of Computing Sciences
UEA Research Groups: Faculty of Science > Research Groups > Smart Emerging Technologies
Faculty of Science > Research Groups > Interactive Graphics and Audio
Related URLs:
Depositing User: Vishal Gautam
Date Deposited: 04 Apr 2011 12:06
Last Modified: 20 Jun 2023 14:32

Actions (login required)

View Item View Item