Integrated Pitch and MFCC Extraction for Speech Reconstruction and Speech Recognition Applications

Shao, Xu, Milner, Ben P. and Cox, Stephen J. (2003) Integrated Pitch and MFCC Extraction for Speech Reconstruction and Speech Recognition Applications. In: Eurospeech-2003 — 8th European Conference on Speech Communication and Technology, 2003-09-01 - 2003-09-04.

Full text not available from this repository. (Request a copy)

Abstract

This paper proposes an integrated speech front-end for both speech recognition and speech reconstruction applications. Speech is first decomposed into a set of frequency bands by an auditory model. The output of this is then used to extract both robust pitch estimates and MFCC vectors. Initial tests used a 128 channel auditory model, but results show that this can be reduced significantly to between 23 and 32 channels. A detailed analysis of the pitch classification accuracy and the RMS pitch error shows the system to be more robust than both comb function and LPC-based pitch extraction. Speech recognition results show that the auditory-based cepstral coefficients give very similar performance to conventional MFCCs. Spectrograms and informal listening tests also reveal that speech reconstructed from the auditory-based cepstral coefficients and pitch has similar quality to that reconstructed from conventional MFCCs and pitch.

Item Type: Conference or Workshop Item (Paper)
Faculty \ School: Faculty of Science > School of Computing Sciences
UEA Research Groups: Faculty of Science > Research Groups > Interactive Graphics and Audio
Faculty of Science > Research Groups > Smart Emerging Technologies
Faculty of Science > Research Groups > Data Science and AI
Depositing User: Vishal Gautam
Date Deposited: 23 Jul 2011 19:34
Last Modified: 10 Dec 2024 01:15
URI: https://ueaeprints.uea.ac.uk/id/eprint/23132
DOI:

Actions (login required)

View Item View Item