Shao, Xu, Milner, Ben P. and Cox, Stephen J. (2003) Integrated Pitch and MFCC Extraction for Speech Reconstruction and Speech Recognition Applications. In: Eurospeech-2003 — 8th European Conference on Speech Communication and Technology, 2003-09-01 - 2003-09-04.
Full text not available from this repository. (Request a copy)Abstract
This paper proposes an integrated speech front-end for both speech recognition and speech reconstruction applications. Speech is first decomposed into a set of frequency bands by an auditory model. The output of this is then used to extract both robust pitch estimates and MFCC vectors. Initial tests used a 128 channel auditory model, but results show that this can be reduced significantly to between 23 and 32 channels. A detailed analysis of the pitch classification accuracy and the RMS pitch error shows the system to be more robust than both comb function and LPC-based pitch extraction. Speech recognition results show that the auditory-based cepstral coefficients give very similar performance to conventional MFCCs. Spectrograms and informal listening tests also reveal that speech reconstructed from the auditory-based cepstral coefficients and pitch has similar quality to that reconstructed from conventional MFCCs and pitch.
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Faculty \ School: | Faculty of Science > School of Computing Sciences |
UEA Research Groups: | Faculty of Science > Research Groups > Interactive Graphics and Audio Faculty of Science > Research Groups > Smart Emerging Technologies Faculty of Science > Research Groups > Data Science and AI |
Depositing User: | Vishal Gautam |
Date Deposited: | 23 Jul 2011 19:34 |
Last Modified: | 10 Dec 2024 01:15 |
URI: | https://ueaeprints.uea.ac.uk/id/eprint/23132 |
DOI: |
Actions (login required)
View Item |