Milner, Ben P. and Shao, Xu (2006) Clean speech reconstruction from MFCC vectors and fundamental frequency using an integrated front-end. Speech Communication, 48 (6). pp. 697-715. ISSN 0167-6393
Full text not available from this repository. (Request a copy)Abstract
The aim of this work is to enable a noise-free time-domain speech signal to be reconstructed from a stream of MFCC vectors and fundamental frequency and voicing estimates, such as may be received in a distributed speech recognition system. To facilitate reconstruction, both a sinusoidal model and a source-filter model of speech are compared by listening tests and spectrogram analysis, with the result that the former provides higher quality speech reconstruction. Analysis of the sinusoidal model shows that for clean speech reconstruction, both a noise-free spectral envelope and a robust estimate of the fundamental frequency and voicing are necessary. Investigation into fundamental frequency estimation reveals that an auditory model based approach gives superior performance over other methods of estimation. This leads to the proposal of an integrated front-end which uses the auditory model for both fundamental frequency and voicing estimation, and as the filterbank stage in MFCC extraction, and thereby reduces computation. Applying spectral subtraction to the auditory model parameters improves the spectral envelope estimates needed for clean speech reconstruction. Experiments on the Aurora connected digits database show that the auditory model-based MFCCs give comparable performance to that attained with conventional MFCCs. Speech reconstruction tests reveal that the combination of robust fundamental frequency and voicing estimation with spectral subtraction in the integrated front-end leads to intelligible and relatively noise-free speech.
Item Type: | Article |
---|---|
Faculty \ School: | Faculty of Science > School of Computing Sciences |
UEA Research Groups: | Faculty of Science > Research Groups > Smart Emerging Technologies Faculty of Science > Research Groups > Interactive Graphics and Audio |
Depositing User: | Vishal Gautam |
Date Deposited: | 01 Jun 2011 19:16 |
Last Modified: | 22 Apr 2023 01:24 |
URI: | https://ueaeprints.uea.ac.uk/id/eprint/22225 |
DOI: | 10.1016/j.specom.2005.10.004 |
Actions (login required)
View Item |