Clean speech reconstruction from MFCC vectors and fundamental frequency using an integrated front-end

Milner, Ben P.; Shao, Xu

doi:10.1016/j.specom.2005.10.004

Clean speech reconstruction from MFCC vectors and fundamental frequency using an integrated front-end

Tools

Milner, Ben P. and Shao, Xu (2006) Clean speech reconstruction from MFCC vectors and fundamental frequency using an integrated front-end. Speech Communication, 48 (6). pp. 697-715. ISSN 0167-6393

Full text not available from this repository. (Request a copy)

Abstract

The aim of this work is to enable a noise-free time-domain speech signal to be reconstructed from a stream of MFCC vectors and fundamental frequency and voicing estimates, such as may be received in a distributed speech recognition system. To facilitate reconstruction, both a sinusoidal model and a source-filter model of speech are compared by listening tests and spectrogram analysis, with the result that the former provides higher quality speech reconstruction. Analysis of the sinusoidal model shows that for clean speech reconstruction, both a noise-free spectral envelope and a robust estimate of the fundamental frequency and voicing are necessary. Investigation into fundamental frequency estimation reveals that an auditory model based approach gives superior performance over other methods of estimation. This leads to the proposal of an integrated front-end which uses the auditory model for both fundamental frequency and voicing estimation, and as the filterbank stage in MFCC extraction, and thereby reduces computation. Applying spectral subtraction to the auditory model parameters improves the spectral envelope estimates needed for clean speech reconstruction. Experiments on the Aurora connected digits database show that the auditory model-based MFCCs give comparable performance to that attained with conventional MFCCs. Speech reconstruction tests reveal that the combination of robust fundamental frequency and voicing estimation with spectral subtraction in the integrated front-end leads to intelligible and relatively noise-free speech.

Item Type:	Article
Faculty \ School:	Faculty of Science > School of Computing Sciences
UEA Research Groups:	Faculty of Science > Research Groups > Smart Emerging Technologies (former - to 2025) Faculty of Science > Research Groups > Visual Computing and Signal Processing (former - to 2025) Faculty of Science > Research Groups > Data Science and AI Faculty of Science > Research Groups > Cyber Intelligence and Networks
Depositing User:	Vishal Gautam
Date Deposited:	01 Jun 2011 19:16
Last Modified:	09 Feb 2026 09:30
URI:	https://ueaeprints.uea.ac.uk/id/eprint/22225
DOI:	10.1016/j.specom.2005.10.004

Actions (login required)

View Item