Pitch prediction from MFCC vectors for speech reconstruction

Tools

Shao, X. and Milner, B. P. (2004) Pitch prediction from MFCC vectors for speech reconstruction. In: IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), 2005-03-18 - 2005-03-23.

Full text not available from this repository. (Request a copy)

Abstract

The paper proposes a technique for reconstructing an acoustic speech signal solely from a stream of Mel-frequency cepstral coefficients (MFCCs). Previous speech reconstruction methods have required an additional pitch element, but this work proposes two maximum a posteriori (MAP) methods for predicting pitch from the MFCC vectors themselves. The first method is based on a Gaussian mixture model (GMM) while the second scheme utilises the temporal correlation available from a hidden Markov model (HMM) framework. A formal measurement of both frame classification accuracy and RMS pitch error shows that an HMM-based scheme with 5 clusters per state is able to classify correctly over 94% of frames and has an RMS pitch error of 3.1 Hz in comparison to a reference pitch. Informal listening tests and analysis of spectrograms reveals that speech reconstructed solely from the MFCC vectors is almost indistinguishable from that using the reference pitch.

Item Type:	Conference or Workshop Item (Other)
Faculty \ School:	Faculty of Science > School of Computing Sciences
UEA Research Groups:	Faculty of Science > Research Groups > Visual Computing and Signal Processing Faculty of Science > Research Groups > Smart Emerging Technologies (former - to 2025) Faculty of Science > Research Groups > Data Science and AI Faculty of Science > Research Groups > Cyber Intelligence and Networks
Depositing User:	EPrints Services
Date Deposited:	01 Oct 2010 13:41
Last Modified:	28 Mar 2025 13:23
URI:	https://ueaeprints.uea.ac.uk/id/eprint/3050
DOI:	10.1109/ICASSP.2004.1325931

Actions (login required)

View Item