Prediction of fundamental frequency and voicing from mel-frequency cepstral coefficients for unconstrained speech reconstruction

Milner, Ben; Shao, Xu

doi:10.1109/TASL.2006.876880

Prediction of fundamental frequency and voicing from mel-frequency cepstral coefficients for unconstrained speech reconstruction

Tools

Milner, Ben and Shao, Xu (2007) Prediction of fundamental frequency and voicing from mel-frequency cepstral coefficients for unconstrained speech reconstruction. IEEE Transactions on Audio, Speech, and Language Processing, 15 (1). pp. 24-33. ISSN 1558-7916

Full text not available from this repository. (Request a copy)

Abstract

This work proposes a method for predicting the fundamental frequency and voicing of a frame of speech from its mel-frequency cepstral coefficient (MFCC) vector representation. This information is subsequently used to enable a speech signal to be reconstructed solely from a stream of MFCC vectors and has particular application in distributed speech recognition systems. Prediction is achieved by modeling the joint density of fundamental frequency and MFCCs. This is first modeled using a Gaussian mixture model (GMM) and then extended by using a set of hidden Markov models to link together a series of state-dependent GMMs. Prediction accuracy is measured on unconstrained speech input for both a speaker-dependent system and a speaker-independent system. A fundamental frequency prediction error of 3.06% is obtained on the speaker-dependent system in comparison to 8.27% on the speaker-independent system. On the speaker-dependent system 5.22% of frames have voicing errors compared to 8.82% on the speaker-independent system. Spectrogram analysis of reconstructed speech shows that highly intelligible speech is produced with the quality of the speaker-dependent speech being slightly higher owing to the more accurate fundamental frequency and voicing predictions

Item Type:	Article
Faculty \ School:	Faculty of Science > School of Computing Sciences
UEA Research Groups:	Faculty of Science > Research Groups > Visual Computing and Signal Processing (former - to 2025) Faculty of Science > Research Groups > Smart Emerging Technologies (former - to 2025) Faculty of Science > Research Groups > Data Science and AI Faculty of Science > Research Groups > Cyber Intelligence and Networks
Depositing User:	Vishal Gautam
Date Deposited:	07 Mar 2011 13:51
Last Modified:	26 Apr 2026 13:23
URI:	https://ueaeprints.uea.ac.uk/id/eprint/22049
DOI:	10.1109/TASL.2006.876880

Actions (login required)

View Item