Milner, Ben and Shao, Xu (2007) Prediction of fundamental frequency and voicing from mel-frequency cepstral coefficients for unconstrained speech reconstruction. IEEE Transactions on Audio, Speech, and Language Processing, 15 (1). pp. 24-33. ISSN 1558-7916
Full text not available from this repository. (Request a copy)Abstract
This work proposes a method for predicting the fundamental frequency and voicing of a frame of speech from its mel-frequency cepstral coefficient (MFCC) vector representation. This information is subsequently used to enable a speech signal to be reconstructed solely from a stream of MFCC vectors and has particular application in distributed speech recognition systems. Prediction is achieved by modeling the joint density of fundamental frequency and MFCCs. This is first modeled using a Gaussian mixture model (GMM) and then extended by using a set of hidden Markov models to link together a series of state-dependent GMMs. Prediction accuracy is measured on unconstrained speech input for both a speaker-dependent system and a speaker-independent system. A fundamental frequency prediction error of 3.06% is obtained on the speaker-dependent system in comparison to 8.27% on the speaker-independent system. On the speaker-dependent system 5.22% of frames have voicing errors compared to 8.82% on the speaker-independent system. Spectrogram analysis of reconstructed speech shows that highly intelligible speech is produced with the quality of the speaker-dependent speech being slightly higher owing to the more accurate fundamental frequency and voicing predictions
Item Type: | Article |
---|---|
Faculty \ School: | Faculty of Science > School of Computing Sciences |
UEA Research Groups: | Faculty of Science > Research Groups > Interactive Graphics and Audio Faculty of Science > Research Groups > Smart Emerging Technologies |
Depositing User: | Vishal Gautam |
Date Deposited: | 07 Mar 2011 13:51 |
Last Modified: | 27 Oct 2023 00:38 |
URI: | https://ueaeprints.uea.ac.uk/id/eprint/22049 |
DOI: | 10.1109/TASL.2006.876880 |
Actions (login required)
View Item |