Milner, Ben P., Shao, Xu and Darch, Jonathan (2005) Fundamental Frequency and Voicing Prediction from MFCCs for Speech Reconstruction from Unconstrained Speech. In: 9th European Conference on Speech Communication and Technology, 2005-09-04 - 2005-09-08.
Full text not available from this repository. (Request a copy)Abstract
This work proposes a method to predict the fundamental frequency and voicing of a frame of speech from its MFCC representation. This has particular use in distributed speech recognition systems where the ability to predict fundamental frequency and voicing allows a time-domain speech signal to be reconstructed solely from the MFCC vectors. Prediction is achieved by modeling the joint density of MFCCs and fundamental frequency with a combined hidden Markov model-Gaussian mixture model (HMM-GMM) framework. Prediction results are presented on unconstrained speech using both a speaker-dependent database and a speaker-independent database. Spectrogram comparisons of the reconstructed and original speech are also made. The results show for the speaker-dependent task a percentage fundamental frequency prediction error of 3.1% is made while for the speaker-independent task this rises to 8.3%.
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Faculty \ School: | Faculty of Science > School of Computing Sciences |
UEA Research Groups: | Faculty of Science > Research Groups > Interactive Graphics and Audio Faculty of Science > Research Groups > Smart Emerging Technologies Faculty of Science > Research Groups > Data Science and AI |
Depositing User: | Vishal Gautam |
Date Deposited: | 14 Jul 2011 07:31 |
Last Modified: | 10 Dec 2024 01:15 |
URI: | https://ueaeprints.uea.ac.uk/id/eprint/22048 |
DOI: |
Actions (login required)
View Item |