Fundamental Frequency and Voicing Prediction from MFCCs for Speech Reconstruction from Unconstrained Speech

Milner, Ben P., Shao, Xu and Darch, Jonathan (2005) Fundamental Frequency and Voicing Prediction from MFCCs for Speech Reconstruction from Unconstrained Speech. In: 9th European Conference on Speech Communication and Technology, 2005-09-04 - 2005-09-08.

Full text not available from this repository. (Request a copy)

Abstract

This work proposes a method to predict the fundamental frequency and voicing of a frame of speech from its MFCC representation. This has particular use in distributed speech recognition systems where the ability to predict fundamental frequency and voicing allows a time-domain speech signal to be reconstructed solely from the MFCC vectors. Prediction is achieved by modeling the joint density of MFCCs and fundamental frequency with a combined hidden Markov model-Gaussian mixture model (HMM-GMM) framework. Prediction results are presented on unconstrained speech using both a speaker-dependent database and a speaker-independent database. Spectrogram comparisons of the reconstructed and original speech are also made. The results show for the speaker-dependent task a percentage fundamental frequency prediction error of 3.1% is made while for the speaker-independent task this rises to 8.3%.

Item Type: Conference or Workshop Item (Paper)
Faculty \ School: Faculty of Science > School of Computing Sciences
UEA Research Groups: Faculty of Science > Research Groups > Interactive Graphics and Audio
Faculty of Science > Research Groups > Smart Emerging Technologies
Faculty of Science > Research Groups > Data Science and AI
Depositing User: Vishal Gautam
Date Deposited: 14 Jul 2011 07:31
Last Modified: 10 Dec 2024 01:15
URI: https://ueaeprints.uea.ac.uk/id/eprint/22048
DOI:

Actions (login required)

View Item View Item