Speech feature extraction and reconstruction

Milner, Ben (2008) Speech feature extraction and reconstruction. In: Automatic Speech Recognition on Mobile Devices and over Communication Networks. Advances in Pattern Recognition, Chapter 6 . Springer, 107–130.

Full text not available from this repository. (Request a copy)

Abstract

This chapter is concerned with feature extraction and back-end speech reconstruction and is particularly aimed at distributed speech recognition (DSR) and the work carried out by the ETSI Aurora group. Feature extraction is examined first and begins with a basic implementation of mel-frequency cepstral coefficients (MFCCs). Additional processing, in the form of noise and channel compensation, is explained and has the aim of increasing speech recognition accuracy in real-world environments. Source and channel coding issues relevant to DSR are also briefly discussed. Back-end speech reconstruction using a sinusoidal model is explained and it is shown how this is possible by transmitting additional source information (voicing and fundamental frequency) from the terminal device. An alternative method of back-end speech reconstruction is then explained, where the voicing and fundamental frequency are predicted from the received MFCC vectors. This enables speech to be reconstructed solely from the MFCC vector stream and requires no explicit voicing and fundamental frequency transmission.

Item Type: Book Section
Faculty \ School: Faculty of Science > School of Computing Sciences
UEA Research Groups: Faculty of Science > Research Groups > Interactive Graphics and Audio
Faculty of Science > Research Groups > Smart Emerging Technologies
Faculty of Science > Research Groups > Data Science and AI
Depositing User: Vishal Gautam
Date Deposited: 11 Mar 2011 17:18
Last Modified: 10 Dec 2024 01:09
URI: https://ueaeprints.uea.ac.uk/id/eprint/23776
DOI: 10.1007/978-1-84800-143-5_6

Actions (login required)

View Item View Item