Harding, Philip and Milner, Ben (2017) Estimating acoustic speech features in low signal-to-noise ratios using a statistical framework. Computer Speech and Language, 42. 1–19. ISSN 0885-2308
Preview |
PDF (Accepted manuscript)
- Accepted Version
Available under License Creative Commons Attribution Non-commercial No Derivatives. Download (632kB) | Preview |
Abstract
Accurate estimation of acoustic speech features from noisy speech and from different speakers is an ongoing problem in speech processing. Many methods have been proposed to estimate acoustic features but errors increase as signal-to-noise ratios fall. This work proposes a robust statistical framework to estimate an acoustic speech vector (comprising voicing, fundamental frequency and spectral envelope) from an intermediate feature that is extracted from a noisy time-domain speech signal. The initial approach is accurate in clean conditions but deteriorates in noise and with changing speaker. Adaptation methods are then developed to adjust the acoustic models to the noise conditions and speaker. Evaluations are carried out in stationary and nonstationary noises and at SNRs from -5dB to clean conditions. Comparison with conventional methods of estimating fundamental frequency, voicing and spectral envelope reveals the proposed framework to have lowest errors in all conditions tested.
Item Type: | Article |
---|---|
Uncontrolled Keywords: | voicing,fundamental frequency,spectral envelope,noise adaptation,speaker adaptation |
Faculty \ School: | Faculty of Science > School of Computing Sciences |
UEA Research Groups: | Faculty of Science > Research Groups > Interactive Graphics and Audio Faculty of Science > Research Groups > Smart Emerging Technologies |
Depositing User: | Pure Connector |
Date Deposited: | 24 Sep 2016 00:19 |
Last Modified: | 19 Apr 2023 16:30 |
URI: | https://ueaeprints.uea.ac.uk/id/eprint/59988 |
DOI: | 10.1016/j.csl.2016.08.001 |
Downloads
Downloads per month over past year
Actions (login required)
View Item |