Hidden Markov model-based speech enhancement

Kato, Akihiro (2017) Hidden Markov model-based speech enhancement. Doctoral thesis, University of East Anglia.

[img]
Preview
PDF
Download (15MB) | Preview

    Abstract

    This work proposes a method of model-based speech enhancement that uses a network of
    HMMs to first decode noisy speech and to then synthesise a set of features that enables
    a speech production model to reconstruct clean speech. The motivation is to remove the
    distortion and residual and musical noises that are associated with conventional filteringbased
    methods of speech enhancement.
    STRAIGHT forms the speech production model for speech reconstruction and requires
    a time-frequency spectral surface, aperiodicity and a fundamental frequency contour.
    The technique of HMM-based synthesis is used to create the estimate of the timefrequency
    surface, and aperiodicity after the model and state sequence is obtained from
    HMM decoding of the input noisy speech. Fundamental frequency were found to be best
    estimated using the PEFAC method rather than synthesis from the HMMs.
    For the robust HMM decoding in noisy conditions it is necessary for the HMMs
    to model noisy speech and consequently noise adaptation is investigated to achieve this
    and its resulting effect on the reconstructed speech measured. Even with such noise
    adaptation to match the HMMs to the noisy conditions, decoding errors arise, both
    in terms of incorrect decoding and time alignment errors. Confidence measures are
    developed to identify such errors and then compensation methods developed to conceal
    these errors in the enhanced speech signal.
    Speech quality and intelligibility analysis is first applied in terms of PESQ and NCM
    showing the superiority of the proposed method against conventional methods at low
    SNRs. Three way subjective MOS listening test then discovers the performance of the
    proposed method overwhelmingly surpass the conventional methods over all noise conditions
    and then a subjective word recognition test shows an advantage of the proposed
    method over speech intelligibility to the conventional methods at low SNRs.

    Item Type: Thesis (Doctoral)
    Faculty \ School: Faculty of Science > School of Computing Sciences
    Depositing User: Jackie Webb
    Date Deposited: 28 Jun 2017 15:32
    Last Modified: 28 Jun 2017 15:34
    URI: https://ueaeprints.uea.ac.uk/id/eprint/63950
    DOI:

    Actions (login required)

    View Item