Websdale, Danny and Milner, Ben (2017) A Comparison of Perceptually Motivated Loss Functions for Binary Mask Estimation in Speech Separation. In: Proceedings of Interspeech 2017. ISCA, SWE, pp. 2003-2007.
Preview |
PDF (Accepted manuscript)
- Accepted Version
Download (281kB) | Preview |
Abstract
This work proposes and compares perceptually motivated loss functions for deep learning based binary mask estimation for speech separation. Previous loss functions have focused on maximising classification accuracy of mask estimation but we now propose loss functions that aim to maximise the hit mi- nus false-alarm (HIT-FA) rate which is known to correlate more closely to speech intelligibility. The baseline loss function is bi- nary cross-entropy (CE), a standard loss function used in binary mask estimation, which maximises classification accuracy. We propose first a loss function that maximises the HIT-FA rate in- stead of classification accuracy. We then propose a second loss function that is a hybrid between CE and HIT-FA, providing a balance between classification accuracy and HIT-FA rate. Eval- uations of the perceptually motivated loss functions with the GRID database show improvements to HIT-FA rate and ESTOI across babble and factory noises. Further tests then explore ap- plication of the perceptually motivated loss functions to a larger vocabulary dataset.
Item Type: | Book Section |
---|---|
Faculty \ School: | Faculty of Science > School of Computing Sciences |
UEA Research Groups: | Faculty of Science > Research Groups > Interactive Graphics and Audio Faculty of Science > Research Groups > Smart Emerging Technologies |
Depositing User: | Pure Connector |
Date Deposited: | 11 Jul 2017 08:04 |
Last Modified: | 22 Mar 2024 02:06 |
URI: | https://ueaeprints.uea.ac.uk/id/eprint/64073 |
DOI: | 10.21437/Interspeech.2017-1504 |
Downloads
Downloads per month over past year
Actions (login required)
View Item |