Using visual speech information in masking methods for audio speaker separation

Khan, Faheem; Milner, Ben P.; Le Cornu, Thomas

doi:10.1109/TASLP.2018.2835719

Using visual speech information in masking methods for audio speaker separation

Tools

Khan, Faheem, Milner, Ben P. and Le Cornu, Thomas (2018) Using visual speech information in masking methods for audio speaker separation. IEEE Transactions on Audio, Speech, and Language Processing, 26 (10). pp. 1742-1754. ISSN 1558-7916

Preview

PDF (Accepted manuscript) - Accepted Version
Download (399kB) | Preview

Abstract

This work examines whether visual speech infor- mation can be effective within audio masking-based speaker separation to improve the quality and intelligibility of the target speech. Two visual-only methods of generating an audio mask for speaker separation are first developed. These use a deep neural network to map visual speech features to an audio feature space from which both visually-derived binary masks and visually- derived ratio masks are estimated, before application to the speech mixture. Secondly, an audio ratio masking method forms a baseline approach for speaker separation which is extended to exploit visual speech information to form audio-visual ratio masks. Speech quality and intelligibility tests are carried out on the visual-only, audio-only and audio-visual masking methods of speaker separation at mixing levels from -10dB to +10dB. These reveal substantial improvements in the target speech when applying the visual-only and audio-only masks, but with highest performance occurring when combining audio and visual information to create the audio-visual masks.

Item Type:	Article
Faculty \ School:	Faculty of Science > School of Computing Sciences
UEA Research Groups:	Faculty of Science > Research Groups > Visual Computing and Signal Processing (former - to 2025) Faculty of Science > Research Groups > Smart Emerging Technologies (former - to 2025) Faculty of Science > Research Groups > Data Science and AI Faculty of Science > Research Groups > Cyber Intelligence and Networks
Depositing User:	LivePure Connector
Date Deposited:	20 Jun 2018 11:30
Last Modified:	14 May 2026 11:45
URI:	https://ueaeprints.uea.ac.uk/id/eprint/67404
DOI:	10.1109/TASLP.2018.2835719

Downloads

Downloads per month over past year

Actions (login required)

View Item