Using audio and visual information for single channel speaker separation

Khan, Faheem and Milner, Ben (2015) Using audio and visual information for single channel speaker separation. In: Interspeech 2015, 2015-09-06 - 2015-09-10.

[thumbnail of interspeech2015-ss-v3]
PDF (interspeech2015-ss-v3) - Accepted Version
Download (241kB) | Preview


This work proposes a method to exploit both audio and vi- sual speech information to extract a target speaker from a mix- ture of competing speakers. The work begins by taking an ef- fective audio-only method of speaker separation, namely the soft mask method, and modifying its operation to allow visual speech information to improve the separation process. The au- dio input is taken from a single channel and includes the mix- ture of speakers, where as a separate set of visual features are extracted from each speaker. This allows modification of the separation process to include not only the audio speech but also visual speech from each speaker in the mixture. Experimen- tal results are presented that compare the proposed audio-visual speaker separation with audio-only and visual-only methods us- ing both speech quality and speech intelligibility metrics.

Item Type: Conference or Workshop Item (Paper)
Faculty \ School: Faculty of Science > School of Computing Sciences
Faculty of Science
UEA Research Groups: Faculty of Science > Research Groups > Interactive Graphics and Audio
Faculty of Science > Research Groups > Smart Emerging Technologies
Depositing User: Pure Connector
Date Deposited: 23 Dec 2015 13:00
Last Modified: 20 Apr 2023 01:15


Downloads per month over past year

Actions (login required)

View Item View Item