Using audio and visual information for single channel speaker separation

Khan, Faheem; Milner, Ben

Using audio and visual information for single channel speaker separation

Tools

Khan, Faheem and Milner, Ben (2015) Using audio and visual information for single channel speaker separation. In: Interspeech 2015, 2015-09-06 - 2015-09-10.

Preview

PDF (interspeech2015-ss-v3) - Accepted Version
Download (241kB) | Preview

Abstract

This work proposes a method to exploit both audio and vi- sual speech information to extract a target speaker from a mix- ture of competing speakers. The work begins by taking an ef- fective audio-only method of speaker separation, namely the soft mask method, and modifying its operation to allow visual speech information to improve the separation process. The au- dio input is taken from a single channel and includes the mix- ture of speakers, where as a separate set of visual features are extracted from each speaker. This allows modification of the separation process to include not only the audio speech but also visual speech from each speaker in the mixture. Experimen- tal results are presented that compare the proposed audio-visual speaker separation with audio-only and visual-only methods us- ing both speech quality and speech intelligibility metrics.

Item Type:	Conference or Workshop Item (Paper)
Faculty \ School:	Faculty of Science > School of Computing Sciences Faculty of Science
UEA Research Groups:	Faculty of Science > Research Groups > Visual Computing and Signal Processing (former - to 2025) Faculty of Science > Research Groups > Smart Emerging Technologies (former - to 2025) Faculty of Science > Research Groups > Data Science and AI Faculty of Science > Research Groups > Cyber Intelligence and Networks
Depositing User:	Pure Connector
Date Deposited:	23 Dec 2015 13:00
Last Modified:	14 May 2026 15:25
URI:	https://ueaeprints.uea.ac.uk/id/eprint/55880
DOI:

Downloads

Downloads per month over past year

Actions (login required)

View Item