Khan, Faheem and Milner, Ben (2015) Using audio and visual information for single channel speaker separation. In: Interspeech 2015, 2015-09-06 - 2015-09-10.
Preview |
PDF (interspeech2015-ss-v3)
- Accepted Version
Download (241kB) | Preview |
Abstract
This work proposes a method to exploit both audio and vi- sual speech information to extract a target speaker from a mix- ture of competing speakers. The work begins by taking an ef- fective audio-only method of speaker separation, namely the soft mask method, and modifying its operation to allow visual speech information to improve the separation process. The au- dio input is taken from a single channel and includes the mix- ture of speakers, where as a separate set of visual features are extracted from each speaker. This allows modification of the separation process to include not only the audio speech but also visual speech from each speaker in the mixture. Experimen- tal results are presented that compare the proposed audio-visual speaker separation with audio-only and visual-only methods us- ing both speech quality and speech intelligibility metrics.
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Faculty \ School: | Faculty of Science > School of Computing Sciences Faculty of Science |
UEA Research Groups: | Faculty of Science > Research Groups > Interactive Graphics and Audio Faculty of Science > Research Groups > Smart Emerging Technologies Faculty of Science > Research Groups > Data Science and AI |
Depositing User: | Pure Connector |
Date Deposited: | 23 Dec 2015 13:00 |
Last Modified: | 10 Dec 2024 01:15 |
URI: | https://ueaeprints.uea.ac.uk/id/eprint/55880 |
DOI: |
Downloads
Downloads per month over past year
Actions (login required)
View Item |