Language identification using visual features

Newman, Jacob L. and Cox, Stephen J. (2012) Language identification using visual features. IEEE Transactions on Audio, Speech and Language Processing, 20 (7). pp. 1936-1947.

Full text not available from this repository. (Request a copy)


Automatic visual language identification (VLID) is the technology of using information derived from the visual appearance and movement of the speech articulators to iden- tify the language being spoken, without the use of any audio information. This technique for language identification (LID) is useful in situations in which conventional audio processing is ineffective (very noisy environments), or impossible (no audio signal is available). Research in this field is also beneficial in the related field of automatic lip-reading. This paper introduces several methods for visual language identification (VLID). They are based upon audio LID techniques, which exploit language phonology and phonotactics to discriminate languages. We show that VLID is possible in a speaker-dependent mode by discrimi- nating different languages spoken by an individual, and we then extend the technique to speaker-independent operation, taking pains to ensure that discrimination is not due to artefacts, either visual (e.g. skin-tone) or audio (e.g. rate of speaking). Although the low accuracy of visual speech recognition currently limits the performance of VLID, we can obtain an error-rate of < 10% in discriminating between Arabic and English on 19 speakers and using about 30s of visual speech.

Item Type: Article
Faculty \ School: Faculty of Science > School of Computing Sciences
UEA Research Groups: Faculty of Science > Research Groups > Smart Emerging Technologies
Faculty of Science > Research Groups > Interactive Graphics and Audio
Depositing User: Stephen Cox
Date Deposited: 29 Apr 2013 13:50
Last Modified: 01 Feb 2024 01:38
DOI: 10.1109/TASL.2012.2191956

Actions (login required)

View Item View Item