Phoneme-to-viseme mappings: the good, the bad, and the ugly

Bear, Helen L. and Harvey, Richard ORCID: (2017) Phoneme-to-viseme mappings: the good, the bad, and the ugly. Speech Communication, 95. pp. 40-67. ISSN 0167-6393

[thumbnail of Accepted manuscript]
PDF (Accepted manuscript) - Accepted Version
Available under License Creative Commons Attribution Non-commercial No Derivatives.

Download (1MB) | Preview


Visemes are the visual equivalent of phonemes. Although not precisely defined, a working definition of a viseme is “a set of phonemes which have identical appearance on the lips”. Therefore a phoneme falls into one viseme class but a viseme may represent many phonemes: a many to one mapping. This mapping introduces ambiguity between phonemes when using viseme classifiers. Not only is this ambiguity damaging to the performance of audio-visual classifiers operating on real expressive speech, there is also considerable choice between possible mappings. In this paper we explore the issue of this choice of viseme-to-phoneme map. We show that there is definite difference in performance between viseme-to-phoneme mappings and explore why some maps appear to work better than others. We also devise a new algorithm for constructing phoneme-to-viseme mappings from labeled speech data. These new visemes, ‘Bear’ visemes, are shown to perform better than previously known units.

Item Type: Article
Uncontrolled Keywords: lipreading,speaker-dependent,viseme,phoneme,resolution,speech recognition,classification,visual speech,visual units
Faculty \ School: Faculty of Science > School of Computing Sciences
UEA Research Groups: Faculty of Science > Research Groups > Interactive Graphics and Audio
Faculty of Science > Research Groups > Smart Emerging Technologies
Related URLs:
Depositing User: Pure Connector
Date Deposited: 05 Aug 2017 05:06
Last Modified: 19 Apr 2023 22:32
DOI: 10.1016/j.specom.2017.07.001


Downloads per month over past year

Actions (login required)

View Item View Item