Language identification using text, audio and video feature mapping

Dai, Zhuoyi (2018) Language identification using text, audio and video feature mapping. Doctoral thesis, University of East Anglia.

[thumbnail of Thesis_proofreading.pdf]
Download (74MB) | Preview


Unlike text language identification techniques, which are now quite mature, audio and video language identification techniques still face many challenges. One of the main challenges, due to a variety of reasons, is that there are not enough audio and video datasets.
However, text data are sufficient for experiments and many text databases are free for research which leads to an interesting question: can we identify an unknown video or audio language based on the relationship between the known text languages? To answer this question, it requires us to examine two issues: language identification and language mapping.
In language identification, we compare two methods which are zipping classification and N-gram modelling. An advantage of zipping classification is that it tolerates the lack of long training data and can be applied to a large variety of problems without modification. However, the N-gram model provides a high classification accuracy and efficiency which makes it worthy of consideration. Also, we evaluate another audio classification method based on the MPEG compression to compare with the general zipping tools and the N-gram model.
For the language mapping section, we firstly use the Robinson-Foulds tree distance to measure the distances between the language trees and also use Sammon mapping and Shepard’s interpolation to map the language distance results from the higher dimensions to the lower dimensions and try to find the optimal language relationships in the specific dimension.

Item Type: Thesis (Doctoral)
Faculty \ School: Faculty of Science > School of Computing Sciences
Depositing User: Users 11011 not found.
Date Deposited: 16 Oct 2019 15:12
Last Modified: 16 Oct 2019 15:12

Actions (login required)

View Item View Item