Moving to continuous classifications of bilingualism through machine learning trained on language production

Coco, Moreno I., Smith, Guiditta, Spelorzi, Roberta and Garraffa, Maria ORCID: https://orcid.org/0000-0003-1767-424X (2024) Moving to continuous classifications of bilingualism through machine learning trained on language production. Bilingualism-Language and Cognition. ISSN 1366-7289

[thumbnail of cocosmithspelorzigarraffa_ACCEPTED_VERSION]
Preview
PDF (cocosmithspelorzigarraffa_ACCEPTED_VERSION) - Accepted Version
Available under License Creative Commons Attribution.

Download (563kB) | Preview

Abstract

Recent conceptualisations of bilingualism are moving away from strict categorisations, towards continuous approaches. This study supports this trend by combining empirical psycholinguistics data with machine learning classification modelling. Support vector classifiers were trained on two datasets of coded productions by Italian speakers to predict the class they belonged to (“monolingual”, “attriters” and “heritage”). All classes can be predicted above chance (>33%), even if the classifier's performance substantially varies, with monolinguals identified much better (f-score >70%) than attriters (f-score <50%), which are instead the most confusable class. Further analyses of the classification errors expressed in the confusion matrices qualify that attriters are identified as heritage speakers nearly as often as they are correctly classified. Cluster clitics are the most identifying features for the classification performance. Overall, this study supports a conceptualisation of bilingualism as a continuum of linguistic behaviours rather than sets of a priori established classes.

Item Type: Article
Additional Information: Data availability statement: The data and script to illustrate the analysis supporting the findings of this study are available in Open Science Framework at https://osf.io/w24p3/.
Uncontrolled Keywords: attrition,bilingualism,classification,heritage speakers,support vector machine,education,language and linguistics,linguistics and language,sdg 17 - partnerships for the goals,4* ,/dk/atira/pure/subjectarea/asjc/3300/3304
Faculty \ School: Faculty of Medicine and Health Sciences > School of Health Sciences
UEA Research Groups: Faculty of Medicine and Health Sciences > Research Centres > Lifespan Health
Related URLs:
Depositing User: LivePure Connector
Date Deposited: 26 Mar 2024 11:30
Last Modified: 22 Jul 2024 12:31
URI: https://ueaeprints.uea.ac.uk/id/eprint/94759
DOI: 10.1017/S1366728924000361

Downloads

Downloads per month over past year

Actions (login required)

View Item View Item