Hybrid expert ensembles for identifying unreliable data in citizen science

Wessels, Pieter, Moran, Nick, Johnston, Ali and Wang, Wenjia (2019) Hybrid expert ensembles for identifying unreliable data in citizen science. Engineering Applications of Artificial Intelligence, 81. pp. 200-212. ISSN 0952-1976

[thumbnail of Accepted manuscript]
PDF (Accepted manuscript) - Accepted Version
Available under License Creative Commons Attribution Non-commercial No Derivatives.

Download (2MB) | Preview


Citizen science utilises public resources for scientific research. BirdTrack is such a project established in 2004 by the British Trust for Ornithology (BTO) for the public to log their bird observations through its web or mobile applications. It has accumulated over 40 million observations. However, the veracity of these observations needs to be checked and the current process involves time-consuming interventions by human experts. This research therefore aims to develop a more efficient system to automatically identify unreliable observations from large volume of records. This paper presents a novel approach — a Hybrid Expert Ensemble System (HEES) that combines an Expert System (ES) and machine induced models to perform the intended task. The ES is built based on human expertise and used as a base member of the ensemble. Other members are decision trees induced from county-based data. The HEES uses accuracy and diversity as criteria to select its members with an aim of improving its accuracy and reliability. The experiments were carried out using the county-based data and the results indicate that (1) the performance of the expert system is reasonable for some counties but varied considerably on others. (2) An HEES is more accurate and reliable than the Expert System and also other individual models, with Sensitivity of 85% for correctly identifying unreliable observations and Specificity of 99% for reliable observations. These results demonstrated that the proposed approach has the ability to be an alternative or additional means to validate the observations in a timely and cost-effective manner and also has a potential to be applied in other citizen science projects where the huge amount of data needs to be checked effectively and efficiently.

Item Type: Article
Uncontrolled Keywords: hybrid ensemble,expert systems,citizen science,data validation,classification,diversity
Faculty \ School: Faculty of Science > School of Computing Sciences
UEA Research Groups: Faculty of Science > Research Groups > Data Science and Statistics
Related URLs:
Depositing User: LivePure Connector
Date Deposited: 08 Jan 2019 15:31
Last Modified: 22 Oct 2022 00:59
URI: https://ueaeprints.uea.ac.uk/id/eprint/69493
DOI: 10.1016/j.engappai.2019.01.004


Downloads per month over past year

Actions (login required)

View Item View Item