Incorporating topic membership in review rating prediction from unstructured data: A gradient boosting approach

Yang, Nan, Korfiatis, Nikolaos ORCID: https://orcid.org/0000-0001-6377-4837, Zissis, Dimitris ORCID: https://orcid.org/0000-0002-6957-3494 and Spanaki, Konstantina (2024) Incorporating topic membership in review rating prediction from unstructured data: A gradient boosting approach. Annals of Operations Research, 339 (1-2). pp. 631-662. ISSN 0254-5330

[thumbnail of Yang_etal_2024_AnnOpeRes]
Preview
PDF (Yang_etal_2024_AnnOpeRes) - Published Version
Available under License Creative Commons Attribution.

Download (1MB) | Preview

Abstract

Rating prediction is a crucial element of business analytics as it enables decision-makers to assess service performance based on expressive customer feedback. Enhancing rating score predictions and demand forecasting through incorporating performance features from verbatim text fields, particularly in service quality measurement and customer satisfaction modelling is a key objective in various areas of analytics. A range of methods has been identified in the literature for improving the predictability of customer feedback, including simple bag-of-words-based approaches and advanced supervised machine learning models, which are designed to work with response variables such as Likert-based rating scores. This paper presents a dynamic model that incorporates values from topic membership, an outcome variable from Latent Dirichlet Allocation, with sentiment analysis in an Extreme Gradient Boosting (XGBoost) model used for rating prediction. The results show that, by incorporating features from simple unsupervised machine learning approaches (LDA-based), an 86% prediction accuracy (AUC based) can be achieved on objective rating values. At the same time, a combination of polarity and single-topic membership can yield an even higher accuracy when compared with sentiment text detection tasks both at the document and sentence levels. This study carries significant practical implications since sentiment analysis tasks often require dictionary coverage and domain-specific adjustments depending on the task at hand. To further investigate this result, we used Shapley Additive Values to determine the additive predictability of topic membership values in combination with sentiment-based methods using a dataset of customer reviews from food delivery services.

Item Type: Article
Additional Information: Publisher Copyright: © The Author(s) 2023.
Uncontrolled Keywords: latent dirichlet allocation,machine learning,online reviews,sentiment analysis,xgboost,decision sciences(all),management science and operations research ,/dk/atira/pure/subjectarea/asjc/1800
Faculty \ School: Faculty of Social Sciences > Norwich Business School
Related URLs:
Depositing User: LivePure Connector
Date Deposited: 02 Oct 2024 12:30
Last Modified: 18 Dec 2024 01:38
URI: https://ueaeprints.uea.ac.uk/id/eprint/96862
DOI: 10.1007/s10479-023-05336-z

Downloads

Downloads per month over past year

Actions (login required)

View Item View Item