A synthetic dataset for the exploration of survival and classification models: prediction of heart attack or stroke within a 10-year follow-up period

Burns, Dan, Richardson, Kathryn and Driessens, Corine (2024) A synthetic dataset for the exploration of survival and classification models: prediction of heart attack or stroke within a 10-year follow-up period. NIHR Open Research, 4 (67). ISSN 2633-4402

[thumbnail of 1e88741c-2475-49dd-a82b-bae777241600_13651_-_dan_burns]
Preview
PDF (1e88741c-2475-49dd-a82b-bae777241600_13651_-_dan_burns) - Published Version
Available under License Creative Commons Attribution.

Download (546kB) | Preview

Abstract

Machine learning methodologies are becoming increasingly popular in healthcare research. This shift to integrated data science approaches necessitates professional development of the existing healthcare data analyst workforce. To enhance this smooth transition, educational resources need to be developed. Real healthcare datasets, vital for healthcare data analysis and training purposes, have many barriers, including financial, ethical, and patient confidentiality concerns. Synthetic datasets that mimic real-world complexities offer simple solutions. The presented synthetic dataset mirrors the routinely collected primary care data on heart attacks and strokes among the adult population. Training experiences using this synthetic dataset are elevated as the data incorporate many of the practical challenges encountered in routinely collected primary care systems, such as missing data, informative censoring, interactions, variable irrelevance, and noise. By openly sharing this synthetic dataset, our goal was to contribute a transformative asset for professional training in health and social care data analysis. The dataset covers demographics, lifestyle variables, comorbidities, systolic blood pressure, hypertension treatment, family history of cardiovascular diseases, respiratory function, and experience of heart attack and/or stroke. Methods for simulating each variable are detailed to ensure a realistic representation of the patient data. This initiative aims to bridge the gap in sophisticated healthcare datasets for training, fostering professional development in the healthcare and social care research workforce.

Item Type: Article
Faculty \ School: Faculty of Medicine and Health Sciences > Norwich Medical School
UEA Research Groups: Faculty of Science > Research Groups > Norwich Epidemiology Centre
Faculty of Medicine and Health Sciences > Research Groups > Norwich Epidemiology Centre
Faculty of Medicine and Health Sciences > Research Groups > Health Promotion
Faculty of Medicine and Health Sciences > Research Centres > Public Health
Related URLs:
Depositing User: LivePure Connector
Date Deposited: 04 Nov 2024 17:30
Last Modified: 27 Aug 2025 16:30
URI: https://ueaeprints.uea.ac.uk/id/eprint/97476
DOI:

Downloads

Downloads per month over past year

Actions (login required)

View Item View Item