Application of machine learning techniques yields improvements in the predictive ability of urine biomarker panels for prostate cancer; analysis of the Movember GAP1 Urine Biomarker project.

Connell, Shea (2021) Application of machine learning techniques yields improvements in the predictive ability of urine biomarker panels for prostate cancer; analysis of the Movember GAP1 Urine Biomarker project. Doctoral thesis, University of East Anglia.

[thumbnail of 2020OConnellSPhD.pdf]
Preview
PDF
Download (13MB) | Preview

Abstract

Prostate cancer is a considerable clinical problem worldwide, with large amounts of variation seen in the clinical outcome of patients with apparently similar disease. The diagnostic and prognostic tool-sets currently available to clinicians lack both sensitivity and specificity, not taking into account the molecular variability of the disease. The successful development of non-invasive prognostic biomarker tests has the potential to impact the large numbers of patients with a clinical suspicion of prostate cancer but that ultimately do not require invasive investigation and stressful follow-up.

The Movember Global Action Plan 1 (GAP1) Urine Biomarker Consortium had the aim of developing of a muti-modal urine test for the accurate discrimination of disease status. The consortium of 12 collaborating institutes collected 1,258 urine samples that were subsequently assayed by a range of biochemical techniques. The main aim of this thesis was to apply statistical learning techniques to these data in order to robustly develop prognostic models for prostate cancer.

The Prostate Urine Risk (PUR) model was developed using solely NanoString data from cell-free RNA samples, and reported strong utility for predicting the outcome of an initial prostate biopsy (AUCs > 0.70 for Gleason 3+4 and 4+3). Addition-ally displaying remarkable use in an active surveillance sub-cohort, PUR identified patients at a higher apparent risk of disease progression, reporting a hazard ratio = 8.23 (95% CI: 3.26 - 20.81).

The effects of altering the statistical methodology applied to the data were quantified, where ensemble algorithms presented the best solution to capturing the most amount of information. Using this information a machine learning framework was de-signed to produce multivariable risk prediction models incorporating strong internal validation compliant with the TRIPOD reporting guidelines.

This framework was used to construct three risk models, each integrating information from different fractions of urine. All showed strong potential for clinical utility, reporting AUCs in excess of 0.8 for predicting Gleason 3+4, and approaching AUC = 0.9 for ruling out the presence of any cancer on biopsy. The net benefit of adopting these risk models was determined via simulation of a population-level cohort, where each model has the potential to result in large reductions to the numbers of unnecessary biopsies currently undertaken.

In conclusion, the analyses presented here demonstrate the large amount of information that can be captured within urine. If these models are validated in future studies by the proposed clinical trial designs they could dramatically change the treatment pathway for prostate cancer, reducing costs to healthcare systems and ultimately unnecessary stress to patients.

Item Type: Thesis (Doctoral)
Faculty \ School: Faculty of Medicine and Health Sciences > Norwich Medical School
Depositing User: Chris White
Date Deposited: 07 Jun 2021 13:24
Last Modified: 07 Jun 2021 13:24
URI: https://ueaeprints.uea.ac.uk/id/eprint/80210
DOI:

Actions (login required)

View Item View Item