Consistency of variety of machine learning and statistical models in predicting clinical risks of individual patients: longitudinal cohort study using cardiovascular disease as exemplar

Yan Li; Matthew Sperrin; Darren M Ashcroft; Tjeerd Pieter van Staa

doi:10.1136/bmj.m3919

Research

Consistency of variety of machine learning and statistical models in predicting clinical risks of individual patients: longitudinal cohort study using cardiovascular disease as exemplar

BMJ 2020; 371 doi: https://doi.org/10.1136/bmj.m3919 (Published 04 November 2020) Cite this as: BMJ 2020;371:m3919

Linked Opinion

The consistency of machine learning and statistical models in predicting clinical risks of individual patients

Yan Li, doctoral student of statistical epidemiology1,
Matthew Sperrin, senior lecturer in health data science1,
Darren M Ashcroft, professor of pharmacoepidemiology2 3,
Tjeerd Pieter van Staa, professor in health e-research1 4 5

¹Health e-Research Centre, Health Data Research UK North, School of Health Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, Manchester M13 9PL, UK
²Centre for Pharmacoepidemiology and Drug Safety, School of Health Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK
³NIHR Greater Manchester Patient Safety Translational Research Centre, School of Health Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK
⁴Utrecht Institute for Pharmaceutical Sciences, Utrecht University, Utrecht, Netherlands
⁵Alan Turing Institute, Headquartered at the British Library, London, UK

Correspondence to: T P van Staa tjeerd.vanstaa{at}manchester.ac.uk (or @HeRC_Tweets on Twitter)

Accepted 10 September 2020

Abstract

Objective To assess the consistency of machine learning and statistical techniques in predicting individual level and population level risks of cardiovascular disease and the effects of censoring on risk predictions.

Design Longitudinal cohort study from 1 January 1998 to 31 December 2018.

Setting and participants 3.6 million patients from the Clinical Practice Research Datalink registered at 391 general practices in England with linked hospital admission and mortality records.

Main outcome measures Model performance including discrimination, calibration, and consistency of individual risk prediction for the same patients among models with comparable model performance. 19 different prediction techniques were applied, including 12 families of machine learning models (grid searched for best models), three Cox proportional hazards models (local fitted, QRISK3, and Framingham), three parametric survival models, and one logistic model.

Results The various models had similar population level performance (C statistics of about 0.87 and similar calibration). However, the predictions for individual risks of cardiovascular disease varied widely between and within different types of machine learning and statistical models, especially in patients with higher risks. A patient with a risk of 9.5-10.5% predicted by QRISK3 had a risk of 2.9-9.2% in a random forest and 2.4-7.2% in a neural network. The differences in predicted risks between QRISK3 and a neural network ranged between –23.2% and 0.1% (95% range). Models that ignored censoring (that is, assumed censored patients to be event free) substantially underestimated risk of cardiovascular disease. Of the 223 815 patients with a cardiovascular disease risk above 7.5% with QRISK3, 57.8% would be reclassified below 7.5% when using another model.

Conclusions A variety of models predicted risks for the same patients very differently despite similar model performances. The logistic models and commonly used machine learning models should not be directly applied to the prediction of long term risks without considering censoring. Survival models that consider censoring and that are explainable, such as QRISK3, are preferable. The level of consistency within and between models should be routinely assessed before they are used for clinical decision making.

Introduction

Risk prediction models are used routinely in healthcare practice to identify patients at high risk and make treatment decisions, so that appropriate healthcare resources can be allocated to those patients who most need care.1 These risk prediction models are usually built using statistical regression techniques. Examples include the Framingham risk score (developed from a US cohort with prospectively collected data)2 and QRISK3 (developed from a large UK cohort using retrospective electronic health records).3 Recently, machine learning models have gained considerable popularity. The English National Health Service has invested £250m ($323m; €275m) to further embed machine learning in healthcare.4 A recent viewpoint article suggested that machine learning technology is about to start a revolution with the potential to transform the whole healthcare system.5 Several studies suggested that machine learning models could outperform statistical models in terms of calibration and discrimination.6 7 8 9 However, another viewpoint concerns the fact that these approaches cannot provide explainable reasons behind their predictions, potentially leading to inappropriate actions,10 and a recent review found no evidence that machine learning models had better model performance than logistic models.11 However, interpretation of this review is difficult, as it included models from mostly small sample sizes and with different outcomes and predictors. Machine learning has established strengths in image recognition that could help in diagnosing diseases in healthcare,12 13 14 15 but censoring (patients lost to follow-up), which is common in risk prediction, does not exist in image recognition. Many commonly used machine learning models do not take into account censoring by default.16

The objective of this study was to assess the robustness and consistency of a variety of machine learning and statistical models on individual risk prediction and the effects of censoring on risk predictions. We used cardiovascular disease as an exemplar. We defined robustness of individual risk prediction as the level of consistency in the prediction of risks for individual patients with models that have comparable population level performance metrics.17 18 19

Methods

Data source

We derived the study cohort from Clinical Practice Research Datalink (CPRD GOLD), which includes data from about 6.9% of the population in England.20 It also has been linked to Hospital Episode Statistics, Office for National Statistics mortality records, and Townsend deprivation scores,3 to provide additional information about hospital admissions (including date and discharge diagnoses) and cause specific mortality.20 CPRD includes patients’ electronic health records from general practice, capturing detailed information such as demographics (age, sex, and ethnicity), symptoms, tests, diagnoses, prescribed treatments, health related behaviours, and referrals to secondary care.20 CPRD is a well established representative cohort of the UK population, and thousands of studies have used it,21 22 including a validation of the QRISK2 model and an analysis of machine learning.8 23

Study population

This study used the same selection criteria for the study population, risk factors, and cardiovascular disease outcomes as were used for QRISK3.3 18 Follow-up of patients started at the date of the patient’s registration with the practice, their 25th birthday, or 1 January 1998 (whichever was latest) and ended at the date of death, incident cardiovascular disease, date of leaving the practice, or last date of data collection (whichever was earliest). The index date for measurement of cardiovascular disease risk was randomly chosen from the follow-up period to capture time relevant practice variability with a better spread of calendar time and age.24 This was different from QRISK3, for which a single calendar time date was mostly used.18 The main inclusion criteria were age between 25 and 84 years, no history of cardiovascular disease, and no prescription for a statin before the index date. The outcome of interest was the 10 year risk of developing cardiovascular disease. The definition of the primary clinical outcome (cardiovascular disease) was the same as for QRISK3 (that is, coronary heart disease, ischaemic stroke, or transient ischaemic attack).3

We extracted two main cohorts from the study population—one overall cohort including all patients with at least one day of follow-up and one cohort with censored patients removed. The cohort without censoring excluded patients who were lost to follow-up before developing cardiovascular disease by year 10. The analysis of the cohort without censoring aimed to investigate the effects of ignoring censoring on patients’ individual risk predictions. This cohort mimics the methods used by some machine learning studies—that is, only patients or practices with full 10 years’ follow-up were selected.8

Cardiovascular disease risk factors

The cardiovascular disease risk factors at the index date included sex; age; body mass index; smoking history; total cholesterol to high density lipoprotein cholesterol ratio; systolic blood pressure and its standard deviation; history of prescribing of atypical antipsychotic drugs; blood pressure treatment or regular oral glucocorticoids; clinical history of systemic lupus erythematosus, atrial fibrillation, chronic kidney disease (stage 3, 4, or 5), erectile dysfunction, migraine, rheumatoid arthritis, severe mental illness, or type 1 or 2 diabetes mellitus; family history of angina or heart attack in a first degree relative aged under 60 years; ethnicity; and Townsend deprivation score.3 The same predictors from QRISK33 were used for all model fitting except for Framingham,25 which used fewer and different predictors.

Machine learning and Cox models

The study considered 19 models, including 12 families of machine learning, three Cox proportional hazards models (local fitted, QRISK3, and Framingham), three parametric survival models (assuming Weibull, Gaussian, and logistic distribution), and a statistical logistic model (fitted in a statistical causal-inference framework). Machine learning models included logistic model (fitted in an automated machine learning framework),26 random forest,27 and neural network28 from R package “Caret” 29; logistic model, random forest, neural network, extra-tree model,30 and gradient boosting classifier30 from Python package “Sklearn”31; and logistic model, random forest, neural network, and autoML32 from Python package “h2o.”33 The package autoML selects a best model from a broader spectrum of candidate models.32 Details of these models are summarised in eTable 1. The study used the machine learning algorithms from different software packages, with a grid search process on hyper-parameters and cross validation, to acquire a series of high performing machine learning models; this mimics the reality that practitioners may subjectively select different packages for model fitting and end up with a different best model. The study treated the models from the same machine learning algorithm but different software packages as different model families, as the settings (hyper-parameters) of these packages to control the model fitting are often different, which might result in a different best performing model through the grid search process.

Statistical analysis

We used the Markov chain Monte Carlo method with monotone style to impute missing values 10 times for ethnicity (54.3% missing in overall cohort), body mass index (40.3%), Townsend score (0.1%), systolic blood pressure (26.9%), standard deviation of systolic blood pressure (53.9%), ratio of total cholesterol to high density lipoprotein cholesterol (65.0%), and smoking status (25.2%)18 (only these variables had missing values). We randomly split the overall cohort (which contained 10 imputations) into an overall derivation cohort (75%) and an overall testing cohort (25%). We grid searched a total of 1200 machine learning models with the highest discrimination (C statistic) on hyper-parameters with twofold cross validation estimating calibration and discrimination. They were derived from 12 model families of 100 samples with similar sample size to another machine learning study.8 We then estimated the individual cardiovascular disease risk predictions (averaged for missing value imputations) and model performance of all models by using the overall testing cohort. The sample splitting and model fitting process is shown in eFigure 1.

We compared distributions of risk predictions for the same individual among models. We plotted the differences of individual cardiovascular disease risk predictions between models against deciles of cardiovascular disease risk predictions for QRISK3. We produced Bland-Altman plots—a graphical method to compare two measurement techniques across the full spectrums of values.34 These plotted the differences of individual risk predictions between two models against the average individual risk prediction.34

We used R to fit the models from “Caret” and Python to fit models from “Sklearn” and “h2o.”29 30 We used SAS procedures to extract the raw data, create analysis datasets, and generate tables and graphs.35

Patient and public involvement

No patients were involved in setting the research question or the outcome measures, nor were they involved in developing plans or implementation of the study. No patients were asked to advise on interpretation or writing up of results.

Results

The overall study population included 3.66 million patients from 391 general practices. The cohort without censoring was considerably smaller (0.45 million) than the overall cohort. Table 1 shows the baseline characteristics of the two study populations, which were split into derivation and validation cohorts. The average age was higher in the cohort without censoring (owing to younger patients leaving the practice as shown in eFigure 11).

Table 1

Baseline characteristics of two study populations (patients aged 25-84 years without history of cardiovascular disease (CVD) or previous statin use). Values are numbers (percentages) unless stated otherwise

View this table:

Table 2 shows the model performance of the machine learning and statistical models. All models had very similar discrimination (C statistics of about 0.87) and calibration (Brier scores of about 0.03 in eTables 2-4 and eFigures 3-4).

Table 2

Performance indicators of machine learning and statistical models in overall cohort

View this table:

Figure 1 shows the variability in individual risk predictions across the models for patients with predicted cardiovascular disease risks of 9.5-10.5% by QRISK3. Patients with a predicted cardiovascular disease risk between 9.5% and 10.5% with QRISK3 had a risk of 2.2-5.8% with logistic Caret model, 2.9-9.2% with Caret random forest, 2.4-7.2% with Caret neural network, and 3.1-9.3% with Sklearn random forest. The calibration plot (fig 2) shows that models that ignore censoring were miscalibrated (that is, predicted risks were lower than observed risks).

Fig 1

Distribution of individual risk predictions with machine learning and statistical models in overall cohort for patients with predicted cardiovascular disease risks of 9.5-10.5% in QRISK3

Fig 2

Calibration slope of machine learning models and statistical models in overall cohort in survival framework (observed events consider censoring). CVD=cardiovascular disease

Figure 3 plots the differences of individual cardiovascular disease risk predictions with the different models stratified by deciles of cardiovascular disease risk predictions of QRISK3. The largest range of inconsistencies in risk predictions was found in patients with highest predicted risks of cardiovascular disease. Low risk of cardiovascular disease was generally predicted consistently between and within models. We observed similar trends when using a different reference model (eFigure 5.2).

Fig 3

95% range of individual risk predictions with machine learning and statistical models stratified by deciles of predicted cardiovascular disease (CVD) risks with QRISK3 in overall cohort

Figure 4 shows the Bland-Altman plot of QRISK3 and neural network. We found a large inconsistency of risk predictions between models. The differences in predicted risks between QRISK3 and neural network ranged between −23.2% and 0.1% (95% range). The regression line shows similar finding to figure 3, with the largest differences in higher risk groups. More comparison between specific models can be found in eFigure 6 and eFigure 7. We found similar inconsistency of risk prediction among models when using a logistic model as reference (eFigure 2.1). The removal of censored patients changed the magnitude but not the variability of individual cardiovascular disease risk predictions (eFigure 2.2).

Fig 4

Bland-Altman analysis comparing QRISK3 with neural network model

We found substantial reclassification across a treatment threshold when using a different type of prediction model. Of 691 664 patients with a cardiovascular disease risk of 7.5% or lower, as predicted by QRISK3, 13.6% would be reclassified above 7.5% when using another model (table 3). Of the 223 815 patients with a cardiovascular disease risk above 7.5%, 57.8% would be reclassified below 7.5% when using another model. We also found high levels of reclassification with a different reference model (as shown in table 3) or a different threshold (eTable 7).

Table 3

Reclassification of individual risk predictions with machine learning and statistical models

View this table:

We did several sensitivity analyses with consistent findings of high levels of inconsistencies in individual risk predictions between and within models. The same machine learning algorithm with the selection of different settings (hyper-parameters) from different software packages yielded different individual cardiovascular disease risk predictions (eTable 8 and eFigure 8). The evaluation of the effects of generalisability by developing and testing models in different regions of England showed similarly high levels of inconsistencies in cardiovascular disease risk predictions (eTable 10 and eFigure 9). Changing the number of predictors did not result in lower levels of inconsistencies in cardiovascular disease risk predictions with more predictors included in the models (eTable 11 and eFigure 10),

Discussion

We found that the predictions of cardiovascular disease risks for individual patients varied widely between and within different types of machine learning and statistical models, especially in patients with higher risks (when using similar predictors). Logistic models and the machine learning models that ignored censoring substantially underestimated risk of cardiovascular disease.

Comparison with other studies

Despite claims that machine learning models can revolutionise risk prediction and potentially replace traditional statistical regression models in other areas,5 36 37 this study of prediction of cardiovascular disease risk found that they have similar model performance to traditional statistical methods and share similar uncertainty in individual risk predictions. Strengths of machine learning models may include their ability to automatically model non-linear associations and interactions between different risk factors.38 39 They may also find new data patterns.30 They have the acknowledged strength of automating model building with a better performance in specific classification tasks (for example, image recognition).30 However, a critical question is whether risk prediction models provide accurate and consistent risk predictions for individual patients. Previous research has found that a traditional risk prediction model such as QRISK3 has considerable uncertainty on individual risk prediction, although it has very good model performance at the population level.18 19 This uncertainty is related to unmeasured heterogeneity between clinical sites and modelling choices such as the inclusion of secular trends.18 19 Our study found that machine learning models share this uncertainty, as models with comparable population level performance yielded very different individual risk predictions. Consequently, different treatment decisions could be made by arbitrarily selecting another modelling technique.

Censoring of patients is an unavoidable problem in prediction models for long term risks, as patients frequently move away or die. However, many popular machine learning models ignore censoring, as the default framework is the analysis of a binary outcome rather than time to event survival outcome. A UK Biobank study of risk prediction for cardiovascular disease did not report how censoring was dealt with,7 like several other studies.39 40 41 Another machine learning study incorrectly excluded censored patients.8 Random survival forest is a machine learning model that takes account of censoring.42 Innovative techniques are being developed that incorporate statistical censoring approaches into the machine learning framework.16 43 However, to our knowledge no current software packages can handle large datasets for these methods. This study shows that directly applying popular machine learning models to data (especially for data with substantive censoring) without considering censoring will substantially bias risk predictions. The miscalibration was large compared with observed life table predictions. This is consistent with a recent study that reported loss of information due to lack of consideration of censoring with the random forest method.6

Models with similar C statistics gave varying estimates of individual risks for the same patients. A fundamental challenge with the C statistic is that it applies to the population level but not to individual patients.18 44 The C statistic measures the ability of a model to discriminate between cases and non-cases. It is a proportion of cases and non-cases that are correctly ranked by the model. This means that for a high C statistic, patients with observed events should have a higher risk than the patients without observed events.38 The C statistic concerns rank of predicted probability rather than probability itself. For example, a model may predict all events with a range of probability between 50.2% and 50.3% and non-events with a probability of 50%, which would result a perfect discrimination, but the predicted probability is not clinically useful. When a large number of patients have lower risks (which is often the case for cardiovascular disease risk prediction), the C statistic becomes less informative in indicating discrimination of models, especially in patients at high risk. For example, two patients with very low risk (say 1% and 1.5%) may have similar effects on C statistic to two patients with high risk (say 10% and 20%), given that their differences in rank are the same (but the latter two are of greater clinical interest). Therefore, C statistics do not tell us whether a model discriminates specific patients at high risk correctly or consistently compared with other models. C statistics have also been shown to be insensitive to changes in the model.44 The evaluation of consistency in individual risk predictions between models may thus be important in assessing their clinical usefulness in identifying patients at high risk.

This study considered a total of 22 predictors that had been selected by the developers of QRISK on the basis of their likely causal effect on cardiovascular disease.3 Other machine learning studies have used considerably more predictors. As an example, a study using the UK Biobank included 473 predictors in the machine learning models.7 A potentially unresolved question in risk prediction is what type of variables and how many of them should be included in models, as consensus and guidelines for choosing variables for risk prediction model are lacking.45 More information incorporated into a model may increase the model performance of risk prediction at the population level. For example, the C statistic is related to both the effects of predictors and the variation of predictors among patients with and without events.46 Including more predictors in a model may increase the C statistic merely because of greater variation of predictors. On the other hand, inclusion of non-causal predictors may lower the accuracy of the risk prediction by adding noise, increasing the risk of over-fitting, and leading to more data quality challenges.47 Also, a very large number of predictors may limit the clinical utility of these machine learning models, as more predictors need to be measured before a prediction can be made. Further research is needed to establish whether the focus of risk prediction should be on consistently measured causal risk factors or on variables that may be recorded inconsistently between clinicians or electronic health records systems.

Guidelines for the development and validation of risk prediction models (called TRIPOD) focus on the assessment of population level performance but do not consider consistencies in individual risk predictions by prediction models with comparable population level performance.48 Arguably, the clinical utility of risk prediction models should be based, as has been done with blood pressure devices for instance, on the consistent risk prediction (reliability) for a particular patient rather than broad population level performance.49 If models with comparable performance provide different predictions for a patient with certain risk factors, an explanation for these discrepant predictions is needed.50 Explainable artificial intelligence has been described as methods and techniques in the application of artificial intelligence such that the results of the solution can be understood by human experts.51 This contrasts with the concept of the “black box” in machine learning, whereby predictions cannot be explained. Arguably, a survival model that is explainable (such as QRISK3, which is based on established causal predictors) may be preferable over black box models that are high dimensional (include many predictors) but that provide inconsistent results for individual patients. Better standards are needed on how to develop and test machine learning algorithms.14

Strengths and limitations of study

The major strength of this study was that a large number of different machine learning models with varying hyper-parameters using different packages from different programming languages were fitted to a large population based primary care cohort. However, the study has several limitations. We considered only predictors from QRISK3 in order to compare models on the basis of equal information, but sensitivity analyses showed similar findings of inconsistencies in cardiovascular disease risk prediction independent of the number of predictors. Furthermore, more hyper-parameters in the machine learning models could have been considered in the grid search process. However, the fitted models already achieved reasonably high model performance, which indicates that the main hyper-parameters had been covered in the grid search process. Several machine learning algorithms were not included in this study, such as support vector machine or survival random forest, as the current software packages of these models cannot cope with large datasets.52 53 54 55 The Bland-Altman graph used the 95% range of differences rather than 95% confidence interval, as the differences of predicted risk (including log transformed) did not follow normal distribution (which is a required assumption to calculate the Bland-Altman 95% confidence interval). Another limitation is that this study concerned cardiovascular disease risk prediction in primary care, and findings may not be generalisable to other outcomes or settings. However, the robustness of individual risk predictions within and between models with comparable population level performance is rarely, if ever, evaluated. Our findings indicate the importance of assessing this.

Conclusions

A variety of models predicted cardiovascular disease risks for the same patients very differently despite similar model performances. Using the logistic model and commonly used machine learning models without considering censoring in survival analysis results in substantially biased risk prediction and has limited usefulness in the prediction of long term risks. The level of consistency within and between models should be assessed before they are used for clinical decision making and should be considered in TRIPOD guidelines.

What is already known on this topic

Risk prediction models are widely used in clinical practice (such as QRISK or Framingham for cardiovascular disease)
Multiple techniques can be used for these predictions, and recent studies claim that machine learning models can outperform models such as QRISK

What this study adds

Nineteen different prediction techniques (including 12 machine learning models and seven statistical models) yielded similar population level performance
However, cardiovascular disease risk predictions for the same patients varied substantially between models
Models that ignored censoring (including commonly used machine learning models) yielded biased risk predictions

Footnotes

Contributors: YL designed the study, did all statistical analysis, produced all tables and figures, and wrote the main manuscript text and supplementary materials. MS supervised the study, provided quality control on statistical analysis, reviewed all statistical results, and reviewed and edited the main manuscript text. DMA reviewed and edited the main manuscript text and supplementary materials. TPvS designed and supervised the study, provided quality control of all aspects of the paper, and wrote the main manuscript text. The corresponding author attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted. TPvS is the guarantor.
Funding: This study was funded by the China Scholarship Council (to cover costs of doctoral studentship of YL at the University of Manchester). The funder did not participate in the research or review any details of this study; the other authors are independent of the funder.
Competing interests: All authors have completed the ICMJE uniform disclosure form at www.icmje.org/coi_disclosure.pdf and declare: support to YL from the China Scholarship Council; no financial relationships with any organisations that might have an interest in the submitted work in the previous three years; no other relationships or activities that could appear to have influenced the submitted work.
Ethical approval: This study is based on data from Clinical Practice Research Datalink (CPRD) obtained under licence from the UK Medicines and Healthcare products Regulatory Agency. The protocol for this work was approved by the independent scientific advisory committee for CPRD research (No 19_054R). The data are provided by patients and collected by the NHS as part of their care and support. The Office for National Statistics (ONS) is the provider of the ONS data contained within the CPRD data. Hospital Episode Statistics data and the ONS data (copyright 2014) are re-used with the permission of the Health and Social Care Information Centre.
Data sharing: This study is based on CPRD data and is subject to a full licence agreement, which does not permit data sharing outside of the research team. Code lists are available from the corresponding author.
The lead author affirms that the manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as planned (and, if relevant, registered) have been explained.
Dissemination to participants and related patient and public communities: Dissemination to research participants is not possible as data were anonymised.
Provenance and peer review: Not commissioned; externally peer reviewed.

http://creativecommons.org/licenses/by-nc/4.0/

This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

References

↵
National Institute for Health and Care Excellence. NICE recommends wider use of statins for prevention of CVD. https://www.nice.org.uk/news/article/nice-recommends-wider-use-of-statins-for-prevention-of-cvd.
↵
1. Mahmood SS,
2. Levy D,
3. Vasan RS,
4. Wang TJ
. The Framingham Heart Study and the epidemiology of cardiovascular disease: a historical perspective. Lancet2014;383:999-1008. doi:10.1016/S0140-6736(13)61752-3 pmid:24084292
OpenUrl CrossRef PubMed
↵
1. Hippisley-Cox J,
2. Coupland C,
3. Brindle P
. Development and validation of QRISK3 risk prediction algorithms to estimate future risk of cardiovascular disease: prospective cohort study. BMJ2017;357:j2099. doi:10.1136/bmj.j2099 pmid:28536104
OpenUrl Abstract/FREE Full Text
↵
Gov.uk. Health Secretary announces £250 million investment in artificial intelligence. 2019. https://www.gov.uk/government/news/health-secretary-announces-250-million-investment-in-artificial-intelligence.
↵
1. Hinton G
. Deep Learning-A Technology With the Potential to Transform Health Care. JAMA2018;320:1101-2. doi:10.1001/jama.2018.11100 pmid:30178065
OpenUrl CrossRef PubMed
↵
1. Steele AJ,
2. Denaxas SC,
3. Shah AD,
4. Hemingway H,
5. Luscombe NM
. Machine learning models in electronic health records can outperform conventional survival models for predicting patient mortality in coronary artery disease. PLoS One2018;13:e0202344. doi:10.1371/journal.pone.0202344 pmid:30169498
OpenUrl CrossRef PubMed
↵
1. Alaa AM,
2. Bolton T,
3. Di Angelantonio E,
4. Rudd JHF,
5. van der Schaar M
. Cardiovascular disease risk prediction using automated machine learning: A prospective study of 423,604 UK Biobank participants. PLoS One2019;14:e0213653. doi:10.1371/journal.pone.0213653 pmid:31091238
OpenUrl CrossRef PubMed
↵
1. Weng SF,
2. Reps J,
3. Kai J,
4. Garibaldi JM,
5. Qureshi N
. Can machine-learning improve cardiovascular risk prediction using routine clinical data?PLoS One2017;12:e0174944. doi:10.1371/journal.pone.0174944 pmid:28376093
OpenUrl CrossRef PubMed
↵
1. Al’Aref SJ,
2. Anchouche K,
3. Singh G,
4. et al
. Clinical applications of machine learning in cardiovascular disease and its relevance to cardiac imaging. Eur Heart J2019;40:1975-86. doi:10.1093/eurheartj/ehy404 pmid:30060039
OpenUrl CrossRef PubMed
↵
1. Price WN 2nd.,
2. Gerke S,
3. Cohen IG
. Potential Liability for Physicians Using Artificial Intelligence. JAMA2019. doi:10.1001/jama.2019.15064 pmid:31584609
OpenUrl CrossRef PubMed
↵
1. Christodoulou E,
2. Ma J,
3. Collins GS,
4. Steyerberg EW,
5. Verbakel JY,
6. Van Calster B
. A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. J Clin Epidemiol2019;110:12-22. doi:10.1016/j.jclinepi.2019.02.004 pmid:30763612
OpenUrl CrossRef PubMed
↵
1. Carin L,
2. Pencina MJ
. On deep learning for medical image analysis. JAMA2018;320:1192-3. doi:10.1001/jama.2018.13316 pmid:30422287
OpenUrl CrossRef PubMed
↵
1. Gulshan V,
2. Peng L,
3. Coram M,
4. et al
. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA2016;316:2402-10. doi:10.1001/jama.2016.17216 pmid:27898976
OpenUrl CrossRef PubMed
↵
1. Maddox TM,
2. Rumsfeld JS,
3. Payne PRO
. Questions for Artificial Intelligence in Health Care. JAMA2019;321:31-2. doi:10.1001/jama.2018.18932 pmid:30535130
OpenUrl CrossRef PubMed
↵
1. Nsoesie EO
. Evaluating Artificial Intelligence Applications in Clinical Settings. JAMA Netw Open2018;1:e182658. doi:10.1001/jamanetworkopen.2018.2658 pmid:30646173
OpenUrl CrossRef PubMed
↵
1. Vock DM,
2. Wolfson J,
3. Bandyopadhyay S,
4. et al
. Adapting machine learning techniques to censored time-to-event health record data: A general-purpose approach using inverse probability of censoring weighting. J Biomed Inform2016;61:119-31. doi:10.1016/j.jbi.2016.03.009 pmid:26992568
OpenUrl CrossRef PubMed
↵
1. Briggs WL
. Uncertainty: the soul of modeling, probability & statistics.Springer, 2018.
↵
1. Li Y,
2. Sperrin M,
3. Belmonte M,
4. Pate A,
5. Ashcroft DM,
6. van Staa TP
. Do population-level risk prediction models that use routinely collected health data reliably predict individual risks?Sci Rep2019;9:11222. doi:10.1038/s41598-019-47712-5 pmid:31375726
OpenUrl CrossRef PubMed
↵
1. Pate A,
2. Emsley R,
3. Ashcroft DM,
4. Brown B,
5. van Staa T
. The uncertainty with using risk prediction models for individual decision making: an exemplar cohort study examining the prediction of cardiovascular disease in English primary care. BMC Med2019;17:134. doi:10.1186/s12916-019-1368-8 pmid:31311543
OpenUrl CrossRef PubMed
↵
1. Herrett E,
2. Gallagher AM,
3. Bhaskaran K,
4. et al
. Data Resource Profile: Clinical Practice Research Datalink (CPRD). Int J Epidemiol2015;44:827-36. doi:10.1093/ije/dyv098 pmid:26050254
OpenUrl CrossRef PubMed
↵
CPRD. Clinical Practice Research Datalink. https://www.cprd.com/.
↵
1. Danese MD,
2. Gleeson M,
3. Griffiths RI,
4. Catterick D,
5. Kutikova L
. Methods for estimating costs in patients with hyperlipidemia experiencing their first cardiovascular event in the United Kingdom. J Med Econ2017;20:931-7. doi:10.1080/13696998.2017.1345747 pmid:28635339
OpenUrl CrossRef PubMed
↵
1. Hippisley-Cox J,
2. Coupland C,
3. Brindle P
. The performance of seven QPrediction risk scores in an independent external sample of patients from general practice: a validation study. BMJ Open2014;4:e005809. doi:10.1136/bmjopen-2014-005809 pmid:25168040
OpenUrl Abstract/FREE Full Text
↵
1. van Staa T-P,
2. Gulliford M,
3. Ng ES-W,
4. Goldacre B,
5. Smeeth L
. Prediction of cardiovascular risk using Framingham, ASSIGN and QRISK2: how well do they predict individual rather than population risk?PLoS One2014;9:e106455. doi:10.1371/journal.pone.0106455 pmid:25271417
OpenUrl CrossRef PubMed
↵
1. Anderson KM,
2. Wilson PW,
3. Odell PM,
4. Kannel WB
. An updated coronary risk profile. A statement for health professionals. Circulation1991;83:356-62. doi:10.1161/01.CIR.83.1.356 pmid:1984895
OpenUrl FREE Full Text
↵
1. Nelder JA,
2. Wedderburn RWM
. Generalized Linear Models. J R Stat Soc [Ser A]1972;135:370doi:10.2307/2344614.
OpenUrl CrossRef Web of Science
↵
Breiman L. Random Forests. 2001. https://www.stat.berkeley.edu/~breiman/randomforest2001.pdf.
↵
Demuth H, De Jesús B. Neural Network Design. 2nd ed. https://hagan.okstate.edu/NNDesign.pdf.
↵
Kuhn M. The caret package. 2019. http://topepo.github.io/caret/index.html.
↵
1. Géron A
. Hands-on machine learning with Scikit-Learn and TensorFlow: concepts, tools, and techniques to build intelligent systems.2nd ed. O’Reilly, 2019: 551.
↵
1. scikit-learn
. About us. https://scikit-learn.org/stable/about.html.
↵
H2O.ai. AutoML: Automatic Machine Learning. http://docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html?highlight=automl.
↵
H2O.ai. What is H2O? http://docs.h2o.ai/h2o/latest-stable/h2o-py/docs/intro.html#what-is-h2o.
↵
1. Altman DG,
2. Bland JM
. Measurement in Medicine: the Analysis of Method Comparison Studies. Statistician1983;32:307-17doi:10.2307/2987937.
OpenUrl CrossRef Web of Science
↵
SAS. SAS 9.4 MERGE Statement. http://support.sas.com/documentation/cdl/en/lestmtsref/69738/HTML/default/viewer.htm#n1i8w2bwu1fn5kn1gpxj18xttbb0.htm.
↵
The Alan Turing Institute. Turing Lecture: Transforming medicine through AI-enabled healthcare pathways. 2019. https://www.youtube.com/watch?v=TWI-WIoWvfk&feature=youtu.be&_cldee=dGplZXJkLnZhbnN0YWFAbWFuY2hlc3Rlci5hYy51aw%3D%3D&recipientid=contact-d2c6e6742b58e811812370106faae7f1-40027ba23fa146c189b8a3077e37916a&esid=2d3fc1d1-6593-e911-a98b-002248014cd6.
↵
1. Makridakis S,
2. Spiliotis E,
3. Assimakopoulos V
. Statistical and Machine Learning forecasting methods: Concerns and ways forward. PLoS One2018;13:e0194889. doi:10.1371/journal.pone.0194889 pmid:29584784
OpenUrl CrossRef PubMed
↵
1. Steyerberg EW
. Clinical prediction models : a practical approach to development, validation, and updating.Springer, 2009: 497doi:10.1007/978-0-387-77244-8.
OpenUrl CrossRef
↵
1. Kattan MW,
2. Hess KR,
3. Beck JR
. Experiments to determine whether recursive partitioning (CART) or an artificial neural network overcomes theoretical limitations of Cox proportional hazards regression. Comput Biomed Res1998;31:363-73. doi:10.1006/cbmr.1998.1488 pmid:9790741
OpenUrl CrossRef PubMed
↵
1. Desai RJ,
2. Wang SV,
3. Vaduganathan M,
4. Evers T,
5. Schneeweiss S
. Comparison of Machine Learning Methods With Traditional Models for Use of Administrative Claims With Electronic Medical Records to Predict Heart Failure Outcomes. JAMA Netw Open2020;3:e1918962. doi:10.1001/jamanetworkopen.2019.18962 pmid:31922560
OpenUrl CrossRef PubMed
↵
1. Kakadiaris IA,
2. Vrigkas M,
3. Yen AA,
4. Kuznetsova T,
5. Budoff M,
6. Naghavi M
. Machine Learning Outperforms ACC / AHA CVD Risk Calculator in MESA. J Am Heart Assoc2018;7:e009476. doi:10.1161/JAHA.118.009476 pmid:30571498
OpenUrl CrossRef PubMed
↵
1. Ishwaran H,
2. Kogalur UB,
3. Blackstone EH,
4. Lauer MS
. RANDOM SURVIVAL FORESTS 1. Ann Appl Stat2008;2:841-60doi:10.1214/08-AOAS169.
OpenUrl CrossRef Web of Science
↵
1. Kvamme H,
2. Borgan Ø,
3. Scheel I
. Time-to-Event Prediction with Neural Networks and Cox Regression. J Mach Learn Res2019;20:1-30.
OpenUrl
↵
1. Cook NR
. Use and misuse of the receiver operating characteristic curve in risk prediction. Circulation2007;115:928-35. doi:10.1161/CIRCULATIONAHA.106.672402 pmid:17309939
OpenUrl Abstract/FREE Full Text
↵
1. Lee YH,
2. Bang H,
3. Kim DJ
. How to Establish Clinical Prediction Models. Endocrinol Metab (Seoul)2016;31:38-44. doi:10.3803/EnM.2016.31.1.38 pmid:26996421
OpenUrl CrossRef PubMed
↵
1. Austin PC,
2. Steyerberg EW
. Interpreting the concordance statistic of a logistic regression model: relation to the variance and odds ratio of a continuous explanatory variable. BMC Med Res Methodol2012;12:82. doi:10.1186/1471-2288-12-82 pmid:22716998
OpenUrl CrossRef PubMed
↵
1. Gianfrancesco MA,
2. Tamang S,
3. Yazdany J,
4. Schmajuk G
. Potential Biases in Machine Learning Algorithms Using Electronic Health Record Data. JAMA Intern Med2018;178:1544-7. doi:10.1001/jamainternmed.2018.3763 pmid:30128552
OpenUrl CrossRef PubMed
↵
1. Collins GS,
2. Reitsma JB,
3. Altman DG,
4. Moons KGM,
5. members of the TRIPOD group
. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): The TRIPOD Statement. Eur Urol2015;67:1142-51. doi:10.1016/j.eururo.2014.11.025 pmid:25572824
OpenUrl CrossRef PubMed
↵
1. Kerr KF,
2. Janes H
. First things first: risk model performance metrics should reflect the clinical application. Stat Med2017;36:4503-8. doi:10.1002/sim.7341 pmid:29156498
OpenUrl CrossRef PubMed
↵
1. Vollmer S,
2. Mateen BA,
3. Bohner G,
4. et al
. Machine learning and artificial intelligence research for patient benefit: 20 critical questions on transparency, replicability, ethics, and effectiveness. BMJ2020;368:l6927. doi:10.1136/bmj.l6927 pmid:32198138
OpenUrl FREE Full Text
↵
1. Barredo Arrieta A,
2. Díaz-Rodríguez N,
3. Del Ser J,
4. et al
. Explainable Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Inf Fusion2020;58:82-115doi:10.1016/j.inffus.2019.12.012.
OpenUrl CrossRef
↵
Menon AK. Large-Scale Support Vector Machines: Algorithms and Theory. http://cseweb.ucsd.edu/~akmenon/ResearchExam.pdf.
↵
Wright MN, Wager S, Probst P. Package ‘ranger’. 2020. https://cran.r-project.org/web/packages/ranger/ranger.pdf.
↵
Ishwaran H, Kogalur U. randomForestSRC: Fast Unified Random Forests for Survival, Regression, and Classification (RF-SRC). R package version 2.9.3. 2020. https://cran.r-project.org/package=randomForestSRC.
↵
Therneau TM, Atkinson EJ. An Introduction to Recursive Partitioning Using the RPART Routines. 2019. https://cran.r-project.org/web/packages/rpart/vignettes/longintro.pdf.

[1] ↵
National Institute for Health and Care Excellence. NICE recommends wider use of statins for prevention of CVD. https://www.nice.org.uk/news/article/nice-recommends-wider-use-of-statins-for-prevention-of-cvd.

[2] ↵
Mahmood SS,
Levy D,
Vasan RS,
Wang TJ
. The Framingham Heart Study and the epidemiology of cardiovascular disease: a historical perspective. Lancet2014;383:999-1008. doi:10.1016/S0140-6736(13)61752-3 pmid:24084292
OpenUrl CrossRef PubMed

[3] Mahmood SS,

[4] Levy D,

[5] Vasan RS,

[6] Wang TJ

[7] ↵
Hippisley-Cox J,
Coupland C,
Brindle P
. Development and validation of QRISK3 risk prediction algorithms to estimate future risk of cardiovascular disease: prospective cohort study. BMJ2017;357:j2099. doi:10.1136/bmj.j2099 pmid:28536104
OpenUrl Abstract/FREE Full Text

[8] Hippisley-Cox J,

[9] Coupland C,

[10] Brindle P

[11] ↵
Gov.uk. Health Secretary announces £250 million investment in artificial intelligence. 2019. https://www.gov.uk/government/news/health-secretary-announces-250-million-investment-in-artificial-intelligence.

[12] ↵
Hinton G
. Deep Learning-A Technology With the Potential to Transform Health Care. JAMA2018;320:1101-2. doi:10.1001/jama.2018.11100 pmid:30178065
OpenUrl CrossRef PubMed

[13] Hinton G

[14] ↵
Steele AJ,
Denaxas SC,
Shah AD,
Hemingway H,
Luscombe NM
. Machine learning models in electronic health records can outperform conventional survival models for predicting patient mortality in coronary artery disease. PLoS One2018;13:e0202344. doi:10.1371/journal.pone.0202344 pmid:30169498
OpenUrl CrossRef PubMed

[15] Steele AJ,

[16] Denaxas SC,

[17] Shah AD,

[18] Hemingway H,

[19] Luscombe NM

[20] ↵
Alaa AM,
Bolton T,
Di Angelantonio E,
Rudd JHF,
van der Schaar M
. Cardiovascular disease risk prediction using automated machine learning: A prospective study of 423,604 UK Biobank participants. PLoS One2019;14:e0213653. doi:10.1371/journal.pone.0213653 pmid:31091238
OpenUrl CrossRef PubMed

[21] Alaa AM,

[22] Bolton T,

[23] Di Angelantonio E,

[24] Rudd JHF,

[25] van der Schaar M

[26] ↵
Weng SF,
Reps J,
Kai J,
Garibaldi JM,
Qureshi N
. Can machine-learning improve cardiovascular risk prediction using routine clinical data?PLoS One2017;12:e0174944. doi:10.1371/journal.pone.0174944 pmid:28376093
OpenUrl CrossRef PubMed

[27] Weng SF,

[28] Reps J,

[29] Kai J,

[30] Garibaldi JM,

[31] Qureshi N

[32] ↵
Al’Aref SJ,
Anchouche K,
Singh G,
et al
. Clinical applications of machine learning in cardiovascular disease and its relevance to cardiac imaging. Eur Heart J2019;40:1975-86. doi:10.1093/eurheartj/ehy404 pmid:30060039
OpenUrl CrossRef PubMed

[33] Al’Aref SJ,

[34] Anchouche K,

[35] Singh G,

[36] et al

[37] ↵
Price WN 2nd.,
Gerke S,
Cohen IG
. Potential Liability for Physicians Using Artificial Intelligence. JAMA2019. doi:10.1001/jama.2019.15064 pmid:31584609
OpenUrl CrossRef PubMed

[38] Price WN 2nd.,

[39] Gerke S,

[40] Cohen IG

[41] ↵
Christodoulou E,
Ma J,
Collins GS,
Steyerberg EW,
Verbakel JY,
Van Calster B
. A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. J Clin Epidemiol2019;110:12-22. doi:10.1016/j.jclinepi.2019.02.004 pmid:30763612
OpenUrl CrossRef PubMed

[42] Christodoulou E,

[43] Ma J,

[44] Collins GS,

[45] Steyerberg EW,

[46] Verbakel JY,

[47] Van Calster B

[48] ↵
Carin L,
Pencina MJ
. On deep learning for medical image analysis. JAMA2018;320:1192-3. doi:10.1001/jama.2018.13316 pmid:30422287
OpenUrl CrossRef PubMed

[49] Carin L,

[50] Pencina MJ

[51] ↵
Gulshan V,
Peng L,
Coram M,
et al
. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA2016;316:2402-10. doi:10.1001/jama.2016.17216 pmid:27898976
OpenUrl CrossRef PubMed

[52] Gulshan V,

[53] Peng L,

[54] Coram M,

[55] et al

[56] ↵
Maddox TM,
Rumsfeld JS,
Payne PRO
. Questions for Artificial Intelligence in Health Care. JAMA2019;321:31-2. doi:10.1001/jama.2018.18932 pmid:30535130
OpenUrl CrossRef PubMed

[57] Maddox TM,

[58] Rumsfeld JS,

[59] Payne PRO

[60] ↵
Nsoesie EO
. Evaluating Artificial Intelligence Applications in Clinical Settings. JAMA Netw Open2018;1:e182658. doi:10.1001/jamanetworkopen.2018.2658 pmid:30646173
OpenUrl CrossRef PubMed

[61] Nsoesie EO

[62] ↵
Vock DM,
Wolfson J,
Bandyopadhyay S,
et al
. Adapting machine learning techniques to censored time-to-event health record data: A general-purpose approach using inverse probability of censoring weighting. J Biomed Inform2016;61:119-31. doi:10.1016/j.jbi.2016.03.009 pmid:26992568
OpenUrl CrossRef PubMed

[63] Vock DM,

[64] Wolfson J,

[65] Bandyopadhyay S,

[66] et al

[67] ↵
Briggs WL
. Uncertainty: the soul of modeling, probability & statistics.Springer, 2018.

[68] Briggs WL

[69] ↵
Li Y,
Sperrin M,
Belmonte M,
Pate A,
Ashcroft DM,
van Staa TP
. Do population-level risk prediction models that use routinely collected health data reliably predict individual risks?Sci Rep2019;9:11222. doi:10.1038/s41598-019-47712-5 pmid:31375726
OpenUrl CrossRef PubMed

[70] Li Y,

[71] Sperrin M,

[72] Belmonte M,

[73] Pate A,

[74] Ashcroft DM,

[75] van Staa TP

[76] ↵
Pate A,
Emsley R,
Ashcroft DM,
Brown B,
van Staa T
. The uncertainty with using risk prediction models for individual decision making: an exemplar cohort study examining the prediction of cardiovascular disease in English primary care. BMC Med2019;17:134. doi:10.1186/s12916-019-1368-8 pmid:31311543
OpenUrl CrossRef PubMed

[77] Pate A,

[78] Emsley R,

[79] Ashcroft DM,

[80] Brown B,

[81] van Staa T

[82] ↵
Herrett E,
Gallagher AM,
Bhaskaran K,
et al
. Data Resource Profile: Clinical Practice Research Datalink (CPRD). Int J Epidemiol2015;44:827-36. doi:10.1093/ije/dyv098 pmid:26050254
OpenUrl CrossRef PubMed

[83] Herrett E,

[84] Gallagher AM,

[85] Bhaskaran K,

[86] et al

[87] ↵
CPRD. Clinical Practice Research Datalink. https://www.cprd.com/.

[88] ↵
Danese MD,
Gleeson M,
Griffiths RI,
Catterick D,
Kutikova L
. Methods for estimating costs in patients with hyperlipidemia experiencing their first cardiovascular event in the United Kingdom. J Med Econ2017;20:931-7. doi:10.1080/13696998.2017.1345747 pmid:28635339
OpenUrl CrossRef PubMed

[89] Danese MD,

[90] Gleeson M,

[91] Griffiths RI,

[92] Catterick D,

[93] Kutikova L

[94] ↵
Hippisley-Cox J,
Coupland C,
Brindle P
. The performance of seven QPrediction risk scores in an independent external sample of patients from general practice: a validation study. BMJ Open2014;4:e005809. doi:10.1136/bmjopen-2014-005809 pmid:25168040
OpenUrl Abstract/FREE Full Text

[95] Hippisley-Cox J,

[96] Coupland C,

[97] Brindle P

[98] ↵
van Staa T-P,
Gulliford M,
Ng ES-W,
Goldacre B,
Smeeth L
. Prediction of cardiovascular risk using Framingham, ASSIGN and QRISK2: how well do they predict individual rather than population risk?PLoS One2014;9:e106455. doi:10.1371/journal.pone.0106455 pmid:25271417
OpenUrl CrossRef PubMed

[99] van Staa T-P,

[100] Gulliford M,

[101] Ng ES-W,

[102] Goldacre B,

[103] Smeeth L

[104] ↵
Anderson KM,
Wilson PW,
Odell PM,
Kannel WB
. An updated coronary risk profile. A statement for health professionals. Circulation1991;83:356-62. doi:10.1161/01.CIR.83.1.356 pmid:1984895
OpenUrl FREE Full Text

[105] Anderson KM,

[106] Wilson PW,

[107] Odell PM,

[108] Kannel WB

[109] ↵
Nelder JA,
Wedderburn RWM
. Generalized Linear Models. J R Stat Soc [Ser A]1972;135:370doi:10.2307/2344614.
OpenUrl CrossRef Web of Science

[110] Nelder JA,

[111] Wedderburn RWM

[112] ↵
Breiman L. Random Forests. 2001. https://www.stat.berkeley.edu/~breiman/randomforest2001.pdf.

[113] ↵
Demuth H, De Jesús B. Neural Network Design. 2nd ed. https://hagan.okstate.edu/NNDesign.pdf.

[114] ↵
Kuhn M. The caret package. 2019. http://topepo.github.io/caret/index.html.

[115] ↵
Géron A
. Hands-on machine learning with Scikit-Learn and TensorFlow: concepts, tools, and techniques to build intelligent systems.2nd ed. O’Reilly, 2019: 551.

[116] Géron A

[117] ↵
scikit-learn
. About us. https://scikit-learn.org/stable/about.html.

[118] scikit-learn

[119] ↵
H2O.ai. AutoML: Automatic Machine Learning. http://docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html?highlight=automl.

[120] ↵
H2O.ai. What is H2O? http://docs.h2o.ai/h2o/latest-stable/h2o-py/docs/intro.html#what-is-h2o.

[121] ↵
Altman DG,
Bland JM
. Measurement in Medicine: the Analysis of Method Comparison Studies. Statistician1983;32:307-17doi:10.2307/2987937.
OpenUrl CrossRef Web of Science

[122] Altman DG,

[123] Bland JM

[124] ↵
SAS. SAS 9.4 MERGE Statement. http://support.sas.com/documentation/cdl/en/lestmtsref/69738/HTML/default/viewer.htm#n1i8w2bwu1fn5kn1gpxj18xttbb0.htm.

[125] ↵
The Alan Turing Institute. Turing Lecture: Transforming medicine through AI-enabled healthcare pathways. 2019. https://www.youtube.com/watch?v=TWI-WIoWvfk&feature=youtu.be&_cldee=dGplZXJkLnZhbnN0YWFAbWFuY2hlc3Rlci5hYy51aw%3D%3D&recipientid=contact-d2c6e6742b58e811812370106faae7f1-40027ba23fa146c189b8a3077e37916a&esid=2d3fc1d1-6593-e911-a98b-002248014cd6.

[126] ↵
Makridakis S,
Spiliotis E,
Assimakopoulos V
. Statistical and Machine Learning forecasting methods: Concerns and ways forward. PLoS One2018;13:e0194889. doi:10.1371/journal.pone.0194889 pmid:29584784
OpenUrl CrossRef PubMed

[127] Makridakis S,

[128] Spiliotis E,

[129] Assimakopoulos V

[130] ↵
Steyerberg EW
. Clinical prediction models : a practical approach to development, validation, and updating.Springer, 2009: 497doi:10.1007/978-0-387-77244-8.
OpenUrl CrossRef

[131] Steyerberg EW

[132] ↵
Kattan MW,
Hess KR,
Beck JR
. Experiments to determine whether recursive partitioning (CART) or an artificial neural network overcomes theoretical limitations of Cox proportional hazards regression. Comput Biomed Res1998;31:363-73. doi:10.1006/cbmr.1998.1488 pmid:9790741
OpenUrl CrossRef PubMed

[133] Kattan MW,

[134] Hess KR,

[135] Beck JR

[136] ↵
Desai RJ,
Wang SV,
Vaduganathan M,
Evers T,
Schneeweiss S
. Comparison of Machine Learning Methods With Traditional Models for Use of Administrative Claims With Electronic Medical Records to Predict Heart Failure Outcomes. JAMA Netw Open2020;3:e1918962. doi:10.1001/jamanetworkopen.2019.18962 pmid:31922560
OpenUrl CrossRef PubMed

[137] Desai RJ,

[138] Wang SV,

[139] Vaduganathan M,

[140] Evers T,

[141] Schneeweiss S

[142] ↵
Kakadiaris IA,
Vrigkas M,
Yen AA,
Kuznetsova T,
Budoff M,
Naghavi M
. Machine Learning Outperforms ACC / AHA CVD Risk Calculator in MESA. J Am Heart Assoc2018;7:e009476. doi:10.1161/JAHA.118.009476 pmid:30571498
OpenUrl CrossRef PubMed

[143] Kakadiaris IA,

[144] Vrigkas M,

[145] Yen AA,

[146] Kuznetsova T,

[147] Budoff M,

[148] Naghavi M

[149] ↵
Ishwaran H,
Kogalur UB,
Blackstone EH,
Lauer MS
. RANDOM SURVIVAL FORESTS 1. Ann Appl Stat2008;2:841-60doi:10.1214/08-AOAS169.
OpenUrl CrossRef Web of Science

[150] Ishwaran H,

[151] Kogalur UB,

[152] Blackstone EH,

[153] Lauer MS

[154] ↵
Kvamme H,
Borgan Ø,
Scheel I
. Time-to-Event Prediction with Neural Networks and Cox Regression. J Mach Learn Res2019;20:1-30.
OpenUrl

[155] Kvamme H,

[156] Borgan Ø,

[157] Scheel I

[158] ↵
Cook NR
. Use and misuse of the receiver operating characteristic curve in risk prediction. Circulation2007;115:928-35. doi:10.1161/CIRCULATIONAHA.106.672402 pmid:17309939
OpenUrl Abstract/FREE Full Text

[159] Cook NR

[160] ↵
Lee YH,
Bang H,
Kim DJ
. How to Establish Clinical Prediction Models. Endocrinol Metab (Seoul)2016;31:38-44. doi:10.3803/EnM.2016.31.1.38 pmid:26996421
OpenUrl CrossRef PubMed

[161] Lee YH,

[162] Bang H,

[163] Kim DJ

[164] ↵
Austin PC,
Steyerberg EW
. Interpreting the concordance statistic of a logistic regression model: relation to the variance and odds ratio of a continuous explanatory variable. BMC Med Res Methodol2012;12:82. doi:10.1186/1471-2288-12-82 pmid:22716998
OpenUrl CrossRef PubMed

[165] Austin PC,

[166] Steyerberg EW

[167] ↵
Gianfrancesco MA,
Tamang S,
Yazdany J,
Schmajuk G
. Potential Biases in Machine Learning Algorithms Using Electronic Health Record Data. JAMA Intern Med2018;178:1544-7. doi:10.1001/jamainternmed.2018.3763 pmid:30128552
OpenUrl CrossRef PubMed

[168] Gianfrancesco MA,

[169] Tamang S,

[170] Yazdany J,

[171] Schmajuk G

[172] ↵
Collins GS,
Reitsma JB,
Altman DG,
Moons KGM,
members of the TRIPOD group
. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): The TRIPOD Statement. Eur Urol2015;67:1142-51. doi:10.1016/j.eururo.2014.11.025 pmid:25572824
OpenUrl CrossRef PubMed

[173] Collins GS,

[174] Reitsma JB,

[175] Altman DG,

[176] Moons KGM,

[177] members of the TRIPOD group

[178] ↵
Kerr KF,
Janes H
. First things first: risk model performance metrics should reflect the clinical application. Stat Med2017;36:4503-8. doi:10.1002/sim.7341 pmid:29156498
OpenUrl CrossRef PubMed

[179] Kerr KF,

[180] Janes H

[181] ↵
Vollmer S,
Mateen BA,
Bohner G,
et al
. Machine learning and artificial intelligence research for patient benefit: 20 critical questions on transparency, replicability, ethics, and effectiveness. BMJ2020;368:l6927. doi:10.1136/bmj.l6927 pmid:32198138
OpenUrl FREE Full Text

[182] Vollmer S,

[183] Mateen BA,

[184] Bohner G,

[185] et al

[186] ↵
Barredo Arrieta A,
Díaz-Rodríguez N,
Del Ser J,
et al
. Explainable Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Inf Fusion2020;58:82-115doi:10.1016/j.inffus.2019.12.012.
OpenUrl CrossRef

[187] Barredo Arrieta A,

[188] Díaz-Rodríguez N,

[189] Del Ser J,

[190] et al

[191] ↵
Menon AK. Large-Scale Support Vector Machines: Algorithms and Theory. http://cseweb.ucsd.edu/~akmenon/ResearchExam.pdf.

[192] ↵
Wright MN, Wager S, Probst P. Package ‘ranger’. 2020. https://cran.r-project.org/web/packages/ranger/ranger.pdf.

[193] ↵
Ishwaran H, Kogalur U. randomForestSRC: Fast Unified Random Forests for Survival, Regression, and Classification (RF-SRC). R package version 2.9.3. 2020. https://cran.r-project.org/package=randomForestSRC.

[194] ↵
Therneau TM, Atkinson EJ. An Introduction to Recursive Partitioning Using the RPART Routines. 2019. https://cran.r-project.org/web/packages/rpart/vignettes/longintro.pdf.

Consistency of variety of machine learning and statistical models in predicting clinical risks of individual patients: longitudinal cohort study using cardiovascular disease as exemplar

Linked Opinion

Abstract

Introduction

Methods

Data source

Study population

Cardiovascular disease risk factors

Machine learning and Cox models

Statistical analysis

Patient and public involvement

Results

Discussion

Comparison with other studies

Strengths and limitations of study

Conclusions

What is already known on this topic

What this study adds

Footnotes

References

Article alerts

Log in or register:

Download this article to citation manager

Help

Forward this page

Content links

About us

Resources

Explore BMJ

My account

Information

Search form

Consistency of variety of machine learning and statistical models in predicting clinical risks of individual patients: longitudinal cohort study using cardiovascular disease as exemplar

Linked Opinion

Abstract

Introduction

Methods

Data source

Study population

Cardiovascular disease risk factors

Machine learning and Cox models

Statistical analysis

Patient and public involvement

Results

Discussion

Comparison with other studies

Strengths and limitations of study

Conclusions

What is already known on this topic

What this study adds

Footnotes

References

Article alerts

Log in or register:

Download this article to citation manager

Help

Forward this page

Content links

About us

Resources

Explore BMJ

My account

Information