Development of rapid and effective risk prediction models for stroke in the Chinese population: a cross-sectional study

Yuexin Qiu; Shiqi Cheng; Yuhang Wu; Wei Yan; Songbo Hu; Yiying Chen; Yan Xu; Xiaona Chen; Junsai Yang; Xiaoyun Chen; Huilie Zheng

doi:10.1136/bmjopen-2022-068045

Article Text

PDF

Public health

Original research

Development of rapid and effective risk prediction models for stroke in the Chinese population: a cross-sectional study

Yuexin Qiu1,2,
Shiqi Cheng3,
Yuhang Wu4,
Wei Yan5,
Songbo Hu1,2,
Yiying Chen5,
Yan Xu5,
Xiaona Chen5,
Junsai Yang1,2,
Xiaoyun Chen1,2,
http://orcid.org/0000-0003-2774-0757Huilie Zheng1,2

¹School of Public Health, Nanchang University, Nanchang, Jiangxi, China
²Key Laboratory of Preventive Medicine, Nanchang University, Nanchang, Jiangxi, China
³Neurosurgery Department, Nanchang University Second Affiliated Hospital, Nanchang, Jiangxi, China
⁴Department of Epidemiology and Health Statistics, Central South University, Changsha, Hunan, China
⁵Institute of Chronic Non-communicable Diseases, Center for Disease Control and Prevention of Jiangxi Province, Nanchang, Jiangxi, China

Correspondence to Dr Huilie Zheng; zhenghuilie{at}ncu.edu.cn

Abstract

Objectives The purpose of this study was to use easily obtained and directly observable clinical features to establish predictive models to identify patients at increased risk of stroke.

Setting and participants A total of 46 240 valid records were obtained from 8 research centres and 14 communities in Jiangxi province, China, between February and September 2018.

Primary and secondary outcome measures The area under the receiver operating characteristic curve (AUC), sensitivity, specificity and accuracy were calculated to test the performance of the five models (logistic regression (LR), random forest (RF), decision tree (DT), extreme gradient boosting (XGBoost) and gradient boosting DT). The calibration curve was used to show calibration performance.

Results The results indicated that XGBoost (AUC: 0.924, accuracy: 0.873, sensitivity: 0.776, specificity: 0.916) and RF (AUC: 0.924, accuracy: 0.872, sensitivity: 0.778, specificity: 0.913) demonstrated excellent performance in predicting stroke. Physical inactivity, hypertension, meat-based diet and high salt intake were important prediction features of stroke.

Conclusion The five machine learning models all had good predictive and discriminatory performance for stroke. The performance of RF and XGBoost was slightly better than that of LR, which was easier to interpret and less prone to overfitting. This work provides a rapid and accurate tool for stroke risk assessment, which can help to improve the efficiency of stroke screening medical services and the management of high-risk groups.

stroke
epidemiology
statistics & research methods

Data availability statement

Data are available upon reasonable request. The data presented in this study are available upon request from the corresponding author. The data are not publicly available due to privacy concerns.

http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

https://doi.org/10.1136/bmjopen-2022-068045

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

The study used machine learning algorithms with some simple and readily available clinical features for rapid stroke prediction.
The study compared five different algorithms to find the best model, adding to the limited research on stroke risk prediction in China.
Data were collected from 51 483 participants in Jiangxi province using the multistage stratified random cluster sampling method.
The study was cross-sectional, which might have introduced some bias.
Generalisation of study findings to populations of different ages and outside China should be cautious.

Introduction

Stroke is the leading cause of death and disability worldwide.1 2 China is one of the countries with the heaviest stroke burden in the world, and the burden of stroke has been increasing in the past 30 years.3 Over 76% of strokes occur in those without a history of stroke, and mortality and disability associated with strokes significantly affect the lives of patients.4 The Global Burden of Disease Study reported that stroke incidence decreased by 12% in countries with practical strategies for preventing cerebrovascular risk factors and good health services in 1990–2010.5 Prevention of stroke and related risk factors is an essential priority for global public health, especially for low-income and middle-income countries, such as China.

Early stroke screening is an essential means for effective preventive measures. However, the limitations of stroke screening include expensive examination items and an immeasurable workforce. It is unrealistic to ask doctors to make wide-scale diagnoses of stroke using modalities such as ECG, CT and MRI of the brain. In addition, the lack of self-awareness in high-risk individuals makes them want to be tested only when there is a suspected cerebrovascular disease event. To reduce the incidence of stroke, it is vital to develop a simple and accurate method of screening for stroke.

Machine learning has received intense attention for its robust disease prediction capabilities due to its different classification techniques.6–8 Currently, most machine learning algorithms have been developed as predictive tools for the prognosis of stroke and the occurrence of stroke with other complications, such as using machine learning to predict stroke-associated pneumonia in Chinese patients with acute ischaemic stroke or the outcomes in acute stroke.9–11 In contrast, there is a lack of research on the construction of stroke risk prediction models, especially in China. Previous Western studies have assessed traditional risk factors (smoking, diabetes, etc). They have developed some risk algorithms to provide valid measures of absolute stroke risk in the general population of patients free of stroke or transient ischaemic attack, as shown by their performance.12–14 However, it remains questionable whether these models can be reasonably applied to Chinese or other Western populations. A well-known example is the Framingham Stroke Risk Score (FSRS).15 The FSRS was later modified, particularly for the Chinese population, but the predictive power of the modified model has not been satisfactory.16

The development of appropriate disease prediction algorithms is technically challenging. To date, many classical machine learning algorithms have been applied to create a risk assessment for stroke. Li et al17 used the generalised linear model, Bayes model and decision tree (DT) model to predict the risk of ischaemic stroke and other thromboembolisms in people with atrial fibrillation. Zhang et al18 employed a variety of filter-based feature selection models to improve the ineffective feature selection in existing research on stroke risk detection. Yu et al19 developed a simple, convenient model to predict the risk of stroke among middle-aged and elderly Chinese adults using retrospective cohort datasets. Nevertheless, the sample size is relatively small for developing a prediction model, and the only variables used to build the model are sex, age, hypertension and total cholesterol (TC). Li et al20 developed a logistic regression (LR) model, naïve Bayesian model, Bayesian network model, DT model, neural network model, random forest (RF) model, bagged DT model, voting model and boosting model with DTs to improve stroke risk level classification methods in China. In their study, the outcome of the prediction model was stroke-free individuals at different risk levels determined by the National Stroke Center’s screening and intervention project rather than patients who had a stroke. These studies have performed well in stroke prediction, but they cannot fully address the practical issues facing population-level efforts to prevent stroke, especially in China. Therefore, we aim to establish a machine learning-based prediction model to predict stroke occurrence in the population using a sizeable Chinese population and easily obtained and directly observable clinical features.

Materials and methods

Study population

This study was supported by the National Stroke Center’s screening and intervention project for individuals at high risk of stroke. A total of 51 483 participants (stroke: 18 435; stroke free: 33 048) were recruited in Jiangxi province, China, from February to September 2018. For stroke, we collected electronic health records from eight research centres selected by the National Stroke Center. Stroke was defined by the WHO clinical criteria for stroke.21 The controls were permanent residents without stroke who had lived in the investigation site for more than 6 months; they were all from the 14 counties (cities, districts) randomly selected by the multistage cluster sampling method in the catchment areas or nearby areas of the hospitals where cases were recruited. The stroke status was comprehensively judged and ruled out by the neurologist during the interview and investigation after they asked about the history of the stroke, assessed neurological symptoms and signs, and conducted auxiliary examinations.

We prespecified 13 common independent features related to stroke that have been reported in some previous studies.22–25 During this process, we fully considered the economy, public acceptance, availability in practice and whether prevention can be achieved by interfering with these predictive factors. The China Stroke Primary Prevention Trial has shown that a high homocysteine concentration increases the risk of stroke.26 Eventually, the 13 features included basic predictors (age, sex and area), recognised significant risks (hypertension, smoking, diabetes mellitus, dyslipidaemia and physical inactivity), some modifiable risk factors and characteristics of interest (alcohol intake, high salt intake, meat-based diet, cardiac causes and high homocysteine). Among 51 483 records, a total of 5243 records were excluded. The exclusion criteria were as follows: less than 20 years old; lack of anthropometric information; missing values or abnormal values; and patients with a history of stroke. Finally, a total of 46 240 records were included in this study, as shown in figure 1.

Figure 1

Flow diagram of the study population selected from 8 research centres and 14 counties (cities, districts) in Jiangxi, China.

Data collection

Thirteen variables were included in this study. Cardiac disease was defined as abnormal ECG results or a history of atrial fibrillation, cardiomyopathy, heart failure, ischaemic heart disease, rheumatic heart disease or valvular disease diagnosed by a doctor in secondary or higher hospitals. Hypertension was defined as having a history of being diagnosed with hypertension by a secondary or higher hospital or blood pressure (mean of three measurements) of 140/90 mm Hg or higher. Blood pressure was measured at the time of admission. Diabetes was defined as a history of diabetes or a fasting blood glucose concentration greater than 7.0 mmol/L at the first encounter. Smoking status was defined as cumulative smoking for more than 6 months in a lifetime (current smoking and former smoking). Alcohol intake was classified as never, low or moderate intake and high (more than three times a week and 100 mL each time) intake. Physically active individuals were defined as being involved in moderate or strenuous activity three times or more for 0.5 hours or more per week or those engaged in moderate or severe physical labour. High salt intake and a meat-biased diet were defined by self-reported daily diet preference for salty taste and appreciation for meat, respectively. For obesity, we assessed body mass index (BMI). Individuals with BMI≥30 were defined as obese.24 Dyslipidaemia was defined according to the Chinese guidelines for the prevention and treatment of dyslipidaemia in adults as follows27: triglycerides≥2.26 mmol/L, TC≥6.22 mmol/L, low-density lipoprotein cholesterol≥4.14 mmol/L and high-density lipoprotein cholesterol<1.04 mmol/L. According to the WHO standard, the average level of homocysteine for healthy adults is 5–15 µmol/L, with a homocysteine level>15 µmol/L representing high homocysteine.28 The research patients were classified into urban and rural populations based on their areas of residence.

Patient and public involvement

This research was performed without patient involvement. Patients were not invited to comment on the study design or contribute to the writing or editing of the paper.

Feature preprocessing

The χ² test and Student’s t-test were used for discrete and continuous parameters, respectively. For the independent features of stroke, multivariate LR analysis with backwards stepwise selection was used to calculate the OR with 95% CI. All variables were tested for correlation with each other.

Construction of machine learning models

In this study, we used five popular machine learning algorithms to predict the probability of a binary outcome (stroke or stroke free): LR,29 RF,30 DT,31 gradient boosting DT32 and extreme gradient boosting (XGBoost).33 First, we randomly split our dataset into two groups: the training sets (75%) for machine learning model development and the validation sets (25%) for performance evaluation. Second, we selected the ranges of hyperparameters to find the best prediction model for each machine learning model. According to the machine learning algorithms, we created a machine learning-based mortality prediction model with hyperparameters for predicting stroke occurrence in the population, which completes the range fitness through grid search using training data. Then, it is evaluated by 10-fold cross-validation. Third, when several hyperparameter combinations were optimal and the choice affected the model’s efficiency, we selected the parameter combination that led to the highest efficiency. More details about the features used and their parameter combinations in the models are shown in table 1. Fourth, each machine learning-based model employed the best hyperparameters and was evaluated by the validation sets. The area under the receiver operating characteristic curve (AUC), corresponding sensitivity, specificity and overall accuracy were applied to compare the predictive power of machine learning models; the closer the AUC was to 1, the better the classification model performed. The calibration curve was used to show the agreement between the predicted and observed risks of the five models. All variables were tested for correlation with each other, and a heatmap was generated with R (V.4.0.3, R Foundation for Statistical Computing). The R packages ‘polycor’ and ‘ggplot2’ were used for correlation analysis; the other statistical analyses were performed with Python (V.3.8, Python Software Foundation). All the results of the models we used in this study could be reproduced by using a fixed random seed.

View this table:

Table 1

The choice of hyperparameters for each model

Results

Demographic features

A total of 46 240 records (21 095 women and 25 145 men) were selected for this analysis, which included 14 360 records with stroke and 31 880 records without stroke. The average ages were 66.31±12.17 years for patients who had a stroke and 60.64±11.23 years for normal patients. The characteristics of the participants are presented in table 2.

View this table:

Table 2

Characteristics of variables in stroke and stroke-free groups

Univariate and multivariate LR analyses of stroke

In univariable analysis, sex, age, cardiac causes, hypertension, diabetes mellitus, smoking, alcohol intake, physical inactivity, high salt intake, meat-based diet, dyslipidaemia and high homocysteine were all significantly associated with stroke in Jiangxi province (p<0.001). In contrast, there was no significant difference between stroke and stroke-free patients in terms of whether they lived in urban or rural areas. In multivariate LR analysis (table 3), all parameters were included except for area. The results showed that except for women (OR 0.534, 95% CI 0.501 to 0.569), all the other parameters were independent positive predictors of stroke.

View this table:

Table 3

Univariate and multivariate logistic regression analysis of variables in predicting stroke

Performance of machine learning algorithms

Comparisons of the performance of prediction among the five machine learning algorithms models in validation sets are detailed in table 4 and figure 2. The differences between these curves were slight. The performance of XGBoost (AUC: 0.924, accuracy: 0.873, sensitivity: 0.776, specificity: 0.916) and RF (AUC: 0.924, accuracy: 0.872, sensitivity: 0.778, specificity: 0.913) was the best in predicting stroke.

View this table:

Table 4

Predictive performance comparison of the five types of machine learning algorithms in the validation sets

Figure 2

Performance characteristic curves for five models (logistic regression (LR), random forest (RF), decision tree (DT), extreme gradient boosting (XGBoost) and gradient boosting decision tree (GBDT)). ROC, receiver operating characteristic.

Figure 3 presents a graphical representation of calibration, showing agreement between the predicted and observed risk of the five models. The figure demonstrates that the calibration curves of all models are close to perfect calibration.

Figure 3

Calibration curve showing the agreement between predicted (x-axis) and observed (y-axis) risk of five models. The prediction probability of stroke is divided into 10 bins on average. The diagonal dotted line represents a perfect prediction by an ideal model. DT, decision tree; GBDT, gradient boosting decision tree; LR, logistic regression; RF, random forest; XGBoost, extreme gradient boosting.

All variables were tested for correlation, as shown in figure 4. There was a significant correlation between sex and smoking (correlation coefficient>0.8).

Figure 4

Results of correlation analysis between all variables.

Moreover, according to the information gain values of the five models, the relative importance of variables in XGBoost and RF is shown in figure 5. We can see there were general evidence trends: physical inactivity contributed the most to stroke, followed by hypertension, a meat-based diet and high salt intake.

Figure 5

Relative importance ranking of each input variable for prediction of stroke extreme gradient boosting (XGBoost) and random forest (RF).

Discussion

In this study, we employed machine learning algorithms to examine the performance of five classifiers and 12 non-invasive and easily obtained clinical features for the rapid and accurate identification of individuals who had a stroke. All models in our study showed very excellent predictive performance, especially RF and XGBoost. This suggests that using machine learning algorithms with some simple and readily available clinical features for rapid stroke prediction is reasonable and feasible. This method is especially suitable for low-income or middle-income areas with heavy stroke burdens, such as China.

RF and XGBoost seem to be the machine learning algorithms of choice in most similar studies.10 34–36 In the literature, we found that advanced machine learning techniques such as RF and XGBoost modelling can improve the utilisation of information in analytical databases and enable the development and validation of predictive models with better performance.7 RF and XGBoost showed a considerable degree of predictive power. The RF model was better than XGBoost in accurately detecting patients who had a stroke, whereas the XGBoost model was good at identifying more stroke-free patients. During the training process, the hyperparameters of each algorithm (except for LR) were tuned. We decided not to tune the parameters of the LR model to keep the model specification as simple as possible for comprehensibility. The grid search values were adjusted to optimise the performance of the models. In this study, too many DTs (n=291) in RF required a huge training space and time. In addition, as a black-box model, it cannot control the internal operation of the model for RF, which is not conducive to the interpretation of the model. It was also challenging to avoid complex operating costs for XGBoost. It is worth noting that the classical model, such as LR, also shows solid predictive performance compared with these complex machine learning algorithms. The LR model is easier to use and interpret and less prone to overfitting, but it is sensitive to independent variable multicollinearity. In addition, the correlation analysis results indicated a significant correlation between sex and smoking. However, we still included them in this study because we are more concerned about the predictive power for stroke of the models rather than reporting the impact of stroke.

We have provided more details of similar studies in recent years in table 5. Compared with other studies,14 19 20 37–41 the models we used have stronger prediction and discrimination performances with higher AUCs, which may be due to the inclusion of more variables in this study. Many machine learning models are sensitive to imbalanced data. The patients we selected included a large number of patients who had a stroke from hospitals, which prevented the classification results from being affected by potential bias. We found that physical inactivity is the most predictive feature of stroke, whether we used the RF or XGBoost models. Physical inactivity is followed by hypertension, meat-based diet and high salt intake. Hypertension has always been considered to be the most important risk factor for stroke,22 42 which seems to deviate from our results. The results of a large-scale case–control study23 showed that physical inactivity rather than hypertension was the most important risk factor in China. This also indicates that each region should establish a prediction model with its own geographic and ethnic characteristics based on its own data.43 In addition, studies19 40 have reported that age was a significant risk predictor for stroke, whereas it was not highly predictive of stroke in our models. Age group may obscure the contribution of age to stroke in this study. Homocysteine was used as a new predictor to develop a predictive model for stroke. Our results suggest that high homocysteine may not show an important predictive ability for stroke. A meta-analysis reported44 that elevated homocysteine levels were associated with an increased risk for strokes in different subtypes, which indicated that these stroke risk prediction models built only for overall stroke (ischaemic and haemorrhagic stroke) may underestimate the importance of homocysteine levels for different subtypes of stroke, especially ischaemic stroke. The China Stroke Primary Prevention Trial has shown that high homocysteine concentration increases the risk of stroke. We included this feature because of interest and ease of detection. Our study showed that high homocysteine was an independent predictor of stroke and that the association with hypertension was not significant, so we retained this feature in the final model. It is a very interesting topic for reflection in future public health work as to whether we will consider a cost-effective or more streamlined version.

View this table:

Table 5

Summary of this study and other similar research findings

The trend in prediction models is to incorporate simplicity and non-invasiveness. In a resource-poor environment, the burden of stroke is disproportionately high.45 46 The model developed by laboratory testing is difficult to use. Several large-scale studies have demonstrated that far-reaching measures to prevent stroke must involve targeted lifestyle interventions.22–24 45

In this study, we used a real dataset of stroke cases from hospitals, and all the cases were diagnosed by doctors, which was more reliable than if the individuals were diagnosed by self-reporting. In addition, data from multiple centres would provide reliable predictive value on how our models identify stroke without selection bias. The models we developed are simple, non-invasive, cost-saving and time-saving, and easy to apply in scenarios other than the clinical setting. We have included enough clinical features to promote stroke screening and prevention in nonprofessional populations. However, some limitations of this study need to be acknowledged. First, this study did not distinguish between ischaemic and haemorrhagic strokes in the diagnosis of stroke. There are some notable differences in risk factors between ischaemic and haemorrhagic stroke.47 Therefore, more studies with the development of predictive models for ischaemic and haemorrhagic stroke need to be conducted. Second, the models are based on machine learning algorithms, so there may be some difficulties in clinical interpretation of the important features screened out by the models. Third, this is a study based on a province in China, so there may be gaps in population applicability, so it is necessary to include a broader population in future studies. Fourth, the prediction variables obtained retrospectively may leak information to the fitted models, which should be treated with caution during evaluation. The results should be confirmed in a prospective study. Fifth, the overall accuracy of our model in predicting stroke in the general population is likely to be overly optimistic.

Conclusion

In this study, we demonstrate that the 5 machine learning models developed by using 12 clinical features that are easily obtained and non-invasive all have good predictive and discriminative performance for stroke. The performance of these sophisticated models, such as RF and XGBoost, is slightly better than that of LR, which is easier to interpret and less prone to overfitting. This work provides a rapid and accurate stroke risk assessment tool that can help to improve the efficiency of stroke screening medical services and the management of high-risk populations.

Data availability statement

Data are available upon reasonable request. The data presented in this study are available upon request from the corresponding author. The data are not publicly available due to privacy concerns.

Ethics statements

Patient consent for publication

Ethics approval

This study involves human participants and was approved by Xuanwu Hospital Capital Medical University (no. 024 [2015]). Participants gave informed consent to participate in the study before taking part.

Acknowledgments

We would like to thank the researchers who participated in this survey.

References

↵
1. Campbell BCV,
2. De Silva DA,
3. Macleod MR, et al
. Ischaemic stroke. Nat Rev Dis Primers 2019;5:70. doi:10.1038/s41572-019-0118-8
OpenUrl PubMed
↵
1. Campbell BCV,
2. Khatri P
. Stroke. Lancet 2020;396:129–42. doi:10.1016/S0140-6736(20)31179-X
OpenUrl CrossRef PubMed
↵
1. Wang W,
2. Jiang B,
3. Sun H, et al
. Prevalence, incidence, and mortality of stroke in China: results from a nationwide population-based survey of 480 687 adults. Circulation 2017;135:759–71. doi:10.1161/CIRCULATIONAHA.116.025250
OpenUrl Abstract/FREE Full Text
↵
1. Saver JL,
2. Carroll JD,
3. Smalling R, et al
. Letter by saver et al regarding article, “ guidelines for the prevention of stroke in patients with stroke and transient ischemic attack: a guideline for healthcare professionals from the American heart association/american stroke association. ” Stroke 2015;46:e85–6. doi:10.1161/STROKEAHA.115.007311
OpenUrl FREE Full Text
↵
1. Feigin VL,
2. Forouzanfar MH,
3. Krishnamurthi R, et al
. Global and regional burden of stroke during 1990-2010: findings from the global burden of disease study 2010. Lancet 2014;383:245–54. doi:10.1016/s0140-6736(13)61953-4
OpenUrl CrossRef PubMed Web of Science
↵
1. Pei D,
2. Gong Y,
3. Kang H, et al
. Accurate and rapid screening model for potential diabetes mellitus. BMC Med Inform Decis Mak 2019;19:41. doi:10.1186/s12911-019-0790-3
OpenUrl
↵
1. Liu W-C,
2. Li Z-Q,
3. Luo Z-W, et al
. Machine learning for the prediction of bone metastasis in patients with newly diagnosed thyroid cancer. Cancer Med 2021;10:2802–11. doi:10.1002/cam4.3776
OpenUrl
↵
1. Zhu J,
2. Zheng J,
3. Li L, et al
. Application of machine learning algorithms to predict central lymph node metastasis in T1-T2, non-invasive, and clinically node negative papillary thyroid carcinoma. Front Med (Lausanne) 2021;8:635771. doi:10.3389/fmed.2021.635771
OpenUrl
↵
1. Kostev K,
2. Wu T,
3. Wang Y, et al
. Predicting the risk of stroke in patients with late-onset epilepsy: a machine learning approach. Epilepsy Behav 2021;122:108211. doi:10.1016/j.yebeh.2021.108211
OpenUrl CrossRef
↵
1. Li X,
2. Wu M,
3. Sun C, et al
. Using machine learning to predict stroke-associated pneumonia in Chinese acute ischaemic stroke patients. Eur J Neurol 2020;27:1656–63. doi:10.1111/ene.14295
OpenUrl
↵
1. Heo J,
2. Yoon JG,
3. Park H, et al
. Machine learning-based model for prediction of outcomes in acute stroke. Stroke 2019;50:1263–5. doi:10.1161/STROKEAHA.118.024293
OpenUrl CrossRef PubMed
↵
1. Chambless LE,
2. Heiss G,
3. Shahar E, et al
. Prediction of ischemic stroke risk in the atherosclerosis risk in communities study. Am J Epidemiol 2004;160:259–69. doi:10.1093/aje/kwh189
OpenUrl CrossRef PubMed Web of Science
↵
1. Hippisley-Cox J,
2. Coupland C,
3. Brindle P
. Derivation and validation of qstroke score for predicting risk of ischaemic stroke in primary care and comparison with other risk scores: a prospective open cohort study. BMJ 2013;346:f2573. doi:10.1136/bmj.f2573
↵
1. Dufouil C,
2. Beiser A,
3. McLure LA, et al
. Revised Framingham stroke risk profile to reflect temporal trends. Circulation 2017;135:1145–59. doi:10.1161/CIRCULATIONAHA.115.021275
OpenUrl Abstract/FREE Full Text
↵
1. D’Agostino RB,
2. Wolf PA,
3. Belanger AJ, et al
. Stroke risk profile: adjustment for antihypertensive medication. the framingham study. Stroke 1994;25:40–3. doi:10.1161/01.str.25.1.40
OpenUrl Abstract/FREE Full Text
↵
1. Huang JY, et al
. Modified framingham stroke profile in the prediction of the risk of stroke among chinese. Chinese Journal of Cerebrovascular Diseases 2013;10:228–32.
OpenUrl
↵
1. Li X, et al
. Integrated machine learning approaches for predicting ischemic stroke and thromboembolism in atrial fibrillation. American Medical Informatics Association Annual Symposium (AMIA); 2017
↵
1. Zhang Y,
2. Zhou Y,
3. Zhang D, et al
. A stroke risk detection: improving hybrid feature selection method. J Med Internet Res 2019;21:e12437. doi:10.2196/12437
↵
1. Yu Q,
2. Wu Y,
3. Jin Q, et al
. Development and internal validation of a multivariable prediction model for 6-year risk of stroke: a cohort study in middle-aged and elderly Chinese population. BMJ Open 2021;11:e048734. doi:10.1136/bmjopen-2021-048734
↵
1. Li X,
2. Bian D,
3. Yu J, et al
. Using machine learning models to improve stroke risk level classification methods of china national stroke screening. BMC Med Inform Decis Mak 2019;19:261. doi:10.1186/s12911-019-0998-2
↵
1. Hatano S
. Experience from a multicentre stroke register: a preliminary report. Bull World Health Organ 1976;54:541–53.
OpenUrl PubMed Web of Science
↵
1. O’Donnell MJ,
2. Xavier D,
3. Liu L, et al
. Risk factors for ischaemic and intracerebral haemorrhagic stroke in 22 countries (the INTERSTROKE study): a case-control study. Lancet 2010;376:112–23. doi:10.1016/S0140-6736(10)60834-3
OpenUrl CrossRef PubMed Web of Science
↵
1. O’Donnell MJ,
2. Chin SL,
3. Rangarajan S, et al
. Global and regional effects of potentially modifiable risk factors associated with acute stroke in 32 countries (INTERSTROKE): a case-control study. Lancet 2016;388:761–75. doi:10.1016/S0140-6736(16)30506-2
OpenUrl CrossRef PubMed
↵
1. Owolabi MO,
2. Sarfo F,
3. Akinyemi R, et al
. Dominant modifiable risk factors for stroke in ghana and nigeria (siren): a case-control study. Lancet Glob Health 2018;6:e436–46. doi:10.1016/S2214-109X(18)30002-0
OpenUrl
↵
1. Cotlarciuc I,
2. Malik R,
3. Holliday EG, et al
. Effect of genetic variants associated with plasma homocysteine levels on stroke risk. Stroke 2014;45:1920–4. doi:10.1161/STROKEAHA.114.005208
OpenUrl Abstract/FREE Full Text
↵
1. Zhao M,
2. Wang X,
3. He M, et al
. Homocysteine and stroke risk: modifying effect of methylenetetrahydrofolate reductase C677T polymorphism and folic acid intervention. Stroke 2017;48:1183–90. doi:10.1161/STROKEAHA.116.015324
OpenUrl Abstract/FREE Full Text
↵
1. Joint Committee for Developing Chinese guidelines on Prevention and Treatment of Dyslipidemia in Adults
. Chinese guidelines on prevention and treatment of dyslipidemia in adults. Zhonghua Xin Xue Guan Bing Za Zhi 2007;35:390–419.
OpenUrl PubMed
↵
1. Anniwaer J,
2. Liu M-Z,
3. Xue K-D, et al
. Homocysteine might increase the risk of recurrence in patients presenting with primary cerebral infarction. Int J Neurosci 2019;129:654–9. doi:10.1080/00207454.2018.1517762
OpenUrl
↵
1. Hosmer DW,
2. Lemeshow S
. n.d. Applied logistic regression.
↵
1. Breiman L
. Random forests. MACH LEARN 2001;45:5–32. doi:10.1023/A:1010933404324
OpenUrl CrossRef PubMed Web of Science
↵
1. Barros RC,
2. Basgalupp MP,
3. de Carvalho ACPLF, et al
. A hyper-heuristic evolutionary algorithm for automatically designing decision-tree algorithms. GECCO ’12; Philadelphia Pennsylvania USA.New York, NY, USA, July 7, 2012 doi:10.1145/2330163.2330335
↵
1. Cherkassky V,
2. Ma Y
. Another look at statistical learning theory and regularization. Neural Netw 2009;22:958–69. doi:10.1016/j.neunet.2009.04.005
OpenUrl PubMed
↵
1. Chen T,
2. Guestrin C
. XGBoost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; San Francisco, California, USA: Association for Computing Machinery, 2016:785–94 doi:10.1145/2939672.2939785
↵
1. Mogensen UB,
2. Ishwaran H,
3. Gerds TA
. Evaluating random forests for survival analysis using prediction error curves. J Stat Softw 2012;50:1–23. doi:10.18637/jss.v050.i11
OpenUrl CrossRef PubMed
↵
1. Mosca E,
2. Alfieri R,
3. Merelli I, et al
. A multilevel data integration resource for breast cancer study. BMC Syst Biol 2010;4:76. doi:10.1186/1752-0509-4-76
↵
1. Zhang Y,
2. Wang Y,
3. Xu J, et al
. Comparison of prediction models for acute kidney injury among patients with hepatobiliary malignancies based on xgboost and LASSO-logistic algorithms. Int J Gen Med 2021;14:1325–35. doi:10.2147/IJGM.S302795
OpenUrl
↵
1. Yao Q,
2. Zhang J,
3. Yan K, et al
. Development and validation of a 2-year new-onset stroke risk prediction model for people over age 45 in china. Medicine (Baltimore) 2020;99:e22680. doi:10.1097/MD.0000000000022680
↵
1. Lee S,
2. Lee H,
3. Kim HS, et al
. Incidence, risk factors, and prediction of myocardial infarction and stroke in farmers: a Korean nationwide population-based study. J Prev Med Public Health 2020;53:313–22. doi:10.3961/jpmph.20.156
OpenUrl
↵
1. Lee J-W,
2. Lim H-S,
3. Kim D-W, et al
. The development and implementation of stroke risk prediction model in national health insurance service’s personal health record. Comput Methods Programs Biomed 2018;153:253–7. doi:10.1016/j.cmpb.2017.10.007
OpenUrl PubMed
↵
1. Chien K-L,
2. Su T-C,
3. Hsu H-C, et al
. Constructing the prediction model for the risk of stroke in a Chinese population: report from a cohort study in Taiwan. Stroke 2010;41:1858–64. doi:10.1161/STROKEAHA.110.586222
OpenUrl Abstract/FREE Full Text
↵
1. Chun M,
2. Clarke R,
3. Cairns BJ, et al
. Stroke risk prediction using machine learning: a prospective cohort study of 0.5 million chinese adults. J Am Med Inform Assoc 2021;28:1719–27. doi:10.1093/jamia/ocab068
OpenUrl
↵
1. Lawes CMM,
2. Bennett DA,
3. Feigin VL, et al
. Blood pressure and stroke: an overview of published reviews. Stroke 2004;35:1024.
OpenUrl Abstract/FREE Full Text
↵
1. Menotti A,
2. Lanti M,
3. Agabiti-Rosei E, et al
. Riskard 2005. new tools for prediction of cardiovascular disease risk derived from Italian population studies. Nutr Metab Cardiovasc Dis 2005;15:426–40. doi:10.1016/j.numecd.2005.07.007
OpenUrl CrossRef PubMed Web of Science
↵
1. He Y,
2. Li Y,
3. Chen Y, et al
. Homocysteine level and risk of different stroke types: a meta-analysis of prospective observational studies. Nutr Metab Cardiovasc Dis 2014;24:1158–65. doi:10.1016/j.numecd.2014.05.011
OpenUrl CrossRef PubMed
↵
1. Feigin VL,
2. Roth GA,
3. Naghavi M, et al
. Global burden of stroke and risk factors in 188 countries, during 1990-2013: a systematic analysis for the global burden of disease study 2013. Lancet Neurol 2016;15:913–24. doi:10.1016/S1474-4422(16)30073-4
OpenUrl CrossRef PubMed
↵
1. Jia L,
2. Quan M,
3. Fu Y, et al
. Dementia in china: epidemiology, clinical management, and research advances. Lancet Neurol 2020;19:81–92. doi:10.1016/S1474-4422(19)30290-X
OpenUrl PubMed
↵
1. Boehme AK,
2. Esenwa C,
3. Elkind MSV
. Stroke risk factors, genetics, and prevention. Circ Res 2017;120:472–95. doi:10.1161/CIRCRESAHA.116.308398
OpenUrl Abstract/FREE Full Text

Footnotes

Contributors YQ: Conceptualisation (lead), writing—original draft (lead), formal analysis (lead), writing—review and editing (equal). YW: Writing—original draft (lead), writing—review and editing (equal). SH: Conceptualisation (supporting), formal analysis (supporting), writing—review and editing (equal). WY: Methodology (lead), formal analysis (supporting), writing—review and editing (equal). YC: Conceptualisation (supporting), project administration (equal). YX: Data curation (equal), project administration (equal). XC: Investigation (qual), project administration (equal). JY: Writing—review and editing (equal). XC: Writing—review and editing (equal). SC: Conceptualisation (supporting), supervision (equal). HZ: Conceptualisation (supporting), supervision (equal). YQ is the lead study investigator. HZ is the guarantor.
Funding The study was supported by Natural Science Foundation of Jiangxi Province (20202BABL216044), National Natural Science Foundation of China (Grant No.: 81960618), Regional Project of National Natural Science Foundation of China (Grant No.: 82260388), Key projects of Jiangxi Provincial Department of Education (GJJ210118), Project of Jiangxi Provincial Health Commission (202130385) and Key projects of Jiangxi Provincial Administration of Traditional Chinese Medicine (2022Z017).
Competing interests None declared.
Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.
Provenance and peer review Not commissioned; externally peer reviewed.

[1] ↵
Campbell BCV,
De Silva DA,
Macleod MR, et al
. Ischaemic stroke. Nat Rev Dis Primers 2019;5:70. doi:10.1038/s41572-019-0118-8
OpenUrl PubMed

[2] Campbell BCV,

[3] De Silva DA,

[4] Macleod MR, et al

[5] ↵
Campbell BCV,
Khatri P
. Stroke. Lancet 2020;396:129–42. doi:10.1016/S0140-6736(20)31179-X
OpenUrl CrossRef PubMed

[6] Campbell BCV,

[7] Khatri P

[8] ↵
Wang W,
Jiang B,
Sun H, et al
. Prevalence, incidence, and mortality of stroke in China: results from a nationwide population-based survey of 480 687 adults. Circulation 2017;135:759–71. doi:10.1161/CIRCULATIONAHA.116.025250
OpenUrl Abstract/FREE Full Text

[9] Wang W,

[10] Jiang B,

[11] Sun H, et al

[12] ↵
Saver JL,
Carroll JD,
Smalling R, et al
. Letter by saver et al regarding article, “ guidelines for the prevention of stroke in patients with stroke and transient ischemic attack: a guideline for healthcare professionals from the American heart association/american stroke association. ” Stroke 2015;46:e85–6. doi:10.1161/STROKEAHA.115.007311
OpenUrl FREE Full Text

[13] Saver JL,

[14] Carroll JD,

[15] Smalling R, et al

[16] ↵
Feigin VL,
Forouzanfar MH,
Krishnamurthi R, et al
. Global and regional burden of stroke during 1990-2010: findings from the global burden of disease study 2010. Lancet 2014;383:245–54. doi:10.1016/s0140-6736(13)61953-4
OpenUrl CrossRef PubMed Web of Science

[17] Feigin VL,

[18] Forouzanfar MH,

[19] Krishnamurthi R, et al

[20] ↵
Pei D,
Gong Y,
Kang H, et al
. Accurate and rapid screening model for potential diabetes mellitus. BMC Med Inform Decis Mak 2019;19:41. doi:10.1186/s12911-019-0790-3
OpenUrl

[21] Pei D,

[22] Gong Y,

[23] Kang H, et al

[24] ↵
Liu W-C,
Li Z-Q,
Luo Z-W, et al
. Machine learning for the prediction of bone metastasis in patients with newly diagnosed thyroid cancer. Cancer Med 2021;10:2802–11. doi:10.1002/cam4.3776
OpenUrl

[25] Liu W-C,

[26] Li Z-Q,

[27] Luo Z-W, et al

[28] ↵
Zhu J,
Zheng J,
Li L, et al
. Application of machine learning algorithms to predict central lymph node metastasis in T1-T2, non-invasive, and clinically node negative papillary thyroid carcinoma. Front Med (Lausanne) 2021;8:635771. doi:10.3389/fmed.2021.635771
OpenUrl

[29] Zhu J,

[30] Zheng J,

[31] Li L, et al

[32] ↵
Kostev K,
Wu T,
Wang Y, et al
. Predicting the risk of stroke in patients with late-onset epilepsy: a machine learning approach. Epilepsy Behav 2021;122:108211. doi:10.1016/j.yebeh.2021.108211
OpenUrl CrossRef

[33] Kostev K,

[34] Wu T,

[35] Wang Y, et al

[36] ↵
Li X,
Wu M,
Sun C, et al
. Using machine learning to predict stroke-associated pneumonia in Chinese acute ischaemic stroke patients. Eur J Neurol 2020;27:1656–63. doi:10.1111/ene.14295
OpenUrl

[37] Li X,

[38] Wu M,

[39] Sun C, et al

[40] ↵
Heo J,
Yoon JG,
Park H, et al
. Machine learning-based model for prediction of outcomes in acute stroke. Stroke 2019;50:1263–5. doi:10.1161/STROKEAHA.118.024293
OpenUrl CrossRef PubMed

[41] Heo J,

[42] Yoon JG,

[43] Park H, et al

[44] ↵
Chambless LE,
Heiss G,
Shahar E, et al
. Prediction of ischemic stroke risk in the atherosclerosis risk in communities study. Am J Epidemiol 2004;160:259–69. doi:10.1093/aje/kwh189
OpenUrl CrossRef PubMed Web of Science

[45] Chambless LE,

[46] Heiss G,

[47] Shahar E, et al

[48] ↵
Hippisley-Cox J,
Coupland C,
Brindle P
. Derivation and validation of qstroke score for predicting risk of ischaemic stroke in primary care and comparison with other risk scores: a prospective open cohort study. BMJ 2013;346:f2573. doi:10.1136/bmj.f2573

[49] Hippisley-Cox J,

[50] Coupland C,

[51] Brindle P

[52] ↵
Dufouil C,
Beiser A,
McLure LA, et al
. Revised Framingham stroke risk profile to reflect temporal trends. Circulation 2017;135:1145–59. doi:10.1161/CIRCULATIONAHA.115.021275
OpenUrl Abstract/FREE Full Text

[53] Dufouil C,

[54] Beiser A,

[55] McLure LA, et al

[56] ↵
D’Agostino RB,
Wolf PA,
Belanger AJ, et al
. Stroke risk profile: adjustment for antihypertensive medication. the framingham study. Stroke 1994;25:40–3. doi:10.1161/01.str.25.1.40
OpenUrl Abstract/FREE Full Text

[57] D’Agostino RB,

[58] Wolf PA,

[59] Belanger AJ, et al

[60] ↵
Huang JY, et al
. Modified framingham stroke profile in the prediction of the risk of stroke among chinese. Chinese Journal of Cerebrovascular Diseases 2013;10:228–32.
OpenUrl

[61] Huang JY, et al

[62] ↵
Li X, et al
. Integrated machine learning approaches for predicting ischemic stroke and thromboembolism in atrial fibrillation. American Medical Informatics Association Annual Symposium (AMIA); 2017

[63] Li X, et al

[64] ↵
Zhang Y,
Zhou Y,
Zhang D, et al
. A stroke risk detection: improving hybrid feature selection method. J Med Internet Res 2019;21:e12437. doi:10.2196/12437

[65] Zhang Y,

[66] Zhou Y,

[67] Zhang D, et al

[68] ↵
Yu Q,
Wu Y,
Jin Q, et al
. Development and internal validation of a multivariable prediction model for 6-year risk of stroke: a cohort study in middle-aged and elderly Chinese population. BMJ Open 2021;11:e048734. doi:10.1136/bmjopen-2021-048734

[69] Yu Q,

[70] Wu Y,

[71] Jin Q, et al

[72] ↵
Li X,
Bian D,
Yu J, et al
. Using machine learning models to improve stroke risk level classification methods of china national stroke screening. BMC Med Inform Decis Mak 2019;19:261. doi:10.1186/s12911-019-0998-2

[73] Li X,

[74] Bian D,

[75] Yu J, et al

[76] ↵
Hatano S
. Experience from a multicentre stroke register: a preliminary report. Bull World Health Organ 1976;54:541–53.
OpenUrl PubMed Web of Science

[77] Hatano S

[78] ↵
O’Donnell MJ,
Xavier D,
Liu L, et al
. Risk factors for ischaemic and intracerebral haemorrhagic stroke in 22 countries (the INTERSTROKE study): a case-control study. Lancet 2010;376:112–23. doi:10.1016/S0140-6736(10)60834-3
OpenUrl CrossRef PubMed Web of Science

[79] O’Donnell MJ,

[80] Xavier D,

[81] Liu L, et al

[82] ↵
O’Donnell MJ,
Chin SL,
Rangarajan S, et al
. Global and regional effects of potentially modifiable risk factors associated with acute stroke in 32 countries (INTERSTROKE): a case-control study. Lancet 2016;388:761–75. doi:10.1016/S0140-6736(16)30506-2
OpenUrl CrossRef PubMed

[83] O’Donnell MJ,

[84] Chin SL,

[85] Rangarajan S, et al

[86] ↵
Owolabi MO,
Sarfo F,
Akinyemi R, et al
. Dominant modifiable risk factors for stroke in ghana and nigeria (siren): a case-control study. Lancet Glob Health 2018;6:e436–46. doi:10.1016/S2214-109X(18)30002-0
OpenUrl

[87] Owolabi MO,

[88] Sarfo F,

[89] Akinyemi R, et al

[90] ↵
Cotlarciuc I,
Malik R,
Holliday EG, et al
. Effect of genetic variants associated with plasma homocysteine levels on stroke risk. Stroke 2014;45:1920–4. doi:10.1161/STROKEAHA.114.005208
OpenUrl Abstract/FREE Full Text

[91] Cotlarciuc I,

[92] Malik R,

[93] Holliday EG, et al

[94] ↵
Zhao M,
Wang X,
He M, et al
. Homocysteine and stroke risk: modifying effect of methylenetetrahydrofolate reductase C677T polymorphism and folic acid intervention. Stroke 2017;48:1183–90. doi:10.1161/STROKEAHA.116.015324
OpenUrl Abstract/FREE Full Text

[95] Zhao M,

[96] Wang X,

[97] He M, et al

[98] ↵
Joint Committee for Developing Chinese guidelines on Prevention and Treatment of Dyslipidemia in Adults
. Chinese guidelines on prevention and treatment of dyslipidemia in adults. Zhonghua Xin Xue Guan Bing Za Zhi 2007;35:390–419.
OpenUrl PubMed

[99] Joint Committee for Developing Chinese guidelines on Prevention and Treatment of Dyslipidemia in Adults

[100] ↵
Anniwaer J,
Liu M-Z,
Xue K-D, et al
. Homocysteine might increase the risk of recurrence in patients presenting with primary cerebral infarction. Int J Neurosci 2019;129:654–9. doi:10.1080/00207454.2018.1517762
OpenUrl

[101] Anniwaer J,

[102] Liu M-Z,

[103] Xue K-D, et al

[104] ↵
Hosmer DW,
Lemeshow S
. n.d. Applied logistic regression.

[105] Hosmer DW,

[106] Lemeshow S

[107] ↵
Breiman L
. Random forests. MACH LEARN 2001;45:5–32. doi:10.1023/A:1010933404324
OpenUrl CrossRef PubMed Web of Science

[108] Breiman L

[109] ↵
Barros RC,
Basgalupp MP,
de Carvalho ACPLF, et al
. A hyper-heuristic evolutionary algorithm for automatically designing decision-tree algorithms. GECCO ’12; Philadelphia Pennsylvania USA.New York, NY, USA, July 7, 2012 doi:10.1145/2330163.2330335

[110] Barros RC,

[111] Basgalupp MP,

[112] de Carvalho ACPLF, et al

[113] ↵
Cherkassky V,
Ma Y
. Another look at statistical learning theory and regularization. Neural Netw 2009;22:958–69. doi:10.1016/j.neunet.2009.04.005
OpenUrl PubMed

[114] Cherkassky V,

[115] Ma Y

[116] ↵
Chen T,
Guestrin C
. XGBoost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; San Francisco, California, USA: Association for Computing Machinery, 2016:785–94 doi:10.1145/2939672.2939785

[117] Chen T,

[118] Guestrin C

[119] ↵
Mogensen UB,
Ishwaran H,
Gerds TA
. Evaluating random forests for survival analysis using prediction error curves. J Stat Softw 2012;50:1–23. doi:10.18637/jss.v050.i11
OpenUrl CrossRef PubMed

[120] Mogensen UB,

[121] Ishwaran H,

[122] Gerds TA

[123] ↵
Mosca E,
Alfieri R,
Merelli I, et al
. A multilevel data integration resource for breast cancer study. BMC Syst Biol 2010;4:76. doi:10.1186/1752-0509-4-76

[124] Mosca E,

[125] Alfieri R,

[126] Merelli I, et al

[127] ↵
Zhang Y,
Wang Y,
Xu J, et al
. Comparison of prediction models for acute kidney injury among patients with hepatobiliary malignancies based on xgboost and LASSO-logistic algorithms. Int J Gen Med 2021;14:1325–35. doi:10.2147/IJGM.S302795
OpenUrl

[128] Zhang Y,

[129] Wang Y,

[130] Xu J, et al

[131] ↵
Yao Q,
Zhang J,
Yan K, et al
. Development and validation of a 2-year new-onset stroke risk prediction model for people over age 45 in china. Medicine (Baltimore) 2020;99:e22680. doi:10.1097/MD.0000000000022680

[132] Yao Q,

[133] Zhang J,

[134] Yan K, et al

[135] ↵
Lee S,
Lee H,
Kim HS, et al
. Incidence, risk factors, and prediction of myocardial infarction and stroke in farmers: a Korean nationwide population-based study. J Prev Med Public Health 2020;53:313–22. doi:10.3961/jpmph.20.156
OpenUrl

[136] Lee S,

[137] Lee H,

[138] Kim HS, et al

[139] ↵
Lee J-W,
Lim H-S,
Kim D-W, et al
. The development and implementation of stroke risk prediction model in national health insurance service’s personal health record. Comput Methods Programs Biomed 2018;153:253–7. doi:10.1016/j.cmpb.2017.10.007
OpenUrl PubMed

[140] Lee J-W,

[141] Lim H-S,

[142] Kim D-W, et al

[143] ↵
Chien K-L,
Su T-C,
Hsu H-C, et al
. Constructing the prediction model for the risk of stroke in a Chinese population: report from a cohort study in Taiwan. Stroke 2010;41:1858–64. doi:10.1161/STROKEAHA.110.586222
OpenUrl Abstract/FREE Full Text

[144] Chien K-L,

[145] Su T-C,

[146] Hsu H-C, et al

[147] ↵
Chun M,
Clarke R,
Cairns BJ, et al
. Stroke risk prediction using machine learning: a prospective cohort study of 0.5 million chinese adults. J Am Med Inform Assoc 2021;28:1719–27. doi:10.1093/jamia/ocab068
OpenUrl

[148] Chun M,

[149] Clarke R,

[150] Cairns BJ, et al

[151] ↵
Lawes CMM,
Bennett DA,
Feigin VL, et al
. Blood pressure and stroke: an overview of published reviews. Stroke 2004;35:1024.
OpenUrl Abstract/FREE Full Text

[152] Lawes CMM,

[153] Bennett DA,

[154] Feigin VL, et al

[155] ↵
Menotti A,
Lanti M,
Agabiti-Rosei E, et al
. Riskard 2005. new tools for prediction of cardiovascular disease risk derived from Italian population studies. Nutr Metab Cardiovasc Dis 2005;15:426–40. doi:10.1016/j.numecd.2005.07.007
OpenUrl CrossRef PubMed Web of Science

[156] Menotti A,

[157] Lanti M,

[158] Agabiti-Rosei E, et al

[159] ↵
He Y,
Li Y,
Chen Y, et al
. Homocysteine level and risk of different stroke types: a meta-analysis of prospective observational studies. Nutr Metab Cardiovasc Dis 2014;24:1158–65. doi:10.1016/j.numecd.2014.05.011
OpenUrl CrossRef PubMed

[160] He Y,

[161] Li Y,

[162] Chen Y, et al

[163] ↵
Feigin VL,
Roth GA,
Naghavi M, et al
. Global burden of stroke and risk factors in 188 countries, during 1990-2013: a systematic analysis for the global burden of disease study 2013. Lancet Neurol 2016;15:913–24. doi:10.1016/S1474-4422(16)30073-4
OpenUrl CrossRef PubMed

[164] Feigin VL,

[165] Roth GA,

[166] Naghavi M, et al

[167] ↵
Jia L,
Quan M,
Fu Y, et al
. Dementia in china: epidemiology, clinical management, and research advances. Lancet Neurol 2020;19:81–92. doi:10.1016/S1474-4422(19)30290-X
OpenUrl PubMed

[168] Jia L,

[169] Quan M,

[170] Fu Y, et al

[171] ↵
Boehme AK,
Esenwa C,
Elkind MSV
. Stroke risk factors, genetics, and prevention. Circ Res 2017;120:472–95. doi:10.1161/CIRCRESAHA.116.308398
OpenUrl Abstract/FREE Full Text

[172] Boehme AK,

[173] Esenwa C,

[174] Elkind MSV

Log in using your username and password

Main menu

Log in using your username and password

You are here

Abstract

Data availability statement

Statistics from Altmetric.com

Request Permissions

Strengths and limitations of this study

Introduction

Materials and methods

Study population

Data collection

Patient and public involvement

Feature preprocessing

Construction of machine learning models

Results

Demographic features

Univariate and multivariate LR analyses of stroke

Performance of machine learning algorithms

Discussion

Conclusion

Data availability statement

Ethics statements

Patient consent for publication

Ethics approval

Acknowledgments

References

Footnotes

Read the full text or download the PDF:

Log in using your username and password