Article Text

Download PDFPDF

Original research
Development of rapid and effective risk prediction models for stroke in the Chinese population: a cross-sectional study
  1. Yuexin Qiu1,2,
  2. Shiqi Cheng3,
  3. Yuhang Wu4,
  4. Wei Yan5,
  5. Songbo Hu1,2,
  6. Yiying Chen5,
  7. Yan Xu5,
  8. Xiaona Chen5,
  9. Junsai Yang1,2,
  10. Xiaoyun Chen1,2,
  11. Huilie Zheng1,2
  1. 1School of Public Health, Nanchang University, Nanchang, Jiangxi, China
  2. 2Key Laboratory of Preventive Medicine, Nanchang University, Nanchang, Jiangxi, China
  3. 3Neurosurgery Department, Nanchang University Second Affiliated Hospital, Nanchang, Jiangxi, China
  4. 4Department of Epidemiology and Health Statistics, Central South University, Changsha, Hunan, China
  5. 5Institute of Chronic Non-communicable Diseases, Center for Disease Control and Prevention of Jiangxi Province, Nanchang, Jiangxi, China
  1. Correspondence to Dr Huilie Zheng; zhenghuilie{at}ncu.edu.cn

Abstract

Objectives The purpose of this study was to use easily obtained and directly observable clinical features to establish predictive models to identify patients at increased risk of stroke.

Setting and participants A total of 46 240 valid records were obtained from 8 research centres and 14 communities in Jiangxi province, China, between February and September 2018.

Primary and secondary outcome measures The area under the receiver operating characteristic curve (AUC), sensitivity, specificity and accuracy were calculated to test the performance of the five models (logistic regression (LR), random forest (RF), decision tree (DT), extreme gradient boosting (XGBoost) and gradient boosting DT). The calibration curve was used to show calibration performance.

Results The results indicated that XGBoost (AUC: 0.924, accuracy: 0.873, sensitivity: 0.776, specificity: 0.916) and RF (AUC: 0.924, accuracy: 0.872, sensitivity: 0.778, specificity: 0.913) demonstrated excellent performance in predicting stroke. Physical inactivity, hypertension, meat-based diet and high salt intake were important prediction features of stroke.

Conclusion The five machine learning models all had good predictive and discriminatory performance for stroke. The performance of RF and XGBoost was slightly better than that of LR, which was easier to interpret and less prone to overfitting. This work provides a rapid and accurate tool for stroke risk assessment, which can help to improve the efficiency of stroke screening medical services and the management of high-risk groups.

  • stroke
  • epidemiology
  • statistics & research methods

Data availability statement

Data are available upon reasonable request. The data presented in this study are available upon request from the corresponding author. The data are not publicly available due to privacy concerns.

http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Data availability statement

Data are available upon reasonable request. The data presented in this study are available upon request from the corresponding author. The data are not publicly available due to privacy concerns.

View Full Text

Footnotes

  • Contributors YQ: Conceptualisation (lead), writing—original draft (lead), formal analysis (lead), writing—review and editing (equal). YW: Writing—original draft (lead), writing—review and editing (equal). SH: Conceptualisation (supporting), formal analysis (supporting), writing—review and editing (equal). WY: Methodology (lead), formal analysis (supporting), writing—review and editing (equal). YC: Conceptualisation (supporting), project administration (equal). YX: Data curation (equal), project administration (equal). XC: Investigation (qual), project administration (equal). JY: Writing—review and editing (equal). XC: Writing—review and editing (equal). SC: Conceptualisation (supporting), supervision (equal). HZ: Conceptualisation (supporting), supervision (equal). YQ is the lead study investigator. HZ is the guarantor.

  • Funding The study was supported by Natural Science Foundation of Jiangxi Province (20202BABL216044), National Natural Science Foundation of China (Grant No.: 81960618), Regional Project of National Natural Science Foundation of China (Grant No.: 82260388), Key projects of Jiangxi Provincial Department of Education (GJJ210118), Project of Jiangxi Provincial Health Commission (202130385) and Key projects of Jiangxi Provincial Administration of Traditional Chinese Medicine (2022Z017).

  • Competing interests None declared.

  • Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.

  • Provenance and peer review Not commissioned; externally peer reviewed.