ERIC Number: ED636565
Record Type: Non-Journal
Publication Date: 2021
Pages: 108
Abstractor: As Provided
ISBN: 979-8-3798-4452-3
ISSN: N/A
EISSN: N/A
Predictors of Early Postsecondary STEM Persistence of High-Achieving Students: An Explanatory Study Using Machine Learning Techniques
Nesibe Karakis
ProQuest LLC, Ph.D. Dissertation, Purdue University
This study investigated high-achieving and non-high-achieving students' persistence in STEM fields using nationally representative data from the High School Longitudinal Study of 2009 for the years 2009, 2012, 2013, 2013-2014, and 2016. The results indicated that approximately 70% of high-achieving and non-high-achieving students continued their initial STEM degrees within 3 years of college enrollment. The study revealed that the most important predictors of STEM persistence were: math proficiency level, school belonging, school engagement, school motivation, school problems, science self-efficacy, credits earned in computer sciences, GPA in STEM courses, credits earned in STEM courses, and credits earned in Advanced Placement/International Baccalaureate (AP/IB) courses. Based on the results, math proficiency was the most important variable in the study for both high-achieving and non-high-achieving students. Even though credits earned in AP/IB combined were among the most important variables, they were two times more important for high-achieving students (6.86% vs. 3.37%). Regarding demographic information related variables, socioeconomic status was the most important variable among gender, ethnicity, and urbanicity in models predicting STEM persistence and had higher importance for non-high-achieving students. Furthermore, Hispanic students' proportion of persistence differed from other underrepresented populations' persistence. Non-high-achieving Hispanic students had the highest persistence rate, similar to well-represented populations (i.e., White, Asian). Machine learning methods used in the study including random forest and artificial neural network provided good accuracy for both achievement groups. Random forest accuracy was over 82% with the Synthetic Minority Over-Sampling Technique (SMOTE) dataset, while artificial neural network accuracy was over 92%. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page: http://bibliotheek.ehb.be:2222/en-US/products/dissertations/individuals.shtml.]
Descriptors: College Students, STEM Education, Academic Persistence, High Achievement, Predictor Variables, Postsecondary Education, Control Groups, Mathematics, Hispanic American Students
ProQuest LLC. 789 East Eisenhower Parkway, P.O. Box 1346, Ann Arbor, MI 48106. Tel: 800-521-0600; Web site: http://bibliotheek.ehb.be:2222/en-US/products/dissertations/individuals.shtml
Publication Type: Dissertations/Theses - Doctoral Dissertations
Education Level: Higher Education; Postsecondary Education
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A
Grant or Contract Numbers: N/A