ERIC Number: ED657408
Record Type: Non-Journal
Publication Date: 2024
Pages: 146
Abstractor: As Provided
ISBN: 979-8-3827-7752-8
ISSN: N/A
EISSN: N/A
Essays on Statistics and Data Science Education
Emma Mary Klugman
ProQuest LLC, Ph.D. Dissertation, Harvard University
Statistics & data science are growing, rapidly evolving, and increasingly important for an informed citizenry in a data-saturated world. In this dissertation, I address two central questions: (1) who is taking statistics? and (2) what are statistics courses teaching? I estimate that 920,000 US students take statistics in high school each year, but this population has not yet been well studied. Using a rich set of survey responses describing 15,727 students' demographics, career interests and values, STEM identity, grades, and test scores, my first study compares four groups of high-school course-takers: those who take statistics, calculus, both, and neither. I then employ latent profile analysis to shed light on who these students are, showing that students with different profiles take statistics at surprisingly similar rates: statistics is as an important part of the academic pathway for a wide range of students and serves a demographically diverse population. In my second study, I build upon tools from natural language processing and psychometric measurement to develop a human-in-the-loop methodology for measuring latent constructs in large text corpora, and present a framework for doing so. I construct a lexicon-based instrument to measure the extent to which syllabi from college statistics and data science courses align with a vision for modernizing instruction set forth in the Guidelines for Assessment and Instruction in Statistics Education (GAISE) project and across 145 journal articles spanning almost a century. In so doing, I illustrate an approach that researchers can take in bringing measurement questions to text data, a method that I believe strikes a useful balance between interpretability, communicability, validity, and scalability. My final study applies these instruments to 32,483 syllabi from US statistics and data science courses taught between 2010 and 2018. I find a modest overall increase in modern approaches over this decade. Finally, I explore differences between institution types using multilevel models, finding that private and four-year institutions, as well as those with higher admissions rates and Pell-recipient populations, have more modern syllabi, though two-year institutions and schools serving fewer Pell recipients seem to be gaining ground. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page: http://bibliotheek.ehb.be:2222/en-US/products/dissertations/individuals.shtml.]
Descriptors: Data Science, Statistics Education, High School Students, Profiles, Alignment (Education), College Mathematics, Course Descriptions, Methods, Measurement, Course Content, Institutional Characteristics, Student Characteristics
ProQuest LLC. 789 East Eisenhower Parkway, P.O. Box 1346, Ann Arbor, MI 48106. Tel: 800-521-0600; Web site: http://bibliotheek.ehb.be:2222/en-US/products/dissertations/individuals.shtml
Publication Type: Dissertations/Theses - Doctoral Dissertations
Education Level: High Schools; Secondary Education; Higher Education; Postsecondary Education
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A
Grant or Contract Numbers: N/A