NotesFAQContact Us
Collection
Advanced
Search Tips
Back to results
Peer reviewed Peer reviewed
Direct linkDirect link
ERIC Number: EJ1453622
Record Type: Journal
Publication Date: 2024-Dec
Pages: 28
Abstractor: As Provided
ISBN: N/A
ISSN: ISSN-1560-4292
EISSN: EISSN-1560-4306
Examining the Effect of Assessment Construct Characteristics on Machine Learning Scoring of Scientific Argumentation
Kevin C. Haudek; Xiaoming Zhai
International Journal of Artificial Intelligence in Education, v34 n4 p1482-1509 2024
Argumentation, a key scientific practice presented in the "Framework for K-12 Science Education," requires students to construct and critique arguments, but timely evaluation of arguments in large-scale classrooms is challenging. Recent work has shown the potential of automated scoring systems for open response assessments, leveraging machine learning (ML) and artificial intelligence (AI) to aid the scoring of written arguments in complex assessments. Moreover, research has amplified that the features (i.e., complexity, diversity, and structure) of assessment construct are critical to ML scoring accuracy, yet how the assessment construct may be associated with machine scoring accuracy remains unknown. This study investigated how the features associated with the assessment construct of a scientific argumentation assessment item affected machine scoring performance. Specifically, we conceptualized the construct in three dimensions: complexity, diversity, and structure. We employed human experts to code characteristics of the assessment tasks and score middle school student responses to 17 argumentation tasks aligned to three levels of a validated learning progression of scientific argumentation. We randomly selected 361 responses to use as training sets to build machine-learning scoring models for each item. The scoring models yielded a range of agreements with human consensus scores, measured by Cohen's kappa (mean = 0.60; range 0.38 - 0.89), indicating good to almost perfect performance. We found that higher levels of "Complexity" and "Diversity" of the assessment task were associated with decreased model performance, similarly the relationship between levels of "Structure" and model performance showed a somewhat negative linear trend. These findings highlight the importance of considering these construct characteristics when developing ML models for scoring assessments, particularly for higher complexity items and multidimensional assessments.
Springer. Available from: Springer Nature. One New York Plaza, Suite 4600, New York, NY 10004. Tel: 800-777-4643; Tel: 212-460-1500; Fax: 212-460-1700; e-mail: customerservice@springernature.com; Web site: https://bibliotheek.ehb.be:2123/
Publication Type: Journal Articles; Reports - Research
Education Level: Early Childhood Education; Elementary Education; Kindergarten; Primary Education; Elementary Secondary Education
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A
Grant or Contract Numbers: N/A