ERIC Number: EJ1422796
Record Type: Journal
Publication Date: 2024
Pages: 21
Abstractor: As Provided
ISBN: N/A
ISSN: ISSN-1088-8438
EISSN: EISSN-1532-799X
Text Complexity of Chinese Elementary School Textbooks: Analysis of Text Linguistic Features Using Machine Learning Algorithms
Purpose: This study sought to 1) identify linguistic features important for Chinese text complexity with a theory-based and systematic approach, and 2) address how feature sets and algorithms affect the performance of Chinese text complexity models. Method: Texts from Chinese language arts textbooks from Grades 1 to 6 (N = 1,478) in Mainland China were analyzed. The predictor variables were 265 linguistic features of texts: 154 lexical features and 111 sentence and discourse features. The outcome variable was the complexity level of texts; a one-semester-scale was applied, thus 12 levels in total (two semesters per grade). Results: Features of the categories of character and word frequency, character and word semantic features, lexical diversity, part-of-speech syntactic categories, and referential cohesion were found the most important. With the important features identified, we found that text complexity models with features at all levels outperformed those with features at only one level. Models using the two machine learning algorithms (Random Forest Regression and Support Vector Regression) outperformed those using Linear Regression. Conclusion: This work clarifies important linguistic features for Chinese text complexity, and points to the necessity of considering features across levels and using machine learning algorithms in future text complexity research.
Descriptors: Difficulty Level, Textbooks, Algorithms, Artificial Intelligence, Foreign Countries, Predictor Variables, Word Frequency, Semantics, Syntax, Speech Communication, Elementary School Students, Language Variation, Chinese, Language Arts, Textbook Evaluation, Orthographic Symbols, Computational Linguistics, Discourse Analysis
Routledge. Available from: Taylor & Francis, Ltd. 530 Walnut Street Suite 850, Philadelphia, PA 19106. Tel: 800-354-1420; Tel: 215-625-8900; Fax: 215-207-0050; Web site: http://www.tandf.co.uk/journals
Publication Type: Journal Articles; Reports - Research
Education Level: Elementary Education
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A
Identifiers - Location: China
Grant or Contract Numbers: N/A