ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	2
Since 2016 (last 10 years)	13
Since 2006 (last 20 years)	26

Descriptor

Correlation	28
Essays	28
Interrater Reliability	18
Scoring	18
Writing Evaluation	15
Second Language Learning	13
Evaluators	12
Computer Assisted Testing	11
English (Second Language)	11
Foreign Countries	11
Comparative Analysis	10
Statistical Analysis	9
Scores	8
Writing Tests	8
Language Tests	7
Reliability	7
Computer Software	5
Accuracy	4
Automation	4
College Students	4
Computational Linguistics	4
Grammar	4
Language Usage	4
Regression (Statistics)	4
Second Language Instruction	4
More ▼

Publication Type

Journal Articles	24
Reports - Research	22
Tests/Questionnaires	3
Dissertations/Theses -…	2
Reports - Evaluative	2
Speeches/Meeting Papers	2
Information Analyses	1
Reports - Descriptive	1

Education Level

Higher Education	11
Postsecondary Education	9
Secondary Education	3
Grade 11	1
High Schools	1

Audience

Location

China	4
Hong Kong	3
Iran	1
Israel	1
Ohio	1
Turkey	1

Laws, Policies, & Programs

Assessments and Surveys

Test of English as a Foreign…	3
Graduate Record Examinations	2
Test of Standard Written…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 28 results Save | Export

Correlating What We Know: A Mixed Methods Study of Reflection and Writing in First-Year Writing Assessment

Peer reviewed

Direct link

Pruchnic, Jeff; Barton, Ellen; Primeau, Sarah; Trimble, Thomas; Varty, Nicole; Foster, Tanina – Composition Forum, 2021

Over the past two decades, reflective writing has occupied an increasingly prominent position in composition theory, pedagogy, and assessment as researchers have described the value of reflection and reflective writing in college students' development of higher-order writing skills, such as genre conventions (Yancey, "Reflection";…

Descriptors: Reflection, Correlation, Essays, Freshman Composition

Meta-Analysis of Inter-Rater Agreement and Discrepancy Between Human and Automated English Essay Scoring

Peer reviewed
PDF on ERIC

Download full text

Direct link

Jiyeo Yun – English Teaching, 2023

Studies on automatic scoring systems in writing assessments have also evaluated the relationship between human and machine scores for the reliability of automated essay scoring systems. This study investigated the magnitudes of indices for inter-rater agreement and discrepancy, especially regarding human and machine scoring, in writing assessment.…

Descriptors: Meta Analysis, Interrater Reliability, Essays, Scoring

Autoscoring Essays Based on Complex Networks

Peer reviewed

Direct link

Ke, Xiaohua; Zeng, Yongqiang; Luo, Haijiao – Journal of Educational Measurement, 2016

This article presents a novel method, the Complex Dynamics Essay Scorer (CDES), for automated essay scoring using complex network features. Texts produced by college students in China were represented as scale-free networks (e.g., a word adjacency model) from which typical network features, such as the in-/out-degrees, clustering coefficient (CC),…

Descriptors: Scoring, Automation, Essays, Networks

Diagnosing Linguistic Problems in English Academic Writing of University Students: An Item Bank Approach

Peer reviewed

Direct link

Xie, Qin – Language Assessment Quarterly, 2020

This article describes the steps we went through in designing and validating an item bank to diagnose linguistic problems in the English academic writing of university students in Hong Kong. Test items adopt traditional item formats (e.g., MCQ, grammatical judgment tasks, and error correction) but are based on authentic language materials…

Descriptors: English for Academic Purposes, Second Language Learning, Second Language Instruction, Item Analysis

Validating Human and Automated Scoring of Essays against "True" Scores

Peer reviewed

Direct link

Cohen, Yoav; Levi, Effi; Ben-Simon, Anat – Applied Measurement in Education, 2018

In the current study, two pools of 250 essays, all written as a response to the same prompt, were rated by two groups of raters (14 or 15 raters per group), thereby providing an approximation to the essay's true score. An automated essay scoring (AES) system was trained on the datasets and then scored the essays using a cross-validation scheme. By…

Descriptors: Test Validity, Automation, Scoring, Computer Assisted Testing

Effect of Quality Characteristics of Peer Raters on Rating Errors in Peer Assessment

Peer reviewed

Direct link

Guo, Xiuyan; Lei, Pui-Wa – International Journal of Testing, 2020

Little research has been done on the effects of peer raters' quality characteristics on peer rating qualities. This study aims to address this gap and investigate the effects of key variables related to peer raters' qualities, including content knowledge, previous rating experience, training on rating tasks, and rating motivation. In an experiment…

Descriptors: Peer Evaluation, Error Patterns, Correlation, Knowledge Level

The Impact of Rater Variability on Relationships among Different Effect-Size Indices for Inter-Rater Agreement between Human and Automated Essay Scoring

Direct link

Yun, Jiyeo – ProQuest LLC, 2017

Since researchers investigated automatic scoring systems in writing assessments, they have dealt with relationships between human and machine scoring, and then have suggested evaluation criteria for inter-rater agreement. The main purpose of my study is to investigate the magnitudes of and relationships among indices for inter-rater agreement used…

Descriptors: Interrater Reliability, Essays, Scoring, Evaluators

Evaluating Comparative Judgment as an Approach to Essay Scoring

Peer reviewed

Direct link

Steedle, Jeffrey T.; Ferrara, Steve – Applied Measurement in Education, 2016

As an alternative to rubric scoring, comparative judgment generates essay scores by aggregating decisions about the relative quality of the essays. Comparative judgment eliminates certain scorer biases and potentially reduces training requirements, thereby allowing a large number of judges, including teachers, to participate in essay evaluation.…

Descriptors: Essays, Scoring, Comparative Analysis, Evaluators

Interrater Reliability Estimators Commonly Used in Scoring Language Assessments: A Monte Carlo Investigation of Estimator Accuracy

Peer reviewed

Direct link

Morgan, Grant B.; Zhu, Min; Johnson, Robert L.; Hodge, Kari J. – Language Assessment Quarterly, 2014

Common estimators of interrater reliability include Pearson product-moment correlation coefficients, Spearman rank-order correlations, and the generalizability coefficient. The purpose of this study was to examine the accuracy of estimators of interrater reliability when varying the true reliability, number of scale categories, and number of…

Descriptors: Interrater Reliability, Correlation, Generalization, Scoring

Does Indirect Writing Assessment Have Any Relevance to Direct Writing Assessment? Focus on Validity and Reliability

Peer reviewed
PDF on ERIC

Download full text

Kural, Faruk – Journal of Language and Linguistic Studies, 2018

The present paper, which is a study based on midterm exam results of 53 University English prep-school students, examines correlation between a direct writing test, measured holistically by multiple-trait scoring, and two indirect writing tests used in a competence exam, one of which is a multiple-choice cloze test and the other a rewrite test…

Descriptors: Writing Evaluation, Cloze Procedure, Comparative Analysis, Essays

Linguistic Features of Humor in Academic Writing

Peer reviewed
PDF on ERIC

Download full text

Skalicky, Stephen; Berger, Cynthia M.; Crossley, Scott A.; McNamara, Danielle S. – Advances in Language and Literary Studies, 2016

A corpus of 313 freshman college essays was analyzed in order to better understand the forms and functions of humor in academic writing. Human ratings of humor and wordplay were statistically aggregated using Factor Analysis to provide an overall "Humor" component score for each essay in the corpus. In addition, the essays were also…

Descriptors: Discourse Analysis, Academic Discourse, Humor, Writing (Composition)

Validating Automated Essay Scoring: A (Modest) Refinement of the "Gold Standard"

Peer reviewed

Direct link

Powers, Donald E.; Escoffery, David S.; Duchnowski, Matthew P. – Applied Measurement in Education, 2015

By far, the most frequently used method of validating (the interpretation and use of) automated essay scores has been to compare them with scores awarded by human raters. Although this practice is questionable, human-machine agreement is still often regarded as the "gold standard." Our objective was to refine this model and apply it to…

Descriptors: Essays, Test Scoring Machines, Program Validation, Criterion Referenced Tests

Writing Assessment as a Predictor of Performance in an Executive Graduate Degree Program

Direct link

Forkner, Carl B. – ProQuest LLC, 2013

Compressed time from matriculation to graduation prevalent in executive graduate degree programs means that traditional methods of identifying deficiencies in student writing during the course of study may not provide timely remediation enabling for student success. This study examined a writing-intensive, 10-month executive graduate degree…

Descriptors: Writing Evaluation, Graduate Students, Computer Assisted Testing, Predictive Validity

Investigating the Application of Automated Writing Evaluation to Chinese Undergraduate English Majors: A Case Study of "WriteToLearn"

Peer reviewed
PDF on ERIC

Download full text

Liu, Sha; Kunnan, Antony John – CALICO Journal, 2016

This study investigated the application of "WriteToLearn" on Chinese undergraduate English majors' essays in terms of its scoring ability and the accuracy of its error feedback. Participants were 163 second-year English majors from a university located in Sichuan province who wrote 326 essays from two writing prompts. Each paper was…

Descriptors: Foreign Countries, Undergraduate Students, English (Second Language), Second Language Learning

Grounding Lexical Diversity in Human Judgments

Peer reviewed

Direct link

Jarvis, Scott – Language Testing, 2017

The present study discusses the relevance of measures of lexical diversity (LD) to the assessment of learner corpora. It also argues that existing measures of LD, many of which have become specialized for use with language corpora, are fundamentally measures of lexical repetition, are based on an etic perspective of language, and lack construct…

Descriptors: Computational Linguistics, English (Second Language), Second Language Learning, Native Speakers

Previous Page | Next Page »

Pages: 1 | 2

ETS Research Report Series	4
Applied Measurement in…	3
Journal of Educational…	2
Language Assessment Quarterly	2
ProQuest LLC	2
Advances in Language and…	1
Applied Linguistics	1
CALICO Journal	1
Composition Forum	1
Education and Information…	1
Educational Research and…	1
Educational Testing Service	1
English Teaching	1
IEEE Transactions on Learning…	1
International Journal of…	1
Iranian Journal of Language…	1
Journal of Language and…	1
Language Testing	1
New Horizons in Education	1
More ▼

Coniam, David	2
Gentile, Claudia	2
Kantor, Robert	2
Lee, Yong-Won	2
Attali, Yigal	1
Barton, Ellen	1
Ben-Simon, Anat	1
Berger, Cynthia M.	1
Breland, Hunter M.	1
Breyer, F. Jay	1
Bridgeman, Brent	1
Chaudhary, Banshi D.	1
Cohen, Yoav	1
Crossley, Scott A.	1
Davey, Tim	1
Duchnowski, Matthew P.	1
Escoffery, David S.	1
Ferrara, Steve	1
Forkner, Carl B.	1
Foster, Tanina	1
Gaynor, Judith L.	1
Guo, Xiuyan	1
Haberman, Shelby J.	1
Hodge, Kari J.	1
Jarvis, Scott	1
More ▼