ERIC - Search Results

Publication Date

In 2025	0
Since 2024	1
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	6
Since 2006 (last 20 years)	8

Descriptor

Comparative Analysis	8
Evaluators	8
Writing Tests	8
Essays	6
English (Second Language)	5
Language Tests	5
Scoring	5
Second Language Learning	5
Correlation	4
Writing Evaluation	4
Computer Assisted Testing	3
Foreign Countries	3
Interrater Reliability	3
Language Proficiency	3
Scores	3
College Students	2
Computer Software	2
Factor Analysis	2
Prompting	2
Regression (Statistics)	2
Reliability	2
Statistical Analysis	2
Accuracy	1
Artificial Intelligence	1
Attention Deficit…	1
More ▼

Source

Applied Measurement in…	2
ETS Research Report Series	2
JALT CALL Journal	1
Language Assessment Quarterly	1
Language Teaching Research…	1
Language Testing	1

Author

Attali, Yigal	3
Breyer, F. Jay	1
Buzick, Heather	1
Ferrara, Steve	1
Flor, Michael	1
Heidari, Jamshid	1
Khodabandeh, Farzaneh	1
Li, Jiuliang	1
Lorenz, Florian	1
Oliveri, Maria Elena	1
Osama Koraishi	1
Sinharay, Sandip	1
Soleimani, Hassan	1
Steedle, Jeffrey T.	1
Zhang, Mo	1
More ▼

Publication Type

Journal Articles	8
Reports - Research	8
Tests/Questionnaires	1

Education Level

Higher Education	3
Postsecondary Education	3

Audience

Location

China	1
Iran	1

Laws, Policies, & Programs

Assessments and Surveys

Test of English as a Foreign…	2
Graduate Record Examinations	1
International English…	1

What Works Clearinghouse Rating

Showing all 8 results Save | Export

The Intersection of AI and Language Assessment: A Study on the Reliability of ChatGPT in Grading IELTS Writing Task 2

Peer reviewed
PDF on ERIC

Download full text

Osama Koraishi – Language Teaching Research Quarterly, 2024

This study conducts a comprehensive quantitative evaluation of OpenAI's language model, ChatGPT 4, for grading Task 2 writing of the IELTS exam. The objective is to assess the alignment between ChatGPT's grading and that of official human raters. The analysis encompassed a multifaceted approach, including a comparison of means and reliability…

Descriptors: Second Language Learning, English (Second Language), Language Tests, Artificial Intelligence

Evaluating Comparative Judgment as an Approach to Essay Scoring

Peer reviewed

Direct link

Steedle, Jeffrey T.; Ferrara, Steve – Applied Measurement in Education, 2016

As an alternative to rubric scoring, comparative judgment generates essay scores by aggregating decisions about the relative quality of the essays. Comparative judgment eliminates certain scorer biases and potentially reduces training requirements, thereby allowing a large number of judges, including teachers, to participate in essay evaluation.…

Descriptors: Essays, Scoring, Comparative Analysis, Evaluators

Establishing Comparability across Writing Tasks with Picture Prompts of Three Alternate Tests

Peer reviewed

Direct link

Li, Jiuliang – Language Assessment Quarterly, 2018

In language testing programs, different test forms are often used to administer the same test. Demonstrating the comparability of these forms is essential to avoid criticisms of potential test unfairness. However, studies with this objective are scarce. This study aims to investigate the extent to which the picture-prompt writing tasks of three…

Descriptors: Writing Tests, Language Tests, Check Lists, Culture Fair Tests

A Comparison of Newly-Trained and Experienced Raters on a Standardized Writing Assessment

Peer reviewed

Direct link

Attali, Yigal – Language Testing, 2016

A short training program for evaluating responses to an essay writing task consisted of scoring 20 training essays with immediate feedback about the correct score. The same scoring session also served as a certification test for trainees. Participants with little or no previous rating experience completed this session and 14 trainees who passed an…

Descriptors: Writing Evaluation, Writing Tests, Standardized Tests, Evaluators

Comparing Human and Automated Essay Scoring for Prospective Graduate Students with Learning Disabilities and/or ADHD

Peer reviewed

Direct link

Buzick, Heather; Oliveri, Maria Elena; Attali, Yigal; Flor, Michael – Applied Measurement in Education, 2016

Automated essay scoring is a developing technology that can provide efficient scoring of large numbers of written responses. Its use in higher education admissions testing provides an opportunity to collect validity and fairness evidence to support current uses and inform its emergence in other areas such as K-12 large-scale assessment. In this…

Descriptors: Essays, Learning Disabilities, Attention Deficit Hyperactivity Disorder, Scoring

A Comparative Analysis of Face to Face Instruction vs. Telegram Mobile Instruction in Terms of Narrative Writing

Peer reviewed
PDF on ERIC

Download full text

Heidari, Jamshid; Khodabandeh, Farzaneh; Soleimani, Hassan – JALT CALL Journal, 2018

The emergence of computer technology in English language teaching has paved the way for teachers' application of Mobile Assisted Language Learning (mall) and its advantages in teaching. This study aimed to compare the effectiveness of the face to face instruction with Telegram mobile instruction. Based on a toefl test, 60 English foreign language…

Descriptors: Comparative Analysis, Conventional Instruction, Teaching Methods, Computer Assisted Instruction

Automated Trait Scores for "TOEFL"® Writing Tasks. Research Report. ETS RR-15-14

Peer reviewed
PDF on ERIC

Download full text

Attali, Yigal; Sinharay, Sandip – ETS Research Report Series, 2015

The "e-rater"® automated essay scoring system is used operationally in the scoring of "TOEFL iBT"® independent and integrated tasks. In this study we explored the psychometric added value of reporting four trait scores for each of these two tasks, beyond the total e-rater score.The four trait scores are word choice, grammatical…

Descriptors: Writing Tests, Scores, Language Tests, English (Second Language)

Investigating the Suitability of Implementing the "e-rater"® Scoring Engine in a Large-Scale English Language Testing Program. Research Report. ETS RR-13-36

Peer reviewed
PDF on ERIC

Download full text

Zhang, Mo; Breyer, F. Jay; Lorenz, Florian – ETS Research Report Series, 2013

In this research, we investigated the suitability of implementing "e-rater"® automated essay scoring in a high-stakes large-scale English language testing program. We examined the effectiveness of generic scoring and 2 variants of prompt-based scoring approaches. Effectiveness was evaluated on a number of dimensions, including agreement…

Descriptors: Computer Assisted Testing, Computer Software, Scoring, Language Tests