ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	0
Since 2006 (last 20 years)	2

Source

Language Testing	2
Educational and Psychological…	1

Author

Way, Walter D.	3
Hicks, Marilyn M.	2
Boldt, R. F.	1
Kaplan, Randy M.	1
Knoch, Ute	1
Lee, Yong-Won	1
Lukhele, Robert	1
McKinley, Robert L.	1
McNamara, Tim	1
Reese, Clyde M.	1
Tang, K. Linda	1
Wainer, Howard	1
Yamamoto, Kentaro	1
More ▼

Publication Type

Reports - Evaluative	12
Journal Articles	3

Education Level

Audience

Location

Australia	1
Netherlands	1
United Kingdom	1
United States	1

Laws, Policies, & Programs

Assessments and Surveys

Test of English as a Foreign…	12
International English…	1
Test of English for…	1

What Works Clearinghouse Rating

Showing all 12 results Save | Export

The Rasch Wars: The Emergence of Rasch Measurement in Language Testing

Peer reviewed

Direct link

McNamara, Tim; Knoch, Ute – Language Testing, 2012

This paper examines the uptake of Rasch measurement in language testing through a consideration of research published in language testing research journals in the period 1984 to 2009. Following the publication of the first papers on this topic, exploring the potential of the simple Rasch model for the analysis of dichotomous language test data, a…

Descriptors: Language Tests, Testing, English (Second Language), Item Response Theory

Evaluating a Prototype Essay Scoring Procedure Using Off-the-Shelf Software.

Download full text

Kaplan, Randy M.; And Others – 1995

The increased use of constructed-response items, like essays, creates a need for tools to score these responses automatically in part or as a whole. This study explores one approach to analyzing essay-length natural language constructed-responses. A decision model for scoring essays was developed and evaluated. The decision model uses…

Descriptors: Computer Software, Constructed Response, Essay Tests, Grammar

Dependability of Scores for a New ESL Speaking Assessment Consisting of Integrated and Independent Tasks

Peer reviewed

Direct link

Lee, Yong-Won – Language Testing, 2006

A multitask speaking measure consisting of both integrated and independent tasks is expected to be an important component of a new version of the TOEFL test. This study considered two critical issues concerning score dependability of the new speaking measure: How much would the score dependability be impacted by (1) combining scores on different…

Descriptors: Language Tests, Second Language Learning, English (Second Language), Generalizability Theory

The Feasibility of Modeling Secondary TOEFL Ability Dimensions Using Multidimensional TRT Models.

Download full text

McKinley, Robert L.; Way, Walter D. – 1992

An analysis of the skills necessary for performance on the Test of English as a Foreign Language (TOEFL) tends to support the view that there are important, although subtle, secondary dimensions present in the test. This research explored the feasibility of an item response theory (IRT) based method of modeling examinee performance on these…

Descriptors: Ability, Goodness of Fit, Identification, Item Response Theory

Estimating the Effects of Test Length and Test Time on Parameter Estimation Using the HYBRID Model.

Download full text

Yamamoto, Kentaro – 1995

The traditional indicator of test speededness, missing responses, clearly indicates a lack of time to respond (thereby indicating the speededness of the test), but it is inadequate for evaluating speededness in a multiple-choice test scored as number correct, and it underestimates test speededness. Conventional item response theory (IRT) parameter…

Descriptors: Ability, Estimation (Mathematics), Item Response Theory, Multiple Choice Tests

How Reliable Are TOEFL Scores?

Peer reviewed

Wainer, Howard; Lukhele, Robert – Educational and Psychological Measurement, 1997

The reliability of scores from four forms of the Test of English as a Foreign Language (TOEFL) was estimated using a hybrid item response theory model. It was found that there was very little difference between overall reliability when the testlet items were assumed to be independent and when their dependence was modeled. (Author/SLD)

Descriptors: English (Second Language), Item Response Theory, Scores, Second Language Learning

Analyzing the Option Effects of Difficult TOEFL Items with Low Biserials: Methods Developed for Use by Test Assemblers.

Download full text

Hicks, Marilyn M. – 1988

Several exploratory analyses of the fifths data generated by Test of English as a Foreign Language (TOEFL) item analyses were developed in order to evaluate the effects of options on the discriminability of difficult items and to identify difficult items with low, unreliable biserials that had been rejected by test developers, but for which…

Descriptors: Difficulty Level, Estimation (Mathematics), Identification, Item Analysis

An Investigation of the Use of Simplified IRT Models for Scaling and Equating the TOEFL Test. TOEFL Technical Report TR-2.

Download full text

Way, Walter D.; Reese, Clyde M. – 1991

The use of two alternative item response theory (IRT) estimation models in the scaling and equating of the Test of English as a Foreign Language (TOEFL) was explored; and item scaling and test equating results based on these models were compared with results based on the three-parameter (3PL) model currently being used with the TOEFL. Models were…

Descriptors: Correlation, Equated Scores, Estimation (Mathematics), Goodness of Fit

The Effect of Small Calibration Sample Sizes on TOEFL IRT-Based Equating.

Download full text

Tang, K. Linda; And Others – 1993

This study compared the performance of the LOGIST and BILOG computer programs on item response theory (IRT) based scaling and equating for the Test of English as a Foreign Language (TOEFL) using real and simulated data and two calibration structures. Applications of IRT for the TOEFL program are based on the three-parameter logistic (3PL) model.…

Descriptors: Comparative Analysis, Computer Simulation, Equated Scores, Estimation (Mathematics)

Simulated Equating Using Several Item Response Curves.

Download full text

Boldt, R. F. – 1994

The comparison of item response theory models for the Test of English as a Foreign Language (TOEFL) was extended to an equating context as simulation trials were used to "equate the test to itself." Equating sample data were generated from administration of identical item sets. Equatings that used procedures based on each model (simple…

Descriptors: Comparative Analysis, Cutting Scores, English (Second Language), Equated Scores

An Exploratory Study of Characteristics Related to IRT Item Parameter Invariance with the Test of English as a Foreign Language. TOEFL Technical Report.

Download full text

Way, Walter D.; And Others – 1992

This study provided an exploratory investigation of item features that might contribute to a lack of invariance of item parameters for the Test of English as a Foreign Language (TOEFL). Data came from seven forms of the TOEFL administered in 1989. Subjective and quantitative measures developed for the study provided consistent information related…

Descriptors: Ability, English (Second Language), Goodness of Fit, Item Response Theory

The TOEFL Computerized Placement Test: Adaptive Conventional Measurement. TOEFL Research Reports, Report 31.

Download full text

Hicks, Marilyn M. – 1989

Methods of computerized adaptive testing using conventional scoring methods in order to develop a computerized placement test for the Test of English as a Foreign Language (TOEFL) were studied. As a consequence of simulation studies during the first phase of the study, the multilevel testing paradigm was adopted to produce three test levels…

Descriptors: Adaptive Testing, Adults, Algorithms, Computer Assisted Testing

Item Response Theory	10
English (Second Language)	6
Equated Scores	4
Estimation (Mathematics)	4
Language Tests	4
Simulation	4
Test Construction	4
Test Items	4
Ability	3
Goodness of Fit	3
Models	3
Sample Size	3
Scaling	3
Scores	3
Test Format	3
Comparative Analysis	2
Difficulty Level	2
Identification	2
Pretests Posttests	2
Scoring	2
Second Language Learning	2
Test Reliability	2
Adaptive Testing	1
Adults	1
Algorithms	1
More ▼