ERIC - Search Results

Publication Date

In 2025	0
Since 2024	2
Since 2021 (last 5 years)	7
Since 2016 (last 10 years)	23
Since 2006 (last 20 years)	53

Source

Applied Measurement in…

111

Publication Type

Journal Articles	111
Reports - Research	68
Reports - Evaluative	39
Reports - Descriptive	5
Speeches/Meeting Papers	5
Information Analyses	1

Education Level

Higher Education	7
Grade 8	6
Elementary Education	5
Elementary Secondary Education	5
Grade 5	5
Grade 4	4
High Schools	4
Middle Schools	4
Postsecondary Education	4
Secondary Education	4
Grade 3	3
Early Childhood Education	2
Grade 10	2
Grade 7	2
Intermediate Grades	2
Junior High Schools	2
Primary Education	2
Grade 1	1
Grade 12	1
Grade 2	1
Grade 6	1
More ▼

Audience

Location

California	3
Canada	2
Arizona	1
Australia	1
California (Los Angeles)	1
Germany	1
Hawaii	1
Idaho	1
Indiana	1
Israel	1
Louisiana	1
Netherlands	1
North Carolina	1
Oman	1
Tennessee	1
United Kingdom	1
United States	1
Vermont	1
More ▼

Laws, Policies, & Programs

No Child Left Behind Act 2001

Assessments and Surveys

National Assessment of…	3
SAT (College Admission Test)	2
Advanced Placement…	1
Iowa Tests of Basic Skills	1
National Teacher Examinations	1
Program for International…	1
Stanford Achievement Tests	1
Texas Assessment of Academic…	1
United States Medical…	1

What Works Clearinghouse Rating

Applied Measurement in Education X

Showing 76 to 90 of 111 results Save | Export

The Effects of Test Length and Sample Size on the Reliability and Equating of Tests Composed of Constructed-Response Items.

Peer reviewed

Fitzpatrick, Anne R.; Yen, Wendy M. – Applied Measurement in Education, 2001

Examined the effects of test length and sample size on the alternate forms reliability and equating of simulated mathematics tests composed of constructed response items scaled using the two-parameter partial credit model. Results suggest that, to obtain acceptable reliabilities and accurate equated scores, tests should have at least 8 6-point…

Descriptors: Constructed Response, Equated Scores, Mathematics Tests, Reliability

The Precision of Measurements.

Peer reviewed

Kane, Michael – Applied Measurement in Education, 1996

This overview of the role of error and tolerance for error in measurement asserts that the generic precision associated with a measurement procedure is defined as the root mean square error, or standard error, in some relevant population. This view of precision is explored in several applications of measurement. (SLD)

Descriptors: Error of Measurement, Error Patterns, Generalizability Theory, Measurement Techniques

Stability of School-Level Scores from Large-Scale Student Assessment.

Peer reviewed

Sicoly, Fiore – Applied Measurement in Education, 2002

Calculated year-1 to year-2 stability of assessment data from 21 states and 2 Canadian provinces. The median stability coefficient was 0.78 in mathematics and reading, and lower in writing. A stability coefficient of 0.80 is recommended as the standard for large-scale assessments of student performance. (SLD)

Descriptors: Educational Testing, Elementary Secondary Education, Foreign Countries, Mathematics

Increasing Score Reliability with Item-Pattern Scoring: An Empirical Study in Five Score Metrics.

Peer reviewed

Yen, Wendy M.; Candell, Gregory L. – Applied Measurement in Education, 1991

Empirical reliabilities of scores based on item-pattern scoring, using 3-parameter item-response theory and number-correct scoring, were compared within each of 5 score metrics for at least 900 elementary school students for 5 content areas. Average increases in reliability were produced by item-pattern scoring. (SLD)

Descriptors: Elementary Education, Elementary School Students, Grade Equivalent Scores, Item Response Theory

Measuring the Impact of Judge Severity on Examination Scores.

Peer reviewed

Lunz, Mary E.; And Others – Applied Measurement in Education, 1990

An extension of the Rasch model is used to obtain objective measurements for examinations graded by judges. The model calibrates elements of each facet of the examination on a common log-linear scale. Real examination data illustrate the way correcting for judge severity improves fairness of examinee measures. (SLD)

Descriptors: Certification, Difficulty Level, Interrater Reliability, Judges

The Relation between Score Resolution Methods and Interrater Reliability: An Empirical Study of an Analytic Scoring Rubric.

Peer reviewed

Johnson, Robert L.; Penny, James; Gordon, Belita – Applied Measurement in Education, 2000

Studied four forms of score resolution used by testing agencies and investigated the effect that each has on the interrater reliability associated with the resulting operational scores. Results, based on 120 essays from the Georgia High School Writing Test, show some forms of resolution to be associated with higher reliability and some associated…

Descriptors: Essay Tests, High School Students, High Schools, Interrater Reliability

A Taxonomy of Multiple-Choice Item-Writing Rules.

Peer reviewed

Haladyna, Thomas M.; Downing, Steven M. – Applied Measurement in Education, 1989

A taxonomy of 43 rules for writing multiple-choice test items is presented, based on a consensus of 46 textbooks. These guidelines are presented as complete and authoritative, with solid consensus apparent for 33 of the rules. Four rules lack consensus, and 5 rules were cited fewer than 10 times. (SLD)

Descriptors: Classification, Interrater Reliability, Multiple Choice Tests, Objective Tests

The Consistency between Raters Scoring in Different Test Years.

Peer reviewed

Fitzpatrick, Anne R.; Ercikan, Kadriye; Yen, Wendy M.; Ferrara, Steven – Applied Measurement in Education, 1998

The consistency between raters over three years of a high-stakes performance assessment was examined in two studies involving a total of approximately 3,000 students in grades three, five, and eight. Results show that raters in different years differ in severity, with raters in mathematics most consistent, and those in language arts least…

Descriptors: Elementary Education, Elementary School Students, High Stakes Tests, Interrater Reliability

Response Time Effort: A New Measure of Examinee Motivation in Computer-Based Tests

Peer reviewed

Direct link

Wise, Steven L.; Kong, Xiaojing – Applied Measurement in Education, 2005

When low-stakes assessments are administered, the degree to which examinees give their best effort is often unclear, complicating the validity and interpretation of the resulting test scores. This study introduces a new method, based on item response time, for measuring examinee test-taking effort on computer-based test items. This measure, termed…

Descriptors: Psychometrics, Validity, Reaction Time, Test Items

An Investigation of the Differential Effort Received by Items on a Low-Stakes Computer-Based Test

Peer reviewed

Direct link

Wise, Steven L. – Applied Measurement in Education, 2006

In low-stakes testing, the motivation levels of examinees are often a matter of concern to test givers because a lack of examinee effort represents a direct threat to the validity of the test data. This study investigated the use of response time to assess the amount of examinee effort received by individual test items. In 2 studies, it was found…

Descriptors: Computer Assisted Testing, Motivation, Test Validity, Item Response Theory

Composite versus Component Scores: Consistency of School Effectiveness Classification.

Peer reviewed

Crone, Linda J.; And Others – Applied Measurement in Education, 1994

Scores from 324 Louisiana schools on the Louisiana Graduation Exit Examination and a within-school split sample of 255 schools indicate that a single subject or grade provides a less consistent and more narrow perspective on school effectiveness than a subcomposite made up of 2 subject areas. (SLD)

Descriptors: Classification, Effective Schools Research, Elementary Secondary Education, Exit Examinations

Developing a Valid and Reliable Portfolio Assessment in the Primary Grades: Building on Practical Experience.

Peer reviewed

Shapley, Kelly S.; Bush, M. Joan – Applied Measurement in Education, 1999

Examined the validity and reliability of the 1995-96 reading/language arts portfolio assessment developed in the Dallas (Texas) public schools for prekindergarten through second grade. Ratings by 42 teachers show that portfolio contents do not provide a valid sample of student work and the assessment reliability is low. (SLD)

Descriptors: Language Arts, Portfolio Assessment, Portfolios (Background Materials), Primary Education

Using an Extended Angoff Procedure to Set Standards on Complex Performance Assessments.

Peer reviewed

Hambleton, Ronald K.; Plake, Barbara S. – Applied Measurement in Education, 1995

Several extensions to the Angoff method of standard setting are described that can accommodate characteristics of performance-based assessment. A study involving 12 panelists supported the effectiveness of the new approach but suggested that panelists preferred an approach that was at least partially conjunctive. (SLD)

Descriptors: Educational Assessment, Evaluation Methods, Evaluators, Interrater Reliability

The Reproducibility of Standards over Groups and Occasions.

Peer reviewed

Norcini, John; Shea, Judy – Applied Measurement in Education, 1992

Two studies involving a total of 99 experts examined the reproducibility of standards for 2 medical certifying examinations set under different conditions. Together, results of both studies provide evidence that a modified version of the Angoff method is quite reliable and produces stable results under varying conditions. (SLD)

Descriptors: Academic Standards, Evaluators, Groups, Higher Education

An Analysis of Textbook Advice about True-False Tests.

Peer reviewed

Frisbie, David A.; Becker, Douglas F. – Applied Measurement in Education, 1990

Seventeen educational measurement textbooks were reviewed to analyze current perceptions regarding true-false achievement testing. A synthesis of the rules for item writing is presented, and the purported advantages and disadvantages of the true-false format derived from those texts are reviewed. (TJH)

Descriptors: Achievement Tests, Higher Education, Methods Courses, Objective Tests

« Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8

Feldt, Leonard S.	7
Hambleton, Ronald K.	3
Johnson, Robert L.	3
Qualls, Audrey L.	3
Wise, Steven L.	3
Yen, Wendy M.	3
Bandalos, Deborah L.	2
Brennan, Robert L.	2
Enders, Craig K.	2
Ercikan, Kadriye	2
Fitzpatrick, Anne R.	2
Frisbie, David A.	2
Gao, Xiaohong	2
Kane, Michael	2
Klein, Stephen P.	2
Lane, Suzanne	2
Linn, Robert L.	2
Mehrens, William A.	2
Penny, Jim	2
Shavelson, Richard J.	2
Stone, Clement A.	2
Thissen, David	2
Wainer, Howard	2
Webb, Noreen M.	2
More ▼

Reliability	50
Test Reliability	38
Interrater Reliability	28
Scores	28
Test Items	25
Scoring	21
Test Construction	21
Item Response Theory	20
Validity	20
Error of Measurement	15
Correlation	14
Educational Assessment	14
Performance Based Assessment	14
Test Validity	14
Generalizability Theory	13
Mathematics Tests	13
Evaluation Methods	12
Multiple Choice Tests	11
Standard Setting (Scoring)	11
Comparative Analysis	10
Elementary Secondary Education	9
Evaluators	9
Item Analysis	9
Licensing Examinations…	9
Computation	8
More ▼