ERIC - Search Results

Publication Date

In 2025	0
Since 2024	2
Since 2021 (last 5 years)	7
Since 2016 (last 10 years)	23
Since 2006 (last 20 years)	53

Source

Applied Measurement in…

111

Publication Type

Journal Articles	111
Reports - Research	68
Reports - Evaluative	39
Reports - Descriptive	5
Speeches/Meeting Papers	5
Information Analyses	1

Education Level

Higher Education	7
Grade 8	6
Elementary Education	5
Elementary Secondary Education	5
Grade 5	5
Grade 4	4
High Schools	4
Middle Schools	4
Postsecondary Education	4
Secondary Education	4
Grade 3	3
Early Childhood Education	2
Grade 10	2
Grade 7	2
Intermediate Grades	2
Junior High Schools	2
Primary Education	2
Grade 1	1
Grade 12	1
Grade 2	1
Grade 6	1
More ▼

Audience

Location

California	3
Canada	2
Arizona	1
Australia	1
California (Los Angeles)	1
Germany	1
Hawaii	1
Idaho	1
Indiana	1
Israel	1
Louisiana	1
Netherlands	1
North Carolina	1
Oman	1
Tennessee	1
United Kingdom	1
United States	1
Vermont	1
More ▼

Laws, Policies, & Programs

No Child Left Behind Act 2001

Assessments and Surveys

National Assessment of…	3
SAT (College Admission Test)	2
Advanced Placement…	1
Iowa Tests of Basic Skills	1
National Teacher Examinations	1
Program for International…	1
Stanford Achievement Tests	1
Texas Assessment of Academic…	1
United States Medical…	1

What Works Clearinghouse Rating

Applied Measurement in Education X

Showing 46 to 60 of 111 results Save | Export

Modified Multiple-Choice Items for Alternate Assessments: Reliability, Difficulty, and Differential Boost

Peer reviewed

Direct link

Kettler, Ryan J.; Rodriguez, Michael C.; Bolt, Daniel M.; Elliott, Stephen N.; Beddow, Peter A.; Kurz, Alexander – Applied Measurement in Education, 2011

Federal policy on alternate assessment based on modified academic achievement standards (AA-MAS) inspired this research. Specifically, an experimental study was conducted to determine whether tests composed of modified items would have the same level of reliability as tests composed of original items, and whether these modified items helped reduce…

Descriptors: Multiple Choice Tests, Test Items, Alternative Assessment, Test Reliability

The Critical Role of Anchor Paper Selection in Writing Assessment

Peer reviewed

Direct link

Osborn Popp, Sharon E.; Ryan, Joseph M.; Thompson, Marilyn S. – Applied Measurement in Education, 2009

Scoring rubrics are routinely used to evaluate the quality of writing samples produced for writing performance assessments, with anchor papers chosen to represent score points defined in the rubric. Although the careful selection of anchor papers is associated with best practices for scoring, little research has been conducted on the role of…

Descriptors: Writing Evaluation, Scoring Rubrics, Selection, Scoring

Stability of Rasch Scales over Time

Peer reviewed

Direct link

Taylor, Catherine S.; Lee, Yoonsun – Applied Measurement in Education, 2010

Item response theory (IRT) methods are generally used to create score scales for large-scale tests. Research has shown that IRT scales are stable across groups and over time. Most studies have focused on items that are dichotomously scored. Now Rasch and other IRT models are used to create scales for tests that include polytomously scored items.…

Descriptors: Measures (Individuals), Item Response Theory, Robustness (Statistics), Item Analysis

The Effects of the Number of Scale Points and Non-Normality on the Generalizability Coefficient: A Monte Carlo Study

Peer reviewed

Direct link

Shumate, Steven R.; Surles, James; Johnson, Robert L.; Penny, Jim – Applied Measurement in Education, 2007

Increasingly, assessment practitioners use generalizability coefficients to estimate the reliability of scores from performance tasks. Little research, however, examines the relation between the estimation of generalizability coefficients and the number of rubric scale points and score distributions. The purpose of the present research is to…

Descriptors: Generalizability Theory, Monte Carlo Methods, Measures (Individuals), Program Effectiveness

Score Resolution: An Investigation of the Reliability and Validity of Resolved Scores

Peer reviewed

Direct link

Johnson, Robert L.; Penny, Jim; Fisher, Steve; Kuhs, Therese – Applied Measurement in Education, 2003

When raters assign different scores to a performance task, a method for resolving rating differences is required to report a single score to the examinee. Recent studies indicate that decisions about examinees, such as pass/fail decisions, differ across resolution methods. Previous studies also investigated the interrater reliability of…

Descriptors: Test Reliability, Test Validity, Scores, Interrater Reliability

Investigating Design Features of Descriptive Graphic Rating Scales.

Peer reviewed

Myford, Carol M. – Applied Measurement in Education, 2002

Studied the use of descriptive graphic rating scales by 11 raters to evaluate students' work, exploring different design features. Used a Rasch-model based rating scale analysis to determine that all the continuous scales could be considered to have at least five points, and that defined midpoints did not result in higher student separation…

Descriptors: Evaluators, Rating Scales, Reliability, Test Construction

Recommendations for Preparing and Scoring Constructed-Response Items: What the Experts Say

Peer reviewed

Direct link

Hogan, Thomas P.; Murphy, Gavin – Applied Measurement in Education, 2007

We determined the recommendations for preparing and scoring constructed-response (CR) test items in 25 sources (textbooks and chapters) on educational and psychological measurement. The project was similar to Haladyna's (2004) analysis for multiple-choice items. We identified 12 recommendations for preparing CR items given by multiple sources,…

Descriptors: Test Items, Scoring, Test Construction, Educational Indicators

Peer reviewed

Direct link

Webb, Norman L. – Applied Measurement in Education, 2007

A process for judging the alignment between curriculum standards and assessments developed by the author is presented. This process produces information on the relationship of standards and assessments on four alignment criteria: Categorical Concurrence, Depth of Knowledge Consistency, Range of Knowledge Correspondence, and Balance of…

Descriptors: Educational Assessment, Academic Standards, Item Analysis, Interrater Reliability

Test Reliability and Homogeneity from the Perspective of the Ordinal Test Theory.

Peer reviewed

Krus, David J.; Blackman, Harold S. – Applied Measurement in Education, 1988

Test homogeneity and internal consistency reliability indices were developed on the basis of theoretical considerations of properties of hierarchical structures of data matrices. This reconceptualization, in terms of ordinal test theory, has potential for explication of the mutual relationship of test reliability and homogeneity. (TJH)

Descriptors: Equations (Mathematics), Statistics, Test Reliability, Test Theory

The Sampling Theory for the Intraclass Reliability Coefficient.

Peer reviewed

Feldt, Leonard S. – Applied Measurement in Education, 1990

Sampling theory for the intraclass reliability coefficient, a Spearman-Brown extrapolation of alpha to a single measurement for each examinee, is less recognized and less cited than that of coefficient alpha. Techniques for constructing confidence intervals and testing hypotheses for the intraclass coefficient are presented. (SLD)

Descriptors: Hypothesis Testing, Measurement Techniques, Reliability, Sampling

Prophesying the Reliability of Cognitively Complex Assessments.

Peer reviewed

Nichols, Paul; Kuehl, Barbara Jean – Applied Measurement in Education, 1999

An approach is presented that can predict internal consistency of cognitively complex assessments on two dimensions, those of adding tasks with similar or different solution strategies and adding test takers with different solution strategies. Data from the 1992 National Assessment of Educational Progress mathematics assessment are used to…

Descriptors: Cognitive Tests, Mathematics Tests, Prediction, Test Reliability

The Effects of Nonnormality and Number of Response Categories on Reliability.

Peer reviewed

Bandalos, Deborah L.; Enders, Craig K. – Applied Measurement in Education, 1996

Computer simulation indicated that reliability increased with the degree of similarity between underlying and observed distributions when the observed categorical distribution was deliberately constructed to match the shape of the underlying distribution of the trait being measured. Reliability also increased with correlation among variables and…

Descriptors: Computer Simulation, Correlation, Likert Scales, Reliability

Analytic versus Holistic Scoring of Science Performance Tasks.

Peer reviewed

Klein, Stephen P.; Stecher, Brian M.; Shavelson, Richard J.; McCaffrey, Daniel; Ormseth, Tor; Bell, Robert M.; Comfort, Kathy; Othman, Abdul R. – Applied Measurement in Education, 1998

Two studies involving 368 elementary and high school students and 29 readers were conducted to investigate reader consistency, score reliability, and reader time requirements of three hands-on science performance tasks. Holistic scores were as reliable as analytic scores, and there was a high correlation between them after they were disattenuated…

Descriptors: Elementary School Students, Elementary Secondary Education, Hands on Science, High School Students

Alternative Approaches to Standard Setting for Licensing and Certification Examinations.

Peer reviewed

Chinn, Roberta N.; Hertz, Norman R. – Applied Measurement in Education, 2002

Compared two Angoff standard-setting methods, percentage, and yes-no, in the work of four groups of judges (n=24) given behavioral descriptors or incidents to use in making ratings. Results indicate that passing scores based on percentage estimates were stable from initial to final ratings, but those based on dichotomous (yes-no) ratings had…

Descriptors: Certification, Judges, Licensing Examinations (Professions), Reliability

The Reliability and Validity of Weighted Composite Scores

Peer reviewed

Direct link

Kane, Michael; Case, Susan M. – Applied Measurement in Education, 2004

The scores on 2 distinct tests (e.g., essay and objective) are often combined to create a composite score, which is used to make decisions. The validity of the observed composite can sometimes be evaluated relative to an external criterion. However, in cases where no criterion is available, the observed composite has generally been evaluated in…

Descriptors: Validity, Weighted Scores, Reliability, Student Evaluation

« Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8

Feldt, Leonard S.	7
Hambleton, Ronald K.	3
Johnson, Robert L.	3
Qualls, Audrey L.	3
Wise, Steven L.	3
Yen, Wendy M.	3
Bandalos, Deborah L.	2
Brennan, Robert L.	2
Enders, Craig K.	2
Ercikan, Kadriye	2
Fitzpatrick, Anne R.	2
Frisbie, David A.	2
Gao, Xiaohong	2
Kane, Michael	2
Klein, Stephen P.	2
Lane, Suzanne	2
Linn, Robert L.	2
Mehrens, William A.	2
Penny, Jim	2
Shavelson, Richard J.	2
Stone, Clement A.	2
Thissen, David	2
Wainer, Howard	2
Webb, Noreen M.	2
More ▼

Reliability	50
Test Reliability	38
Interrater Reliability	28
Scores	28
Test Items	25
Scoring	21
Test Construction	21
Item Response Theory	20
Validity	20
Error of Measurement	15
Correlation	14
Educational Assessment	14
Performance Based Assessment	14
Test Validity	14
Generalizability Theory	13
Mathematics Tests	13
Evaluation Methods	12
Multiple Choice Tests	11
Standard Setting (Scoring)	11
Comparative Analysis	10
Elementary Secondary Education	9
Evaluators	9
Item Analysis	9
Licensing Examinations…	9
Computation	8
More ▼