ERIC - Search Results

Publication Date

In 2025	0
Since 2024	2
Since 2021 (last 5 years)	7
Since 2016 (last 10 years)	23
Since 2006 (last 20 years)	53

Source

Applied Measurement in…

111

Publication Type

Journal Articles	111
Reports - Research	68
Reports - Evaluative	39
Reports - Descriptive	5
Speeches/Meeting Papers	5
Information Analyses	1

Education Level

Higher Education	7
Grade 8	6
Elementary Education	5
Elementary Secondary Education	5
Grade 5	5
Grade 4	4
High Schools	4
Middle Schools	4
Postsecondary Education	4
Secondary Education	4
Grade 3	3
Early Childhood Education	2
Grade 10	2
Grade 7	2
Intermediate Grades	2
Junior High Schools	2
Primary Education	2
Grade 1	1
Grade 12	1
Grade 2	1
Grade 6	1
More ▼

Audience

Location

California	3
Canada	2
Arizona	1
Australia	1
California (Los Angeles)	1
Germany	1
Hawaii	1
Idaho	1
Indiana	1
Israel	1
Louisiana	1
Netherlands	1
North Carolina	1
Oman	1
Tennessee	1
United Kingdom	1
United States	1
Vermont	1
More ▼

Laws, Policies, & Programs

No Child Left Behind Act 2001

Assessments and Surveys

National Assessment of…	3
SAT (College Admission Test)	2
Advanced Placement…	1
Iowa Tests of Basic Skills	1
National Teacher Examinations	1
Program for International…	1
Stanford Achievement Tests	1
Texas Assessment of Academic…	1
United States Medical…	1

What Works Clearinghouse Rating

Applied Measurement in Education X

Showing 61 to 75 of 111 results Save | Export

Measurement Issues in the Alignment of Standards and Assessments: A Case Study

Peer reviewed

Direct link

Herman, Joan L.; Webb, Noreen M.; Zuniga, Stephen A. – Applied Measurement in Education, 2007

This study examined the impact of rater agreement on decisions concerning the alignment between the "Golden State Examination in High School Mathematics" (California Department of Education, 2001a) and the University of California (UC) "Statement on Competencies in Mathematics Expected of Entering College Students" (Academic…

Descriptors: Educational Assessment, Academic Standards, Item Analysis, Interrater Reliability

Estimating the Internal Consistency Reliability of Tests Composed of Testlets Varying in Length.

Peer reviewed

Feldt, Leonard S. – Applied Measurement in Education, 2002

Considers the degree of bias in testlet-based alpha (internal consistency reliability) through hypothetical examples and real test data from four tests of the Iowa Tests of Basic Skills. Presents a simple formula for computing a testlet-based congeneric coefficient. (SLD)

Descriptors: Estimation (Mathematics), Reliability, Statistical Bias, Test Format

Estimating Reliability under a Generalizability Theory Model for Test Scores Composed of Testlets.

Peer reviewed

Lee, Guemin; Frisbie, David A. – Applied Measurement in Education, 1999

Studied the appropriateness and implications of using a generalizability theory approach to estimating the reliability of scores from tests composed of testlets. Analyses of data from two national standardization samples suggest that manipulating the number of passages is a more productive way to obtain efficient measurement than manipulating the…

Descriptors: Generalizability Theory, Models, National Surveys, Reliability

Approximating Scale Score Standard Error of Measurement from the Raw Score Standard Error.

Peer reviewed

Feldt, Leonard S.; Qualls, Audrey L. – Applied Measurement in Education, 1998

Two relatively simple methods for estimating the condition standard error of measurement (SEM) for nonlinearly derived score scales are proposed. Applications indicate that these two procedures produce fairly consistent estimates that tend to peak near the high end of the scale and reach a minimum in the middle of the raw score scale. (SLD)

Descriptors: Error of Measurement, Estimation (Mathematics), Raw Scores, Reliability

Quality Control in the Development and Use of Performance Assessments.

Peer reviewed

Dunbar, Stephen B.; And Others – Applied Measurement in Education, 1991

Issues pertaining to the quality of performance assessments, including reliability and validity, are discussed. The relatively limited generalizability of performance across tasks is indicative of the care needed to evaluate performance assessments. Quality control is an empirical matter when measurement is intended to inform public policy. (SLD)

Descriptors: Educational Assessment, Generalization, Interrater Reliability, Measurement Techniques

Can Validity Rise When Reliability Declines?

Peer reviewed

Feldt, Leonard S. – Applied Measurement in Education, 1997

It has often been asserted that the reliability of a measure places an upper limit on its validity. This article demonstrates in theory that validity can rise when reliability declines, even when validity evidence is a correlation with an acceptable criterion. Whether empirical examples can actually be found is an open question. (SLD)

Descriptors: Correlation, Criteria, Reliability, Test Construction

The Effects of Heterogeneous Item Distributions on Reliability.

Peer reviewed

Enders, Craig K.; Bandalos, Deborah L. – Applied Measurement in Education, 1999

Examined the degree to which coefficient alpha is affected by including items with different distribution shapes within a unidimensional scale. Computer simulation results indicate that reliability does not increase dramatically as a result of using differentially shaped items within a scale. Discusses implications for test construction. (SLD)

Descriptors: Computer Simulation, Reliability, Scaling, Statistical Distributions

Higher Validity in the Face of Lower Reliability: Another Look.

Peer reviewed

Frary, Robert B. – Applied Measurement in Education, 2000

Characterizes the circumstances under which validity changes may occur as a result of the deletion of a predictor test segment. Equations show that, for a positive outcome, one should seek a relatively large correlation between the scores from the deleted segment and the remaining items, with a relatively low correlation between scores from the…

Descriptors: Equations (Mathematics), Prediction, Reliability, Scores

Reliability Estimation When a Test Is Split into Two Parts of Unknown Effective Length.

Peer reviewed

Feldt, Leonard S. – Applied Measurement in Education, 2002

Considers the situation in which content or administrative considerations limit the way in which a test can be partitioned to estimate the internal consistency reliability of the total test score. Demonstrates that a single-valued estimate of the total score reliability is possible only if an assumption is made about the comparative size of the…

Descriptors: Error of Measurement, Reliability, Scores, Test Construction

Variability in Reliability Coefficients and the Standard Error of Measurement from School District to District.

Peer reviewed

Feldt, Leonard S.; Qualls, Audrey L. – Applied Measurement in Education, 1999

Examined the stability of the standard error of measurement and the relationship between the reliability coefficient and the variance of both true scores and error scores for 170 school districts in a state. As expected, reliability coefficients varied as a function of group variability, but the variation in split-half coefficients from school to…

Descriptors: Elementary Secondary Education, Error of Measurement, Reliability, School Districts

Defending a State Graduation Test: "GI Forum v. Texas Education Agency." Measurement Perspectives from an External Evaluator.

Peer reviewed

Mehrens, William A. – Applied Measurement in Education, 2000

Presents conclusions of an independent measurement expert that the Texas Assessment of Academic Skills (TAAS) was constructed according to acceptable professional standards and tests curricular material considered by the Texas Board of Education important for graduates to have mastered. Also supports the validity and reliability of the TAAS and…

Descriptors: Curriculum, Psychometrics, Reliability, Standards

Performance of SIBTEST When the Percentage of DIF Items Is Large

Peer reviewed

Direct link

Gierl, Mark J.; Gotzmann, Andrea; Boughton, Keith A. – Applied Measurement in Education, 2004

Differential item functioning (DIF) analyses are used to identify items that operate differently between two groups, after controlling for ability. The Simultaneous Item Bias Test (SIBTEST) is a popular DIF detection method that matches examinees on a true score estimate of ability. However in some testing situations, like test translation and…

Descriptors: True Scores, Simulation, Test Bias, Student Evaluation

Estimating the Reliability of a Test Containing Multiple Item Formats.

Peer reviewed

Qualls, Audrey L. – Applied Measurement in Education, 1995

Classically parallel, tau-equivalently parallel, and congenerically parallel models representing various degrees of part-test parallelism and their appropriateness for tests composed of multiple item formats are discussed. An appropriate reliability estimate for a test with multiple item formats is presented and illustrated. (SLD)

Descriptors: Achievement Tests, Estimation (Mathematics), Measurement Techniques, Test Format

Variability of Estimated Variance Components and Related Statistics in a Performance Assessment.

Peer reviewed

Gao, Xiaohong; Brennan, Robert L. – Applied Measurement in Education, 2001

Studied the sampling variability of estimated variance components using data collected over several years for a listening and writing performance assessment and evaluated the stability of estimated measurement precision. Results indicate that the estimated variance components varied from one year to another and suggest that the measurement…

Descriptors: Estimation (Mathematics), Generalizability Theory, Listening Comprehension Tests, Performance Based Assessment

The Dependability and Interchangeability of Assessment Methods in Science.

Peer reviewed

Webb, Noreen M.; Schlackman, Jonah; Sugrue, Brenda – Applied Measurement in Education, 2000

Studied the importance of occasion as a source of error variance in estimates of the dependability (generalizability) of science assessment scores and the Interchangeability of science test formats. Junior high school students (n=662) took hands-on and paper-and-pencil tests twice. Results show that recognizing occasion as a facet of error alters…

Descriptors: Generalization, Junior High School Students, Junior High Schools, Reliability

« Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8

Reliability	50
Test Reliability	38
Interrater Reliability	28
Scores	28
Test Items	25
Scoring	21
Test Construction	21
Item Response Theory	20
Validity	20
Error of Measurement	15
Correlation	14
Educational Assessment	14
Performance Based Assessment	14
Test Validity	14
Generalizability Theory	13
Mathematics Tests	13
Evaluation Methods	12
Multiple Choice Tests	11
Standard Setting (Scoring)	11
Comparative Analysis	10
Elementary Secondary Education	9
Evaluators	9
Item Analysis	9
Licensing Examinations…	9
Computation	8
More ▼

Feldt, Leonard S.	7
Hambleton, Ronald K.	3
Johnson, Robert L.	3
Qualls, Audrey L.	3
Wise, Steven L.	3
Yen, Wendy M.	3
Bandalos, Deborah L.	2
Brennan, Robert L.	2
Enders, Craig K.	2
Ercikan, Kadriye	2
Fitzpatrick, Anne R.	2
Frisbie, David A.	2
Gao, Xiaohong	2
Kane, Michael	2
Klein, Stephen P.	2
Lane, Suzanne	2
Linn, Robert L.	2
Mehrens, William A.	2
Penny, Jim	2
Shavelson, Richard J.	2
Stone, Clement A.	2
Thissen, David	2
Wainer, Howard	2
Webb, Noreen M.	2
More ▼