Publication Date
In 2025 | 0 |
Since 2024 | 2 |
Since 2021 (last 5 years) | 7 |
Since 2016 (last 10 years) | 23 |
Since 2006 (last 20 years) | 53 |
Descriptor
Reliability | 50 |
Test Reliability | 38 |
Interrater Reliability | 28 |
Scores | 28 |
Test Items | 25 |
Scoring | 21 |
Test Construction | 21 |
Item Response Theory | 20 |
Validity | 20 |
Error of Measurement | 15 |
Correlation | 14 |
More ▼ |
Source
Applied Measurement in… | 111 |
Author
Publication Type
Journal Articles | 111 |
Reports - Research | 68 |
Reports - Evaluative | 39 |
Reports - Descriptive | 5 |
Speeches/Meeting Papers | 5 |
Information Analyses | 1 |
Education Level
Higher Education | 7 |
Grade 8 | 6 |
Elementary Education | 5 |
Elementary Secondary Education | 5 |
Grade 5 | 5 |
Grade 4 | 4 |
High Schools | 4 |
Middle Schools | 4 |
Postsecondary Education | 4 |
Secondary Education | 4 |
Grade 3 | 3 |
More ▼ |
Audience
Location
California | 3 |
Canada | 2 |
Arizona | 1 |
Australia | 1 |
California (Los Angeles) | 1 |
Germany | 1 |
Hawaii | 1 |
Idaho | 1 |
Indiana | 1 |
Israel | 1 |
Louisiana | 1 |
More ▼ |
Laws, Policies, & Programs
No Child Left Behind Act 2001 | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Herman, Joan L.; Webb, Noreen M.; Zuniga, Stephen A. – Applied Measurement in Education, 2007
This study examined the impact of rater agreement on decisions concerning the alignment between the "Golden State Examination in High School Mathematics" (California Department of Education, 2001a) and the University of California (UC) "Statement on Competencies in Mathematics Expected of Entering College Students" (Academic…
Descriptors: Educational Assessment, Academic Standards, Item Analysis, Interrater Reliability

Feldt, Leonard S. – Applied Measurement in Education, 2002
Considers the degree of bias in testlet-based alpha (internal consistency reliability) through hypothetical examples and real test data from four tests of the Iowa Tests of Basic Skills. Presents a simple formula for computing a testlet-based congeneric coefficient. (SLD)
Descriptors: Estimation (Mathematics), Reliability, Statistical Bias, Test Format

Lee, Guemin; Frisbie, David A. – Applied Measurement in Education, 1999
Studied the appropriateness and implications of using a generalizability theory approach to estimating the reliability of scores from tests composed of testlets. Analyses of data from two national standardization samples suggest that manipulating the number of passages is a more productive way to obtain efficient measurement than manipulating the…
Descriptors: Generalizability Theory, Models, National Surveys, Reliability

Feldt, Leonard S.; Qualls, Audrey L. – Applied Measurement in Education, 1998
Two relatively simple methods for estimating the condition standard error of measurement (SEM) for nonlinearly derived score scales are proposed. Applications indicate that these two procedures produce fairly consistent estimates that tend to peak near the high end of the scale and reach a minimum in the middle of the raw score scale. (SLD)
Descriptors: Error of Measurement, Estimation (Mathematics), Raw Scores, Reliability

Dunbar, Stephen B.; And Others – Applied Measurement in Education, 1991
Issues pertaining to the quality of performance assessments, including reliability and validity, are discussed. The relatively limited generalizability of performance across tasks is indicative of the care needed to evaluate performance assessments. Quality control is an empirical matter when measurement is intended to inform public policy. (SLD)
Descriptors: Educational Assessment, Generalization, Interrater Reliability, Measurement Techniques

Feldt, Leonard S. – Applied Measurement in Education, 1997
It has often been asserted that the reliability of a measure places an upper limit on its validity. This article demonstrates in theory that validity can rise when reliability declines, even when validity evidence is a correlation with an acceptable criterion. Whether empirical examples can actually be found is an open question. (SLD)
Descriptors: Correlation, Criteria, Reliability, Test Construction

Enders, Craig K.; Bandalos, Deborah L. – Applied Measurement in Education, 1999
Examined the degree to which coefficient alpha is affected by including items with different distribution shapes within a unidimensional scale. Computer simulation results indicate that reliability does not increase dramatically as a result of using differentially shaped items within a scale. Discusses implications for test construction. (SLD)
Descriptors: Computer Simulation, Reliability, Scaling, Statistical Distributions

Frary, Robert B. – Applied Measurement in Education, 2000
Characterizes the circumstances under which validity changes may occur as a result of the deletion of a predictor test segment. Equations show that, for a positive outcome, one should seek a relatively large correlation between the scores from the deleted segment and the remaining items, with a relatively low correlation between scores from the…
Descriptors: Equations (Mathematics), Prediction, Reliability, Scores

Feldt, Leonard S. – Applied Measurement in Education, 2002
Considers the situation in which content or administrative considerations limit the way in which a test can be partitioned to estimate the internal consistency reliability of the total test score. Demonstrates that a single-valued estimate of the total score reliability is possible only if an assumption is made about the comparative size of the…
Descriptors: Error of Measurement, Reliability, Scores, Test Construction

Feldt, Leonard S.; Qualls, Audrey L. – Applied Measurement in Education, 1999
Examined the stability of the standard error of measurement and the relationship between the reliability coefficient and the variance of both true scores and error scores for 170 school districts in a state. As expected, reliability coefficients varied as a function of group variability, but the variation in split-half coefficients from school to…
Descriptors: Elementary Secondary Education, Error of Measurement, Reliability, School Districts

Mehrens, William A. – Applied Measurement in Education, 2000
Presents conclusions of an independent measurement expert that the Texas Assessment of Academic Skills (TAAS) was constructed according to acceptable professional standards and tests curricular material considered by the Texas Board of Education important for graduates to have mastered. Also supports the validity and reliability of the TAAS and…
Descriptors: Curriculum, Psychometrics, Reliability, Standards
Gierl, Mark J.; Gotzmann, Andrea; Boughton, Keith A. – Applied Measurement in Education, 2004
Differential item functioning (DIF) analyses are used to identify items that operate differently between two groups, after controlling for ability. The Simultaneous Item Bias Test (SIBTEST) is a popular DIF detection method that matches examinees on a true score estimate of ability. However in some testing situations, like test translation and…
Descriptors: True Scores, Simulation, Test Bias, Student Evaluation

Qualls, Audrey L. – Applied Measurement in Education, 1995
Classically parallel, tau-equivalently parallel, and congenerically parallel models representing various degrees of part-test parallelism and their appropriateness for tests composed of multiple item formats are discussed. An appropriate reliability estimate for a test with multiple item formats is presented and illustrated. (SLD)
Descriptors: Achievement Tests, Estimation (Mathematics), Measurement Techniques, Test Format

Gao, Xiaohong; Brennan, Robert L. – Applied Measurement in Education, 2001
Studied the sampling variability of estimated variance components using data collected over several years for a listening and writing performance assessment and evaluated the stability of estimated measurement precision. Results indicate that the estimated variance components varied from one year to another and suggest that the measurement…
Descriptors: Estimation (Mathematics), Generalizability Theory, Listening Comprehension Tests, Performance Based Assessment

Webb, Noreen M.; Schlackman, Jonah; Sugrue, Brenda – Applied Measurement in Education, 2000
Studied the importance of occasion as a source of error variance in estimates of the dependability (generalizability) of science assessment scores and the Interchangeability of science test formats. Junior high school students (n=662) took hands-on and paper-and-pencil tests twice. Results show that recognizing occasion as a facet of error alters…
Descriptors: Generalization, Junior High School Students, Junior High Schools, Reliability