Publication Date
In 2025 | 0 |
Since 2024 | 3 |
Since 2021 (last 5 years) | 14 |
Since 2016 (last 10 years) | 51 |
Since 2006 (last 20 years) | 125 |
Descriptor
Test Reliability | 150 |
Test Validity | 93 |
Reliability | 83 |
Test Construction | 67 |
Scoring | 47 |
Tables (Data) | 45 |
Scores | 40 |
Academic Achievement | 38 |
Reading Tests | 38 |
Elementary Secondary Education | 35 |
Statistical Analysis | 34 |
More ▼ |
Source
Author
Alonzo, Julie | 26 |
Tindal, Gerald | 24 |
Lai, Cheng-Fei | 16 |
Anderson, Daniel | 14 |
Park, Bitnara Jasmine | 13 |
Irvin, P. Shawn | 8 |
Nese, Joseph F. T. | 7 |
Petscher, Yaacov | 5 |
Gill, Brian | 4 |
Saez, Leilani | 4 |
Benton, Stephen L. | 3 |
More ▼ |
Publication Type
Numerical/Quantitative Data | 249 |
Reports - Research | 126 |
Reports - Evaluative | 68 |
Reports - Descriptive | 39 |
Tests/Questionnaires | 27 |
Speeches/Meeting Papers | 22 |
Journal Articles | 17 |
Guides - Non-Classroom | 10 |
Collected Works - General | 4 |
Guides - General | 3 |
Books | 1 |
More ▼ |
Education Level
Location
Florida | 10 |
New York | 8 |
Illinois | 7 |
United States | 7 |
Nebraska | 6 |
California | 5 |
Pennsylvania | 5 |
Maryland | 4 |
Massachusetts | 4 |
North Carolina | 4 |
Texas | 4 |
More ▼ |
Laws, Policies, & Programs
American Recovery and… | 6 |
Race to the Top | 6 |
Individuals with Disabilities… | 1 |
No Child Left Behind Act 2001 | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Meets WWC Standards without Reservations | 1 |
Meets WWC Standards with or without Reservations | 1 |
Angoff, William H. – 1989
This study was undertaken to test the hypothesis that items of the Test of English as a Foreign Language (TOEFL) containing reference to American people, places, customs, etc., tend to favor examinees who have spent some time living in the United States. Two samples of examinees were drawn from the March 1987 TOEFL administration, one tested in…
Descriptors: Context Effect, English (Second Language), Evaluators, Foreign Nationals
Saez, Leilani; Park, Bitnara; Nese, Joseph F. T.; Jamgochian, Elisa; Lai, Cheng-Fei; Anderson, Daniel; Kamata, Akihito; Alonzo, Julie; Tindal, Gerald – Behavioral Research and Teaching, 2010
In this series of studies, we investigated the technical adequacy of three curriculum-based measures used as benchmarks and for monitoring progress in three critical reading- related skills: fluency, reading comprehension, and vocabulary. In particular, we examined the following easyCBM measurement across grades 3-7 at fall, winter, and spring…
Descriptors: Elementary School Students, Middle School Students, Vocabulary, Reading Comprehension

Wood, Terry M.; Safrit, Margaret J. – Research Quarterly for Exercise and Sport, 1984
A proposed model for estimating psychomotor test battery reliability, based upon canonical correlation analysis, is described. (Author/JMK)
Descriptors: Evaluation Criteria, Multivariate Analysis, Physical Education, Psychomotor Skills
Griph, Gerald W. – New Mexico Public Education Department, 2006
The purpose of the NMSBA technical report is to provide users and other interested parties with a general overview of and technical characteristics of the 2006 NMSBA. The 2006 technical report contains the following information: (1) Test development; (2) Scoring procedures; (3) Calibration, scaling, and equating procedures; (4) Standard setting;…
Descriptors: Interrater Reliability, Standard Setting, Measures (Individuals), Scoring
Hoffman, R. Gene; Wise, Lauress L. – 2000
Classical test theory is based on the concept of a true score for each examinee, defined as the expected or average score across an infinite number of repeated parallel tests. In most cases, there is only a score from a single administration of the test in question. The difference between this single observed score and the underlying true score is…
Descriptors: Achievement, Classification, Observation, Probability
Yiu, Edwin M.-L.; Ng, Chi-Yan – Clinical Linguistics and Phonetics, 2004
One of the factors that affects the reliability of perceptual voice evaluation is the rating scale. Equal-appearing interval (EAI) and visual analogue (VA) scales are the two most common scales used and have attracted much attention in recent studies of perceptual voice evaluation. Available findings are contradictory, with one study finding the…
Descriptors: Test Reliability, Measurement Techniques, Rating Scales, Phonetics
Lee, Guemin; Frisbie, David A. – 1997
Previous studies have indicated that the reliability of test scores composed of testlets might be overestimated by conventional item-based reliability estimation methods (R. Thorndike, 1953; A. Anastasi, 1988; S. Sireci, D. Thissen, and H. Wainer, 1991; H. Wainer and D. Thissen, 1996). This study used generalizability theory to investigate the…
Descriptors: Estimation (Mathematics), Generalizability Theory, Reliability, Scores
Subkoviak, Michael J. – 1985
Current methods of obtaining reliability coefficients for mastery tests are laborious from a practitioner's perspective. Some methods require two test administrations; while others require access to computer facilities and/or advanced measurement and statistical procedures. This report provides tables from which practitioners can read such…
Descriptors: Estimation (Mathematics), Mastery Tests, Statistical Studies, Tables (Data)
Capie, William; And Others – 1979
A study investigating the degree of agreement between teacher evaluators using a standard set of evaluation criteria and evaluation methods examines the test reliability of the Classroom Procedures Assessment Instrument, the Interpersonal Skills Assessment Instrument, and the Teaching Plans and Materials Assessment Instrument. Two statistical…
Descriptors: Data Collection, Evaluation Criteria, Evaluation Methods, Teacher Evaluation
Hendrickson, Amy B. – 2001
The purpose of the study was to compare reliability estimates for a test composed of stimulus-dependent testlets as derived from item scores, testlet scores, and under the univariate generalizability theory and multivariate generalizability theory designs, as well as to determine the influence of the number of testlets and the number of items per…
Descriptors: Comparative Analysis, Reliability, Scores, Standardized Tests
Reese, Lynda; McKinley, Robert – 1993
In item calibration using LOGIST (M. Wingersky, R. Patrick, and F. Lord, 1987), when the program determines that it cannot accurately estimate the c-parameter for a particular item due to insufficient information at the lower levels of ability, an estimate of the c-parameters, called COMC, is obtained by combining all such items. The purpose of…
Descriptors: College Entrance Examinations, Estimation (Mathematics), Law Schools, Law Students
Hintze, John M.; Matthews, William J. – School Psychology Review, 2004
This study examined the generalizability of systematic direct observation across setting and time. Participants included 14 students from an intact inclusionary fifth grade classroom. On-task/off-task behavior was directly observed using momentary time-sampling recording, twice a day, for 10 school days. Using Generalizability (G) theory, results…
Descriptors: Grade 5, Psychometrics, Classroom Observation Techniques, Interrater Reliability
Colby, Anne; And Others – 1979
Findings of this 20-year longitudinal study support Kohlberg's theoretical predictions of invariant structural consistency of subjects' responses to hypothetical moral dilemmas. Subjects were 58 American males chosen in 1955 according to age, social class and sociometric status. They were 10, 13, or 16 years of age at Time 1. Half of each age…
Descriptors: Developmental Stages, Longitudinal Studies, Moral Development, Reliability
Gearhart, Maryl; Novak, John R.; Herman, Joan L. – 1994
Technical questions regarding the reliability and validity of large-scale portfolio assessment were studied which focused on: (1) whether raters can score collections of writing reliably with rubrics designed for single samples; (2) whether ratings derived from different frameworks differ in their capacities to support technically sound…
Descriptors: Educational Assessment, Elementary Education, Elementary School Students, Essay Tests
North Carolina State Dept. of Public Instruction, Raleigh. Div. of Accountability/Testing. – 2001
During 1999-2000 school year, the North Carolina Alternate Assessment Portfolio was administered to eligible students with serious cognitive deficits statewide as a pilot program. This report provides state, regional, and local education agency results of that pilot program. The purpose of the pilot was to review the feasibility, validity, and…
Descriptors: Academic Achievement, American Indians, Cultural Differences, Elementary Secondary Education