Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 0 |
Since 2016 (last 10 years) | 1 |
Since 2006 (last 20 years) | 9 |
Descriptor
Reliability | 47 |
Testing Programs | 47 |
State Programs | 29 |
Validity | 26 |
Elementary Secondary Education | 21 |
Test Construction | 14 |
Academic Achievement | 11 |
Scoring | 10 |
Achievement Tests | 8 |
Educational Assessment | 8 |
Evaluation Methods | 7 |
More ▼ |
Source
Author
Tindal, Gerald | 3 |
Alonzo, Julie | 2 |
Anderson, Daniel | 2 |
Jamgochian, Elisa | 2 |
Lai, Cheng-Fei | 2 |
Nese, Joseph F. T. | 2 |
Saez, Leilani | 2 |
Almond, Patricia | 1 |
Atteberry, Allison | 1 |
August, Diane, Ed. | 1 |
Bene, Nancy | 1 |
More ▼ |
Publication Type
Education Level
Elementary Secondary Education | 5 |
Elementary Education | 3 |
Middle Schools | 2 |
Secondary Education | 2 |
Early Childhood Education | 1 |
Grade 2 | 1 |
Grade 3 | 1 |
Grade 4 | 1 |
Grade 5 | 1 |
Grade 6 | 1 |
Grade 7 | 1 |
More ▼ |
Audience
Administrators | 2 |
Practitioners | 2 |
Parents | 1 |
Teachers | 1 |
Location
United States | 3 |
Canada | 2 |
Kentucky | 2 |
New York | 2 |
Georgia | 1 |
Hawaii | 1 |
Louisiana | 1 |
Maine | 1 |
Maryland | 1 |
Massachusetts | 1 |
North Carolina | 1 |
More ▼ |
Laws, Policies, & Programs
Kentucky Education Reform Act… | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Atteberry, Allison; Mangan, Daniel – Educational Researcher, 2020
Papay (2011) noticed that teacher value-added measures (VAMs) from a statistical model using the most common pre/post testing timeframe--current-year spring relative to previous spring (SS)--are essentially unrelated to those same teachers' VAMs when instead using next-fall relative to current-fall (FF). This is concerning since this choice--made…
Descriptors: Correlation, Value Added Models, Pretests Posttests, Decision Making
Guo, Hongwen; Liu, Jinghua; Dorans, Neil; Feigenbaum, Miriam – ETS Research Report Series, 2011
Maintaining score stability is crucial for an ongoing testing program that administers several tests per year over many years. One way to stall the drift of the score scale is to use an equating design with multiple links. In this study, we use the operational and experimental SAT® data collected from 44 administrations to investigate the effect…
Descriptors: Equated Scores, College Entrance Examinations, Reliability, Testing Programs
McBee, Matthew T.; Peters, Scott J.; Waterman, Craig – Gifted Child Quarterly, 2014
Best practice in gifted and talented identification procedures involves making decisions on the basis of multiple measures. However, very little research has investigated the impact of different methods of combining multiple measures. This article examines the consequences of the conjunctive ("and"), disjunctive/complementary…
Descriptors: Best Practices, Ability Identification, Academically Gifted, Correlation
Qi, Sen; Mitchell, Ross E. – Journal of Deaf Studies and Deaf Education, 2012
The first large-scale, nationwide academic achievement testing program using Stanford Achievement Test (Stanford) for deaf and hard-of-hearing children in the United States started in 1969. Over the past three decades, the Stanford has served as a benchmark in the field of deaf education for assessing student academic achievement. However, the…
Descriptors: Testing Programs, Educational Testing, Deafness, Academic Achievement
Zhu, Pei; Somers, Marie-Andree; Wong, Edmond – Society for Research on Educational Effectiveness, 2010
For this project, the authors use data from four IES-sponsored randomized studies to examine some of the key issues identified in May et. al. (2009). The first set of questions focuses on issues related to using state tests: (1) Do studies meet the assumptions needed for combining impacts on state tests across grades and/or states?; (2) How…
Descriptors: Academic Achievement, Program Effectiveness, State Standards, Testing Programs
Lissitz, Robert W.; Wei, Hua – Educational Measurement: Issues and Practice, 2008
In this article we address the issue of consistency in standard setting in the context of an augmented state testing program. Information gained from the external NRT scores is used to help make an informed decision on the determination of cut scores on the state test. The consistency of cut scores on the CRT across grades is maintained by forcing…
Descriptors: Testing Programs, State Programs, Standard Setting, Reliability
Heckman, James J.; Humphries, John Eric; Mader, Nicholas S. – National Bureau of Economic Research, 2010
The General Educational Development (GED) credential is issued on the basis of an eight hour subject-based test. The test claims to establish equivalence between dropouts and traditional high school graduates, opening the door to college and positions in the labor market. In 2008 alone, almost 500,000 dropouts passed the test, amounting to 12% of…
Descriptors: Credentials, Testing Programs, Dropouts, Labor Market
Jamgochian, Elisa; Park, Bitnara Jasmine; Nese, Joseph F. T.; Lai, Cheng-Fei; Saez, Leilani; Anderson, Daniel; Alonzo, Julie; Tindal, Gerald – Behavioral Research and Teaching, 2010
In this technical report, we provide reliability and validity evidence for the easyCBM[R] Reading measures for grade 2 (word and passage reading fluency and multiple choice reading comprehension). Evidence for reliability includes internal consistency and item invariance. Evidence for validity includes concurrent, predictive, and construct…
Descriptors: Grade 2, Reading Comprehension, Testing Programs, Reading Fluency
Saez, Leilani; Park, Bitnara; Nese, Joseph F. T.; Jamgochian, Elisa; Lai, Cheng-Fei; Anderson, Daniel; Kamata, Akihito; Alonzo, Julie; Tindal, Gerald – Behavioral Research and Teaching, 2010
In this series of studies, we investigated the technical adequacy of three curriculum-based measures used as benchmarks and for monitoring progress in three critical reading- related skills: fluency, reading comprehension, and vocabulary. In particular, we examined the following easyCBM measurement across grades 3-7 at fall, winter, and spring…
Descriptors: Elementary School Students, Middle School Students, Vocabulary, Reading Comprehension

Hollenbeck, Keith; Tindal, Gerald; Almond, Patricia – Educational Assessment, 1999
Studied the amount of measurement error in a state's performance-based writing task as it relates to high-stakes decision reproducibility. Using 175 eighth-grade writing samples, the study finds moderate correlations between the two raters' scores, with significant differences for the rates for the handwritten, but not the typed, essays.(SLD)
Descriptors: Decision Making, Error of Measurement, Essay Tests, Grade 8
Koretz, Daniel; And Others – 1993
The 1992-93 school year was the second year of the implementation of the Vermont assessment program. Evaluation of the 1991-92 year yielded mixed results, with some evidence that the assessment program was having a strong impact on instruction, but other indications that the reliability of the portfolio scoring in both writing and mathematics was…
Descriptors: Educational Assessment, Elementary Secondary Education, Evaluation Methods, Evaluation Utilization

Mehrens, William A. – Applied Measurement in Education, 2000
Presents conclusions of an independent measurement expert that the Texas Assessment of Academic Skills (TAAS) was constructed according to acceptable professional standards and tests curricular material considered by the Texas Board of Education important for graduates to have mastered. Also supports the validity and reliability of the TAAS and…
Descriptors: Curriculum, Psychometrics, Reliability, Standards
Flowers, Claudia P.; Oshima, T. C. – 1994
This study was patterned after a previous study by Skaggs and Lissitz (1992) in which inconsistency of differential item functioning (DIF) was reported across test administrations. They suggested multidimensionality of test data as one possible reason for inconsistency. Therefore, in this study, DIF indices which were developed recently with a…
Descriptors: Ethnic Groups, Item Bias, Mathematics, Reliability
Crislip, Marian A.; Chin-Chance, Selvin – 2001
This paper discusses the use of two theories of item analysis and test construction, their strengths and weaknesses, and applications to the design of the Hawaii State Test of Essential Competencies (HSTEC). Traditional analyses of the data collected from the HSTEC field test were viewed from the perspectives of item difficulty levels and item…
Descriptors: Difficulty Level, Item Response Theory, Psychometrics, Reliability

Sicoly, Fiore – Applied Measurement in Education, 2002
Calculated year-1 to year-2 stability of assessment data from 21 states and 2 Canadian provinces. The median stability coefficient was 0.78 in mathematics and reading, and lower in writing. A stability coefficient of 0.80 is recommended as the standard for large-scale assessments of student performance. (SLD)
Descriptors: Educational Testing, Elementary Secondary Education, Foreign Countries, Mathematics