Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 2 |
Since 2016 (last 10 years) | 3 |
Since 2006 (last 20 years) | 5 |
Descriptor
Interrater Reliability | 29 |
Test Use | 29 |
Test Validity | 14 |
Scoring | 13 |
Test Reliability | 11 |
Test Construction | 8 |
Educational Assessment | 7 |
Performance Based Assessment | 5 |
Test Format | 5 |
Elementary School Students | 4 |
Evaluators | 4 |
More ▼ |
Source
Author
Meyer, Gregory J. | 2 |
Alderson, J. Charles | 1 |
Aloia, Mark S. | 1 |
Berning, Lisa C. | 1 |
Boivin, Micheline | 1 |
Brennan, Robert L. | 1 |
Brydges, Ryan | 1 |
Conroy, Maureen A. | 1 |
Cook, David A. | 1 |
Crawford, Rebecca | 1 |
Crehan, Kevin D. | 1 |
More ▼ |
Publication Type
Education Level
Early Childhood Education | 2 |
Elementary Education | 1 |
Grade 3 | 1 |
Grade 4 | 1 |
Grade 5 | 1 |
Grade 6 | 1 |
Grade 7 | 1 |
Grade 8 | 1 |
High Schools | 1 |
Intermediate Grades | 1 |
Junior High Schools | 1 |
More ▼ |
Audience
Administrators | 1 |
Practitioners | 1 |
Teachers | 1 |
Location
Australia | 1 |
Kansas | 1 |
Kentucky | 1 |
Nevada | 1 |
New York | 1 |
Ohio | 1 |
Oregon | 1 |
Pennsylvania | 1 |
Tennessee | 1 |
Texas | 1 |
Virginia | 1 |
More ▼ |
Laws, Policies, & Programs
Individuals with Disabilities… | 1 |
No Child Left Behind Act 2001 | 1 |
Assessments and Surveys
Rorschach Test | 3 |
Early Childhood Environment… | 1 |
Work Keys (ACT) | 1 |
What Works Clearinghouse Rating
Isbell, Daniel R.; Son, Young-A – Studies in Second Language Acquisition, 2022
Elicited Imitation Tests (EITs) are commonly used in second language acquisition (SLA)/bilingualism research contexts to assess the general oral proficiency of study participants. While previous studies have provided valuable EIT construct-related validity evidence, some key gaps remain. This study uses an integrative data analysis to further…
Descriptors: Bilingualism, Imitation, Language Tests, Second Language Learning
Grisham, Jennifer; Waddell, Misti; Crawford, Rebecca; Toland, Michael – Journal of Early Intervention, 2021
The purpose of this article is to provide evidence of the technical adequacy of the Assessment, Evaluation, and Programming System--Third Edition (AEPS-3). The AEPS has long been identified as one of the most psychometrically sound early childhood curriculum-based assessments. In this article, results of three studies of technical adequacy are…
Descriptors: Infants, Young Children, Curriculum Based Assessment, Psychometrics
International Journal of Testing, 2019
These guidelines describe considerations relevant to the assessment of test takers in or across countries or regions that are linguistically or culturally diverse. The guidelines were developed by a committee of experts to help inform test developers, psychometricians, test users, and test administrators about fairness issues in support of the…
Descriptors: Test Bias, Student Diversity, Cultural Differences, Language Usage
Hatala, Rose; Cook, David A.; Brydges, Ryan; Hawkins, Richard – Advances in Health Sciences Education, 2015
In order to construct and evaluate the validity argument for the Objective Structured Assessment of Technical Skills (OSATS), based on Kane's framework, we conducted a systematic review. We searched MEDLINE, EMBASE, CINAHL, PsycINFO, ERIC, Web of Science, Scopus, and selected reference lists through February 2013. Working in duplicate, we selected…
Descriptors: Measures (Individuals), Test Validity, Surgery, Skills
New York State Education Department, 2014
This technical report provides an overview of the New York State Alternate Assessment (NYSAA), including a description of the purpose of the NYSAA, the processes utilized to develop and implement the NYSAA program, and Stakeholder involvement in those processes. The purpose of this report is to document the technical aspects of the 2013-14 NYSAA.…
Descriptors: Alternative Assessment, Educational Assessment, State Departments of Education, Student Evaluation

Meier, Augustine; Boivin, Micheline – Journal of Consulting and Clinical Psychology, 1986
The Client Verbal Response Category System classifies client responses into Temporal, Directional and Experiential categories. The categories with their subcategories are defined, interjudge reliability data is presented, and the instrument's utility in psychotherapy process research is demonstrated. Initial results indicate that the instrument is…
Descriptors: Client Characteristics (Human Services), Interrater Reliability, Psychotherapy, Research Tools

Conroy, Maureen A.; And Others – Education and Training in Mental Retardation and Developmental Disabilities, 1996
This study assessed the intra-rater and inter-rater reliability of the Motivation Assessment Scale as used with 20 adults with mental retardation, expanding the results of previous research by evaluating across additional time and administrations. Results from 19 raters indicated variable moderate-to-low intra-rater and inter-rater reliability.…
Descriptors: Adults, Behavior Problems, Interrater Reliability, Measures (Individuals)
Reckase, Mark D. – 1997
This paper argues that special procedures for constructing assessment tools containing performance assessment tasks are unnecessary and that current test methodology can easily be generalized to complex performance assessment tasks without destroying the desirable characteristics of those tasks. Reasonable statistical requirements for sound…
Descriptors: Educational Assessment, Generalizability Theory, High Stakes Tests, Interrater Reliability

Meyer, Gregory J. – Psychological Assessment, 1997
In reply to criticism of the Rorschach Comprehensive System (CS) by J. Wood, M. Nezworski, and W. Stejskal (1996), this article presents a meta-analysis of published data indicating that the CS has excellent chance-corrected interrater reliability. It is noted that the erroneous assumptions of Wood et al. make their assertions about validity…
Descriptors: Interrater Reliability, Meta Analysis, Test Use, Test Validity

Wood, James M.; Nezworski, M. Teresa; Stejskal, William J. – Psychological Assessment, 1997
G. Meyer (1997) attempts to refute the present authors' criticisms of the interrater reliability of the Rorschach Comprehensive System (CS) but misrepresents their position and offers a flawed meta-analysis in support of his own. Rorschach proponents need to undertake high-quality replicated studies of CS reliability and validity. (SLD)
Descriptors: Interrater Reliability, Meta Analysis, Test Use, Test Validity

Meyer, Gregory J. – Psychological Assessment, 1997
Replies to Wood et al. and documents limitations of their conclusions about the Rorschach Comprehensive System (CS), supporting Meyer's own meta-analysis, which finds adequate interrater reliability for the CS. (SLD)
Descriptors: Interrater Reliability, Meta Analysis, Test Use, Test Validity

Smith, Richard Merrill – Academic Medicine, 1993
A University of Hawaii study compared objective and subjective assessments of the three-step triple jump examination which tests medical students' clinical problem-solving processes. Subjects were 58 first-year students. Results found the subjective assessments were more consistent across problems of varying difficulty level than were objective…
Descriptors: Case Studies, Difficulty Level, Higher Education, Interrater Reliability

Berning, Lisa C.; Weed, Nathan C.; Aloia, Mark S. – Assessment, 1998
To examine the interrater reliability of the Ruff Figural Fluency Test (RFFT) (R. Ruff, 1988), 124 college students completed the measure and scored RFFT test protocols. Results indicated substantial interscorer reliability on the RFFT, particularly for number of unique designs. Reliability was lower for scoring perseverative errors and error…
Descriptors: College Students, Higher Education, Interrater Reliability, Scoring
Alderson, J. Charles; And Others – 1995
The guide is intended for teachers who must construct language tests and for other professionals who may need to construct, evaluate, or use the results of language tests. Most examples are drawn from the field of English-as-a-Second-Language instruction in the United Kingdom, but the principles and practices described may be applied to the…
Descriptors: Educational Trends, English (Second Language), Interrater Reliability, Language Tests

Dunbar, Stephen B.; And Others – Applied Measurement in Education, 1991
Issues pertaining to the quality of performance assessments, including reliability and validity, are discussed. The relatively limited generalizability of performance across tasks is indicative of the care needed to evaluate performance assessments. Quality control is an empirical matter when measurement is intended to inform public policy. (SLD)
Descriptors: Educational Assessment, Generalization, Interrater Reliability, Measurement Techniques
Previous Page | Next Page ยป
Pages: 1 | 2