Publication Date
In 2025 | 1 |
Since 2024 | 2 |
Since 2021 (last 5 years) | 10 |
Since 2016 (last 10 years) | 17 |
Since 2006 (last 20 years) | 26 |
Descriptor
Goodness of Fit | 30 |
Interrater Reliability | 30 |
Correlation | 10 |
Psychometrics | 8 |
Evaluators | 7 |
Foreign Countries | 7 |
Scoring | 7 |
Student Evaluation | 7 |
Error of Measurement | 6 |
Item Response Theory | 6 |
Models | 6 |
More ▼ |
Source
Author
Wind, Stefanie A. | 3 |
Anna-Maria Fall | 2 |
Beula M. Magimairaj | 2 |
Greg Roberts | 2 |
Philip Capin | 2 |
Ronald B. Gillam | 2 |
Sandra L. Gillam | 2 |
Sharon Vaughn | 2 |
Stefanie A. Wind | 2 |
Yangmeng Xu | 2 |
Afzali, Katayoon | 1 |
More ▼ |
Publication Type
Reports - Research | 24 |
Journal Articles | 22 |
Speeches/Meeting Papers | 4 |
Dissertations/Theses -… | 2 |
Numerical/Quantitative Data | 2 |
Reports - Descriptive | 2 |
Reports - Evaluative | 2 |
Education Level
Audience
Researchers | 1 |
Location
New Mexico | 2 |
California (Berkeley) | 1 |
China | 1 |
Cyprus | 1 |
Finland | 1 |
Iran | 1 |
Israel | 1 |
Jordan | 1 |
Netherlands | 1 |
South Carolina | 1 |
Texas | 1 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Yangmeng Xu; Stefanie A. Wind – Educational Measurement: Issues and Practice, 2025
Double-scoring constructed-response items is a common but costly practice in mixed-format assessments. This study explored the impacts of Targeted Double-Scoring (TDS) and random double-scoring procedures on the quality of psychometric outcomes, including student achievement estimates, person fit, and student classifications under various…
Descriptors: Academic Achievement, Psychometrics, Scoring, Evaluation Methods
Siqi Huang – North American Chapter of the International Group for the Psychology of Mathematics Education, 2023
The goal of this paper is twofold. First, the paper clarifies and elaborates on an important theoretical construct called orientation with respect to understanding in mathematics, which denotes the degree to which students exhibit an inclination towards and demonstrate an earnest concern for understanding in mathematical learning. Second, the…
Descriptors: Mathematics Instruction, Teaching Methods, Problem Solving, Reliability
Beula M. Magimairaj; Philip Capin; Sandra L. Gillam; Sharon Vaughn; Greg Roberts; Anna-Maria Fall; Ronald B. Gillam – Grantee Submission, 2022
Purpose: Our aim was to evaluate the psychometric properties of the online administered format of the Test of Narrative Language--Second Edition (TNL-2; Gillam & Pearson, 2017), given the importance of assessing children's narrative ability and considerable absence of psychometric studies of spoken language assessments administered online.…
Descriptors: Computer Assisted Testing, Language Tests, Story Telling, Language Impairments
Rossin, Emily G.; Bergee, Martin J. – Journal of Research in Music Education, 2021
This is the sixth and culminating study in a series whose purpose has been to acquire a conceptual understanding of school band performance and to develop an assessment based on this understanding. With the present study, we cross-validated and applied a rating scale for school band performance. In the cross-validation phase, college students…
Descriptors: Music Education, Music Activities, Music, Performance
Stefanie A. Wind; Yangmeng Xu – Educational Assessment, 2024
We explored three approaches to resolving or re-scoring constructed-response items in mixed-format assessments: rater agreement, person fit, and targeted double scoring (TDS). We used a simulation study to consider how the three approaches impact the psychometric properties of student achievement estimates, with an emphasis on person fit. We found…
Descriptors: Interrater Reliability, Error of Measurement, Evaluation Methods, Examiners
Beula M. Magimairaj; Philip Capin; Sandra L. Gillam; Sharon Vaughn; Greg Roberts; Anna-Maria Fall; Ronald B. Gillam – Language, Speech, and Hearing Services in Schools, 2022
Purpose: Our aim was to evaluate the psychometric properties of the online administered format of the Test of Narrative Language--Second Edition (TNL-2; Gillam & Pearson, 2017), given the importance of assessing children's narrative ability and considerable absence of psychometric studies of spoken language assessments administered online.…
Descriptors: Computer Assisted Testing, Language Tests, Story Telling, Language Impairments
Wind, Stefanie A. – Measurement: Interdisciplinary Research and Perspectives, 2022
In many performance assessments, one or two raters from the complete rater pool scores each performance, resulting in a sparse rating design, where there are limited observations of each rater relative to the complete sample of students. Although sparse rating designs can be constructed to facilitate estimation of student achievement, the…
Descriptors: Evaluators, Bias, Identification, Performance Based Assessment
Walker, Grant M.; Basilakos, Alexandra; Fridriksson, Julius; Hickok, Gregory – Journal of Speech, Language, and Hearing Research, 2022
Purpose: Meaningful changes in picture naming responses may be obscured when measuring accuracy instead of quality. A statistic that incorporates information about the severity and nature of impairments may be more sensitive to the effects of treatment. Method: We analyzed data from repeated administrations of a naming test to 72 participants with…
Descriptors: Naming, Change, Aphasia, Severity (of Disability)
Wind, Stefanie A.; Walker, A. Adrienne – Educational Measurement: Issues and Practice, 2021
Many large-scale performance assessments include score resolution procedures for resolving discrepancies in rater judgments. The goal of score resolution is conceptually similar to person fit analyses: To identify students for whom observed scores may not accurately reflect their achievement. Previously, researchers have observed that…
Descriptors: Goodness of Fit, Performance Based Assessment, Evaluators, Decision Making
Charalambous, Charalambos Y.; Kyriakides, Ermis; Tsangaridou, Niki; Kyriakides, Leonidas – School Effectiveness and School Improvement, 2017
Heightened accountability pressures and an increased emphasis on teaching quality have directed scholarly attention to scrutinizing instruction, particularly with respect to issues of validity and reliability. However, these attempts have largely been directed toward "core" content areas and investigated generic or content-specific…
Descriptors: Physical Education, Instructional Effectiveness, Lesson Plans, Interrater Reliability
Castle, Courtney – ProQuest LLC, 2018
The Next Generation Science Standards propose a multidimensional model of science learning, comprised of Core Disciplinary Ideas, Science and Engineering Practices, and Crosscutting Concepts (NGSS Lead States, 2013). Accordingly, there is a need for student assessment aligned with the new standards. Creating assessments that validly and reliably…
Descriptors: Science Education, Student Evaluation, Science Tests, Test Construction
Holly Lee Allen – ProQuest LLC, 2020
Development of the Jordan Performance Appraisal System (JPAS) was completed in 1996. This study examined the factor structure of the classroom observation instrument used in the JPAS. Using observed classroom instructional quality ratings of 1220 elementary teachers of Grades 1-6 in the Jordan School District, this study estimated the factor…
Descriptors: Foreign Countries, Performance Based Assessment, Factor Structure, Classroom Observation Techniques
He, Peng; Liu, Xiufeng; Zheng, Changlong; Jia, Mengying – Chemistry Education Research and Practice, 2016
This study intends to develop a standardized instrument for measuring classroom teaching and learning in secondary chemistry lessons. Based on previous studies and interviews with expert teachers, the progression of five quality levels was constructed hypothetically to represent the quality of chemistry lessons in Chinese secondary schools. The…
Descriptors: Foreign Countries, Secondary School Science, Science Instruction, Chemistry
Ghadyani, Fariba; Tahririan, Mohammad Hassan; Afzali, Katayoon – Language Teaching Research Quarterly, 2022
There is a dearth of research on hope in studies of second or foreign (L2) language learning. Therefore, the present research contributes conceptually to a deep understanding of hope for learning English as a foreign language and the ways it may be developed. To do so, an exploratory mixed-methods design was employed. Using in-depth interviews,…
Descriptors: English (Second Language), Second Language Learning, Second Language Instruction, Psychological Patterns
Virtanen, T. E.; Pakarinen, E.; Lerkkanen, M.-K.; Poikkeus, A.-M.; Siekkinen, M.; Nurmi, J.-E. – Journal of Early Adolescence, 2018
This study examined the reliability and validity of the Classroom Assessment Scoring System-Secondary (CLASS-S) in Finnish classrooms. Trained observers coded classroom interactions based on video recordings of 46 Grade 6 classrooms (450 cycles). Concurrent associations were investigated with respect to teacher self-ratings (e.g., efficacy beliefs…
Descriptors: Factor Analysis, Classroom Observation Techniques, Foreign Countries, Factor Structure
Previous Page | Next Page ยป
Pages: 1 | 2