Publication Date
In 2025 | 67 |
Since 2024 | 901 |
Since 2021 (last 5 years) | 3415 |
Since 2016 (last 10 years) | 7595 |
Since 2006 (last 20 years) | 14761 |
Descriptor
Test Reliability | 14547 |
Test Validity | 9865 |
Reliability | 9544 |
Foreign Countries | 6751 |
Test Construction | 4608 |
Validity | 4120 |
Measures (Individuals) | 3750 |
Factor Analysis | 3720 |
Psychometrics | 3393 |
Interrater Reliability | 3054 |
Correlation | 3009 |
More ▼ |
Source
Author
Publication Type
Education Level
Audience
Researchers | 703 |
Practitioners | 447 |
Teachers | 204 |
Administrators | 121 |
Policymakers | 62 |
Counselors | 42 |
Students | 37 |
Parents | 11 |
Community | 7 |
Media Staff | 5 |
Support Staff | 5 |
More ▼ |
Location
Turkey | 1246 |
Australia | 428 |
Canada | 371 |
China | 329 |
United States | 264 |
United Kingdom | 246 |
Taiwan | 221 |
Netherlands | 217 |
Indonesia | 214 |
California | 208 |
Spain | 201 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Meets WWC Standards without Reservations | 8 |
Meets WWC Standards with or without Reservations | 9 |
Does not meet standards | 6 |
Roberto Brazileio Paixão; Michael C. Rodriguez – Educational Research and Evaluation, 2023
The usefulness of evaluation is critical. Evaluation use occurs when, from its results or process, decisions are made about the program, it changes people's mindsets, or persuasive or legitimation actions happen (instrumental, conceptual, and symbolic uses respectively). Few quantitative evaluation use studies have been conducted in recent years.…
Descriptors: Measures (Individuals), College Faculty, Test Validity, Test Reliability
El Alaoui, Mohamed – IEEE Transactions on Learning Technologies, 2023
Classical evaluation methods, assessments, exams, and so forth accentuate the perception of one against all, professor versus learners. Including students in the assessment process, allows transforming the professor from an opponent to a critical friend, with the role of helping students to recognize both their strengths and weaknesses. However,…
Descriptors: Peer Evaluation, Educational Improvement, Test Validity, Test Reliability
Matthew J. Madison; Seungwon Chung; Junok Kim; Laine P. Bradshaw – Grantee Submission, 2023
Recent developments have enabled the modeling of longitudinal assessment data in a diagnostic classification model (DCM) framework. These longitudinal DCMs were developed to provide measures of student growth on a discrete scale in the form of attribute mastery transitions, thereby supporting categorical and criterion-referenced interpretations of…
Descriptors: Models, Cognitive Measurement, Diagnostic Tests, Classification
Evans, Tanya; Mejía-Ramos, Juan Pablo; Inglis, Matthew – Educational Studies in Mathematics, 2022
Offering explanations is a central part of teaching mathematics, and understanding those explanations is a vital activity for learners. Given this, it is natural to ask what makes a good mathematical explanation. This question has received surprisingly little attention in the mathematics education literature, perhaps because the field has no…
Descriptors: Mathematics, Professional Personnel, Undergraduate Students, Mathematics Activities
Simner, Julia; Smees, Rebecca; Rinaldi, Louisa J.; Carmichael, Duncan A.; McDonald, Toby J. – Journal of Creative Behavior, 2022
Creative orientation is the extent to which different individuals are drawn toward creative activities (e.g., art, music). We know relatively little about child-level creative orientation given certain testing limitations. Adult tools often measure time spent engaged in creative pursuits, but this method is unsuitable for children because their…
Descriptors: Influences, Creativity, Creative Activities, Measures (Individuals)
Almehrizi, Rashid S. – Educational Measurement: Issues and Practice, 2022
Coefficient alpha reliability persists as the most common reliability coefficient reported in research. The assumptions for its use are, however, not well-understood. The current paper challenges the commonly used expressions of coefficient alpha and argues that while these expressions are correct when estimating reliability for summed scores,…
Descriptors: Reliability, Scores, Scaling, Statistical Analysis
Hull, Michael M.; Jansky, Alexandra; Hopf, Martin – Physical Review Physics Education Research, 2022
Our study investigates whether confidence correlates with consistency in reasoning, specifically about radioactive decay. In prior work, we developed and tested a survey designed to measure consistency of student reasoning about radioactive decay by comparing responses to three prompts that are isomorphic, meaning that, despite having different…
Descriptors: Students, Self Esteem, Responses, Accuracy
Eric Jones – ProQuest LLC, 2022
The assessment of human performance is not a new phenomenon. We have evidence that people have been required to prove their worth dating back at least to the Epic of Gilgamesh. What has changed, at least on a large scale, is the importance given to quantitative evidence in the evaluation process. For example, many employers have begun subjecting…
Descriptors: Performance Based Assessment, Evaluation Methods, Semiotics, Theories
Vonna L. Hemmler; Allison W. Kenney; Susan Dulong Langley; Carolyn M. Callahan; E. Jean Gubbins; Shannon Holder – Grantee Submission, 2022
Though qualitative research has become more prevalent in practice over the last 30 years, there is still considerable uncertainty among researchers regarding how to ensure inter-rater consistency when teams are tasked with coding qualitative data. In this article, we offer an explanation of a methodology our qualitative team used to achieve…
Descriptors: Interrater Reliability, Coding, Guides, Data Collection
Leighton, Jacqueline P.; Lehman, Blair – Educational Measurement: Issues and Practice, 2020
In this digital ITEMS module, Dr. Jacqueline Leighton and Dr. Blair Lehman review differences between think-aloud interviews to measure problem-solving processes and cognitive labs to measure comprehension processes. Learners are introduced to historical, theoretical, and procedural differences between these methods and how to use and analyze…
Descriptors: Protocol Analysis, Interviews, Problem Solving, Cognitive Processes
Hancock, Gregory R.; An, Ji – Measurement: Interdisciplinary Research and Perspectives, 2020
As an alternative to Cronbach's [alpha] for estimating scale reliability, McDonald's [omega] has attracted increased attention within the methodological community for its less stringent measurement assumptions. Notwithstanding, [omega] is still seldom used by practitioners, likely due to its unavailability in popular software packages (e.g., SPSS)…
Descriptors: Evaluation, Alternative Assessment, Reliability, Test Reliability
Swapna Haresh Teckwani; Amanda Huee-Ping Wong; Nathasha Vihangi Luke; Ivan Cherh Chiet Low – Advances in Physiology Education, 2024
The advent of artificial intelligence (AI), particularly large language models (LLMs) like ChatGPT and Gemini, has significantly impacted the educational landscape, offering unique opportunities for learning and assessment. In the realm of written assessment grading, traditionally viewed as a laborious and subjective process, this study sought to…
Descriptors: Accuracy, Reliability, Computational Linguistics, Standards
Clain, Alex E.; Alkhuwaiter, Munirah; Davidson, Kate; Martin-Harris, Bonnie – Journal of Speech, Language, and Hearing Research, 2022
Purpose: The purpose of this study was to extend the assessment of the psychometric properties of the Modified Barium Swallow Impairment Profile (MBSImP). Here, we re-examined structural validity and internal consistency using a large clinical-registry data set and formally examined rater reliability in a smaller data set. Method: This study…
Descriptors: Diagnostic Tests, Disability Identification, Physical Disabilities, Eating Disorders
Steinmann, Isa; Sánchez, Daniel; van Laar, Saskia; Braeken, Johan – Assessment in Education: Principles, Policy & Practice, 2022
Questionnaire scales that are mixed-worded, i.e. include both positively and negatively worded items, often suffer from issues like low reliability and more complex latent structures than intended. Part of the problem might be that some responders fail to respond consistently to the mixed-worded items. We investigated the prevalence and impact of…
Descriptors: Response Style (Tests), Test Items, Achievement Tests, Foreign Countries
Courtney Bell; Jessalynn James; Eric S. Taylor; James Wyckoff – Journal of Policy Analysis and Management, 2025
We study the returns to experience in teaching, estimated using supervisor ratings from classroom observations. We describe the assumptions required to interpret changes in observation ratings over time as the causal effect of experience on performance. We compare two difference-in-differences strategies: the two-way fixed effects estimator common…
Descriptors: Lesson Observation Criteria, Teaching Experience, Teacher Evaluation, Supervisors