Publication Date
In 2025 | 2 |
Since 2024 | 33 |
Since 2021 (last 5 years) | 134 |
Since 2016 (last 10 years) | 455 |
Since 2006 (last 20 years) | 1164 |
Descriptor
Comparative Analysis | 1930 |
Reliability | 873 |
Test Reliability | 787 |
Foreign Countries | 547 |
Test Validity | 442 |
Correlation | 345 |
Validity | 330 |
Interrater Reliability | 325 |
Statistical Analysis | 321 |
Scores | 274 |
Measures (Individuals) | 236 |
More ▼ |
Source
Author
Reckase, Mark D. | 6 |
Attali, Yigal | 5 |
Coniam, David | 5 |
Brennan, Robert L. | 4 |
Crehan, Kevin D. | 4 |
Feldt, Leonard S. | 4 |
Hakstian, A. Ralph | 4 |
Jones, Ian | 4 |
Kolen, Michael J. | 4 |
Lunz, Mary E. | 4 |
August, Diane | 3 |
More ▼ |
Publication Type
Education Level
Audience
Researchers | 35 |
Practitioners | 29 |
Teachers | 15 |
Administrators | 9 |
Policymakers | 6 |
Counselors | 2 |
Media Staff | 2 |
Parents | 1 |
Support Staff | 1 |
Location
Turkey | 59 |
United States | 47 |
Australia | 36 |
Canada | 32 |
United Kingdom (England) | 32 |
China | 31 |
United Kingdom | 28 |
Germany | 25 |
Netherlands | 24 |
Taiwan | 22 |
Hong Kong | 20 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Meets WWC Standards with or without Reservations | 1 |
Does not meet standards | 1 |
Colvin, Kimberly F.; Gorgun, Guher – Practical Assessment, Research & Evaluation, 2020
This study compares a scale, the Rosenberg Self-Esteem Scale, that was administered with four response categories to versions of the same scale that were administered with six and eight response categories. Respondents were randomly assigned to take one of the three versions of RSES. A rating scale utility analysis was conducted on all three…
Descriptors: Psychometrics, Measures (Individuals), Self Concept Measures, Self Esteem
Bartelink, Cora; de Kwaadsteniet, Leontien; ten Berge, Ingrid J.; Witteman, Cilia L. M. – Child & Youth Care Forum, 2017
Background: The LIRIK, an instrument for the assessment of child safety and risk, is designed to improve assessments by guiding professionals through a structured evaluation of relevant signs, risk factors, and protective factors. Objective: We aimed to assess the interrater agreement and the predictive validity of professionals' judgments made…
Descriptors: Child Safety, Test Validity, Test Reliability, Risk
A Comparison of Procedures for Estimating Person Reliability Parameters in the Graded Response Model
LaHuis, David M.; Bryant-Lees, Kinsey B.; Hakoyama, Shotaro; Barnes, Tyler; Wiemann, Andrea – Journal of Educational Measurement, 2018
Person reliability parameters (PRPs) model temporary changes in individuals' attribute level perceptions when responding to self-report items (higher levels of PRPs represent less fluctuation). PRPs could be useful in measuring careless responding and traitedness. However, it is unclear how well current procedures for estimating PRPs can recover…
Descriptors: Comparative Analysis, Reliability, Error of Measurement, Measurement Techniques
Verhavert, San; Bouwer, Renske; Donche, Vincent; De Maeyer, Sven – Assessment in Education: Principles, Policy & Practice, 2019
Comparative Judgement (CJ) aims to improve the quality of performance-based assessments by letting multiple assessors judge pairs of performances. CJ is generally associated with high levels of reliability, but there is also a large variation in reliability between assessments. This study investigates which assessment characteristics influence the…
Descriptors: Meta Analysis, Reliability, Comparative Analysis, Value Judgment
Ozdemir, Burhanettin; Gelbal, Selahattin – Education and Information Technologies, 2022
The computerized adaptive tests (CAT) apply an adaptive process in which the items are tailored to individuals' ability scores. The multidimensional CAT (MCAT) designs differ in terms of different item selection, ability estimation, and termination methods being used. This study aims at investigating the performance of the MCAT designs used to…
Descriptors: Scores, Computer Assisted Testing, Test Items, Language Proficiency
Daniel, Michael; Koshevoy, Alexey; Schurov, Ilya; Dobrushina, Nina – Field Methods, 2022
In this article, we address the issue of reliability of quantitative data on multilingualism of the past obtained as recall data. More specifically, we investigate whether the interviewees' assessments of the language repertoires of their late relatives (indirect data) provide results that are quantitatively similar to those obtained from the…
Descriptors: Recall (Psychology), Multilingualism, Artificial Intelligence, Second Languages
Arslan Mancar, Sinem; Gulleroglu, H. Deniz – International Journal of Assessment Tools in Education, 2022
The aim of this study is to analyse the importance of the number of raters and compare the results obtained by techniques based on Classical Test Theory (CTT) and Generalizability (G) Theory. The Kappa and Krippendorff alpha techniques based on CTT were used to determine the inter-rater reliability. In this descriptive research data consists of…
Descriptors: Comparative Analysis, Interrater Reliability, Advanced Placement, Scoring Rubrics
Christopher E. Gomez; Marcelo O. Sztainberg; Rachel E. Trana – International Journal of Bullying Prevention, 2022
Cyberbullying is the use of digital communication tools and spaces to inflict physical, mental, or emotional distress. This serious form of aggression is frequently targeted at, but not limited to, vulnerable populations. A common problem when creating machine learning models to identify cyberbullying is the availability of accurately annotated,…
Descriptors: Video Technology, Computer Software, Computer Mediated Communication, Bullying
Wind, Stefanie A.; Walker, A. Adrienne – Educational Measurement: Issues and Practice, 2021
Many large-scale performance assessments include score resolution procedures for resolving discrepancies in rater judgments. The goal of score resolution is conceptually similar to person fit analyses: To identify students for whom observed scores may not accurately reflect their achievement. Previously, researchers have observed that…
Descriptors: Goodness of Fit, Performance Based Assessment, Evaluators, Decision Making
Sims, Maureen E.; Cox, Troy L.; Eckstein, Grant T.; Hartshorn, K. James; Wilcox, Matthew P.; Hart, Judson M. – Educational Measurement: Issues and Practice, 2020
The purpose of this study is to explore the reliability of a potentially more practical approach to direct writing assessment in the context of ESL writing. Traditional rubric rating (RR) is a common yet resource-intensive evaluation practice when performed reliably. This study compared the traditional rubric model of ESL writing assessment and…
Descriptors: Scoring Rubrics, Item Response Theory, Second Language Learning, English (Second Language)
Benton, Tom; Leech, Tony; Hughes, Sarah – Cambridge Assessment, 2020
In the context of examinations, the phrase "maintaining standards" usually refers to any activity designed to ensure that it is no easier (or harder) to achieve a given grade in one year than in another. Specifically, it tends to mean activities associated with setting examination grade boundaries. Benton et al (2020) describes a method…
Descriptors: Mathematics Tests, Equated Scores, Comparative Analysis, Difficulty Level
Deribo, Tobias; Goldhammer, Frank; Kroehne, Ulf – Educational and Psychological Measurement, 2023
As researchers in the social sciences, we are often interested in studying not directly observable constructs through assessments and questionnaires. But even in a well-designed and well-implemented study, rapid-guessing behavior may occur. Under rapid-guessing behavior, a task is skimmed shortly but not read and engaged with in-depth. Hence, a…
Descriptors: Reaction Time, Guessing (Tests), Behavior Patterns, Bias
Zijlmans, Eva A. O.; Tijmstra, Jesper; van der Ark, L. Andries; Sijtsma, Klaas – Educational and Psychological Measurement, 2018
Reliability is usually estimated for a total score, but it can also be estimated for item scores. Item-score reliability can be useful to assess the repeatability of an individual item score in a group. Three methods to estimate item-score reliability are discussed, known as method MS, method [lambda][subscript 6], and method CA. The item-score…
Descriptors: Test Items, Test Reliability, Correlation, Comparative Analysis
Yocarini, Iris E.; Bouwmeester, Samantha; Smeets, Guus; Arends, Lidia R. – Educational Measurement: Issues and Practice, 2018
This real-data-guided simulation study systematically evaluated the decision accuracy of complex decision rules combining multiple tests within different realistic curricula. Specifically, complex decision rules combining conjunctive aspects and compensatory aspects were evaluated. A conjunctive aspect requires a minimum level of performance,…
Descriptors: Comparative Analysis, Decision Making, Accuracy, Higher Education
Butti, Niccolò; Finisguerra, Alessandra; Urgesi, Cosimo – Developmental Psychology, 2022
There is inconsistent evidence that human bodies are processed through holistic processing as it has been widely reported for faces. To assess how configural and holistic processes may develop with age, we administered a visual body recognition task assessing the presence of body inversion and composite illusion effects to white adults (114…
Descriptors: Human Body, Whites, Adults, Holistic Approach