Publication Date
In 2025 | 2 |
Since 2024 | 33 |
Since 2021 (last 5 years) | 134 |
Since 2016 (last 10 years) | 455 |
Since 2006 (last 20 years) | 1164 |
Descriptor
Comparative Analysis | 1930 |
Reliability | 873 |
Test Reliability | 787 |
Foreign Countries | 547 |
Test Validity | 442 |
Correlation | 345 |
Validity | 330 |
Interrater Reliability | 325 |
Statistical Analysis | 321 |
Scores | 274 |
Measures (Individuals) | 236 |
More ▼ |
Source
Author
Reckase, Mark D. | 6 |
Attali, Yigal | 5 |
Coniam, David | 5 |
Brennan, Robert L. | 4 |
Crehan, Kevin D. | 4 |
Feldt, Leonard S. | 4 |
Hakstian, A. Ralph | 4 |
Jones, Ian | 4 |
Kolen, Michael J. | 4 |
Lunz, Mary E. | 4 |
August, Diane | 3 |
More ▼ |
Publication Type
Education Level
Audience
Researchers | 35 |
Practitioners | 29 |
Teachers | 15 |
Administrators | 9 |
Policymakers | 6 |
Counselors | 2 |
Media Staff | 2 |
Parents | 1 |
Support Staff | 1 |
Location
Turkey | 59 |
United States | 47 |
Australia | 36 |
Canada | 32 |
United Kingdom (England) | 32 |
China | 31 |
United Kingdom | 28 |
Germany | 25 |
Netherlands | 24 |
Taiwan | 22 |
Hong Kong | 20 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Meets WWC Standards with or without Reservations | 1 |
Does not meet standards | 1 |
Di, Weiwei; Nie, Youyan; Chua, Bee Leng; Chye, Stefanie; Teo, Timothy – Journal of Psychoeducational Assessment, 2023
General self-efficacy represents the global sense of personal capability across various situations and tasks. The aim of the present study was to develop and validate a single-item general self-efficacy scale which balances practical demands and psychometric concerns. The psychometric properties of the proposed Single-Item General Self-Efficacy…
Descriptors: Self Efficacy, Self Concept Measures, Psychometrics, Adults
Leech, Tony; Chambers, Lucy – Research Matters, 2022
Two of the central issues in comparative judgement (CJ), which are perhaps underexplored compared to questions of the method's reliability and technical quality, are "what processes do judges use to make their decisions" and "what features do they focus on when making their decisions?" This article discusses both, in the…
Descriptors: Comparative Analysis, Decision Making, Evaluators, Reliability
Liu, Xiaowen; Jane Rogers, H. – Educational and Psychological Measurement, 2022
Test fairness is critical to the validity of group comparisons involving gender, ethnicities, culture, or treatment conditions. Detection of differential item functioning (DIF) is one component of efforts to ensure test fairness. The current study compared four treatments for items that have been identified as showing DIF: deleting, ignoring,…
Descriptors: Item Analysis, Comparative Analysis, Culture Fair Tests, Test Validity
Matthews, Evan L.; Horvat, Fiona M.; Phillips, David A. – Measurement in Physical Education and Exercise Science, 2022
The YMCA step test uses a prescribed step height which is difficult in a telehealth setting. Examine a modification of the YMCA step test allowing for the use of preexisting in-home objects of variable height as the "step" in a virtual environment. Young healthy participants (n = 40) performed step tests with a small and large object of…
Descriptors: Physical Fitness, Metabolism, Tests, Measurement
De Raadt, Alexandra; Warrens, Matthijs J.; Bosker, Roel J.; Kiers, Henk A. L. – Educational and Psychological Measurement, 2019
Cohen's kappa coefficient is commonly used for assessing agreement between classifications of two raters on a nominal scale. Three variants of Cohen's kappa that can handle missing data are presented. Data are considered missing if one or both ratings of a unit are missing. We study how well the variants estimate the kappa value for complete data…
Descriptors: Interrater Reliability, Data, Statistical Analysis, Statistical Bias
Osama Koraishi – Language Teaching Research Quarterly, 2024
This study conducts a comprehensive quantitative evaluation of OpenAI's language model, ChatGPT 4, for grading Task 2 writing of the IELTS exam. The objective is to assess the alignment between ChatGPT's grading and that of official human raters. The analysis encompassed a multifaceted approach, including a comparison of means and reliability…
Descriptors: Second Language Learning, English (Second Language), Language Tests, Artificial Intelligence
Jane Batamuliza; Gonzague Habinshuti; Jean Baptiste Nkurunziza – Journal of Technology and Science Education, 2024
This current study presents the effects of interactive computer simulations on students' performance and concept retention in the unit of chemical reactions. Purposive sampling was used to select four schools with a sample population of 320. The Achievement test on chemical reactions was developed, validated, and checked for reliability. The…
Descriptors: Chemistry, Science Instruction, Teaching Methods, Comparative Analysis
Wang, Faming; Wang, Yehui; Liu, Yaping; Leung, Shing On – Scandinavian Journal of Educational Research, 2023
The importance of the opportunity to learn (OTL) for mathematics achievement has been extensively researched. However, there were still unanswered questions regarding OTL's measurement, analytical level, and relationship with motivational beliefs. To fill in the gaps, we aimed to (1) scrutinize the reliability and validity of OTL, (2) investigate…
Descriptors: International Assessment, Foreign Countries, Achievement Tests, Secondary School Students
Jiyeo Yun – English Teaching, 2023
Studies on automatic scoring systems in writing assessments have also evaluated the relationship between human and machine scores for the reliability of automated essay scoring systems. This study investigated the magnitudes of indices for inter-rater agreement and discrepancy, especially regarding human and machine scoring, in writing assessment.…
Descriptors: Meta Analysis, Interrater Reliability, Essays, Scoring
Paquot, Magali; Rubin, Rachel; Vandeweerd, Nathan – Language Learning, 2022
The main objective of this Methods Showcase Article is to show how the technique of adaptive comparative judgment, coupled with a crowdsourcing approach, can offer practical solutions to reliability issues as well as to address the time and cost difficulties associated with a text-based approach to proficiency assessment in L2 research. We…
Descriptors: Comparative Analysis, Decision Making, Language Proficiency, Reliability
Benton, Tom – Research Matters, 2021
Computer adaptive testing is intended to make assessment more reliable by tailoring the difficulty of the questions a student has to answer to their level of ability. Most commonly, this benefit is used to justify the length of tests being shortened whilst retaining the reliability of a longer, non-adaptive test. Improvements due to adaptive…
Descriptors: Risk, Item Response Theory, Computer Assisted Testing, Difficulty Level
Lian Li; Jiehui Hu; Yu Dai; Ping Zhou; Wanhong Zhang – Reading & Writing Quarterly, 2024
This paper proposes to use depth perception to represent raters' decision in holistic evaluation of ESL essays, as an alternative medium to conventional form of numerical scores. The researchers verified the new method's accuracy and inter/intra-rater reliability by inviting 24 ESL teachers to perform different representations when rating 60…
Descriptors: Essays, Holistic Approach, Writing Evaluation, Accuracy
van der Scheer, Emmelien A.; Bijlsma, Hannah J. E.; Glas, Cees A. W. – School Effectiveness and School Improvement, 2019
A Bayesian IRT-model approach was used to investigate the validity and reliability of student perceptions of teaching quality. Furthermore, the student perceptions were compared with ratings of teaching quality by external observers. Grade 4 students (n = 675) filled out a questionnaire that was used to measure their opinions about the lessons of…
Descriptors: Student Attitudes, Validity, Interrater Reliability, Correlation
Purwanto; Hidayah, Niswatul; Wagistina, Satti – International Journal of Educational Methodology, 2023
Learning geography in Indonesia philosophically aims to develop spatial literacy. Students must improve spatial literacy to form reasoning skills and apply spatial concepts in real life. Applying Gersmehl's spatial learning can improve students' spatial literacy through syntax arranged based on spatial aspects. The use of google earth helps…
Descriptors: Spatial Ability, Natural Disasters, Geography Instruction, Teaching Methods
David Bell; Vikki O'Neill; Vivienne Crawford – Practitioner Research in Higher Education, 2023
We compared the influence of open-book extended duration versus closed book time-limited format on reliability and validity of written assessments of pharmacology learning outcomes within our medical and dental courses. Our dental cohort undertake a mid-year test (30xfree-response short answer to a question, SAQ) and end-of-year paper (4xSAQ,…
Descriptors: Undergraduate Students, Pharmacology, Pharmaceutical Education, Test Format