Publication Date
In 2025 | 103 |
Since 2024 | 950 |
Since 2021 (last 5 years) | 3486 |
Since 2016 (last 10 years) | 7671 |
Since 2006 (last 20 years) | 14844 |
Descriptor
Test Reliability | 14596 |
Test Validity | 9898 |
Reliability | 9570 |
Foreign Countries | 6774 |
Test Construction | 4627 |
Validity | 4130 |
Measures (Individuals) | 3759 |
Factor Analysis | 3728 |
Psychometrics | 3406 |
Interrater Reliability | 3068 |
Correlation | 3013 |
More ▼ |
Source
Author
Publication Type
Education Level
Audience
Researchers | 703 |
Practitioners | 447 |
Teachers | 204 |
Administrators | 121 |
Policymakers | 62 |
Counselors | 42 |
Students | 37 |
Parents | 11 |
Community | 7 |
Media Staff | 5 |
Support Staff | 5 |
More ▼ |
Location
Turkey | 1249 |
Australia | 428 |
Canada | 371 |
China | 332 |
United States | 265 |
United Kingdom | 246 |
Taiwan | 222 |
Netherlands | 217 |
Indonesia | 215 |
California | 208 |
Spain | 204 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Meets WWC Standards without Reservations | 8 |
Meets WWC Standards with or without Reservations | 9 |
Does not meet standards | 6 |
Baysal, Emine Akkas; Ocak, Gürbüz – International Journal of Progressive Education, 2022
This study aimed to develop a reliable and valid scale to reveal the cognitive biases of university students in context of analytical thinking skills. During scale development process, firstly, a 5-point Likert type scale pre-trial form consisting of 60 items was created. The pre-trial form was applied to 450 students in Afyon Kocatepe University.…
Descriptors: College Students, Thinking Skills, Test Reliability, Test Validity
Metsämuuronen, Jari – Practical Assessment, Research & Evaluation, 2022
The reliability of a test score is usually underestimated and the deflation may be profound, 0.40 - 0.60 units of reliability or 46 - 71%. Eight root sources of the deflation are discussed and quantified by a simulation with 1,440 real-world datasets: (1) errors in the measurement modelling, (2) inefficiency in the estimator of reliability within…
Descriptors: Test Reliability, Scores, Test Items, Correlation
Kogar, Hakan – International Journal of Assessment Tools in Education, 2022
The purpose of this study is to identify which scale short-form development method produces better findings in different factor structures. A simulation study was designed based on this purpose. Three different factor structures and three simulation conditions were selected. As the findings of this simulation study, the model-data fit and…
Descriptors: Test Construction, Measures (Individuals), Factor Structure, Test Reliability
Lambert, Richard G.; Holcomb, T. Scott; Bottoms, Bryndle – Center for Educational Measurement and Evaluation, 2022
The validity of the Kappa coefficient of chance-corrected agreement has been questioned when the prevalence of specific rating scale categories is low and agreement between raters is high. The researchers proposed the Lambda Coefficient of Rater-Mediated Agreement as an alternative to Kappa to address these concerns. Lambda corrects for chance…
Descriptors: Interrater Reliability, Evaluators, Rating Scales, Teacher Evaluation
Sone, Bailey J.; Kaat, Aaron J.; Roberts, Megan Y. – Autism: The International Journal of Research and Practice, 2021
Children with autism spectrum disorder benefit from early, intensive interventions to improve social communication, and parent-implemented interventions are a feasible, family-centered way to increase treatment dosage. The success of such interventions is dependent on a parent's ability to implement the strategies with fidelity. However,…
Descriptors: Autism, Pervasive Developmental Disorders, Early Intervention, Parent Participation
An, Mihee; Nord, Jayden; Koziol, Natalie A.; Dusing, Stacey C.; Kane, Audrey E.; Lobo, Michele A.; McCoy, Sarah W.; Harbourne, Regina T. – Grantee Submission, 2021
Aim: To describe the development of an intervention-specific fidelity measure and its utilization and to determine whether the newly developed Sitting Together and Reaching to Play (START-Play) intervention was implemented as intended. Also, to quantify differences between START-Play and usual early intervention (uEI) services. Method: A fidelity…
Descriptors: Test Construction, Measures (Individuals), Fidelity, Early Intervention
Steele, Catriona M.; Peladeau-Pigeon, Melanie; Nagy, Ahmed; Waito, Ashley A. – Journal of Speech, Language, and Hearing Research, 2020
Purpose: The field lacks consensus about preferred metrics for capturing pharyngeal residue on videofluoroscopy. We explored four different methods, namely, the visuoperceptual Eisenhuber scale and three pixel-based methods: (a) residue area divided by vallecular or pyriform sinus spatial housing ("%-Full"), (b) the Normalized Residue…
Descriptors: Human Body, Physiology, Speech Language Pathology, Measurement Techniques
Edwards, Ashley A.; Joyner, Keanan J.; Schatschneider, Christopher – Educational and Psychological Measurement, 2021
The accuracy of certain internal consistency estimators have been questioned in recent years. The present study tests the accuracy of six reliability estimators (Cronbach's alpha, omega, omega hierarchical, Revelle's omega, and greatest lower bound) in 140 simulated conditions of unidimensional continuous data with uncorrelated errors with varying…
Descriptors: Reliability, Computation, Accuracy, Sample Size
McLeod, Justin W.H.; McCrimmon, Adam W. – Journal of Psychoeducational Assessment, 2021
The "Raven's 2 Progressive Matrices Clinical Edition" (Raven's 2; Raven, Rust, Chan, & Zhou, 2018), published by NCS Pearson, is an individually administered nonverbal assessment of general cognitive ability developed to measure "educative abilities," defined as the ability to think clearly and solve complex problems in…
Descriptors: Test Reviews, Intelligence Tests, Testing, Test Reliability
Lee, Yi-Hsuan; Haberman, Shelby J. – Journal of Educational Measurement, 2021
For assessments that use different forms in different administrations, equating methods are applied to ensure comparability of scores over time. Ideally, a score scale is well maintained throughout the life of a testing program. In reality, instability of a score scale can result from a variety of causes, some are expected while others may be…
Descriptors: Scores, Regression (Statistics), Demography, Data
Maestrales, Sarah; Zhai, Xiaoming; Touitou, Israel; Baker, Quinton; Schneider, Barbara; Krajcik, Joseph – Journal of Science Education and Technology, 2021
In response to the call for promoting three-dimensional science learning (NRC, 2012), researchers argue for developing assessment items that go beyond rote memorization tasks to ones that require deeper understanding and the use of reasoning that can improve science literacy. Such assessment items are usually performance-based constructed…
Descriptors: Artificial Intelligence, Scoring, Evaluation Methods, Chemistry
Lenz, A. Stephen; Ho, Chia-Min; Rocha, Lauren; Aras, Yahyahan – Measurement and Evaluation in Counseling and Development, 2021
This study examined the degree that reliability coefficients for scores on the PTGI generalize across participant and study characteristics. Meta-analytic procedures resulted in observed and predicted mean alpha coefficients ranging from acceptable to excellent and appeared to be largely unrelated to the participant characteristics included in our…
Descriptors: Generalization, Test Reliability, Scores, Measures (Individuals)
Gwet, Kilem L. – Educational and Psychological Measurement, 2021
Cohen's kappa coefficient was originally proposed for two raters only, and it later extended to an arbitrarily large number of raters to become what is known as Fleiss' generalized kappa. Fleiss' generalized kappa and its large-sample variance are still widely used by researchers and were implemented in several software packages, including, among…
Descriptors: Sample Size, Statistical Analysis, Interrater Reliability, Computation
Pérez-Castilla, Alejandro; Fernandes, John F. T.; Rojas, F. Javier; García-Ramos, Amador – Measurement in Physical Education and Exercise Science, 2021
This study explored the influence of different take-off thresholds on the reliability and magnitude of countermovement jump (CMJ) performance variables. Twenty-three men were tested on two separate sessions. CMJ performance variables were obtained against three external loads (0.5-30-60 kg) using three take-off thresholds: 10 N (arbitrary value of…
Descriptors: Physical Activities, Performance Tests, Reliability, College Students
Shin, Wonho; Park, Jongwon – International Journal of Science and Mathematics Education, 2021
The objective of this study was to understand behavioral characteristics of creative physicists during their growth period, and we want to use any insight gained to help teachers and parents encourage students' creativity in their everyday life. To do this, the critical incident technique was utilized to extract behavioral traits from six…
Descriptors: Psychological Characteristics, Behavior, Physics, Creativity