Publication Date
In 2025 | 1 |
Since 2024 | 24 |
Since 2021 (last 5 years) | 93 |
Since 2016 (last 10 years) | 299 |
Since 2006 (last 20 years) | 850 |
Descriptor
Reliability | 1051 |
Scores | 1051 |
Validity | 419 |
Correlation | 277 |
Foreign Countries | 276 |
Measures (Individuals) | 267 |
Factor Analysis | 209 |
Psychometrics | 197 |
Statistical Analysis | 164 |
Comparative Analysis | 135 |
Questionnaires | 114 |
More ▼ |
Source
Author
Thompson, Bruce | 17 |
Henson, Robin K. | 11 |
Haberman, Shelby J. | 8 |
Lee, Yong-Won | 7 |
Vacha-Haase, Tammi | 7 |
Gill, Brian | 6 |
Petscher, Yaacov | 6 |
Sinharay, Sandip | 6 |
Worrell, Frank C. | 6 |
Kantor, Robert | 5 |
Kolen, Michael J. | 5 |
More ▼ |
Publication Type
Education Level
Location
Turkey | 40 |
United States | 25 |
China | 21 |
Canada | 20 |
Australia | 18 |
Florida | 16 |
Pennsylvania | 14 |
California | 12 |
South Korea | 12 |
Netherlands | 10 |
Spain | 10 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Meets WWC Standards without Reservations | 1 |
Meets WWC Standards with or without Reservations | 1 |
John R. Donoghue; Carol Eckerly – Applied Measurement in Education, 2024
Trend scoring constructed response items (i.e. rescoring Time A responses at Time B) gives rise to two-way data that follow a product multinomial distribution rather than the multinomial distribution that is usually assumed. Recent work has shown that the difference in sampling model can have profound negative effects on statistics usually used to…
Descriptors: Scoring, Error of Measurement, Reliability, Scoring Rubrics
Hulteen, Ryan M.; True, Larissa; Kroc, Edward – Measurement in Physical Education and Exercise Science, 2023
The typical process for assessing inter-rater reliability is facilitated by training raters within a research team. Lacking is an understanding if inter-rater reliability scores "between" research teams demonstrate adequate reliability. This study examined inter-rater reliability between 16 researchers who assessed fundamental motor…
Descriptors: Psychomotor Skills, Scores, Reliability, Interrater Reliability
Ian Jones; Ben Davies – International Journal of Research & Method in Education, 2024
Educational researchers often need to construct precise and reliable measurement scales of complex and varied representations such as participants' written work, videoed lesson segments and policy documents. Developing such scales using can be resource-intensive and time-consuming, and the outcomes are not always reliable. Here we present…
Descriptors: Educational Research, Comparative Analysis, Educational Researchers, Measurement
Brogan L. Barr; Virginia V. W. McIntosh; Eileen F. Britt; Jennifer Jordan; Janet D. Carter – Measurement: Interdisciplinary Research and Perspectives, 2024
Even when raters demonstrate agreement in the use of a measure, limited score variability or violation of often-ignored statistical assumptions can result in lower reliability estimates than intuitively expected. This article uses data drawn from two randomized controlled trials of schema therapy and cognitive behavioral therapy for the treatment…
Descriptors: Evaluators, Interrater Reliability, Reliability, Measurement Techniques
Richard S. Balkin; Quentin Hunter; Bradley T. Erford – Measurement and Evaluation in Counseling and Development, 2024
We describe best practices in reporting reliability estimates in counseling research with consideration to precision, generalization, and diverse populations. We provide a historical context to reporting reliability estimates, the limitations of past practices, and new methods to address reliability generalization. We highlight best practices…
Descriptors: Best Practices, Reliability, Counseling, Research
Moore, C. Missy; Crawford, Carey C.; Tertichny, Alissa – Measurement and Evaluation in Counseling and Development, 2023
We examined dimensionality and temporal stability of the Interpersonal Stress Scale-Counselor (ISS-C) scores in a sample of professional counselors (n = 518). Confirmatory factor analyses provided support for a four-factor model previously identified through exploratory factor analysis and a bifactor model. Using a randomized test-retest, temporal…
Descriptors: Counselors, Interpersonal Relationship, Stress Variables, Measures (Individuals)
Bouwer, Renske; Koster, Monica; van den Bergh, Huub – Assessment in Education: Principles, Policy & Practice, 2023
Assessing students' writing performance is essential to adequately monitor and promote individual writing development, but it is also a challenge. The present research investigates a benchmark rating procedure for assessing texts written by upper-elementary students. In two studies we examined whether a benchmark rating procedure (1) leads to…
Descriptors: Benchmarking, Writing Evaluation, Evaluation Methods, Elementary School Students
Almehrizi, Rashid S. – Educational Measurement: Issues and Practice, 2022
Coefficient alpha reliability persists as the most common reliability coefficient reported in research. The assumptions for its use are, however, not well-understood. The current paper challenges the commonly used expressions of coefficient alpha and argues that while these expressions are correct when estimating reliability for summed scores,…
Descriptors: Reliability, Scores, Scaling, Statistical Analysis
Steinmann, Isa; Sánchez, Daniel; van Laar, Saskia; Braeken, Johan – Assessment in Education: Principles, Policy & Practice, 2022
Questionnaire scales that are mixed-worded, i.e. include both positively and negatively worded items, often suffer from issues like low reliability and more complex latent structures than intended. Part of the problem might be that some responders fail to respond consistently to the mixed-worded items. We investigated the prevalence and impact of…
Descriptors: Response Style (Tests), Test Items, Achievement Tests, Foreign Countries
Lucy Chambers; Emma Walland; Jo Ireland – Research Matters, 2024
Comparative Judgement (CJ) is traditionally and primarily used to compare written texts. In this study we explored whether we could extend its use to comparing audio files. We used GCSE Music portfolios which contained a mix of audio recordings, musical scores and text documents. Fifteen judges completed two exercises: one comparing musical…
Descriptors: Evaluative Thinking, Judges, Comparative Analysis, Reliability
Liu, Yilan; Lee, Sue Ann S.; Chen, Wenjun – Journal of Speech, Language, and Hearing Research, 2022
Introduction: Assessment of resonance characteristics is essential in research and clinical practice in individuals with velopharyngeal impairment. The purpose of this study was to systematically review correlations between auditory perceptual ratings and nasalance scores obtained by a nasometer in individuals with resonance disorders and to…
Descriptors: Correlation, Auditory Perception, Meta Analysis, Guidelines
Tenko Raykov – Educational and Psychological Measurement, 2024
This note is concerned with the benefits that can result from the use of the maximal reliability and optimal linear combination concepts in educational and psychological research. Within the widely used framework of unidimensional multi-component measuring instruments, it is demonstrated that the linear combination of their components that…
Descriptors: Educational Research, Behavioral Science Research, Reliability, Error of Measurement
Franco-Martínez, Alicia; Alvarado, Jesús M.; Sorrel, Miguel A. – Educational and Psychological Measurement, 2023
A sample suffers range restriction (RR) when its variance is reduced comparing with its population variance and, in turn, it fails representing such population. If the RR occurs over the latent factor, not directly over the observed variable, the researcher deals with an indirect RR, common when using convenience samples. This work explores how…
Descriptors: Factor Analysis, Factor Structure, Scores, Sampling
Sanja Tatalovic Vorkapic; Christopher J. Anthony; Stephen N. Elliott; Ilaria Grazzani; Valeria Cavioni – International Journal of Emotional Education, 2024
Although there is increased interest in social and emotional competence and mental health in Croatia, there are currently limited measurement options available for early childhood settings. Thus, the SSIS SEL Brief Scales (SSIS SELb), an efficient measure of social and emotional learning competencies developed in the United States, was translated…
Descriptors: Social Emotional Learning, Preschool Children, Foreign Countries, Resilience (Psychology)
Vispoel, Walter P.; Lee, Hyeryung; Xu, Guanlan; Hong, Hyeri – Journal of Experimental Education, 2023
Although generalizability theory (GT) designs have traditionally been analyzed within an ANOVA framework, identical results can be obtained with structural equation models (SEMs) but extended to represent multiple sources of both systematic and measurement error variance, include estimation methods less likely to produce negative variance…
Descriptors: Generalizability Theory, Structural Equation Models, Programming Languages, Scores