Publication Date
In 2025 | 0 |
Since 2024 | 1 |
Since 2021 (last 5 years) | 1 |
Since 2016 (last 10 years) | 11 |
Since 2006 (last 20 years) | 126 |
Descriptor
Statistical Analysis | 187 |
Reliability | 102 |
Test Reliability | 57 |
Validity | 37 |
Interrater Reliability | 36 |
Foreign Countries | 30 |
Measures (Individuals) | 30 |
Evaluation Methods | 29 |
Scores | 28 |
Test Validity | 26 |
Test Construction | 25 |
More ▼ |
Source
Author
Alonzo, Julie | 9 |
Tindal, Gerald | 9 |
Lai, Cheng-Fei | 8 |
Park, Bitnara Jasmine | 7 |
Irvin, P. Shawn | 6 |
Algina, James | 2 |
Anderson, Daniel | 2 |
Coniam, David | 2 |
Cousineau, Denis | 2 |
Feldt, Leonard S. | 2 |
Laurencelle, Louis | 2 |
More ▼ |
Publication Type
Reports - Evaluative | 187 |
Journal Articles | 141 |
Speeches/Meeting Papers | 23 |
Numerical/Quantitative Data | 9 |
Information Analyses | 5 |
Opinion Papers | 4 |
Tests/Questionnaires | 3 |
Guides - Non-Classroom | 2 |
Reports - Research | 1 |
Education Level
Elementary Education | 17 |
Elementary Secondary Education | 17 |
Higher Education | 17 |
Secondary Education | 12 |
High Schools | 10 |
Postsecondary Education | 9 |
Middle Schools | 8 |
Grade 7 | 7 |
Grade 8 | 6 |
Grade 3 | 5 |
Grade 4 | 5 |
More ▼ |
Location
United Kingdom | 5 |
Australia | 4 |
California | 3 |
Michigan | 3 |
Taiwan | 3 |
Canada | 2 |
Florida | 2 |
Germany | 2 |
Greece | 2 |
Idaho | 2 |
Illinois | 2 |
More ▼ |
Laws, Policies, & Programs
No Child Left Behind Act 2001 | 2 |
Debra P v Turlington | 1 |
Race to the Top | 1 |
Reading Excellence Act | 1 |
Safe and Drug Free Schools… | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Meets WWC Standards without Reservations | 1 |
Meets WWC Standards with or without Reservations | 1 |
John R. Donoghue; Carol Eckerly – Applied Measurement in Education, 2024
Trend scoring constructed response items (i.e. rescoring Time A responses at Time B) gives rise to two-way data that follow a product multinomial distribution rather than the multinomial distribution that is usually assumed. Recent work has shown that the difference in sampling model can have profound negative effects on statistics usually used to…
Descriptors: Scoring, Error of Measurement, Reliability, Scoring Rubrics
Donegan, Sarah; Dias, Sofia; Welton, Nicky J. – Research Synthesis Methods, 2019
When numerous treatments exist for a disease (Treatments 1, 2, 3, etc), network meta-regression (NMR) examines whether each relative treatment effect (eg, mean difference for 2 vs 1, 3 vs 1, and 3 vs 2) differs according to a covariate (eg, disease severity). Two consistency assumptions underlie NMR: consistency of the treatment effects at the…
Descriptors: Reliability, Regression (Statistics), Outcomes of Treatment, Statistical Analysis
Raykov, Tenko; Marcoulides, George A.; Harrison, Michael; Menold, Natalja – Educational and Psychological Measurement, 2019
This note confronts the common use of a single coefficient alpha as an index informing about reliability of a multicomponent measurement instrument in a heterogeneous population. Two or more alpha coefficients could instead be meaningfully associated with a given instrument in finite mixture settings, and this may be increasingly more likely the…
Descriptors: Statistical Analysis, Test Reliability, Measures (Individuals), Computation
Cousineau, Denis; Laurencelle, Louis – Educational and Psychological Measurement, 2017
Assessing global interrater agreement is difficult as most published indices are affected by the presence of mixtures of agreements and disagreements. A previously proposed method was shown to be specifically sensitive to global agreement, excluding mixtures, but also negatively biased. Here, we propose two alternatives in an attempt to find what…
Descriptors: Interrater Reliability, Evaluation Methods, Statistical Bias, Accuracy
Kane, Michael T. – Assessment in Education: Principles, Policy & Practice, 2017
In response to an argument by Baird, Andrich, Hopfenbeck and Stobart (2017), Michael Kane states that there needs to be a better fit between educational assessment and learning theory. In line with this goal, Kane will examine how psychometric constraints might be loosened by relaxing some psychometric "rules" in some assessment…
Descriptors: Educational Assessment, Psychometrics, Standards, Test Reliability
Cousineau, Denis; Laurencelle, Louis – Educational and Psychological Measurement, 2015
Existing tests of interrater agreements have high statistical power; however, they lack specificity. If the ratings of the two raters do not show agreement but are not random, the current tests, some of which are based on Cohen's kappa, will often reject the null hypothesis, leading to the wrong conclusion that agreement is present. A new test of…
Descriptors: Interrater Reliability, Monte Carlo Methods, Measurement Techniques, Accuracy
Benton, Tom; Elliott, Gill – Research Papers in Education, 2016
In recent years the use of expert judgement to set and maintain examination standards has been increasingly criticised in favour of approaches based on statistical modelling. This paper reviews existing research on this controversy and attempts to unify the evidence within a framework where expertise is utilised in the form of comparative…
Descriptors: Reliability, Expertise, Mathematical Models, Standard Setting (Scoring)
Walstad, William B.; Rebeck, Ken – Journal of Economic Education, 2017
The "Test of Financial Literacy" (TFL) was created to measure the financial knowledge of high school students. Its content is based on the standards and benchmarks stated in the "National Standards for Financial Literacy" (Council for Economic Education 2013). The test development process involved extensive item writing and…
Descriptors: Tests, Money Management, Literacy, High School Students
Raykov, Tenko; Dimitrov, Dimiter M.; von Eye, Alexander; Marcoulides, George A. – Educational and Psychological Measurement, 2013
A latent variable modeling method for evaluation of interrater agreement is outlined. The procedure is useful for point and interval estimation of the degree of agreement among a given set of judges evaluating a group of targets. In addition, the approach allows one to test for identity in underlying thresholds across raters as well as to identify…
Descriptors: Interrater Reliability, Models, Statistical Analysis, Computation
Reisman, Fredricka; Keiser, Larry; Otti, Obinna – Creativity Research Journal, 2016
The Reisman Diagnostic Creativity Assessment (RDCA) is a free online self-report creativity assessment that provides immediate feedback to the user and is diagnostic, rather than predictive, with the focus on making the user aware of creative strengths and weaknesses. Several engineering and teacher education studies have included the RDCA over a…
Descriptors: Creativity, Creativity Tests, Creative Development, Computer Oriented Programs
Mahfoud, Ziyad; Ghandour, Lilian; Ghandour, Blanche; Mokdad, Ali H.; Sibai, Abla M. – Field Methods, 2015
Findings on the reliability and cost-effectiveness of the use of cellular phones vis-à-vis face-to-face interviews in investigating health behaviors and conditions are presented for a national epidemiological sample from Lebanon. Using self-reported responses on identical questions, percentage agreement, ? statistics, and McNemar's test were used…
Descriptors: Telephone Surveys, Interviews, Surveys, Responses
Willoughby, Michael T. – Grantee Submission, 2014
The focus article (Willoughby et al., 2014) (1) introduced the distinction between formative and reflective measurement and (2) proposed that performance-based executive function tasks may be better conceptualized from the perspective of formative rather than reflective measurement. This proposal stands in sharp contrast to conventional…
Descriptors: Executive Function, Formative Evaluation, Cognitive Measurement, Factor Analysis
Gleason, Daniel – Educational Forum, 2014
The essay considers two analogies that help to reveal the limitations of value-added modeling: the first, a comparison with batting averages, shows that the model's reliability is quite limited even though year-to-year correlation figures may seem impressive; the second, a comparison between medical malpractice and so-called educational…
Descriptors: Models, Evaluation Methods, Reliability, Correlation
Thomas, D. Roland; Zumbo, Bruno D. – Educational and Psychological Measurement, 2012
There is such doubt in research practice about the reliability of difference scores that granting agencies, journal editors, reviewers, and committees of graduate students' theses have been known to deplore their use. This most maligned index can be used in studies of change, growth, or perhaps discrepancy between two measures taken on the same…
Descriptors: Statistical Analysis, Reliability, Scores, Change
Hsieh, Wen-Min; Tsai, Chin-Chung – International Journal of Science Education, 2017
This cross-sectional study explored students' conceptions of science learning via drawing analysis. A total of 906 Taiwanese students in 4th, 6th, 8th, 10th, and 12th grade were asked to use drawing to illustrate how they conceptualise science learning. Students' drawings were analysed using a coding checklist to determine the presence or absence…
Descriptors: Science Instruction, Teaching Methods, Case Studies, Positive Attitudes