NotesFAQContact Us
Collection
Advanced
Search Tips
Audience
Researchers1
Location
China1
Germany1
Laws, Policies, & Programs
What Works Clearinghouse Rating
Showing 1 to 15 of 25 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
John R. Donoghue; Carol Eckerly – Applied Measurement in Education, 2024
Trend scoring constructed response items (i.e. rescoring Time A responses at Time B) gives rise to two-way data that follow a product multinomial distribution rather than the multinomial distribution that is usually assumed. Recent work has shown that the difference in sampling model can have profound negative effects on statistics usually used to…
Descriptors: Scoring, Error of Measurement, Reliability, Scoring Rubrics
Kelvin Terrell Pompey – ProQuest LLC, 2021
Many methods are used to measure interrater reliability for studies where each target receives ratings by a different set of judges. The purpose of this study is to explore the use of hierarchical modeling for estimating interrater reliability using the intraclass correlation coefficient. This study provides a description of how the ICC can be…
Descriptors: Interrater Reliability, Evaluation Methods, Test Reliability, Correlation
Peer reviewed Peer reviewed
Direct linkDirect link
Solano-Flores, Guillermo – Educational Measurement: Issues and Practice, 2021
This article proposes a Boolean approach to representing and analyzing interobserver agreement in dichotomous coding. Building on the notion that observations are samples of a universe of observations, it submits that coding can be viewed as a process in which observers sample pieces of evidence on constructs. It distinguishes between formal and…
Descriptors: Online Searching, Coding, Interrater Reliability, Evidence
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Donoghue, John R.; McClellan, Catherine A.; Hess, Melinda R. – ETS Research Report Series, 2022
When constructed-response items are administered for a second time, it is necessary to evaluate whether the current Time B administration's raters have drifted from the scoring of the original administration at Time A. To study this, Time A papers are sampled and rescored by Time B scorers. Commonly the scores are compared using the proportion of…
Descriptors: Item Response Theory, Test Construction, Scoring, Testing
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Moeller, Julia; Viljaranta, Jaana; Kracke, Bärbel; Dietrich, Julia – Frontline Learning Research, 2020
This article proposes a study design developed to disentangle the objective characteristics of a learning situation from individuals' subjective perceptions of that situation. The term objective characteristics refers to the agreement across students, whereas subjective perceptions refers to inter-individual heterogeneity. We describe a novel…
Descriptors: Student Attitudes, College Students, Lecture Method, Student Interests
Peer reviewed Peer reviewed
Direct linkDirect link
Johnson, Austin H.; Chafouleas, Sandra M.; Briesch, Amy M. – School Psychology Quarterly, 2017
In this study, generalizability theory was used to examine the extent to which (a) time-sampling methodology, (b) number of simultaneous behavior targets, and (c) individual raters influenced variance in ratings of academic engagement for an elementary-aged student. Ten graduate-student raters, with an average of 7.20 hr of previous training in…
Descriptors: Generalizability Theory, Sampling, Elementary School Students, Learner Engagement
Peer reviewed Peer reviewed
Direct linkDirect link
Becraft, Jessica L.; Borrero, John C.; Davis, Barbara J.; Mendres-Smith, Amber E. – Education and Treatment of Children, 2016
The current study was designed to evaluate a rotating momentary time sampling (MTS) data collection system. A rotating MTS system has been used to measure activity preferences of preschoolers but not to collect data on responses that vary in duration and frequency (e.g., talking). We collected data on talking for 10 preschoolers using a 5-s MTS…
Descriptors: Sampling, Time, Interrater Reliability, Data Collection
Yvette R. Harris – Sage Research Methods Cases, 2016
The goal of this case study was to introduce students to ways to conduct research on parent child cognitive learning interactions. To this end, the case study begins with an overview of the theoretical and empirical work supporting the development of my research program on parent child cognitive learning interaction research and continues with a…
Descriptors: Student Research, Parent Child Relationship, Interaction, Sampling
Peer reviewed Peer reviewed
Direct linkDirect link
Rapp, John T.; Carroll, Regina A.; Stangeland, Lindsay; Swanson, Greg; Higgins, William J. – Behavior Modification, 2011
The authors evaluated the extent to which interobserver agreement (IOA) scores, using the block-by-block method for events scored with continuous duration recording (CDR), were higher when the data from the same sessions were converted to discontinuous methods. Sessions with IOA scores of 89% or less with CDR were rescored using 10-s partial…
Descriptors: Intervals, Sampling, Comparative Analysis, Measures (Individuals)
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Zhang, Mo – ETS Research Report Series, 2013
Many testing programs use automated scoring to grade essays. One issue in automated essay scoring that has not been examined adequately is population invariance and its causes. The primary purpose of this study was to investigate the impact of sampling in model calibration on population invariance of automated scores. This study analyzed scores…
Descriptors: Automation, Scoring, Essay Tests, Sampling
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Boller, Kimberly; Kisker, Ellen Eliason – Regional Educational Laboratory, 2014
This guide is designed to help researchers make sure that their research reports include enough information about study measures so that readers can assess the quality of the study's methods and results. The guide also provides examples of write-ups about measures and suggests resources for learning more about these topics. The guide assumes…
Descriptors: Research Reports, Research Methodology, Educational Research, Check Lists
Peer reviewed Peer reviewed
Direct linkDirect link
Gugiu, Mihaiela R.; Gugiu, Paul C.; Baldus, Robert – Journal of MultiDisciplinary Evaluation, 2012
Background: Educational researchers have long espoused the virtues of writing with regard to student cognitive skills. However, research on the reliability of the grades assigned to written papers reveals a high degree of contradiction, with some researchers concluding that the grades assigned are very reliable whereas others suggesting that they…
Descriptors: Grades (Scholastic), Grading, Scoring Rubrics, Research Design
Peer reviewed Peer reviewed
Direct linkDirect link
Soslau, Elizabeth; Lewis, Kandia – Action in Teacher Education, 2014
For accreditation and programmatic decision making, education school administrators use inter-rater reliability analyses to judge credibility of student-teacher assessments. Although weak levels of agreement between university-appointed supervisors and cooperating teachers are usually interpreted to indicate that the process is not being…
Descriptors: Interrater Reliability, Accreditation (Institutions), Student Teacher Evaluation, Focus Groups
Peer reviewed Peer reviewed
Direct linkDirect link
Ong, Justina; Zhang, Lawrence Jun – TESOL Quarterly: A Journal for Teachers of English to Speakers of Other Languages and of Standard English as a Second Dialect, 2013
Little is known about the effects of various planning and revising conditions on composition quality in experimental or TESOL education research. This study examined the effects of planning conditions (planning, prolonged planning, free writing, and control), subplanning conditions (task-given, task-content-given, and…
Descriptors: English (Second Language), Second Language Learning, Cognitive Processes, Writing (Composition)
Peer reviewed Peer reviewed
Direct linkDirect link
Gwet, Kilem Li – Psychometrika, 2008
Most inter-rater reliability studies using nominal scales suggest the existence of two populations of inference: the population of subjects (collection of objects or persons to be rated) and that of raters. Consequently, the sampling variance of the inter-rater reliability coefficient can be seen as a result of the combined effect of the sampling…
Descriptors: Interrater Reliability, Computation, Statistical Inference, Sampling
Previous Page | Next Page »
Pages: 1  |  2