NotesFAQContact Us
Collection
Advanced
Search Tips
Publication Date
In 20250
Since 20240
Since 2021 (last 5 years)0
Since 2016 (last 10 years)3
Since 2006 (last 20 years)22
Laws, Policies, & Programs
Elementary and Secondary…1
What Works Clearinghouse Rating
Showing 1 to 15 of 61 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Wesolowski, Brian C.; Wind, Stefanie A. – Journal of Educational Measurement, 2019
Rater-mediated assessments are a common methodology for measuring persons, investigating rater behavior, and/or defining latent constructs. The purpose of this article is to provide a pedagogical framework for examining rater variability in the context of rater-mediated assessments using three distinct models. The first model is the observation…
Descriptors: Interrater Reliability, Models, Observation, Measurement
Pennell, Adam – ProQuest LLC, 2019
This dissertation consists of three studies which examined multidimensional balance in youth (= 21 years; Individuals with Disabilities Education Act, 2004) with visual impairments (VIs) using the Brief-Balance Evaluation Systems Test (Brief-BESTest). These studies have the potential to inform (adapted) physical education curricula and…
Descriptors: Psychomotor Skills, Youth, Visual Impairments, Human Posture
Peer reviewed Peer reviewed
Direct linkDirect link
Lee, Keng-Lin; Tsai, Shih-Li; Chiu, Yu-Ting; Ho, Ming-Jung – Advances in Health Sciences Education, 2016
Measurement invariance is a prerequisite for comparing measurement scores from different groups. In medical education, multi-source feedback (MSF) is utilized to assess core competencies, including the professionalism. However, little attention has been paid to the measurement invariance of assessment instruments; that is, whether an instrument…
Descriptors: Measurement, Scores, Medical Education, Competence
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Wedman, Jonathan; Lyrén, Per-Erik – Practical Assessment, Research & Evaluation, 2015
When subscores on a test are reported to the test taker, the appropriateness of reporting them depends on whether they provide useful information above what is provided by the total score. Subscores that fail to do so lack adequate psychometric quality and should not be reported. There are several methods for examining the quality of subscores,…
Descriptors: Evaluation Methods, Psychometrics, Scores, Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Hermans, Heidi; van der Pas, Femke H.; Evenhuis, Heleen M. – Research in Developmental Disabilities: A Multidisciplinary Journal, 2011
Background: In the last decades several instruments measuring anxiety in adults with intellectual disabilities have been developed. Aim: To give an overview of the characteristics and psychometric properties of self-report and informant-report instruments measuring anxiety in this group. Method: Systematic review of the literature. Results:…
Descriptors: Mental Retardation, Learning Disabilities, Interrater Reliability, Measures (Individuals)
Feinberg, Richard A. – ProQuest LLC, 2012
Subscores, also known as domain scores, diagnostic scores, or trait scores, can help determine test-takers' relative strengths and weaknesses and appropriately focus remediation. However, subscores often have poor psychometric properties, particularly reliability and distinctiveness (Folske, Gessaroli, & Swanson, 1999; Monaghan, 2006;…
Descriptors: Simulation, Tests, Testing, Scores
Smith, Julie M. – ProQuest LLC, 2011
This study examines the proposed Reliability Generalization (RG) method for studying reliability. RG employs the application of meta-analytic techniques similar to those used in validity generalization studies to examine reliability coefficients. This study explains why RG does not provide a proper research method for the study of reliability,…
Descriptors: Reliability, Generalization, Sampling, Research Methodology
Peer reviewed Peer reviewed
Direct linkDirect link
Pan, Tianshu; Yin, Yue – Psychological Methods, 2012
In the discussion of mean square difference (MSD) and standard error of measurement (SEM), Barchard (2012) concluded that the MSD between 2 sets of test scores is greater than 2(SEM)[superscript 2] and SEM underestimates the score difference between 2 tests when the 2 tests are not parallel. This conclusion has limitations for 2 reasons. First,…
Descriptors: Error of Measurement, Geometric Concepts, Tests, Structural Equation Models
Peer reviewed Peer reviewed
Direct linkDirect link
Kim, Sunae; Kalish, Charles W.; Harris, Paul L. – Cognitive Development, 2012
Prior work shows that children can make inductive inferences about objects based on their labels rather than their appearance (Gelman, 2003). A separate line of research shows that children's trust in a speaker's label is selective. Children accept labels from a reliable speaker over an unreliable speaker (e.g., Koenig & Harris, 2005). In the…
Descriptors: Logical Thinking, Inferences, Classification, Young Children
Peer reviewed Peer reviewed
Direct linkDirect link
Reynolds, Jesse S.; Treu, Judith A.; Njike, Valentine; Walker, Jennifer; Smith, Erica; Katz, Catherine S.; Katz, David L. – Journal of Nutrition Education and Behavior, 2012
Objective: To determine the reliability and validity of a 10-item questionnaire, the Food Label Literacy for Applied Nutrition Knowledge questionnaire. Methods: Participants were elementary school children exposed to a 90-minute school-based nutrition program. Reliability was assessed via Cronbach alpha and intraclass correlation coefficient…
Descriptors: Elementary School Students, Age, Nutrition, Validity
Peer reviewed Peer reviewed
Direct linkDirect link
Shin, Yongyun; Raudenbush, Stephen W. – Psychometrika, 2012
Social scientists are frequently interested in assessing the qualities of social settings such as classrooms, schools, neighborhoods, or day care centers. The most common procedure requires observers to rate social interactions within these settings on multiple items and then to combine the item responses to obtain a summary measure of setting…
Descriptors: Generalizability Theory, Neighborhoods, Intervals, Child Care Centers
Peer reviewed Peer reviewed
Direct linkDirect link
Sandell, Rolf; Kimber, Birgitta; Andersson, Marie; Elg, Mattias; Fharm, Linus; Gustafsson, Niklas; Soderbaum, Wendela – Educational Psychology in Practice, 2012
This is a psychometric analysis of an instrument to assess the socio-emotional development of school students, How I Feel (HIF), developed as a situational judgment test, with scoring based on expert judgments. The HIF test was administered in grades 4-9, 1999-2005. Internal consistency, retest reliability, and year-to-year stability were…
Descriptors: Evaluation Methods, Emotional Development, Psychometrics, Construct Validity
Peer reviewed Peer reviewed
Direct linkDirect link
Suto, Irenka; Nadas, Rita; Bell, John – Research Papers in Education, 2011
Accurate marking is crucial to the reliability and validity of public examinations, in England and internationally. Factors contributing to accuracy have been conceptualised as affecting either marking task demands or markers' personal expertise. The aim of this empirical study was to develop this conceptualisation through investigating the…
Descriptors: Academic Achievement, Examiners, Biology, Foreign Countries
Peer reviewed Peer reviewed
Direct linkDirect link
Touchie, Claire; Humphrey-Murto, Susan; Ainslie, Martha; Myers, Kathryn; Wood, Timothy J. – Advances in Health Sciences Education, 2010
Oral examinations have become more standardized over recent years. Traditionally a small number of raters were used for this type of examination. Past studies suggested that more raters should improve reliability. We compared the results of a multi-station structured oral examination using two different rater models, those based in a station,…
Descriptors: Interrater Reliability, Internal Medicine, Evaluation Methods, Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Dory, Valerie; Gagnon, Robert; Charlin, Bernard – Advances in Health Sciences Education, 2010
Case-specificity, i.e., variability of a subject's performance across cases, has been a consistent finding in medical education. It has important implications for assessment validity and reliability. Its root causes remain a matter of discussion. One hypothesis, content-specificity, links variability of performance to variable levels of relevant…
Descriptors: Medical Education, Trainees, English (Second Language), Error of Measurement
Previous Page | Next Page »
Pages: 1  |  2  |  3  |  4  |  5