Publication Date
In 2025 | 0 |
Since 2024 | 2 |
Since 2021 (last 5 years) | 5 |
Since 2016 (last 10 years) | 22 |
Since 2006 (last 20 years) | 66 |
Descriptor
Scores | 111 |
Reliability | 63 |
Test Reliability | 43 |
Test Validity | 24 |
Error of Measurement | 22 |
Validity | 22 |
Evaluation Methods | 18 |
Psychometrics | 17 |
Test Construction | 14 |
Interrater Reliability | 13 |
Student Evaluation | 12 |
More ▼ |
Source
Author
Erford, Bradley T. | 3 |
Feldt, Leonard S. | 3 |
Kolen, Michael J. | 3 |
Raykov, Tenko | 3 |
Davis, John L. | 2 |
Dimitrov, Dimiter M. | 2 |
Fan, Xitao | 2 |
Goldschmidt, Pete | 2 |
Harris, Deborah J. | 2 |
Hays, Danica G. | 2 |
Heritage, Margaret | 2 |
More ▼ |
Publication Type
Education Level
Audience
Researchers | 6 |
Practitioners | 3 |
Administrators | 2 |
Teachers | 2 |
Location
Maryland | 2 |
Pennsylvania | 2 |
United States | 2 |
Australia | 1 |
Belgium | 1 |
Canada | 1 |
China | 1 |
Florida | 1 |
India | 1 |
Israel | 1 |
Philippines | 1 |
More ▼ |
Laws, Policies, & Programs
No Child Left Behind Act 2001 | 3 |
Race to the Top | 2 |
Education for All Handicapped… | 1 |
Education of the Handicapped… | 1 |
Individuals with Disabilities… | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Ian Jones; Ben Davies – International Journal of Research & Method in Education, 2024
Educational researchers often need to construct precise and reliable measurement scales of complex and varied representations such as participants' written work, videoed lesson segments and policy documents. Developing such scales using can be resource-intensive and time-consuming, and the outcomes are not always reliable. Here we present…
Descriptors: Educational Research, Comparative Analysis, Educational Researchers, Measurement
Richard S. Balkin; Quentin Hunter; Bradley T. Erford – Measurement and Evaluation in Counseling and Development, 2024
We describe best practices in reporting reliability estimates in counseling research with consideration to precision, generalization, and diverse populations. We provide a historical context to reporting reliability estimates, the limitations of past practices, and new methods to address reliability generalization. We highlight best practices…
Descriptors: Best Practices, Reliability, Counseling, Research
Wolkowitz, Amanda A. – Journal of Educational Measurement, 2021
Decision consistency (DC) is the reliability of a classification decision based on a test score. In professional credentialing, the decision is often a high-stakes pass/fail decision. The current methods for estimating DC are computationally complex. The purpose of this research is to provide a computationally and conceptually simple method for…
Descriptors: Decision Making, Reliability, Classification, Scores
Lane, Suzanne – Journal of Educational Measurement, 2019
Rater-mediated assessments require the evaluation of the accuracy and consistency of the inferences made by the raters to ensure the validity of score interpretations and uses. Modeling rater response processes allows for a better understanding of how raters map their representations of the examinee performance to their representation of the…
Descriptors: Responses, Accuracy, Validity, Interrater Reliability
Petscher, Y.; Pentimonti, J.; Stanley, C. – National Center on Improving Literacy, 2019
Reliability is the consistency of a set of scores that are designed to measure the same thing. Reliability is a statistical property of scores that must be demonstrated rather than assumed.
Descriptors: Scores, Measurement, Test Reliability, Error Patterns
Lenz, A. Stephen; Ault, Haley; Balkin, Richard S.; Barrio Minton, Casey; Erford, Bradley T.; Hays, Danica G.; Kim, Bryan S. K.; Li, Chi – Measurement and Evaluation in Counseling and Development, 2022
In April 2021, The Association for Assessment and Research in Counseling Executive Council commissioned a time-referenced task group to revise the Responsibilities of Users of Standardized Tests (RUST) Statement (3rd edition) published by the Association for Assessment in Counseling (AAC) in 2003. The task group developed a work plan to implement…
Descriptors: Responsibility, Standardized Tests, Counselor Training, Ethics
Petscher, Y.; Pentimonti, J.; Stanley, C. – National Center on Improving Literacy, 2019
Validity is broadly defined as how well something measures what it's supposed to measure. The reliability and validity of scores from assessments are two concepts that are closely knit together and feed into each other.
Descriptors: Screening Tests, Scores, Test Validity, Test Reliability
Nicewander, W. Alan – Educational and Psychological Measurement, 2019
This inquiry is focused on three indicators of the precision of measurement--conditional on fixed values of ?, the latent variable of item response theory (IRT). The indicators that are compared are (1) The traditional, conditional standard errors, s(eX|?) = CSEM; (2) the IRT-based conditional standard errors, s[subscript irt](eX|?)=C[subscript…
Descriptors: Measurement, Accuracy, Scores, Error of Measurement
Bardhoshi, Gerta; Erford, Bradley T. – Measurement and Evaluation in Counseling and Development, 2017
Precision is a key facet of test development, with score reliability determined primarily according to the types of error one wants to approximate and demonstrate. This article identifies and discusses several primary forms of reliability estimation: internal consistency (i.e., split-half, KR-20, a), test-retest, alternate forms, interscorer, and…
Descriptors: Scores, Test Reliability, Accuracy, Pretests Posttests
Center on Standards and Assessments Implementation, 2018
Reliability is a measure of consistency. It is the degree to which student results are the same when they take the same test on different occasions, when different scorers score the same item or task, and when different but equivalent tests are taken at the same time or at different times. Reliability is about making sure that different test forms…
Descriptors: Test Reliability, Test Validity, Student Evaluation, Test Bias
Fitzgerald, Jill; Shanahan, Timothy E. – International Literacy Association, 2020
Reading scores exist for a continuum of purposes, from informal assessment to formal standardized tests. This brief aims to answer the question: What matters most for elementary-grade teachers when thinking about reading scores, and what could policymakers do to help teachers? Three positions worth pursuing in this regard are shared: (1) every…
Descriptors: Reading Achievement, Scores, Elementary School Students, Elementary School Teachers
Dumas, Denis; McNeish, Daniel; Schreiber-Gregory, Deanna; Durning, Steven J.; Torre, Dario – AERA Online Paper Repository, 2020
Dynamic Measurement Modeling (DMM) is a psychometric paradigm that uses longitudinal data to estimate students' capacity to learn over the course of an educational program (i.e., growth scores). Here, we provide justification for this approach in health professions education and demonstrate its proof of concept with three time-points of USMLE Step…
Descriptors: Allied Health Occupations Education, Measurement Techniques, Psychometrics, Longitudinal Studies
Semsar, Katharine; Brownell, Sara; Couch, Brian A.; Crowe, Alison J.; Smith, Michelle K.; Summers, Mindi M.; Wright, Christian D.; Knight, Jennifer K. – Advances in Physiology Education, 2019
We describe the development of a new, freely available, online, programmatic-level assessment tool, Measuring Achievement and Progress in Science in Physiology, or Phys-MAPS (http://cperl.lassp.cornell.edu/bio-maps). Aligned with the conceptual frameworks of Core Principles of Physiology, and Vision and Change Core Concepts, Phys-MAPS can be used…
Descriptors: Physiology, Science Instruction, Science Tests, Computer Assisted Testing
Feinberg, Richard A.; Wainer, Howard – Educational Measurement: Issues and Practice, 2014
Subscores are often used to indicate test-takers' relative strengths and weaknesses and so help focus remediation. But a subscore is not worth reporting if it is too unreliable to believe or if it contains no information that is not already contained in the total score. It is possible, through the use of a simple linear equation provided in…
Descriptors: Scores, Equations (Mathematics), Prediction, Reliability
Regional Educational Laboratory Mid-Atlantic, 2023
This Snapshot highlights key findings from a study that used Bayesian stabilization to improve the reliability (long-term stability) of subgroup proficiency measures that the Pennsylvania Department of Education (PDE) uses to identify schools for Targeted Support and Improvement (TSI) or Additional Targeted Support and Improvement (ATSI). The…
Descriptors: At Risk Students, Low Achievement, Error of Measurement, Measurement Techniques