Publication Date
In 2025 | 0 |
Since 2024 | 4 |
Since 2021 (last 5 years) | 12 |
Since 2016 (last 10 years) | 28 |
Since 2006 (last 20 years) | 69 |
Descriptor
Scoring | 116 |
Student Evaluation | 116 |
Interrater Reliability | 51 |
Test Reliability | 43 |
Reliability | 37 |
Test Validity | 33 |
Evaluation Methods | 31 |
Writing Evaluation | 31 |
Test Construction | 22 |
Foreign Countries | 21 |
Educational Assessment | 20 |
More ▼ |
Source
Author
Publication Type
Education Level
Higher Education | 26 |
Postsecondary Education | 23 |
Elementary Secondary Education | 15 |
Elementary Education | 12 |
Secondary Education | 10 |
Middle Schools | 8 |
High Schools | 7 |
Intermediate Grades | 5 |
Grade 4 | 3 |
Grade 5 | 3 |
Grade 6 | 3 |
More ▼ |
Audience
Teachers | 6 |
Practitioners | 4 |
Policymakers | 1 |
Researchers | 1 |
Students | 1 |
Location
Australia | 4 |
New York | 4 |
United Kingdom (England) | 4 |
Vermont | 4 |
California | 2 |
Connecticut | 2 |
Japan | 2 |
New Hampshire | 2 |
New Mexico | 2 |
Rhode Island | 2 |
Turkey | 2 |
More ▼ |
Laws, Policies, & Programs
Individuals with Disabilities… | 3 |
Every Student Succeeds Act… | 2 |
No Child Left Behind Act 2001 | 1 |
Race to the Top | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Saenz, David Arron – Online Submission, 2023
There is a vast body of literature documenting the positive impacts that rater training and calibration sessions have on inter-rater reliability as research indicates several factors including frequency and timing play crucial roles towards ensuring inter-rater reliability. Additionally, increasing amounts research indicate possible links in…
Descriptors: Interrater Reliability, Scoring, Training, Scoring Rubrics
Marcos Jiménez; María Zapata-Cáceres; Marcos Román-González; Gregorio Robles; Jesús Moreno-León; Estefanía Martín-Barroso – Journal of Science Education and Technology, 2024
Computational thinking (CT) is a multidimensional term that encompasses a wide variety of problem-solving skills related to the field of computer science. Unfortunately, standardized, valid, and reliable methods to assess CT skills in preschool children are lacking, compromising the reliability of the results reported in CT interventions. To…
Descriptors: Computation, Thinking Skills, Student Evaluation, Preschool Children
Susan K. Johnsen – Gifted Child Today, 2024
The author provides a checklist for educators who are selecting technically adequate tests for identifying and referring students for gifted education services and programs. The checklist includes questions related to how the test was normed, reliability and validity studies as well as questions related to types of scores, administration, and…
Descriptors: Test Selection, Academically Gifted, Gifted Education, Test Validity
Cristina Menescardi; Aida Carballo-Fazanes; Núria Ortega-Benavent; Isaac Estevan – Journal of Motor Learning and Development, 2024
The Canadian Agility and Movement Skill Assessment (CAMSA) is a valid and reliable circuit-based test of motor competence which can be used to assess children's skills in a live or recorded performance and then coded. We aimed to analyze the intrarater reliability of the CAMSA scores (total, time, and skill score) and time measured, by comparing…
Descriptors: Interrater Reliability, Evaluators, Scoring, Psychomotor Skills
Wind, Stefanie A. – Measurement: Interdisciplinary Research and Perspectives, 2022
In many performance assessments, one or two raters from the complete rater pool scores each performance, resulting in a sparse rating design, where there are limited observations of each rater relative to the complete sample of students. Although sparse rating designs can be constructed to facilitate estimation of student achievement, the…
Descriptors: Evaluators, Bias, Identification, Performance Based Assessment
Gayle Geschwind; Michael Vignal; Marcos D. Caballero; H.? J. Lewandowski – Physical Review Physics Education Research, 2024
The Survey of Physics Reasoning on Uncertainty Concepts in Experiments (SPRUCE) was designed to measure students' proficiency with measurement uncertainty concepts and practices across ten different assessment objectives to help facilitate the improvement of laboratory instruction focused on this important topic. To ensure the reliability and…
Descriptors: Measurement, Ambiguity (Context), Scientific Concepts, Physics
Kumar, Vivekanandan S.; Boulanger, David – International Journal of Artificial Intelligence in Education, 2021
This article investigates the feasibility of using automated scoring methods to evaluate the quality of student-written essays. In 2012, Kaggle hosted an Automated Student Assessment Prize contest to find effective solutions to automated testing and grading. This article: a) analyzes the datasets from the contest -- which contained hand-graded…
Descriptors: Automation, Scoring, Essays, Writing Evaluation
Shiroda, Megan; Uhl, Juli D.; Urban-Lurain, Mark; Haudek, Kevin C. – Journal of Science Education and Technology, 2022
Constructed response (CR) assessments allow students to demonstrate understanding of complex topics and provide teachers with deeper insight into student thinking. Computer scoring models (CSMs) remove the barrier of increased time and effort, making CR more accessible. As CSMs are commonly created using responses from research-intensive colleges…
Descriptors: Responses, Student Evaluation, Scoring, Models
Kocakulah, Aysel – Participatory Educational Research, 2022
The aim of this study is to develop and apply a rubric to evaluate the solutions proposed for questions about electromagnetic induction belonging to university second year pre-service teachers. In this study which has pretest-posttest quasi-experimental design with control group, teaching of the topic of electromagnetic induction was applied to…
Descriptors: Scoring Rubrics, Student Evaluation, Undergraduate Students, Problem Solving
Slepkov, Aaron D.; Godfrey, Alan T. K. – Applied Measurement in Education, 2019
The answer-until-correct (AUC) method of multiple-choice (MC) testing involves test respondents making selections until the keyed answer is identified. Despite attendant benefits that include improved learning, broad student adoption, and facile administration of partial credit, the use of AUC methods for classroom testing has been extremely…
Descriptors: Multiple Choice Tests, Test Items, Test Reliability, Scores
Koçak, Duygu – International Electronic Journal of Elementary Education, 2020
One of the most commonly used methods for measuring higher-order thinking skills such as problem-solving or written expression is open-ended items. Three main approaches are used to evaluate responses to open-ended items: general evaluation, rating scales, and rubrics. In order to measure and improve problem-solving skills of students, firstly, an…
Descriptors: Interrater Reliability, Item Response Theory, Test Items, Rating Scales
Polat, Murat; Turhan, Nihan Sölpük – International Journal of Curriculum and Instruction, 2021
Scoring language learners' speaking skills is open to a number of measurement errors since raters' personal judgements could involve in the process. Different grading designs in which raters score a student's whole speaking skills or a specific dimension of the speaking performance could be settled to control and minimize the amount of the error…
Descriptors: Language Tests, Scoring, Speech Communication, State Universities
Popham, W. James – ASCD, 2018
What is assessment literacy? It is a handful of fundamental understandings about the testing concepts and procedures that influence educational decisions. And it just might be the most cost-effective means of real school improvement. With characteristic humor and aplomb, assessment expert W. James Popham strips away the psychometrician-speak and…
Descriptors: Student Evaluation, Educational Testing, Test Validity, Test Reliability
International Journal of Testing, 2019
These guidelines describe considerations relevant to the assessment of test takers in or across countries or regions that are linguistically or culturally diverse. The guidelines were developed by a committee of experts to help inform test developers, psychometricians, test users, and test administrators about fairness issues in support of the…
Descriptors: Test Bias, Student Diversity, Cultural Differences, Language Usage
Tavares, Walter; Brydges, Ryan; Myre, Paul; Prpic, Jason; Turner, Linda; Yelle, Richard; Huiskamp, Maud – Advances in Health Sciences Education, 2018
Assessment of clinical competence is complex and inference based. Trustworthy and defensible assessment processes must have favourable evidence of validity, particularly where decisions are considered high stakes. We aimed to organize, collect and interpret validity evidence for a high stakes simulation based assessment strategy for certifying…
Descriptors: Competence, Simulation, Allied Health Personnel, Certification