Publication Date
In 2025 | 0 |
Since 2024 | 5 |
Since 2021 (last 5 years) | 17 |
Since 2016 (last 10 years) | 26 |
Since 2006 (last 20 years) | 46 |
Descriptor
Evaluation Methods | 53 |
Scores | 53 |
Decision Making | 49 |
Models | 15 |
Comparative Analysis | 11 |
Student Evaluation | 11 |
Foreign Countries | 9 |
Academic Achievement | 8 |
Correlation | 7 |
Data Analysis | 7 |
Standardized Tests | 7 |
More ▼ |
Source
Author
Abbasi, Mehdi | 1 |
Adams, Sarah R. | 1 |
Algozzine, Bob | 1 |
Algozzine, Kate | 1 |
Allenger, Robert | 1 |
Ambühl, Mathias | 1 |
Anderman, Eric M. | 1 |
Anderman, Lynley H. | 1 |
Audrey Amrein-Beardsley | 1 |
Baird, Leonard L. | 1 |
Barley, Zoe A. | 1 |
More ▼ |
Publication Type
Education Level
Audience
Practitioners | 2 |
Teachers | 2 |
Researchers | 1 |
Students | 1 |
Location
Arizona | 2 |
Australia | 2 |
Netherlands | 2 |
South Korea | 2 |
Asia | 1 |
Brazil | 1 |
California | 1 |
Canada | 1 |
China | 1 |
Colorado | 1 |
Connecticut | 1 |
More ▼ |
Laws, Policies, & Programs
No Child Left Behind Act 2001 | 3 |
Elementary and Secondary… | 1 |
Elementary and Secondary… | 1 |
Individuals with Disabilities… | 1 |
Race to the Top | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Huan Liu – ProQuest LLC, 2024
In many large-scale testing programs, examinees are frequently categorized into different performance levels. These classifications are then used to make high-stakes decisions about examinees in contexts such as in licensure, certification, and educational assessments. Numerous approaches to estimating the consistency and accuracy of this…
Descriptors: Classification, Accuracy, Item Response Theory, Decision Making
Baumgartner, Michael; Ambühl, Mathias – Sociological Methods & Research, 2023
Consistency and coverage are two core parameters of model fit used by configurational comparative methods (CCMs) of causal inference. Among causal models that perform equally well in other respects (e.g., robustness or compliance with background theories), those with higher consistency and coverage are typically considered preferable. Finding the…
Descriptors: Causal Models, Evaluation Methods, Goodness of Fit, Scores
Jing Chen; Bei Fang; Hao Zhang; Xia Xue – Interactive Learning Environments, 2024
High dropout rate exists universally in massive open online courses (MOOCs) due to the separation of teachers and learners in space and time. Dropout prediction using the machine learning method is an extremely important prerequisite to identify potential at-risk learners to improve learning. It has attracted much attention and there have emerged…
Descriptors: MOOCs, Potential Dropouts, Prediction, Artificial Intelligence
Margarita Pivovarova; Audrey Amrein-Beardsley – Educational Assessment, Evaluation and Accountability, 2024
In this study, we estimated the relationship between two popular measures of teacher effectiveness--teachers' value-added model (VAM) estimates, represented in this study via median growth percentiles (MGPs), and teachers' observational scores, derived from the TAP System for Teacher and Student Advancement. We examined the relationship between…
Descriptors: Teacher Evaluation, Evaluation Methods, Value Added Models, Correlation
Markus T. Jansen; Ralf Schulze – Educational and Psychological Measurement, 2024
Thurstonian forced-choice modeling is considered to be a powerful new tool to estimate item and person parameters while simultaneously testing the model fit. This assessment approach is associated with the aim of reducing faking and other response tendencies that plague traditional self-report trait assessments. As a result of major recent…
Descriptors: Factor Analysis, Models, Item Analysis, Evaluation Methods
Hyemin Yoon; HyunJin Kim; Sangjin Kim – Measurement: Interdisciplinary Research and Perspectives, 2024
We have maintained the customer grade system that is being implemented to customers with excellent performance through customer segmentation for years. Currently, financial institutions that operate the customer grade system provide similar services based on the score calculation criteria, but the score calculation criteria vary from the financial…
Descriptors: Classification, Artificial Intelligence, Prediction, Decision Making
Wolkowitz, Amanda A. – Journal of Educational Measurement, 2021
Decision consistency (DC) is the reliability of a classification decision based on a test score. In professional credentialing, the decision is often a high-stakes pass/fail decision. The current methods for estimating DC are computationally complex. The purpose of this research is to provide a computationally and conceptually simple method for…
Descriptors: Decision Making, Reliability, Classification, Scores
Roosa, Tyler; Mischen, Pamela – International Journal of Sustainability in Higher Education, 2022
Purpose: The purpose of this study is to determine how organizational characteristics at higher education institutions (HEI) influence their sustainability performance as measured by the advancement of sustainability in higher education's sustainability tracking, assessment and rating system (STARS). Design/methodology/approach: This analysis…
Descriptors: Institutional Characteristics, Sustainability, Higher Education, Evaluation Methods
Paquot, Magali; Rubin, Rachel; Vandeweerd, Nathan – Language Learning, 2022
The main objective of this Methods Showcase Article is to show how the technique of adaptive comparative judgment, coupled with a crowdsourcing approach, can offer practical solutions to reliability issues as well as to address the time and cost difficulties associated with a text-based approach to proficiency assessment in L2 research. We…
Descriptors: Comparative Analysis, Decision Making, Language Proficiency, Reliability
De Los Reyes, Andres; Makol, Bridget A. – Grantee Submission, 2021
Clients display considerable variations in functioning across the contexts that encompass their social environments (e.g., home, school/workplace, peer interactions). No single measurement method can fully capture these variations. Yet, assessors must balance the need to accurately capture clients' clinical presentations, and at the same time…
Descriptors: Self Evaluation (Individuals), Mental Health, Scores, Rating Scales
Broumi, Said, Ed. – IGI Global, 2023
Fuzzy sets have experienced multiple expansions since their conception to enhance their capacity to convey complex information. Intuitionistic fuzzy sets, image fuzzy sets, q-rung orthopair fuzzy sets, and neutrosophic sets are a few of these extensions. Researchers and academics have acquired a lot of information about their theories and methods…
Descriptors: Theories, Mathematical Logic, Intuition, Decision Making
Wind, Stefanie A.; Walker, A. Adrienne – Educational Measurement: Issues and Practice, 2021
Many large-scale performance assessments include score resolution procedures for resolving discrepancies in rater judgments. The goal of score resolution is conceptually similar to person fit analyses: To identify students for whom observed scores may not accurately reflect their achievement. Previously, researchers have observed that…
Descriptors: Goodness of Fit, Performance Based Assessment, Evaluators, Decision Making
The AI Teacher Test: Measuring the Pedagogical Ability of Blender and GPT-3 in Educational Dialogues
Tack, Anaïs; Piech, Chris – International Educational Data Mining Society, 2022
How can we test whether state-of-the-art generative models, such as Blender and GPT-3, are good AI teachers, capable of replying to a student in an educational dialogue? Designing an AI teacher test is challenging: although evaluation methods are much-needed, there is no off-the-shelf solution to measuring pedagogical ability. This paper reports…
Descriptors: Artificial Intelligence, Dialogs (Language), Bayesian Statistics, Decision Making
Thai, Thuy; Sheehan, Susan – Language Education & Assessment, 2022
In language performance tests, raters are important as their scoring decisions determine which aspects of performance the scores represent; however, raters are considered as one of the potential sources contributing to unwanted variability in scores (Davis, 2012). Although a great number of studies have been conducted to unpack how rater…
Descriptors: Rating Scales, Speech Communication, Second Language Learning, Second Language Instruction
Pang, Francine; Skehan, Peter – Modern Language Journal, 2021
This study uses a complexity-accuracy-lexis-fluency (CALF) framework to explore performance on 2 speaking tasks: a narrative picture-based task and an interactive decision-making task. A preliminary aim is to compare performance on the 2 tasks, using a wide range of CALF measures to explore where scores are similar and where they are different.…
Descriptors: Task Analysis, Speech Communication, Second Language Learning, Accuracy