Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 9 |
Since 2016 (last 10 years) | 27 |
Since 2006 (last 20 years) | 84 |
Descriptor
Models | 103 |
Scores | 103 |
Reliability | 63 |
Test Reliability | 27 |
Validity | 23 |
Correlation | 21 |
Statistical Analysis | 21 |
Comparative Analysis | 20 |
Interrater Reliability | 19 |
Academic Achievement | 15 |
Test Items | 15 |
More ▼ |
Source
Author
Haberman, Shelby J. | 3 |
Lee, Guemin | 3 |
Chiang, Hanley | 2 |
DeMars, Christine E. | 2 |
Gill, Brian | 2 |
Lipscomb, Stephen | 2 |
Louie, Josephine | 2 |
O'Dwyer, Laura | 2 |
Parker, Caroline E. | 2 |
Petscher, Yaacov | 2 |
Raykov, Tenko | 2 |
More ▼ |
Publication Type
Education Level
Elementary Education | 12 |
Higher Education | 12 |
Postsecondary Education | 10 |
Elementary Secondary Education | 8 |
Grade 4 | 5 |
Grade 5 | 5 |
Grade 8 | 5 |
Secondary Education | 5 |
High Schools | 4 |
Grade 3 | 3 |
Grade 6 | 3 |
More ▼ |
Audience
Policymakers | 1 |
Practitioners | 1 |
Researchers | 1 |
Teachers | 1 |
Location
Pennsylvania | 4 |
Egypt | 2 |
Florida | 2 |
Minnesota | 2 |
Netherlands | 2 |
New Hampshire | 2 |
Norway | 2 |
Portugal | 2 |
Rhode Island | 2 |
United States | 2 |
Vermont | 2 |
More ▼ |
Laws, Policies, & Programs
No Child Left Behind Act 2001 | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Does not meet standards | 1 |
The AI Teacher Test: Measuring the Pedagogical Ability of Blender and GPT-3 in Educational Dialogues
Tack, Anaïs; Piech, Chris – International Educational Data Mining Society, 2022
How can we test whether state-of-the-art generative models, such as Blender and GPT-3, are good AI teachers, capable of replying to a student in an educational dialogue? Designing an AI teacher test is challenging: although evaluation methods are much-needed, there is no off-the-shelf solution to measuring pedagogical ability. This paper reports…
Descriptors: Artificial Intelligence, Dialogs (Language), Bayesian Statistics, Decision Making
Nnamdi Chika Ezike – ProQuest LLC, 2022
Fitting wrongly specified models to observed data may lead to invalid inferences about the model parameters of interest. The current study investigated the performance of the posterior predictive model checking (PPMC) approach in detecting model-data misfit of the hierarchical rater model (HRM). The HRM is a rater-mediated model that incorporates…
Descriptors: Prediction, Models, Interrater Reliability, Item Response Theory
DeMars, Christine E. – Applied Measurement in Education, 2021
Estimation of parameters for the many-facets Rasch model requires that conditional on the values of the facets, such as person ability, item difficulty, and rater severity, the observed responses within each facet are independent. This requirement has often been discussed for the Rasch models and 2PL and 3PL models, but it becomes more complex…
Descriptors: Item Response Theory, Test Items, Ability, Scores
Shin, Jinnie; Gierl, Mark J. – Language Testing, 2021
Automated essay scoring (AES) has emerged as a secondary or as a sole marker for many high-stakes educational assessments, in native and non-native testing, owing to remarkable advances in feature engineering using natural language processing, machine learning, and deep-neural algorithms. The purpose of this study is to compare the effectiveness…
Descriptors: Scoring, Essays, Writing Evaluation, Computer Software
Tim Jacobbe; Bob delMas; Brad Hartlaub; Jeff Haberstroh; Catherine Case; Steven Foti; Douglas Whitaker – Numeracy, 2023
The development of assessments as part of the funded LOCUS project is described. The assessments measure students' conceptual understanding of statistics as outlined in the GAISE PreK-12 Framework. Results are reported from a large-scale administration to 3,430 students in grades 6 through 12 in the United States. Items were designed to assess…
Descriptors: Statistics Education, Common Core State Standards, Student Evaluation, Elementary School Students
Shaharim, Saidatul Ainoor; Ishak, Nor Asniza; Zaharudin, Rozniza – Journal of Science and Mathematics Education in Southeast Asia, 2021
Purpose: This study aims to assess the validity and reliability of the Psycho-B'GREAT Module developed according to the ASSURE's Module Development Model. Method: The content validity of the Psycho-B'GREAT Module was assessed by ten experts in the fields related in teaching and learning (T&L) and biology education. In testing the…
Descriptors: Content Validity, Models, Reliability, Scores
Rubright, Jonathan D. – Educational Measurement: Issues and Practice, 2018
Performance assessments, scenario-based tasks, and other groups of items carry a risk of violating the local item independence assumption made by unidimensional item response theory (IRT) models. Previous studies have identified negative impacts of ignoring such violations, most notably inflated reliability estimates. Still, the influence of this…
Descriptors: Performance Based Assessment, Item Response Theory, Models, Test Reliability
Al-Jarf, Reima – Online Submission, 2023
This article aims to give a comprehensive guide to planning and designing vocabulary tests which include Identifying the skills to be covered by the test; outlining the course content covered; preparing a table of specifications that shows the skill, content topics and number of questions allocated to each; and preparing the test instructions. The…
Descriptors: Vocabulary Development, Learning Processes, Test Construction, Course Content
Botarleanu, Robert-Mihai; Dascalu, Mihai; Watanabe, Micah; Crossley, Scott Andrew; McNamara, Danielle S. – Grantee Submission, 2022
Age of acquisition (AoA) is a measure of word complexity which refers to the age at which a word is typically learned. AoA measures have shown strong correlations with reading comprehension, lexical decision times, and writing quality. AoA scores based on both adult and child data have limitations that allow for error in measurement, and increase…
Descriptors: Age Differences, Vocabulary Development, Correlation, Reading Comprehension
Yang, Hongwei; Su, Jian – International Review of Research in Open and Distributed Learning, 2021
The study revisited the community of inquiry (CoI) instrument for construct revalidation. To that end, the study used confirmatory factor analysis (CFA) to examine four competing models (unidimensional, correlated-factor, second-order factor, and bifactor models) on model fit statistics computed using parameter estimates from a statistical…
Descriptors: Inquiry, Communities of Practice, Factor Analysis, Correlation
Blondin, Carolyn A.; Voils, Kyle; Galyon, Charles E.; Williams, Robert L. – Journal on Excellence in College Teaching, 2015
Concepts from the Response-to-Intervention (RTI) Model were used to promote a successful course outcome for students at risk for making low grades in an entry-level college course. The first exam served as a universal screener to identify students who could potentially benefit from RTI assistance. The researchers developed a tiered coaching…
Descriptors: Response to Intervention, Models, At Risk Students, Coaching (Performance)
Miciak, Jeremy; Taylor, W. Pat; Stuebing, Karla K.; Fletcher, Jack M. – Journal of Psychoeducational Assessment, 2018
We investigated the classification accuracy of learning disability (LD) identification methods premised on the identification of an intraindividual pattern of processing strengths and weaknesses (PSW) method using multiple indicators for all latent constructs. Known LD status was derived from latent scores; values at the observed level identified…
Descriptors: Accuracy, Learning Disabilities, Classification, Identification
Smogorzewska, Joanna; Szumski, Grzegorz; Grygiel, Pawel – Developmental Psychology, 2019
The main aims of this study were to further validate the Children's Social Understanding Scale (CSUS), which is a parent-report measure developed by Tahiroglu and colleagues, and to fill in some gaps in the existing research. Our study included more than 700 Polish parents from a diverse educational background who had children with disabilities,…
Descriptors: Foreign Countries, Children, Theory of Mind, Disabilities
Jayashankar, Shailaja; Sridaran, R. – Education and Information Technologies, 2017
Teachers are thrown open to abundance of free text answers which are very daunting to read and evaluate. Automatic assessments of open ended answers have been attempted in the past but none guarantees 100% accuracy. In order to deal with the overload involved in this manual evaluation, a new tool becomes necessary. The unique superlative model…
Descriptors: Word Frequency, Models, Electronic Learning, Student Evaluation
Zaidi, Nikki L.; Swoboda, Christopher M.; Kelcey, Benjamin M.; Manuel, R. Stephen – Advances in Health Sciences Education, 2017
The extant literature has largely ignored a potentially significant source of variance in multiple mini-interview (MMI) scores by "hiding" the variance attributable to the sample of attributes used on an evaluation form. This potential source of hidden variance can be defined as rating items, which typically comprise an MMI evaluation…
Descriptors: Interviews, Scores, Generalizability Theory, Monte Carlo Methods