Publication Date
In 2025 | 0 |
Since 2024 | 6 |
Since 2021 (last 5 years) | 11 |
Since 2016 (last 10 years) | 31 |
Since 2006 (last 20 years) | 105 |
Descriptor
Evaluation Methods | 187 |
Models | 187 |
Reliability | 88 |
Test Reliability | 72 |
Validity | 60 |
Test Validity | 56 |
Foreign Countries | 34 |
Measurement Techniques | 32 |
Interrater Reliability | 30 |
Student Evaluation | 30 |
Statistical Analysis | 27 |
More ▼ |
Source
Author
Publication Type
Education Level
Audience
Researchers | 15 |
Practitioners | 4 |
Administrators | 1 |
Location
Florida | 4 |
United Kingdom | 4 |
United Kingdom (England) | 4 |
Australia | 3 |
China | 3 |
Connecticut | 3 |
Japan | 3 |
New York | 3 |
Pennsylvania | 3 |
Spain | 3 |
Thailand | 3 |
More ▼ |
Laws, Policies, & Programs
Every Student Succeeds Act… | 2 |
Elementary and Secondary… | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Madeline A. Schellman; Matthew J. Madison – Grantee Submission, 2024
Diagnostic classification models (DCMs) have grown in popularity as stakeholders increasingly desire actionable information related to students' skill competencies. Longitudinal DCMs offer a psychometric framework for providing estimates of students' proficiency status transitions over time. For both cross-sectional and longitudinal DCMs, it is…
Descriptors: Diagnostic Tests, Classification, Models, Psychometrics
Deborah Oluwadele; Yashik Singh; Timothy Adeliyi – Electronic Journal of e-Learning, 2024
Validation is needed for any newly developed model or framework because it requires several real-life applications. The investment made into e-learning in medical education is daunting, as is the expectation for a positive return on investment. The medical education domain requires data-wise implementation of e-learning as the debate continues…
Descriptors: Electronic Learning, Evaluation Methods, Medical Education, Sustainability
The AI Teacher Test: Measuring the Pedagogical Ability of Blender and GPT-3 in Educational Dialogues
Tack, Anaïs; Piech, Chris – International Educational Data Mining Society, 2022
How can we test whether state-of-the-art generative models, such as Blender and GPT-3, are good AI teachers, capable of replying to a student in an educational dialogue? Designing an AI teacher test is challenging: although evaluation methods are much-needed, there is no off-the-shelf solution to measuring pedagogical ability. This paper reports…
Descriptors: Artificial Intelligence, Dialogs (Language), Bayesian Statistics, Decision Making
Ji, Xuejun Ryan; Wu, Amery D. – Educational Measurement: Issues and Practice, 2023
The Cross-Classified Mixed Effects Model (CCMEM) has been demonstrated to be a flexible framework for evaluating reliability by measurement specialists. Reliability can be estimated based on the variance components of the test scores. Built upon their accomplishment, this study extends the CCMEM to be used for evaluating validity evidence.…
Descriptors: Measurement, Validity, Reliability, Models
Theodore Tarnanidis; John Tarnanidis – European Journal of Educational Management, 2024
The research's aim is to assess the services offered by Greek public secondary education schools, with the intention of identifying any discrepancies between students' expectations and their perceptions of the final services provided. The gaps discovered indicate that the school's educational services are not meeting student expectations in the…
Descriptors: Foreign Countries, Educational Quality, Public Schools, High Schools
Price, Heather E.; Smith, Christian – Field Methods, 2021
To identify the dominant cultural models among parents transmitting faith to their children, we find few methodological guidelines to guide coding and analysis of semi-structured interviews. We thus developed a three-phase procedure for our research team. Phase-one follows Campbell et al. by unitizing on meanings rather than words/pages, including…
Descriptors: Semi Structured Interviews, Parents, Religion, Reliability
Wesolowski, Brian C.; Wind, Stefanie A. – Journal of Educational Measurement, 2019
Rater-mediated assessments are a common methodology for measuring persons, investigating rater behavior, and/or defining latent constructs. The purpose of this article is to provide a pedagogical framework for examining rater variability in the context of rater-mediated assessments using three distinct models. The first model is the observation…
Descriptors: Interrater Reliability, Models, Observation, Measurement
Keshavarz, Hamid; Norouzi, Yaghoub – New Review of Academic Librarianship, 2022
Owing to the extreme importance of evaluating the credibility of existing scientific websites, the present study sets out to measure a proposed model concerning the views and preferences of university students in Iran, when evaluating information. Data were collected by administrating a highly validated questionnaire among 487 students in ten top…
Descriptors: Credibility, Evaluation Methods, Scientific and Technical Information, Web Sites
Yun Long; Haifeng Luo; Yu Zhang – npj Science of Learning, 2024
This study explores the use of Large Language Models (LLMs), specifically GPT-4, in analysing classroom dialogue--a key task for teaching diagnosis and quality improvement. Traditional qualitative methods are both knowledge- and labour-intensive. This research investigates the potential of LLMs to streamline and enhance this process. Using…
Descriptors: Classroom Communication, Computational Linguistics, Chinese, Mathematics Instruction
Sujiyani Kassiavera; A. Suparmi; C. Cari; Sukarmin Sukarmin – Journal of Baltic Science Education, 2024
The challenge of accurately assessing critical thinking in physics education, particularly on topics like work and energy, remains a key issue for educators. The current study aims to address this challenge by exploring students' critical thinking abilities using two-tier test data analyzed through the Rasch model. Data were collected from…
Descriptors: Critical Thinking, Physics, Science Instruction, Foreign Countries
Radu Bogdan Toma – Journal of Early Adolescence, 2024
The Expectancy-Value model has been extensively used to understand students' achievement motivation. However, recent studies propose the inclusion of cost as a separate construct from values, leading to the development of the Expectancy-Value-Cost model. This study aimed to adapt Kosovich et al.'s ("The Journal of Early Adolescence", 35,…
Descriptors: Student Motivation, Student Attitudes, Academic Achievement, Mathematics Achievement
Shaharim, Saidatul Ainoor; Ishak, Nor Asniza; Zaharudin, Rozniza – Journal of Science and Mathematics Education in Southeast Asia, 2021
Purpose: This study aims to assess the validity and reliability of the Psycho-B'GREAT Module developed according to the ASSURE's Module Development Model. Method: The content validity of the Psycho-B'GREAT Module was assessed by ten experts in the fields related in teaching and learning (T&L) and biology education. In testing the…
Descriptors: Content Validity, Models, Reliability, Scores
Zheng, Boyang; Sun, Guiping; Wang, Hourong – SAGE Open, 2019
Traditional Chinese medicine (TCM) is an important component of China's medical system. How to educate TCM practitioners in China, therefore, has become a crucial issue. To contribute to this issue, the current research identified the competency model of TCM practitioners in China and developed an evaluation for TCM students. We combined Bloom's…
Descriptors: Medical Students, Correlation, Foreign Countries, Test Reliability
Bosch, Nigel; Paquette, Luc – Journal of Learning Analytics, 2018
Metrics including Cohen's kappa, precision, recall, and F[subscript 1] are common measures of performance for models of discrete student states, such as a student's affect or behaviour. This study examined discrete model metrics for previously published student model examples to identify situations where metrics provided differing perspectives on…
Descriptors: Models, Comparative Analysis, Prediction, Probability
Raj, Gaurav; Mahajan, Manish; Singh, Dheerendra – International Journal of Web-Based Learning and Teaching Technologies, 2020
In secure web application development, the role of web services will not continue if it is not trustworthy. Retaining customers with applications is one of the major challenges if the services are not reliable and trustworthy. This article proposes a trust evaluation and decision model where the authors have defined indirect attribute, trust,…
Descriptors: Trust (Psychology), Models, Decision Making, Computer Software