ERIC - Search Results

Publication Date

In 2025	0
Since 2024	2
Since 2021 (last 5 years)	11
Since 2016 (last 10 years)	38
Since 2006 (last 20 years)	76

Descriptor

Comparative Analysis	131
Test Items	131
Test Reliability	85
Difficulty Level	37
Test Validity	36
Foreign Countries	34
Reliability	34
Test Construction	31
Scores	30
Item Analysis	29
Item Response Theory	28
Statistical Analysis	27
Correlation	26
Test Format	23
Multiple Choice Tests	21
Psychometrics	21
Scoring	20
Higher Education	17
Mathematics Tests	17
Interrater Reliability	16
Language Tests	14
Achievement Tests	13
Computer Assisted Testing	12
Student Evaluation	12
Mathematical Models	11
More ▼

Publication Type

Reports - Research	93
Journal Articles	78
Speeches/Meeting Papers	28
Reports - Evaluative	20
Reports - Descriptive	7
Tests/Questionnaires	6
Dissertations/Theses -…	3
Information Analyses	3
Books	2
Collected Works - General	2
Collected Works - Serials	2
Non-Print Media	1
Numerical/Quantitative Data	1
Opinion Papers	1
Reference Materials - General	1
Reports - General	1
More ▼

Education Level

Higher Education	23
Postsecondary Education	17
Secondary Education	9
Elementary Education	8
High Schools	5
Early Childhood Education	4
Elementary Secondary Education	3
Intermediate Grades	2
Middle Schools	2
Primary Education	2
Grade 2	1
Grade 4	1
Grade 6	1
Kindergarten	1
More ▼

Audience

Researchers	3
Administrators	1
Parents	1
Policymakers	1
Teachers	1

Location

Germany	4
Canada	3
United States	3
Colorado	2
District of Columbia	2
Georgia	2
India	2
Iran	2
Japan	2
Nevada	2
New York	2
Turkey	2
Turkey (Ankara)	2
United Kingdom (England)	2
Washington	2
Australia	1
Florida	1
France	1
Idaho	1
Illinois	1
Indonesia	1
Israel	1
Italy (Rome)	1
Maryland	1
Minnesota	1
More ▼

Laws, Policies, & Programs

What Works Clearinghouse Rating

Showing 1 to 15 of 131 results Save | Export

Generating Social and Emotional Skill Items: Humans vs. ChatGPT. ACT Research. Issue Brief

Download full text

Kate E. Walton; Cristina Anguiano-Carrasco – ACT, Inc., 2024

Large language models (LLMs), such as ChatGPT, are becoming increasingly prominent. Their use is becoming more and more popular to assist with simple tasks, such as summarizing documents, translating languages, rephrasing sentences, or answering questions. Reports like McKinsey's (Chui, & Yee, 2023) estimate that by implementing LLMs,…

Descriptors: Artificial Intelligence, Man Machine Systems, Natural Language Processing, Test Construction

A Comparison of Latent Semantic Analysis and Latent Dirichlet Allocation in Educational Measurement

Peer reviewed

Direct link

Jordan M. Wheeler; Allan S. Cohen; Shiyu Wang – Journal of Educational and Behavioral Statistics, 2024

Topic models are mathematical and statistical models used to analyze textual data. The objective of topic models is to gain information about the latent semantic space of a set of related textual data. The semantic space of a set of textual data contains the relationship between documents and words and how they are used. Topic models are becoming…

Descriptors: Semantics, Educational Assessment, Evaluators, Reliability

The Concurrent Validity of Comparative Judgement Outcomes Compared with Marks

Download full text

Gill, Tim – Research Matters, 2022

In Comparative Judgement (CJ) exercises, examiners are asked to look at a selection of candidate scripts (with marks removed) and order them in terms of which they believe display the best quality. By including scripts from different examination sessions, the results of these exercises can be used to help with maintaining standards. Results from…

Descriptors: Comparative Analysis, Decision Making, Scripts, Standards

Treatments of Differential Item Functioning: A Comparison of Four Methods

Peer reviewed

Direct link

Liu, Xiaowen; Jane Rogers, H. – Educational and Psychological Measurement, 2022

Test fairness is critical to the validity of group comparisons involving gender, ethnicities, culture, or treatment conditions. Detection of differential item functioning (DIF) is one component of efforts to ensure test fairness. The current study compared four treatments for items that have been identified as showing DIF: deleting, ignoring,…

Descriptors: Item Analysis, Comparative Analysis, Culture Fair Tests, Test Validity

Reliability and Validity of Methods to Assess Undergraduate Healthcare Student Performance in Pharmacology: Comparison of Open Book versus Time-Limited Closed Book Examinations

Peer reviewed
PDF on ERIC

Download full text

David Bell; Vikki O'Neill; Vivienne Crawford – Practitioner Research in Higher Education, 2023

We compared the influence of open-book extended duration versus closed book time-limited format on reliability and validity of written assessments of pharmacology learning outcomes within our medical and dental courses. Our dental cohort undertake a mid-year test (30xfree-response short answer to a question, SAQ) and end-of-year paper (4xSAQ,…

Descriptors: Undergraduate Students, Pharmacology, Pharmaceutical Education, Test Format

Measuring Language Ability of Students with Compensatory Multidimensional CAT: A Post-Hoc Simulation Study

Peer reviewed

Direct link

Ozdemir, Burhanettin; Gelbal, Selahattin – Education and Information Technologies, 2022

The computerized adaptive tests (CAT) apply an adaptive process in which the items are tailored to individuals' ability scores. The multidimensional CAT (MCAT) designs differ in terms of different item selection, ability estimation, and termination methods being used. This study aims at investigating the performance of the MCAT designs used to…

Descriptors: Scores, Computer Assisted Testing, Test Items, Language Proficiency

Does Comparative Judgement of Scripts Provide an Effective Means of Maintaining Standards in Mathematics? Research Report

Download full text

Benton, Tom; Leech, Tony; Hughes, Sarah – Cambridge Assessment, 2020

In the context of examinations, the phrase "maintaining standards" usually refers to any activity designed to ensure that it is no easier (or harder) to achieve a given grade in one year than in another. Specifically, it tends to mean activities associated with setting examination grade boundaries. Benton et al (2020) describes a method…

Descriptors: Mathematics Tests, Equated Scores, Comparative Analysis, Difficulty Level

Changes in the Speed-Ability Relation through Different Treatments of Rapid Guessing

Peer reviewed

Direct link

Deribo, Tobias; Goldhammer, Frank; Kroehne, Ulf – Educational and Psychological Measurement, 2023

As researchers in the social sciences, we are often interested in studying not directly observable constructs through assessments and questionnaires. But even in a well-designed and well-implemented study, rapid-guessing behavior may occur. Under rapid-guessing behavior, a task is skimmed shortly but not read and engaged with in-depth. Hence, a…

Descriptors: Reaction Time, Guessing (Tests), Behavior Patterns, Bias

Item-Score Reliability in Empirical-Data Sets and Its Relationship with Other Item Indices

Peer reviewed

Direct link

Zijlmans, Eva A. O.; Tijmstra, Jesper; van der Ark, L. Andries; Sijtsma, Klaas – Educational and Psychological Measurement, 2018

Reliability is usually estimated for a total score, but it can also be estimated for item scores. Item-score reliability can be useful to assess the repeatability of an individual item score in a group. Three methods to estimate item-score reliability are discussed, known as method MS, method [lambda][subscript 6], and method CA. The item-score…

Descriptors: Test Items, Test Reliability, Correlation, Comparative Analysis

Developing the Diagnostic Test of Misconceptions of Fractions

Peer reviewed
PDF on ERIC

Download full text

Aleyna Altan; Zehra Taspinar Sener – Online Submission, 2023

This research aimed to develop a valid and reliable test to be used to detect sixth grade students' misconceptions and errors regarding the subject of fractions. A misconception diagnostic test has been developed that includes the concept of fractions, different representations of fractions, ordering and comparing fractions, equivalence of…

Descriptors: Diagnostic Tests, Mathematics Tests, Fractions, Misconceptions

A Comparative Analysis of the "Early Childhood Environment Rating Scale--Revised" and "Early Childhood Environment Rating Scale, Third Edition"

Peer reviewed
PDF on ERIC

Download full text

Direct link

Neitzel, Jennifer; Early, Diane; Sideris, John; LaForrett, Doré; Abel, Michael B.; Soli, Margaret; Davidson, Dawn L.; Haboush-Deloye, Amanda; Hestenes, Linda L.; Jenson, Denise; Johnson, Cindy; Kalas, Jennifer; Mamrak, Angela; Masterson, Marie L.; Mims, Sharon U.; Oya, Patti; Philson, Bobbi; Showalter, Megan; Warner-Richter, Mallory; Kortright Wood, Jill – Journal of Early Childhood Research, 2019

The Early Childhood Environment Rating Scales, including the "Early Childhood Environment Rating Scale--Revised" (Harms et al., 2005) and the "Early Childhood Environment Rating Scale, Third Edition" (Harms et al., 2015) are the most widely used observational assessments in early childhood learning environments. The most recent…

Descriptors: Rating Scales, Early Childhood Education, Educational Quality, Scoring

A Design for Comparing CTT and IRT in Test Assembly, Scoring and Argumentation: Differences among Reliability, Information and Validation

Peer reviewed

Direct link

Alqarni, Abdulelah Mohammed – Journal on Educational Psychology, 2019

This study compares the psychometric properties of reliability in Classical Test Theory (CTT), item information in Item Response Theory (IRT), and validation from the perspective of modern validity theory for the purpose of bringing attention to potential issues that might exist when testing organizations use both test theories in the same testing…

Descriptors: Test Theory, Item Response Theory, Test Construction, Scoring

Calibrated Parsing Items Evaluation: A Step towards Objectifying the Translation Assessment

Peer reviewed

Direct link

Akbari, Alireza; Shahnazari, Mohammadtaghi – Language Testing in Asia, 2019

The present research paper introduces a translation evaluation method called Calibrated Parsing Items Evaluation (CPIE hereafter). This evaluation method maximizes translators' performance through identifying the parsing items with an optimal p-docimology and d-index (item discrimination). This method checks all the possible parses (annotations)…

Descriptors: Test Items, Translation, Computer Software, Evaluators

Benthik Android Physics Comic Effectiveness for Vector Representation and Crtitical Thinking Students' Improvement

Peer reviewed
PDF on ERIC

Download full text

Maghfiroh, Anissa; Kuswanto, Heru – International Journal of Instruction, 2022

This research aims to reveal the effectiveness of the use of Kofie GeBoL media in improving (1) vector representation ability and (2) critical thinking ability in physics instruction. It is a descriptive quantitative study with the quasi-experiment design. It was conducted in two stages: empirical try out and implementation of Kofie GeboL to see…

Descriptors: Physics, Instructional Effectiveness, Critical Thinking, Thinking Skills

When near Means Related: Evidence from Three Web Survey Experiments on Inter-Item Correlations in Grid Questions

Peer reviewed

Direct link

Silber, Henning; Roßmann, Joss; Gummer, Tobias – International Journal of Social Research Methodology, 2018

In this article, we present the results of three question design experiments on inter-item correlations, which tested a grid design against a single-item design. The first and second experiments examined the inter-item correlations of a set with five and seven items, respectively, and the third experiment examined the impact of the question design…

Descriptors: Foreign Countries, Online Surveys, Experiments, Correlation

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Educational and Psychological…	11
ETS Research Report Series	8
Journal of Educational…	5
Online Submission	5
Advances in Health Sciences…	3
ProQuest LLC	3
Applied Measurement in…	2
Educational Research and…	2
ACT, Inc.	1
Anatolian Journal of Education	1
Applied Psychological…	1
Asia Pacific Education Review	1
Assessment & Evaluation in…	1
Assessment in Education:…	1
Biochemistry and Molecular…	1
British Journal of Language…	1
Cambridge Assessment	1
Center for Research on…	1
Cogent Education	1
College Board	1
College Student Journal	1
Communique	1
Edinburgh Working Papers in…	1
Education Digest: Essential…	1
Education and Information…	1
More ▼

Benson, Jeri	3
Guo, Hongwen	2
Kim, Sooyeon	2
Lunz, Mary E.	2
Reckase, Mark D.	2
Abel, Michael B.	1
Ackerman, Terry A.	1
Afflerbach, Peter	1
Ahmed, Tamim	1
Akbari, Alireza	1
Aktas, Elif	1
Aleyna Altan	1
Aliyu, Hassan	1
Allan S. Cohen	1
Almehrizi, Rashid S.	1
Alpayar, Cagla	1
Alqarni, Abdulelah Mohammed	1
Alvaro, Rosaria	1
Anwyll, Steve	1
Arth, Thomas O.	1
Attali, Yigal	1
Baghaei, Purya	1
Baghaei, Samira	1
Bagheri, Mohammad Sadegh	1
More ▼

SAT (College Admission Test)	4
Program for International…	2
ACT Assessment	1
Armed Services Vocational…	1
Comprehensive Tests of Basic…	1
Defining Issues Test	1
Early Childhood Environment…	1
Embedded Figures Test	1
International Association for…	1
International English…	1
Iowa Tests of Basic Skills	1
National Assessment of…	1
Peabody Individual…	1
Peabody Picture Vocabulary…	1
Progress in International…	1
School and College Ability…	1
Strong Interest Inventory	1
Test of English as a Foreign…	1
Trends in International…	1
Work Keys (ACT)	1
More ▼