ERIC - Search Results

Publication Date

In 2025	0
Since 2024	1
Since 2021 (last 5 years)	6
Since 2016 (last 10 years)	24
Since 2006 (last 20 years)	58

Descriptor

Comparative Analysis	159
Test Construction	159
Test Reliability	131
Test Validity	92
Test Items	31
Foreign Countries	27
Statistical Analysis	25
Higher Education	23
Item Analysis	20
Language Tests	20
Psychometrics	20
Reliability	20
Factor Analysis	19
Multiple Choice Tests	18
Scores	18
College Students	17
Correlation	17
Scoring	16
Testing	16
Achievement Tests	15
Interrater Reliability	15
Measurement Techniques	15
Test Format	15
Computer Assisted Testing	14
Criterion Referenced Tests	14
More ▼

Publication Type

Reports - Research	91
Journal Articles	63
Speeches/Meeting Papers	33
Reports - Evaluative	26
Reports - Descriptive	9
Collected Works - General	4
Numerical/Quantitative Data	4
Tests/Questionnaires	4
Books	3
Information Analyses	3
Dissertations/Theses -…	2
Dissertations/Theses -…	1
Guides - General	1
Guides - Non-Classroom	1
Non-Print Media	1
Opinion Papers	1
Reference Materials - General	1
Reports - General	1
More ▼

Education Level

Higher Education	17
Postsecondary Education	15
Secondary Education	10
Elementary Secondary Education	8
Elementary Education	6
High Schools	5
Intermediate Grades	3
Middle Schools	3
Grade 6	2
Grade 7	2
Grade 8	2
Junior High Schools	2
Grade 10	1
Grade 11	1
Grade 12	1
Grade 4	1
Grade 9	1
Preschool Education	1
More ▼

Audience

Practitioners	5
Teachers	4
Administrators	3
Policymakers	2
Researchers	2
Counselors	1
Parents	1
Support Staff	1

Location

Washington	3
Belgium	2
France	2
Greece	2
Illinois	2
New York	2
Portugal	2
Spain	2
Australia	1
Austria	1
California	1
Colorado	1
Cyprus	1
Denmark	1
District of Columbia	1
Europe	1
Florida	1
Georgia	1
Germany	1
Hong Kong	1
Hungary	1
Idaho	1
Indonesia	1
Iran	1
Ireland	1
More ▼

Laws, Policies, & Programs

What Works Clearinghouse Rating

Showing 1 to 15 of 159 results Save | Export

The AI Teacher Test: Measuring the Pedagogical Ability of Blender and GPT-3 in Educational Dialogues

Peer reviewed
PDF on ERIC

Download full text

Tack, Anaïs; Piech, Chris – International Educational Data Mining Society, 2022

How can we test whether state-of-the-art generative models, such as Blender and GPT-3, are good AI teachers, capable of replying to a student in an educational dialogue? Designing an AI teacher test is challenging: although evaluation methods are much-needed, there is no off-the-shelf solution to measuring pedagogical ability. This paper reports…

Descriptors: Artificial Intelligence, Dialogs (Language), Bayesian Statistics, Decision Making

Generating Social and Emotional Skill Items: Humans vs. ChatGPT. ACT Research. Issue Brief

Download full text

Kate E. Walton; Cristina Anguiano-Carrasco – ACT, Inc., 2024

Large language models (LLMs), such as ChatGPT, are becoming increasingly prominent. Their use is becoming more and more popular to assist with simple tasks, such as summarizing documents, translating languages, rephrasing sentences, or answering questions. Reports like McKinsey's (Chui, & Yee, 2023) estimate that by implementing LLMs,…

Descriptors: Artificial Intelligence, Man Machine Systems, Natural Language Processing, Test Construction

German, Portuguese and Spanish Versions of the Revised Short Form of the Physical Self-Inventory (PSI-S-"R")

Peer reviewed

Direct link

Maïano, Christophe; Morin, Alexandre J. S.; Tietjens, Maike; Bastos, Tânia; Luiggi, Maxime; Corredeira, Rui; Griffet, Jean; Sánchez-Oliva, David – Measurement in Physical Education and Exercise Science, 2023

The present study sought to examine the psychometric properties of new German, Portuguese, and Spanish versions of the Revised Short Form of the Physical Self-Inventory (PSI-S-"R"), and to contrast these properties against those from the original French version of this instrument. Participants (n = 1802) were 288 French youth, 177 German…

Descriptors: German, Portuguese, Spanish, Test Construction

Measuring Language Ability of Students with Compensatory Multidimensional CAT: A Post-Hoc Simulation Study

Peer reviewed

Direct link

Ozdemir, Burhanettin; Gelbal, Selahattin – Education and Information Technologies, 2022

The computerized adaptive tests (CAT) apply an adaptive process in which the items are tailored to individuals' ability scores. The multidimensional CAT (MCAT) designs differ in terms of different item selection, ability estimation, and termination methods being used. This study aims at investigating the performance of the MCAT designs used to…

Descriptors: Scores, Computer Assisted Testing, Test Items, Language Proficiency

Developing the Diagnostic Test of Misconceptions of Fractions

Peer reviewed
PDF on ERIC

Download full text

Aleyna Altan; Zehra Taspinar Sener – Online Submission, 2023

This research aimed to develop a valid and reliable test to be used to detect sixth grade students' misconceptions and errors regarding the subject of fractions. A misconception diagnostic test has been developed that includes the concept of fractions, different representations of fractions, ordering and comparing fractions, equivalence of…

Descriptors: Diagnostic Tests, Mathematics Tests, Fractions, Misconceptions

A Design for Comparing CTT and IRT in Test Assembly, Scoring and Argumentation: Differences among Reliability, Information and Validation

Peer reviewed

Direct link

Alqarni, Abdulelah Mohammed – Journal on Educational Psychology, 2019

This study compares the psychometric properties of reliability in Classical Test Theory (CTT), item information in Item Response Theory (IRT), and validation from the perspective of modern validity theory for the purpose of bringing attention to potential issues that might exist when testing organizations use both test theories in the same testing…

Descriptors: Test Theory, Item Response Theory, Test Construction, Scoring

The Impact of Q-Matrix Designs on Diagnostic Classification Accuracy in the Presence of Attribute Hierarchies

Peer reviewed

Direct link

Liu, Ren; Huggins-Manley, Anne Corinne; Bradshaw, Laine – Educational and Psychological Measurement, 2017

There is an increasing demand for assessments that can provide more fine-grained information about examinees. In response to the demand, diagnostic measurement provides students with feedback on their strengths and weaknesses on specific skills by classifying them into mastery or nonmastery attribute categories. These attributes often form a…

Descriptors: Matrices, Classification, Accuracy, Diagnostic Tests

Theoretical Model and Quantitative Assessment of Scientific Thinking and Reasoning

Peer reviewed

Direct link

Bao, Lei; Koenig, Kathleen; Xiao, Yang; Fritchman, Joseph; Zhou, Shaona; Chen, Cheng – Physical Review Physics Education Research, 2022

Abilities in scientific thinking and reasoning have been emphasized as core areas of initiatives, such as the Next Generation Science Standards or the College Board Standards for College Success in Science, which focus on the skills the future will demand of today's students. Although there is rich literature on studies of how these abilities…

Descriptors: Physics, Science Instruction, Teaching Methods, Thinking Skills

Development of the English Listening and Reading Computerized Revised Token Test into Cantonese: Validity, Reliability, and Sensitivity/Specificity in People with Aphasia and Healthy Controls

Peer reviewed

Direct link

Bakhtiar, Mehdi; Wong, Min Ney; Tsui, Emily Ka Yin; McNeil, Malcolm R. – Journal of Speech, Language, and Hearing Research, 2020

Purpose: This study reports the psychometric development of the Cantonese versions of the English Computerized Revised Token Test (CRTT) for persons with aphasia (PWAs) and healthy controls (HCs). Method: The English CRTT was translated into standard Chinese for the Reading--Word Fade version (CRTT-R-[subscript WF]-Cantonese) and into formal…

Descriptors: Psychometrics, Sino Tibetan Languages, Computer Assisted Testing, Aphasia

Improving STEM Program Quality in Out-of-School-Time: Tool Development and Validation

Peer reviewed

Direct link

Shah, Ashima Mathur; Wylie, Caroline; Gitomer, Drew; Noam, Gil – Science Education, 2018

In and out-of-school time (OST) experiences are viewed as complementary in contributing to students' interest, engagement, and performance in science, technology, engineering, and mathematics (STEM). While tools exist to measure quality in general afterschool settings and others to measure structured science classroom experiences, there is a need…

Descriptors: STEM Education, Educational Improvement, Educational Quality, After School Programs

Development of the International Ocean Literacy Survey: Measuring Knowledge across the World

Peer reviewed

Direct link

Fauville, Géraldine; Strang, Craig; Cannady, Matthew A.; Chen, Ying-Fang – Environmental Education Research, 2019

The Ocean Literacy movement began in the U.S. in the early 2000s, and has recently become an international effort. The focus on marine environmental issues and marine education is increasing, and yet it has been difficult to show progress of the ocean literacy movement, in part, because no widely adopted measurement tool exists. The International…

Descriptors: Marine Education, Environmental Education, Comparative Analysis, Factor Structure

Strengthening the Assessment Trajectory: Engaging Educators in the Policy Process. Making a Measurable Difference: Case Studies from the High-Quality Assessment Project

Download full text

McClellan, Catherine; Snyder, Rebecca; Woods-Murphy, Maryann; Basset, Katherine – National Network of State Teachers of the Year, 2018

Great teachers recognize great assessments. As policy and education leaders work to make sure state tests are measuring the problem-solving, writing, and critical-thinking skills students need for success, they should convene and rely on teachers to review test quality and help answer the question: Do the questions on our state test reflect…

Descriptors: Student Evaluation, Educational Quality, Standardized Tests, Test Items

Psychosexual Functioning of Cognitively-Able Adolescents with Autism Spectrum Disorder Compared to Typically Developing Peers: The Development and Testing of the Teen Transition Inventory--A Self- and Parent Report Questionnaire on Psychosexual Functioning

Peer reviewed

Direct link

Dekker, Linda P.; van der Vegt, Esther J. M.; van der Ende, Jan; Tick, Nouchka; Louwerse, Anneke; Maras, Athanasios; Verhulst, Frank C.; Greaves-Lord, Kirstin – Journal of Autism and Developmental Disorders, 2017

To gain further insight into psychosexual functioning, including behaviors, intrapersonal and interpersonal aspects, in adolescents with Autism Spectrum Disorder (ASD), comprehensive, multi-informant measures are needed. This study describes (1) the development of a new measure of psychosexual functioning in both parent- and self-reports (Teen…

Descriptors: Adolescents, Adolescent Development, Autism, Pervasive Developmental Disorders

English Usage in Daily Life by Turkish Students between 15-19 Years of Age: A Scale Development Study

Peer reviewed

Direct link

Deha Dogan, C.; Canan Karababa, Z.; Fulya Soguksu, A. – Educational Studies, 2017

The purpose of this study is to develop a valid and reliable scale to assess the level of English usage in daily life by students between 15 and 19 years of age, and to compare these students' scale scores according to their achievement levels in an English course. Five hundred and ninety-five participants were randomly selected from a universe.…

Descriptors: Language Usage, English (Second Language), Test Construction, Adolescents

A Comparison of Raw-to-Scale Conversion Consistency between Single- and Multiple-Linking Using a Nonequivalent Groups Anchor Test Design. Research Report. ETS RR-14-13

Peer reviewed
PDF on ERIC

Download full text

Liu, Jinghua; Guo, Hongwen; Dorans, Neil J. – ETS Research Report Series, 2014

Maintaining score interchangeability and scale consistency is crucial for any testing programs that administer multiple forms across years. The use of a multiple linking design, which involves equating a new form to multiple old forms and averaging the conversions, has been proposed to control scale drift. However, the use of multiple linking…

Descriptors: Comparative Analysis, Reliability, Test Construction, Equated Scores

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11

Educational and Psychological…	6
Language Testing	3
Assessment & Evaluation in…	2
International Journal of…	2
Journal of Applied Testing…	2
Journal of Autism and…	2
Journal of Consulting and…	2
Journal of Counseling…	2
Measurement and Evaluation in…	2
Measurement in Physical…	2
Online Submission	2
ProQuest LLC	2
ACT, Inc.	1
Acta Didactica Napocensia	1
Advances in Health Sciences…	1
American Journal of…	1
Applied Measurement in…	1
Child Study Journal	1
College Board	1
College Student Journal	1
College Teaching	1
Distance Education	1
ETS Research Report Series	1
Early Education and…	1
Edinburgh Working Papers in…	1
More ▼

Benson, Jeri	3
Ebel, Robert L.	3
Brown, James Dean	2
Chavez, Oscar	2
Crehan, Kevin D.	2
Frisbie, David A.	2
Grouws, Douglas A.	2
Haladyna, Tom	2
Mott, Michael S.	2
Papick, Ira	2
Pollack, Judith M.	2
Reckase, Mark D.	2
Weiss, David J.	2
AL-Jawaldeh, Fuad	1
AL-Taj, Heyam	1
Abe, Yu	1
Ahmadi, Behrokh	1
Ahn, Jieun Irene	1
Al-Tal, Suhair	1
Albion, Peter R.	1
Aleyna Altan	1
Alqarni, Abdulelah Mohammed	1
Alsawalmeh, Yousef M.	1
Anastasio, Phyllis A.	1
More ▼

SAT (College Admission Test)	3
Trends in International…	3
Strong Campbell Interest…	2
California Critical Thinking…	1
Coopersmith Self Esteem…	1
Early Childhood Longitudinal…	1
Group Embedded Figures Test	1
International Association for…	1
Kaufman Assessment Battery…	1
Learning Style Inventory	1
Michigan Test of English…	1
Minnesota Multiphasic…	1
Motivated Strategies for…	1
Myers Briggs Type Indicator	1
National Teacher Examinations	1
Productivity Environmental…	1
Progress in International…	1
Questionnaire on Teacher…	1
School and College Ability…	1
Self Directed Learning…	1
Test of English as a Foreign…	1
Torrance Tests of Creative…	1
More ▼