NotesFAQContact Us
Collection
Advanced
Search Tips
Laws, Policies, & Programs
What Works Clearinghouse Rating
Showing 1 to 15 of 159 results Save | Export
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Tack, Anaïs; Piech, Chris – International Educational Data Mining Society, 2022
How can we test whether state-of-the-art generative models, such as Blender and GPT-3, are good AI teachers, capable of replying to a student in an educational dialogue? Designing an AI teacher test is challenging: although evaluation methods are much-needed, there is no off-the-shelf solution to measuring pedagogical ability. This paper reports…
Descriptors: Artificial Intelligence, Dialogs (Language), Bayesian Statistics, Decision Making
Kate E. Walton; Cristina Anguiano-Carrasco – ACT, Inc., 2024
Large language models (LLMs), such as ChatGPT, are becoming increasingly prominent. Their use is becoming more and more popular to assist with simple tasks, such as summarizing documents, translating languages, rephrasing sentences, or answering questions. Reports like McKinsey's (Chui, & Yee, 2023) estimate that by implementing LLMs,…
Descriptors: Artificial Intelligence, Man Machine Systems, Natural Language Processing, Test Construction
Peer reviewed Peer reviewed
Direct linkDirect link
Maïano, Christophe; Morin, Alexandre J. S.; Tietjens, Maike; Bastos, Tânia; Luiggi, Maxime; Corredeira, Rui; Griffet, Jean; Sánchez-Oliva, David – Measurement in Physical Education and Exercise Science, 2023
The present study sought to examine the psychometric properties of new German, Portuguese, and Spanish versions of the Revised Short Form of the Physical Self-Inventory (PSI-S-"R"), and to contrast these properties against those from the original French version of this instrument. Participants (n = 1802) were 288 French youth, 177 German…
Descriptors: German, Portuguese, Spanish, Test Construction
Peer reviewed Peer reviewed
Direct linkDirect link
Ozdemir, Burhanettin; Gelbal, Selahattin – Education and Information Technologies, 2022
The computerized adaptive tests (CAT) apply an adaptive process in which the items are tailored to individuals' ability scores. The multidimensional CAT (MCAT) designs differ in terms of different item selection, ability estimation, and termination methods being used. This study aims at investigating the performance of the MCAT designs used to…
Descriptors: Scores, Computer Assisted Testing, Test Items, Language Proficiency
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Aleyna Altan; Zehra Taspinar Sener – Online Submission, 2023
This research aimed to develop a valid and reliable test to be used to detect sixth grade students' misconceptions and errors regarding the subject of fractions. A misconception diagnostic test has been developed that includes the concept of fractions, different representations of fractions, ordering and comparing fractions, equivalence of…
Descriptors: Diagnostic Tests, Mathematics Tests, Fractions, Misconceptions
Peer reviewed Peer reviewed
Direct linkDirect link
Alqarni, Abdulelah Mohammed – Journal on Educational Psychology, 2019
This study compares the psychometric properties of reliability in Classical Test Theory (CTT), item information in Item Response Theory (IRT), and validation from the perspective of modern validity theory for the purpose of bringing attention to potential issues that might exist when testing organizations use both test theories in the same testing…
Descriptors: Test Theory, Item Response Theory, Test Construction, Scoring
Peer reviewed Peer reviewed
Direct linkDirect link
Liu, Ren; Huggins-Manley, Anne Corinne; Bradshaw, Laine – Educational and Psychological Measurement, 2017
There is an increasing demand for assessments that can provide more fine-grained information about examinees. In response to the demand, diagnostic measurement provides students with feedback on their strengths and weaknesses on specific skills by classifying them into mastery or nonmastery attribute categories. These attributes often form a…
Descriptors: Matrices, Classification, Accuracy, Diagnostic Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Bao, Lei; Koenig, Kathleen; Xiao, Yang; Fritchman, Joseph; Zhou, Shaona; Chen, Cheng – Physical Review Physics Education Research, 2022
Abilities in scientific thinking and reasoning have been emphasized as core areas of initiatives, such as the Next Generation Science Standards or the College Board Standards for College Success in Science, which focus on the skills the future will demand of today's students. Although there is rich literature on studies of how these abilities…
Descriptors: Physics, Science Instruction, Teaching Methods, Thinking Skills
Peer reviewed Peer reviewed
Direct linkDirect link
Bakhtiar, Mehdi; Wong, Min Ney; Tsui, Emily Ka Yin; McNeil, Malcolm R. – Journal of Speech, Language, and Hearing Research, 2020
Purpose: This study reports the psychometric development of the Cantonese versions of the English Computerized Revised Token Test (CRTT) for persons with aphasia (PWAs) and healthy controls (HCs). Method: The English CRTT was translated into standard Chinese for the Reading--Word Fade version (CRTT-R-[subscript WF]-Cantonese) and into formal…
Descriptors: Psychometrics, Sino Tibetan Languages, Computer Assisted Testing, Aphasia
Peer reviewed Peer reviewed
Direct linkDirect link
Shah, Ashima Mathur; Wylie, Caroline; Gitomer, Drew; Noam, Gil – Science Education, 2018
In and out-of-school time (OST) experiences are viewed as complementary in contributing to students' interest, engagement, and performance in science, technology, engineering, and mathematics (STEM). While tools exist to measure quality in general afterschool settings and others to measure structured science classroom experiences, there is a need…
Descriptors: STEM Education, Educational Improvement, Educational Quality, After School Programs
Peer reviewed Peer reviewed
Direct linkDirect link
Fauville, Géraldine; Strang, Craig; Cannady, Matthew A.; Chen, Ying-Fang – Environmental Education Research, 2019
The Ocean Literacy movement began in the U.S. in the early 2000s, and has recently become an international effort. The focus on marine environmental issues and marine education is increasing, and yet it has been difficult to show progress of the ocean literacy movement, in part, because no widely adopted measurement tool exists. The International…
Descriptors: Marine Education, Environmental Education, Comparative Analysis, Factor Structure
McClellan, Catherine; Snyder, Rebecca; Woods-Murphy, Maryann; Basset, Katherine – National Network of State Teachers of the Year, 2018
Great teachers recognize great assessments. As policy and education leaders work to make sure state tests are measuring the problem-solving, writing, and critical-thinking skills students need for success, they should convene and rely on teachers to review test quality and help answer the question: Do the questions on our state test reflect…
Descriptors: Student Evaluation, Educational Quality, Standardized Tests, Test Items
Peer reviewed Peer reviewed
Direct linkDirect link
Dekker, Linda P.; van der Vegt, Esther J. M.; van der Ende, Jan; Tick, Nouchka; Louwerse, Anneke; Maras, Athanasios; Verhulst, Frank C.; Greaves-Lord, Kirstin – Journal of Autism and Developmental Disorders, 2017
To gain further insight into psychosexual functioning, including behaviors, intrapersonal and interpersonal aspects, in adolescents with Autism Spectrum Disorder (ASD), comprehensive, multi-informant measures are needed. This study describes (1) the development of a new measure of psychosexual functioning in both parent- and self-reports (Teen…
Descriptors: Adolescents, Adolescent Development, Autism, Pervasive Developmental Disorders
Peer reviewed Peer reviewed
Direct linkDirect link
Deha Dogan, C.; Canan Karababa, Z.; Fulya Soguksu, A. – Educational Studies, 2017
The purpose of this study is to develop a valid and reliable scale to assess the level of English usage in daily life by students between 15 and 19 years of age, and to compare these students' scale scores according to their achievement levels in an English course. Five hundred and ninety-five participants were randomly selected from a universe.…
Descriptors: Language Usage, English (Second Language), Test Construction, Adolescents
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Liu, Jinghua; Guo, Hongwen; Dorans, Neil J. – ETS Research Report Series, 2014
Maintaining score interchangeability and scale consistency is crucial for any testing programs that administer multiple forms across years. The use of a multiple linking design, which involves equating a new form to multiple old forms and averaging the conversions, has been proposed to control scale drift. However, the use of multiple linking…
Descriptors: Comparative Analysis, Reliability, Test Construction, Equated Scores
Previous Page | Next Page »
Pages: 1  |  2  |  3  |  4  |  5  |  6  |  7  |  8  |  9  |  10  |  11