Publication Date
In 2025 | 1 |
Since 2024 | 3 |
Since 2021 (last 5 years) | 17 |
Since 2016 (last 10 years) | 75 |
Since 2006 (last 20 years) | 148 |
Descriptor
Item Response Theory | 174 |
Scores | 174 |
Test Reliability | 98 |
Reliability | 64 |
Test Items | 57 |
Test Validity | 46 |
Psychometrics | 41 |
Foreign Countries | 39 |
Correlation | 38 |
Comparative Analysis | 32 |
Test Construction | 30 |
More ▼ |
Source
Author
Foorman, Barbara R. | 5 |
Petscher, Yaacov | 5 |
Kolen, Michael J. | 4 |
Haberman, Shelby J. | 3 |
Lee, Yi-Hsuan | 3 |
Schatschneider, Chris | 3 |
Wainer, Howard | 3 |
Beaujean, A. Alexander | 2 |
Blaker, Lisa | 2 |
Cai, Li | 2 |
Candell, Gregory L. | 2 |
More ▼ |
Publication Type
Education Level
Higher Education | 36 |
Postsecondary Education | 26 |
Elementary Education | 22 |
Secondary Education | 16 |
High Schools | 11 |
Grade 4 | 10 |
Middle Schools | 8 |
Grade 3 | 7 |
Grade 6 | 7 |
Intermediate Grades | 7 |
Junior High Schools | 7 |
More ▼ |
Audience
Teachers | 1 |
Location
Florida | 6 |
Indonesia | 5 |
China | 4 |
United Kingdom (England) | 4 |
Germany | 3 |
Netherlands | 3 |
Alabama | 2 |
Arizona | 2 |
California | 2 |
Canada | 2 |
Colombia | 2 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Hojung Kim; Changkyung Song; Jiyoung Kim; Hyeyun Jeong; Jisoo Park – Language Testing in Asia, 2024
This study presents a modified version of the Korean Elicited Imitation (EI) test, designed to resemble natural spoken language, and validates its reliability as a measure of proficiency. The study assesses the correlation between average test scores and Test of Proficiency in Korean (TOPIK) levels, examining score distributions among beginner,…
Descriptors: Korean, Test Validity, Test Reliability, Imitation
Grace C. Tetschner; Sachin Nedungadi – Chemistry Education Research and Practice, 2025
Many undergraduate chemistry students hold alternate conceptions related to resonance--an important and fundamental topic of organic chemistry. To help address these alternate conceptions, an organic chemistry instructor could administer the resonance concept inventory (RCI), which is a multiple-choice assessment that was designed to identify…
Descriptors: Scientific Concepts, Concept Formation, Item Response Theory, Scores
Safak, Pinar; Cakmak, Salih; Karakoc, Tamer; Aydin O'Dwyer, Pinar – European Journal of Educational Research, 2021
This study aimed to develop a valid and reliable instrument that measures the functional vision of students with low vision. Thus, an assessment tool and performance activities were developed for three vision skill groups (near vision skills, distance vision skills, and visual field) that include functional vision skills. The universe was 1485…
Descriptors: Foreign Countries, Vision Tests, Diagnostic Tests, Vision
Nnamdi Chika Ezike – ProQuest LLC, 2022
Fitting wrongly specified models to observed data may lead to invalid inferences about the model parameters of interest. The current study investigated the performance of the posterior predictive model checking (PPMC) approach in detecting model-data misfit of the hierarchical rater model (HRM). The HRM is a rater-mediated model that incorporates…
Descriptors: Prediction, Models, Interrater Reliability, Item Response Theory
Donoghue, John R.; McClellan, Catherine A.; Hess, Melinda R. – ETS Research Report Series, 2022
When constructed-response items are administered for a second time, it is necessary to evaluate whether the current Time B administration's raters have drifted from the scoring of the original administration at Time A. To study this, Time A papers are sampled and rescored by Time B scorers. Commonly the scores are compared using the proportion of…
Descriptors: Item Response Theory, Test Construction, Scoring, Testing
DeMars, Christine E. – Applied Measurement in Education, 2021
Estimation of parameters for the many-facets Rasch model requires that conditional on the values of the facets, such as person ability, item difficulty, and rater severity, the observed responses within each facet are independent. This requirement has often been discussed for the Rasch models and 2PL and 3PL models, but it becomes more complex…
Descriptors: Item Response Theory, Test Items, Ability, Scores
Yvette Jackson – ProQuest LLC, 2023
Rater-mediated activities in educational research occur when an expert judge or rater utilizes an instrument to judge persons or items and generates scale scores. Scale scores are from a subjective judgment and must undergo a quality control measure called rating quality. Rating quality in this study is broadly defined as the extent to which…
Descriptors: Educational Research, Evaluators, Test Theory, Item Response Theory
Thompson, Kathryn N. – ProQuest LLC, 2023
It is imperative to collect validity evidence prior to interpreting and using test scores. During the process of collecting validity evidence, test developers should consider whether test scores are contaminated by sources of extraneous information. This is referred to as construct irrelevant variance, or the "degree to which test scores are…
Descriptors: Test Wiseness, Test Items, Item Response Theory, Scores
Yu-Tzu Chang; Ann Tai Choe; Daniel Holden; Daniel R. Isbell – Language Testing, 2024
In this Brief Report, we describe an evaluation of and revisions to a rubric adapted from the Jacobs et al.'s (1981) ESL COMPOSITION PROFILE, with four rubric categories and 20-point rating scales, in the context of an intensive English program writing placement test. Analysis of 4 years of rating data (2016-2021, including 434 essays) using…
Descriptors: Language Tests, Rating Scales, Second Language Learning, English (Second Language)
Wind, Stefanie A.; Wolfe, Edward W.; Engelhard, George, Jr.; Foltz, Peter; Rosenstein, Mark – International Journal of Testing, 2018
Automated essay scoring engines (AESEs) are becoming increasingly popular as an efficient method for performance assessments in writing, including many language assessments that are used worldwide. Before they can be used operationally, AESEs must be "trained" using machine-learning techniques that incorporate human ratings. However, the…
Descriptors: Computer Assisted Testing, Essay Tests, Writing Evaluation, Scoring
Forthmann, Boris; Paek, Sue Hyeon; Dumas, Denis; Barbot, Baptiste; Holling, Heinz – British Journal of Educational Psychology, 2020
Background: The originality of divergent thinking (DT) production is one of the most critical indicators of creative potential. It is commonly scored using the statistical infrequency of responses relative to all responses provided in a given sample. Aims: Response frequency estimates vary in terms of measurement precision. This issue has been…
Descriptors: Creative Thinking, Creativity Tests, Item Response Theory, Scores
Rodriguez, Rebekah M.; Silvia, Paul J.; Kaufman, James C.; Reiter-Palmon, Roni; Puryear, Jeb S. – Creativity Research Journal, 2023
The original 90-item Creative Behavior Inventory (CBI) was a landmark self-report scale in creativity research, and the 28-item brief form developed nearly 20 years ago continues to be a popular measure of everyday creativity. Relatively little is known, however, about the psychometric properties of this widely used scale. In the current research,…
Descriptors: Creativity Tests, Creativity, Creative Thinking, Psychometrics
Myszkowski, Nils – Journal of Intelligence, 2020
Raven's Standard Progressive Matrices (Raven 1941) is a widely used 60-item long measure of general mental ability. It was recently suggested that, for situations where taking this test is too time consuming, a shorter version, comprised of only the last series of the Standard Progressive Matrices (Myszkowski and Storme 2018) could be used, while…
Descriptors: Intelligence Tests, Psychometrics, Nonparametric Statistics, Item Response Theory
Sukri, Akhmad; Rizka, Muhammad Arief; Purwanti, Elly; Ramdiah, Siti; Lukitasari, Marheny – European Journal of Educational Research, 2022
Many researchers have separately developed instruments to measure environmental characteristics such as attitudes, values, and knowledge. However, there is no instrument used to measure all these aspects in one comprehensive instrument. This study is meant to develop and validate a green character instrument which reveals student behavior and…
Descriptors: Foreign Countries, College Students, Student Attitudes, Conservation (Environment)
van der Scheer, Emmelien A.; Bijlsma, Hannah J. E.; Glas, Cees A. W. – School Effectiveness and School Improvement, 2019
A Bayesian IRT-model approach was used to investigate the validity and reliability of student perceptions of teaching quality. Furthermore, the student perceptions were compared with ratings of teaching quality by external observers. Grade 4 students (n = 675) filled out a questionnaire that was used to measure their opinions about the lessons of…
Descriptors: Student Attitudes, Validity, Interrater Reliability, Correlation