Publication Date
In 2025 | 5 |
Since 2024 | 49 |
Since 2021 (last 5 years) | 235 |
Descriptor
Test Items | 235 |
Test Reliability | 207 |
Test Validity | 130 |
Test Construction | 118 |
Foreign Countries | 108 |
Difficulty Level | 61 |
Item Response Theory | 53 |
Psychometrics | 50 |
Factor Analysis | 39 |
Item Analysis | 39 |
Scores | 38 |
More ▼ |
Source
Author
Sachin Nedungadi | 3 |
Al-Jarf, Reima | 2 |
Almehrizi, Rashid S. | 2 |
Boone, William J. | 2 |
Braeken, Johan | 2 |
Che Lah, Noor Hidayah | 2 |
Chew, Cheng Meng | 2 |
Chin, Huan | 2 |
Ji-young Shin | 2 |
Jumaat, Nurul Farhana | 2 |
Metsämuuronen, Jari | 2 |
More ▼ |
Publication Type
Education Level
Audience
Counselors | 1 |
Practitioners | 1 |
Location
Turkey | 23 |
Indonesia | 14 |
Malaysia | 7 |
China | 6 |
Germany | 6 |
Turkey (Istanbul) | 5 |
Thailand | 4 |
India | 3 |
Philippines | 3 |
South Korea | 3 |
Canada | 2 |
More ▼ |
Laws, Policies, & Programs
Head Start | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Parker, Mark A. J.; Hedgeland, Holly; Jordan, Sally E.; Braithwaite, Nicholas St. J. – European Journal of Science and Mathematics Education, 2023
The study covers the development and testing of the alternative mechanics survey (AMS), a modified force concept inventory (FCI), which used automatically marked free-response questions. Data were collected over a period of three academic years from 611 participants who were taking physics classes at high school and university level. A total of…
Descriptors: Test Construction, Scientific Concepts, Physics, Test Reliability
Metsämuuronen, Jari – Practical Assessment, Research & Evaluation, 2023
Traditional estimators of reliability such as coefficients alpha, theta, omega, and rho (maximal reliability) are prone to give radical underestimates of reliability for the tests common when testing educational achievement. These tests are often structured by widely deviating item difficulties. This is a typical pattern where the traditional…
Descriptors: Test Reliability, Achievement Tests, Computation, Test Items
Ntumi, Simon; Agbenyo, Sheilla; Bulala, Tapela – Shanlax International Journal of Education, 2023
There is no need or point to testing of knowledge, attributes, traits, behaviours or abilities of an individual if information obtained from the test is inaccurate. However, by and large, it seems the estimation of psychometric properties of test items in classroomshas been completely ignored otherwise dying slowly in most testing environments. In…
Descriptors: Psychometrics, Accuracy, Test Validity, Factor Analysis
Almehrizi, Rashid S. – Educational Measurement: Issues and Practice, 2022
Coefficient alpha reliability persists as the most common reliability coefficient reported in research. The assumptions for its use are, however, not well-understood. The current paper challenges the commonly used expressions of coefficient alpha and argues that while these expressions are correct when estimating reliability for summed scores,…
Descriptors: Reliability, Scores, Scaling, Statistical Analysis
Steinmann, Isa; Sánchez, Daniel; van Laar, Saskia; Braeken, Johan – Assessment in Education: Principles, Policy & Practice, 2022
Questionnaire scales that are mixed-worded, i.e. include both positively and negatively worded items, often suffer from issues like low reliability and more complex latent structures than intended. Part of the problem might be that some responders fail to respond consistently to the mixed-worded items. We investigated the prevalence and impact of…
Descriptors: Response Style (Tests), Test Items, Achievement Tests, Foreign Countries
Stephanie M. Bell; R. Philip Chalmers; David B. Flora – Educational and Psychological Measurement, 2024
Coefficient omega indices are model-based composite reliability estimates that have become increasingly popular. A coefficient omega index estimates how reliably an observed composite score measures a target construct as represented by a factor in a factor-analysis model; as such, the accuracy of omega estimates is likely to depend on correct…
Descriptors: Influences, Models, Measurement Techniques, Reliability
Mustafa Ilhan; Nese Güler; Gülsen Tasdelen Teker; Ömer Ergenekon – International Journal of Assessment Tools in Education, 2024
This study aimed to examine the effects of reverse items created with different strategies on psychometric properties and respondents' scale scores. To this end, three versions of a 10-item scale in the research were developed: 10 positive items were integrated in the first form (Form-P) and five positive and five reverse items in the other two…
Descriptors: Test Items, Psychometrics, Scores, Measures (Individuals)
Jyun-Hong Chen; Hsiu-Yi Chao – Journal of Educational and Behavioral Statistics, 2024
To solve the attenuation paradox in computerized adaptive testing (CAT), this study proposes an item selection method, the integer programming approach based on real-time test data (IPRD), to improve test efficiency. The IPRD method turns information regarding the ability distribution of the population from real-time test data into feasible test…
Descriptors: Data Use, Computer Assisted Testing, Adaptive Testing, Design
Hojung Kim; Changkyung Song; Jiyoung Kim; Hyeyun Jeong; Jisoo Park – Language Testing in Asia, 2024
This study presents a modified version of the Korean Elicited Imitation (EI) test, designed to resemble natural spoken language, and validates its reliability as a measure of proficiency. The study assesses the correlation between average test scores and Test of Proficiency in Korean (TOPIK) levels, examining score distributions among beginner,…
Descriptors: Korean, Test Validity, Test Reliability, Imitation
Joseph A. Rios; Jiayi Deng – Educational and Psychological Measurement, 2025
To mitigate the potential damaging consequences of rapid guessing (RG), a form of noneffortful responding, researchers have proposed a number of scoring approaches. The present simulation study examines the robustness of the most popular of these approaches, the unidimensional effort-moderated (EM) scoring procedure, to multidimensional RG (i.e.,…
Descriptors: Scoring, Guessing (Tests), Reaction Time, Item Response Theory
Aditya Shah; Ajay Devmane; Mehul Ranka; Prathamesh Churi – Education and Information Technologies, 2024
Online learning has grown due to the advancement of technology and flexibility. Online examinations measure students' knowledge and skills. Traditional question papers include inconsistent difficulty levels, arbitrary question allocations, and poor grading. The suggested model calibrates question paper difficulty based on student performance to…
Descriptors: Computer Assisted Testing, Difficulty Level, Grading, Test Construction
Melissa Whatley; Dominique Foster; Stephen Paul – Journal of Studies in International Education, 2024
The purpose of this study was to develop a measurement instrument that scholars and practitioners in international education can use as a means of exploring whether and how individuals who come into contact with international education programs develop a greater sense of cultural humility. Specifically, the study described here outlines the four…
Descriptors: Foreign Students, Cultural Awareness, Consciousness Raising, Test Construction
Emily A. Holt; Jessica Duke; Ryan Dunk; Krystal Hinerman – Environmental Education Research, 2024
Student understanding of climate change is an active and growing area of research, but little research has documented undergraduate students' knowledge about the biotic impacts of climate change. Here, we address this literature gap by presenting the Inventory of Biotic Climate Literacy (IBCL), a concept inventory developed to assess undergraduate…
Descriptors: Climate, Undergraduate Students, Knowledge Level, Test Construction
Zyluk, Natalia; Karpe, Karolina; Urbanski, Mariusz – SAGE Open, 2022
The aim of this paper is to describe the process of modification of the research tool designed for measuring the development of personal epistemology--"Standardized Epistemological Understanding Assessment" (SEUA). SEUA was constructed as an improved version of the instrument initially proposed by Kuhn et al. SEUA was proved to be a more…
Descriptors: Epistemology, Research Tools, Beliefs, Test Items
Metsämuuronen, Jari – Practical Assessment, Research & Evaluation, 2022
The reliability of a test score is usually underestimated and the deflation may be profound, 0.40 - 0.60 units of reliability or 46 - 71%. Eight root sources of the deflation are discussed and quantified by a simulation with 1,440 real-world datasets: (1) errors in the measurement modelling, (2) inefficiency in the estimator of reliability within…
Descriptors: Test Reliability, Scores, Test Items, Correlation