Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 0 |
Since 2016 (last 10 years) | 10 |
Since 2006 (last 20 years) | 23 |
Descriptor
Statistical Analysis | 39 |
Test Bias | 39 |
Test Reliability | 32 |
Test Validity | 22 |
Test Items | 15 |
Test Construction | 13 |
College Students | 11 |
Factor Analysis | 10 |
Correlation | 9 |
Scores | 9 |
Foreign Countries | 7 |
More ▼ |
Source
Author
Rios, Joseph A. | 2 |
Alhaythami, Hassan | 1 |
Andrulis, Richard S. | 1 |
Barker, Pierce | 1 |
Bashkov, Bozhidar M. | 1 |
Berman, Ye'Elah | 1 |
Bolden, Edward | 1 |
Bormuth, John R. | 1 |
Cater, Melissa | 1 |
Cetin, Bayram | 1 |
Christensen, Karl Bang | 1 |
More ▼ |
Publication Type
Education Level
Audience
Researchers | 2 |
Location
Turkey | 3 |
California | 2 |
Australia | 1 |
Colorado (Denver) | 1 |
Europe | 1 |
Florida | 1 |
New York (New York) | 1 |
North Carolina (Charlotte) | 1 |
Sweden | 1 |
Tennessee (Memphis) | 1 |
Texas (Dallas) | 1 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Meets WWC Standards without Reservations | 1 |
Meets WWC Standards with or without Reservations | 1 |
Using Differential Item Functioning to Test for Interrater Reliability in Constructed Response Items
Walker, Cindy M.; Göçer Sahin, Sakine – Educational and Psychological Measurement, 2020
The purpose of this study was to investigate a new way of evaluating interrater reliability that can allow one to determine if two raters differ with respect to their rating on a polytomous rating scale or constructed response item. Specifically, differential item functioning (DIF) analyses were used to assess interrater reliability and compared…
Descriptors: Test Bias, Interrater Reliability, Responses, Correlation
Rios, Joseph A.; Sparks, Jesse R.; Zhang, Mo; Liu, Ou Lydia – ETS Research Report Series, 2017
Proficiency with written communication (WC) is critical for success in college and careers. As a result, institutions face a growing challenge to accurately evaluate their students' writing skills to obtain data that can support demands of accreditation, accountability, or curricular improvement. Many current standardized measures, however, lack…
Descriptors: Test Construction, Test Validity, Writing Tests, College Outcomes Assessment
Lundqvist, Lars-Olov; Lindner, Helen – Journal of Autism and Developmental Disorders, 2017
The Autism-Spectrum Quotient (AQ) is among the most widely used scales assessing autistic traits in the general population. However, some aspects of the AQ are questionable. To test its scale properties, the AQ was translated into Swedish, and data were collected from 349 adults, 130 with autism spectrum disorder (ASD) and 219 without ASD, and…
Descriptors: Autism, Pervasive Developmental Disorders, Adults, Comparative Analysis
Jin, Ying; Eason, Hershel – Journal of Educational Issues, 2016
The effects of mean ability difference (MAD) and short tests on the performance of various DIF methods have been studied extensively in previous simulation studies. Their effects, however, have not been studied under multilevel data structure. MAD was frequently observed in large-scale cross-country comparison studies where the primary sampling…
Descriptors: Test Bias, Simulation, Hierarchical Linear Modeling, Comparative Analysis
Turkan, Azmi; Cetin, Bayram – Journal of Education and Practice, 2017
Validity and reliability are among the most crucial characteristics of a test. One of the steps to make sure that a test is valid and reliable is to examine the bias in test items. The purpose of this study was to examine the bias in 2012 Placement Test items in terms of gender variable using Rasch Model in Turkey. The sample of this study was…
Descriptors: Item Response Theory, Gender Differences, Test Bias, Test Items
Demir, Ergul – Eurasian Journal of Educational Research, 2018
Purpose: The answer-copying tendency has the potential to detect suspicious answer patterns for prior distributions of statistical detection techniques. The aim of this study is to develop a valid and reliable measurement tool as a scale in order to observe the tendency of university students' copying of answers. Also, it is aimed to provide…
Descriptors: College Students, Cheating, Test Construction, Student Behavior
Teker, Gulsen Tasdelen; Dogan, Nuri – Educational Sciences: Theory and Practice, 2015
Reliability and differential item functioning (DIF) analyses were conducted on testlets displaying local item dependence in this study. The data set employed in the research was obtained from the answers given by 1,500 students to the 20 items included in six testlets given in English Proficiency Exam by the School of Foreign Languages of a state…
Descriptors: Foreign Countries, Test Items, Test Bias, Item Response Theory
Alhaythami, Hassan; Karpinski, Aryn; Kirschner, Paul; Bolden, Edward – Journal of Interactive Learning Research, 2017
This study examined the psychometric properties of a social-networking site (SNS) activities scale (SNSAS) using Rasch Analysis. Items were also examined with Rasch Principal Components Analysis (PCA) and Differential Item Functioning (DIF) across groups of university students (i.e., males and females from the United States [US] and Europe; N =…
Descriptors: Social Media, Social Networks, Likert Scales, Psychometrics
Cater, Melissa; Ferstel, Sarah D.; O'Neil, Carol E. – Journal of General Education, 2016
Student participation in undergraduate research (ugr) may be influenced by interest in research, future career and educational plans, perceived value of undergraduate research experiences, or perceived competence in research skills. The purpose of this study was to develop a questionnaire that could be used to validly and reliably assess students'…
Descriptors: Undergraduate Students, Student Experience, Questionnaires, Test Construction
Rios, Joseph A.; Sireci, Stephen G. – International Journal of Testing, 2014
The International Test Commission's "Guidelines for Translating and Adapting Tests" (2010) provide important guidance on developing and evaluating tests for use across languages. These guidelines are widely applauded, but the degree to which they are followed in practice is unknown. The objective of this study was to perform a…
Descriptors: Guidelines, Translation, Adaptive Testing, Second Languages
McFarland, Jenny L.; Price, Rebecca M.; Wenderoth, Mary Pat; Martinková, Patrícia; Cliff, William; Michael, Joel; Modell, Harold; Wright, Ann – CBE - Life Sciences Education, 2017
We present the Homeostasis Concept Inventory (HCI), a 20-item multiple-choice instrument that assesses how well undergraduates understand this critical physiological concept. We used an iterative process to develop a set of questions based on elements in the Homeostasis Concept Framework. This process involved faculty experts and undergraduate…
Descriptors: Scientific Concepts, Multiple Choice Tests, Science Tests, Test Construction
Bashkov, Bozhidar M.; Finney, Sara J. – Measurement and Evaluation in Counseling and Development, 2013
Traditional methods of assessing construct stability are reviewed and longitudinal mean and covariance structures (LMACS) analysis, a modern approach, is didactically illustrated using psychological entitlement data. Measurement invariance and latent variable stability results are interpreted, emphasizing substantive implications for educators and…
Descriptors: Statistical Analysis, Longitudinal Studies, Reliability, Psychological Patterns
Gill, Brian; Shoji, Megan; Coen, Thomas; Place, Kate – Regional Educational Laboratory Mid-Atlantic, 2016
School districts and states across the Regional Educational Laboratory Mid-Atlantic Region and the country as a whole have been modifying their teacher evaluation systems to identify more effective and less effective teachers and provide better feedback to improve instructional practice. The new systems typically include components related to…
Descriptors: Predictive Validity, Test Bias, Test Content, School Districts
Kreiner, Svend; Christensen, Karl Bang – Psychometrika, 2011
In behavioural sciences, local dependence and DIF are common, and purification procedures that eliminate items with these weaknesses often result in short scales with poor reliability. Graphical loglinear Rasch models (Kreiner & Christensen, in "Statistical Methods for Quality of Life Studies," ed. by M. Mesbah, F.C. Cole & M.T.…
Descriptors: Evidence, Markov Processes, Quality of Life, Item Analysis
Styles, Irene; Wildy, Helen; Pepper, Vivienne; Faulkner, Joanne; Berman, Ye'Elah – International Research in Early Childhood Education, 2014
The assessment of literacy and numeracy skills of students as they enter school for the first time is not yet established nation-wide in Australia. However, a large proportion of primary schools have chosen to assess their starting students on the Performance Indicators in Primary Schools-Baseline Assessment (PIPS-BLA). This series of three…
Descriptors: Foreign Countries, Indigenous Knowledge, Performance Based Assessment, Test Bias