Publication Date
In 2025 | 0 |
Since 2024 | 1 |
Since 2021 (last 5 years) | 2 |
Since 2016 (last 10 years) | 6 |
Since 2006 (last 20 years) | 37 |
Descriptor
Reliability | 50 |
Test Bias | 50 |
Validity | 16 |
Scores | 14 |
Test Items | 11 |
Item Response Theory | 10 |
Student Evaluation | 9 |
Test Construction | 8 |
Factor Analysis | 7 |
Foreign Countries | 7 |
Measurement | 7 |
More ▼ |
Source
Author
Herman, Joan L. | 4 |
Dietel, Ronald | 2 |
Ercikan, Kadriye | 2 |
Gallagher, Carole | 2 |
Goldschmidt, Pete | 2 |
Heritage, Margaret | 2 |
Lagunoff, Rachel | 2 |
Osmundson, Ellen | 2 |
Sato, Edynn | 2 |
Steinberg, Jonathan | 2 |
Worth, Peter | 2 |
More ▼ |
Publication Type
Journal Articles | 32 |
Reports - Research | 28 |
Reports - Evaluative | 8 |
Reports - Descriptive | 7 |
Speeches/Meeting Papers | 4 |
Guides - Non-Classroom | 3 |
Information Analyses | 2 |
Dissertations/Theses -… | 1 |
Opinion Papers | 1 |
Education Level
Elementary Secondary Education | 9 |
Higher Education | 6 |
Postsecondary Education | 5 |
Grade 8 | 3 |
Elementary Education | 2 |
Grade 5 | 2 |
Secondary Education | 2 |
Grade 4 | 1 |
Grade 7 | 1 |
Grade 9 | 1 |
High Schools | 1 |
More ▼ |
Audience
Researchers | 2 |
Administrators | 1 |
Practitioners | 1 |
Laws, Policies, & Programs
No Child Left Behind Act 2001 | 6 |
Assessments and Surveys
What Works Clearinghouse Rating
William C. M. Belzak; Daniel J. Bauer – Journal of Educational and Behavioral Statistics, 2024
Testing for differential item functioning (DIF) has undergone rapid statistical developments recently. Moderated nonlinear factor analysis (MNLFA) allows for simultaneous testing of DIF among multiple categorical and continuous covariates (e.g., sex, age, ethnicity, etc.), and regularization has shown promising results for identifying DIF among…
Descriptors: Test Bias, Algorithms, Factor Analysis, Error of Measurement
Leighton, Jacqueline P.; Lehman, Blair – Educational Measurement: Issues and Practice, 2020
In this digital ITEMS module, Dr. Jacqueline Leighton and Dr. Blair Lehman review differences between think-aloud interviews to measure problem-solving processes and cognitive labs to measure comprehension processes. Learners are introduced to historical, theoretical, and procedural differences between these methods and how to use and analyze…
Descriptors: Protocol Analysis, Interviews, Problem Solving, Cognitive Processes
Plieninger, Hansjörg – Educational and Psychological Measurement, 2017
Even though there is an increasing interest in response styles, the field lacks a systematic investigation of the bias that response styles potentially cause. Therefore, a simulation was carried out to study this phenomenon with a focus on applied settings (reliability, validity, scale scores). The influence of acquiescence and extreme response…
Descriptors: Response Style (Tests), Test Bias, Item Response Theory, Correlation
Sara Faye Maher – ProQuest LLC, 2020
To meet the needs of complex and/or underserved patient populations, health care professionals must possess diverse backgrounds, qualities, and skill sets. Holistic review has been used to diversify student admissions through examination of non-cognitive attributes of health care applicants. The objective of this study was to develop a novel…
Descriptors: Computer Assisted Testing, Pilot Projects, Measures (Individuals), Reliability
Yan, Weili; Siegert, Richard J.; Zhou, Hao; Zou, Xiaobing; Wu, Lijie; Luo, Xuerong; Li, Tingyu; Huang, Yi; Guan, Hongyan; Chen, Xiang; Mao, Meng; Xia, Kun; Zhang, Lan; Li, Erzhen; Li, Chunpei; Zhang, Xudong; Zhou, Yuanfeng; Shih, Andy; Fombonne, Eric; Zheng, Yi; Han, Jisheng; Sun, Zhongsheng; Jiang, Yong-hui; Wang, Yi – Autism: The International Journal of Research and Practice, 2021
The recent adaptation of a Chinese parent version of the Autism Spectrum Rating Scale showed the Modified Chinese Autism Spectrum Rating Scale to be reliable and valid for use in China. The aim of this study was to test the Modified Chinese Autism Spectrum Rating Scale for fit to the Rasch model. We analysed data from a previous study of the…
Descriptors: Psychometrics, Parent Attitudes, Autism, Pervasive Developmental Disorders
Geiger, Tray J.; Amrein-Beardsley, Audrey – AASA Journal of Scholarship & Practice, 2017
In this commentary, we discuss three types of data manipulations that can occur within teacher evaluation methods: artificial inflation, artificial deflation, and artificial conflation. These types of manipulation are more popularly known in the education profession as instances of Campbell's Law (1976), which states that the higher the…
Descriptors: Teacher Evaluation, Evaluation Methods, Data Analysis, Personnel Policy
DeMars, Christine – Applied Measurement in Education, 2015
In generalizability theory studies in large-scale testing contexts, sometimes a facet is very sparsely crossed with the object of measurement. For example, when assessments are scored by human raters, it may not be practical to have every rater score all students. Sometimes the scoring is systematically designed such that the raters are…
Descriptors: Educational Assessment, Measurement, Data, Generalizability Theory
Steiner, Peter M.; Kim, Yongnam – Society for Research on Educational Effectiveness, 2014
In contrast to randomized experiments, the estimation of unbiased treatment effects from observational data requires an analysis that conditions on all confounding covariates. Conditioning on covariates can be done via standard parametric regression techniques or nonparametric matching like propensity score (PS) matching. The regression or…
Descriptors: Observation, Research Methodology, Test Bias, Regression (Statistics)
Carleton, R. Nicholas; Thibodeau, Michel A.; Osborne, Jason W.; Asmundson, Gordon J. G. – Practical Assessment, Research & Evaluation, 2012
The present study was designed to test for item order effects by measuring four distinct constructs that contribute substantively to anxiety-related psychopathology (i.e., anxiety sensitivity, fear of negative evaluation, injury/illness sensitivity, and intolerance of uncertainty). Participants (n = 999; 71% women) were randomly assigned to…
Descriptors: Anxiety, Test Items, Serial Ordering, Measures (Individuals)
Bashkov, Bozhidar M.; Finney, Sara J. – Measurement and Evaluation in Counseling and Development, 2013
Traditional methods of assessing construct stability are reviewed and longitudinal mean and covariance structures (LMACS) analysis, a modern approach, is didactically illustrated using psychological entitlement data. Measurement invariance and latent variable stability results are interpreted, emphasizing substantive implications for educators and…
Descriptors: Statistical Analysis, Longitudinal Studies, Reliability, Psychological Patterns
Attali, Yigal – Educational and Psychological Measurement, 2011
Contrary to previous research on sequential ratings of student performance, this study found that professional essay raters of a large-scale standardized testing program produced ratings that were drawn toward previous ratings, creating an assimilation effect. Longer intervals between the two adjacent ratings and higher degree of agreement with…
Descriptors: Essay Tests, Standardized Tests, Sequential Approach, Test Bias
Oliveri, Maria E.; Ercikan, Kadriye – Applied Measurement in Education, 2011
In this study, we examine the degree of construct comparability and possible sources of incomparability of the English and French versions of the Programme for International Student Assessment (PISA) 2003 problem-solving measure administered in Canada. Several approaches were used to examine construct comparability at the test- (examination of…
Descriptors: Foreign Countries, English, French, Tests
Lakin, Joni M.; Elliott, Diane Cardenas; Liu, Ou Lydia – Educational and Psychological Measurement, 2012
Outcomes assessments are gaining great attention in higher education because of increased demand for accountability. These assessments are widely used by U.S. higher education institutions to measure students' college-level knowledge and skills, including students who speak English as a second language (ESL). For the past decade, the increasing…
Descriptors: College Outcomes Assessment, Achievement Tests, English Language Learners, College Students
Kreiner, Svend; Christensen, Karl Bang – Psychometrika, 2011
In behavioural sciences, local dependence and DIF are common, and purification procedures that eliminate items with these weaknesses often result in short scales with poor reliability. Graphical loglinear Rasch models (Kreiner & Christensen, in "Statistical Methods for Quality of Life Studies," ed. by M. Mesbah, F.C. Cole & M.T.…
Descriptors: Evidence, Markov Processes, Quality of Life, Item Analysis
Oliveri, Maria Elena; Olson, Brent F.; Ercikan, Kadriye; Zumbo, Bruno D. – International Journal of Testing, 2012
In this study, the Canadian English and French versions of the Problem-Solving Measure of the Programme for International Student Assessment 2003 were examined to investigate their degree of measurement comparability at the item- and test-levels. Three methods of differential item functioning (DIF) were compared: parametric and nonparametric item…
Descriptors: Foreign Students, Test Bias, Speech Communication, Effect Size