NotesFAQContact Us
Collection
Advanced
Search Tips
Showing all 10 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Hwanggyu Lim; Danqi Zhu; Edison M. Choe; Kyung T. Han – Journal of Educational Measurement, 2024
This study presents a generalized version of the residual differential item functioning (RDIF) detection framework in item response theory, named GRDIF, to analyze differential item functioning (DIF) in multiple groups. The GRDIF framework retains the advantages of the original RDIF framework, such as computational efficiency and ease of…
Descriptors: Item Response Theory, Test Bias, Test Reliability, Test Construction
Peer reviewed Peer reviewed
Direct linkDirect link
Li, Dongmei – Journal of Educational Measurement, 2022
Equating error is usually small relative to the magnitude of measurement error, but it could be one of the major sources of error contributing to mean scores of large groups in educational measurement, such as the year-to-year state mean score fluctuations. Though testing programs may routinely calculate the standard error of equating (SEE), the…
Descriptors: Error Patterns, Educational Testing, Group Testing, Statistical Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Bengs, Daniel; Kroehne, Ulf; Brefeld, Ulf – Journal of Educational Measurement, 2021
By tailoring test forms to the test-taker's proficiency, Computerized Adaptive Testing (CAT) enables substantial increases in testing efficiency over fixed forms testing. When used for formative assessment, the alignment of task difficulty with proficiency increases the chance that teachers can derive useful feedback from assessment data. The…
Descriptors: Computer Assisted Testing, Formative Evaluation, Group Testing, Program Effectiveness
Peer reviewed Peer reviewed
Direct linkDirect link
Wang, Shiyu; Lin, Haiyan; Chang, Hua-Hua; Douglas, Jeff – Journal of Educational Measurement, 2016
Computerized adaptive testing (CAT) and multistage testing (MST) have become two of the most popular modes in large-scale computer-based sequential testing. Though most designs of CAT and MST exhibit strength and weakness in recent large-scale implementations, there is no simple answer to the question of which design is better because different…
Descriptors: Computer Assisted Testing, Adaptive Testing, Test Format, Sequential Approach
Peer reviewed Peer reviewed
Direct linkDirect link
Kim, Sooyeon; Walker, Michael E.; McHale, Frederick – Journal of Educational Measurement, 2010
In this study we examined variations of the nonequivalent groups equating design for tests containing both multiple-choice (MC) and constructed-response (CR) items to determine which design was most effective in producing equivalent scores across the two tests to be equated. Using data from a large-scale exam, this study investigated the use of…
Descriptors: Measures (Individuals), Scoring, Equated Scores, Test Bias
Peer reviewed Peer reviewed
Baglin, Roger F. – Journal of Educational Measurement, 1988
G. Burket's criticisms regarding calculation and interpretation of group scores on norm-referenced tests are discussed. Burket and Baglin seem to agree on the existence of a problem in the calculation and interpretation of group scores on norm-referenced tests but disagree on the issue of that problem's causes and solutions. (TJH)
Descriptors: Group Testing, Norm Referenced Tests, Scores, Testing Problems
Peer reviewed Peer reviewed
Menne, John W.; Tolsma, Robert J. – Journal of Educational Measurement, 1971
Descriptors: Discriminant Analysis, Group Testing, Item Analysis, Psychometrics
Peer reviewed Peer reviewed
Trentham, Landa L. – Journal of Educational Measurement, 1975
Descriptors: Comparative Testing, Educational Testing, Elementary Education, Grade 6
Peer reviewed Peer reviewed
Neidermeyer, Fred C.; Sullivan, Howard J. – Journal of Educational Measurement, 1972
Results of the study indicate that the 3-choice, selected-response tests often utilized in programs of this type do not provide an accurate indication of end-of-year achievement for many children, and their continued use is not recommended. (Authors)
Descriptors: Comparative Analysis, Criterion Referenced Tests, Grade 1, Group Testing
Peer reviewed Peer reviewed
De Avila, Edward; Pulos, Steven – Journal of Educational Measurement, 1979
Six studies investigated the use of pictorial material in a group setting to measure the cognitive level of children (age 7 to 11) on eight Piagetian concepts. Children were low to middle socioeconomic status with 60 percent Spanish surnames. Factor analysis and clinical administration supported test validity. (Author/CTM)
Descriptors: Cartoons, Cognitive Development, Cognitive Measurement, Conservation (Concept)