Publication Date
In 2025 | 0 |
Since 2024 | 1 |
Since 2021 (last 5 years) | 1 |
Since 2016 (last 10 years) | 1 |
Since 2006 (last 20 years) | 3 |
Descriptor
Comparative Testing | 38 |
Test Construction | 38 |
Test Items | 38 |
Test Format | 17 |
Higher Education | 14 |
Multiple Choice Tests | 13 |
Difficulty Level | 12 |
Test Validity | 10 |
Item Response Theory | 8 |
Test Reliability | 8 |
Item Analysis | 7 |
More ▼ |
Source
Author
Publication Type
Reports - Research | 25 |
Journal Articles | 18 |
Speeches/Meeting Papers | 14 |
Reports - Evaluative | 12 |
Tests/Questionnaires | 2 |
Numerical/Quantitative Data | 1 |
Education Level
Elementary Secondary Education | 1 |
Higher Education | 1 |
Postsecondary Education | 1 |
Audience
Researchers | 3 |
Practitioners | 1 |
Teachers | 1 |
Location
Alabama | 1 |
Germany | 1 |
Indonesia | 1 |
Israel | 1 |
United Kingdom | 1 |
United States | 1 |
Laws, Policies, & Programs
Assessments and Surveys
ACT Assessment | 1 |
Alabama High School… | 1 |
Iowa Tests of Basic Skills | 1 |
Raven Progressive Matrices | 1 |
What Works Clearinghouse Rating
Agus Santoso; Heri Retnawati; Timbul Pardede; Ibnu Rafi; Munaya Nikma Rosyada; Gulzhaina K. Kassymova; Xu Wenxin – Practical Assessment, Research & Evaluation, 2024
The test blueprint is important in test development, where it guides the test item writer in creating test items according to the desired objectives and specifications or characteristics (so-called a priori item characteristics), such as the level of item difficulty in the category and the distribution of items based on their difficulty level.…
Descriptors: Foreign Countries, Undergraduate Students, Business English, Test Construction
Kim, Sooyeon; Walker, Michael E.; McHale, Frederick – Journal of Educational Measurement, 2010
In this study we examined variations of the nonequivalent groups equating design for tests containing both multiple-choice (MC) and constructed-response (CR) items to determine which design was most effective in producing equivalent scores across the two tests to be equated. Using data from a large-scale exam, this study investigated the use of…
Descriptors: Measures (Individuals), Scoring, Equated Scores, Test Bias
Wallach, P. M.; Crespo, L. M.; Holtzman, K. Z.; Galbraith, R. M.; Swanson, D. B. – Advances in Health Sciences Education, 2006
Purpose: In conjunction with curricular changes, a process to develop integrated examinations was implemented. Pre-established guidelines were provided favoring vignettes, clinically relevant material, and application of knowledge rather than simple recall. Questions were read aloud in a committee including all course directors, and a reviewer…
Descriptors: Test Items, Rating Scales, Examiners, Guidelines

Melnick, Steven A.; Gable, Robert K. – Educational Research Quarterly, 1990
By administering an attitude survey to 3,328 parents of elementary school students, use of positive and negative Likert item stems was analyzed. Respondents who answered positive/negative item pairs that were parallel in meaning consistently were compared with those who answered inconsistently. Implications for construction of affective measures…
Descriptors: Affective Measures, Comparative Testing, Elementary Education, Likert Scales
Clauser, Brian E.; And Others – 1991
Item bias has been a major concern for test developers during recent years. The Mantel-Haenszel statistic has been among the preferred methods for identifying biased items. The statistic's performance in identifying uniform bias in simulated data modeled by producing various levels of difference in the (item difficulty) b-parameter for reference…
Descriptors: Comparative Testing, Difficulty Level, Item Bias, Item Response Theory

Stocking, Martha L.; And Others – Applied Psychological Measurement, 1993
A method of automatically selecting items for inclusion in a test with constraints on item content and statistical properties was applied to real data. Tests constructed manually from the same data and constraints were compared to tests constructed automatically. Results show areas in which automated assembly can improve test construction. (SLD)
Descriptors: Algorithms, Automation, Comparative Testing, Computer Assisted Testing

Frary, Robert B. – Applied Measurement in Education, 1991
The use of the "none-of-the-above" option (NOTA) in 20 college-level multiple-choice tests was evaluated for classes with 100 or more students. Eight academic disciplines were represented, and 295 NOTA and 724 regular test items were used. It appears that the NOTA can be compatible with good classroom measurement. (TJH)
Descriptors: College Students, Comparative Testing, Difficulty Level, Discriminant Analysis

Wainer, Howard; And Others – Journal of Educational Measurement, 1992
Computer simulations were run to measure the relationship between testlet validity and factors of item pool size and testlet length for both adaptive and linearly constructed testlets. Making a testlet adaptive yields only modest increases in aggregate validity because of the peakedness of the typical proficiency distribution. (Author/SLD)
Descriptors: Adaptive Testing, Comparative Testing, Computer Assisted Testing, Computer Simulation

Crehan, Kevin D.; And Others – Educational and Psychological Measurement, 1993
Studies with 220 college students found that multiple-choice test items with 3 items are more difficult than those with 4 items, and items with the none-of-these option are more difficult than those without this option. Neither format manipulation affected item discrimination. Implications for test construction are discussed. (SLD)
Descriptors: College Students, Comparative Testing, Difficulty Level, Distractors (Tests)
Nandakumar, Ratna – 1992
The performance of the following four methodologies for assessing unidimensionality was examined: (1) DIMTEST; (2) the approach of P. W. Holland and P. R. Rosenbaum; (3) linear factor analysis; and (4) non-linear factor analysis. Each method is examined and compared with other methods using simulated data sets and real data sets. Seven data sets,…
Descriptors: Ability, Comparative Testing, Correlation, Equations (Mathematics)
Cizek, Gregory J. – 1991
A commonly accepted rule for developing equated examinations using the common-items non-equivalent groups (CINEG) design is that items common to the two examinations being equated should be identical. The CINEG design calls for two groups of examinees to respond to a set of common items that is included in two examinations. In practice, this rule…
Descriptors: Certification, Comparative Testing, Difficulty Level, Higher Education

Schriesheim, Chester A.; And Others – Educational and Psychological Measurement, 1991
Effects of item wording on questionnaire reliability and validity were studied, using 280 undergraduate business students who completed a questionnaire comprising 4 item types: (1) regular; (2) polar opposite; (3) negated polar opposite; and (4) negated regular. Implications of results favoring regular and negated regular items are discussed. (SLD)
Descriptors: Business Education, Comparative Testing, Higher Education, Negative Forms (Language)

Wainer, Howard; And Others – Journal of Educational Measurement, 1991
Hierarchical (adaptive) and linear methods of testlet construction were compared. The performance of 2,080 ninth and tenth graders on a 4-item testlet was used to predict performance on the entire test. The adaptive test was slightly superior as a predictor, but the cost of obtaining that superiority was considerable. (SLD)
Descriptors: Adaptive Testing, Algebra, Comparative Testing, High School Students
Assessing the Effects of Computer Administration on Scores and Parameter Estimates Using IRT Models.
Sykes, Robert C.; And Others – 1991
To investigate the psychometric feasibility of replacing a paper-and-pencil licensing examination with a computer-administered test, a validity study was conducted. The computer-administered test (Cadm) was a common set of items for all test takers, distinct from computerized adaptive testing, in which test takers receive items appropriate to…
Descriptors: Adults, Certification, Comparative Testing, Computer Assisted Testing
Shavelson, Richard J.; And Others – 1988
This study investigated the relationships among the symbolic representation of problems given to students to solve, the mental representations they use to solve the problems, and the accuracy of their solutions. Twenty eleventh-grade science students were asked to think aloud as they solved problems on the ideal gas laws. The problems were…
Descriptors: Chemistry, Comparative Testing, Problem Solving, Response Style (Tests)