ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	2
Since 2006 (last 20 years)	13

Descriptor

Sampling	27
Test Items	27
Test Construction	15
Reliability	13
Test Reliability	12
Test Validity	10
Foreign Countries	9
Difficulty Level	7
Item Analysis	7
Sample Size	7
Achievement Tests	6
Scoring	6
Computation	5
Error of Measurement	5
Item Response Theory	5
Mathematical Models	5
Monte Carlo Methods	5
Questionnaires	5
Research Methodology	5
Scores	5
Comparative Analysis	4
Latent Trait Theory	4
Mathematics Achievement	4
Statistical Analysis	4
Academic Achievement	3
More ▼

Source

Educational and Psychological…	3
Applied Psychological…	2
OECD Publishing	2
Applied Measurement in…	1
Chemistry Education Research…	1
ETS Research Report Series	1
International Association for…	1
Journal of Educational…	1
OECD Publishing (NJ1)	1
ProQuest LLC	1
Research Papers in Education	1
Sociological Methods &…	1
More ▼

Publication Type

Reports - Research	18
Journal Articles	11
Numerical/Quantitative Data	3
Reports - Descriptive	3
Collected Works - General	2
Reports - Evaluative	2
Books	1
Dissertations/Theses -…	1
ERIC Digests in Full Text	1
ERIC Publications	1
Reports - General	1
Speeches/Meeting Papers	1
Tests/Questionnaires	1
More ▼

Education Level

Elementary Education	2
Elementary Secondary Education	1
Grade 4	1
Intermediate Grades	1
Secondary Education	1

Audience

Researchers

Location

Australia	2
Germany	2
Austria	1
Belgium	1
Canada	1
Chile	1
Cyprus	1
Czech Republic	1
Denmark	1
Estonia	1
France	1
Ireland	1
Italy	1
Japan	1
Netherlands	1
Norway	1
Poland	1
Russia	1
Slovakia	1
South Korea	1
Spain	1
Sweden	1
United Kingdom	1
United Kingdom (England)	1
United States	1
More ▼

Laws, Policies, & Programs

Assessments and Surveys

Program for International…	2
Flesch Kincaid Grade Level…	1
International Association for…	1
Progress in International…	1
Trends in International…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 27 results Save | Export

Response Quality in Nonprobability and Probability-Based Online Panels

Peer reviewed

Direct link

Cornesse, Carina; Blom, Annelies G. – Sociological Methods & Research, 2023

Recent years have seen a growing number of studies investigating the accuracy of nonprobability online panels; however, response quality in nonprobability online panels has not yet received much attention. To fill this gap, we investigate response quality in a comprehensive study of seven nonprobability online panels and three probability-based…

Descriptors: Probability, Sampling, Social Science Research, Research Methodology

Use of Jackknifing to Evaluate Effects of Anchor Item Selection on Equating with the Nonequivalent Groups with Anchor Test (NEAT) Design. Research Report. ETS RR-15-10

Peer reviewed
PDF on ERIC

Download full text

Lu, Ru; Haberman, Shelby; Guo, Hongwen; Liu, Jinghua – ETS Research Report Series, 2015

In this study, we apply jackknifing to anchor items to evaluate the impact of anchor selection on equating stability. In an ideal world, the choice of anchor items should have little impact on equating results. When this ideal does not correspond to reality, selection of anchor items can strongly influence equating results. This influence does not…

Descriptors: Test Construction, Equated Scores, Test Items, Sampling

Evaluating the Consistency of Angoff-Based Cut Scores Using Subsets of Items within a Generalizability Theory Framework

Peer reviewed

Direct link

Kannan, Priya; Sgammato, Adrienne; Tannenbaum, Richard J.; Katz, Irvin R. – Applied Measurement in Education, 2015

The Angoff method requires experts to view every item on the test and make a probability judgment. This can be time consuming when there are large numbers of items on the test. In this study, a G-theory framework was used to determine if a subset of items can be used to make generalizable cut-score recommendations. Angoff ratings (i.e.,…

Descriptors: Reliability, Standard Setting (Scoring), Cutting Scores, Test Items

Reliability and Validity of International Large-Scale Assessment: Understanding IEA's Comparative Studies of Student Achievement. IEA Research for Education. Volume 10

Download full text

Wagemaker, Hans, Ed. – International Association for the Evaluation of Educational Achievement, 2020

Although International Association for the Evaluation of Educational Achievement-pioneered international large-scale assessment (ILSA) of education is now a well-established science, non-practitioners and many users often substantially misunderstand how large-scale assessments are conducted, what questions and challenges they are designed to…

Descriptors: International Assessment, Achievement Tests, Educational Assessment, Comparative Analysis

Coefficient Omega Bootstrap Confidence Intervals: Nonnormal Distributions

Peer reviewed

Direct link

Padilla, Miguel A.; Divers, Jasmin – Educational and Psychological Measurement, 2013

The performance of the normal theory bootstrap (NTB), the percentile bootstrap (PB), and the bias-corrected and accelerated (BCa) bootstrap confidence intervals (CIs) for coefficient omega was assessed through a Monte Carlo simulation under conditions not previously investigated. Of particular interests were nonnormal Likert-type and binary items.…

Descriptors: Sampling, Statistical Inference, Computation, Statistical Analysis

An Investigation of Measurement Invariance of the Key Stage 2 National Curriculum Science Sampling Test in England

Peer reviewed

Direct link

He, Qingping; Anwyll, Steve; Glanville, Matthew; Opposs, Dennis – Research Papers in Education, 2014

Since 2010, the whole national cohort Key Stage 2 (KS2) National Curriculum test in science in England has been replaced with a sampling test taken by pupils at the age of 11 from a nationally representative sample of schools annually. The study reported in this paper compares the performance of different subgroups of the samples (classified by…

Descriptors: National Curriculum, Sampling, Foreign Countries, Factor Analysis

Measuring Understanding of Nanoscience and Nanotechnology: Development and Validation of the Nano-Knowledge Instrument (NanoKI)

Peer reviewed

Direct link

Schönborn, K. J.; Höst, G. E.; Lundin Palmerius, K. E. – Chemistry Education Research and Practice, 2015

As the application of nanotechnology in everyday life impacts society, it becomes critical for citizens to have a scientific basis upon which to judge their perceived hopes and fears of 'nano'. Although multiple instruments have been designed for assessing attitudinal and affective aspects of nano, surprisingly little work has focused on…

Descriptors: Molecular Structure, Technology, Test Construction, Test Validity

Bi-Factor Multidimensional Item Response Theory Modeling for Subscores Estimation, Reliability, and Classification

Direct link

Md Desa, Zairul Nor Deana – ProQuest LLC, 2012

In recent years, there has been increasing interest in estimating and improving subscore reliability. In this study, the multidimensional item response theory (MIRT) and the bi-factor model were combined to estimate subscores, to obtain subscores reliability, and subscores classification. Both the compensatory and partially compensatory MIRT…

Descriptors: Item Response Theory, Computation, Reliability, Classification

Technical Report of the Survey of Adult Skills (PIAAC)

Direct link

OECD Publishing, 2013

The Programme for the International Assessment of Adult Competencies (PIAAC) has been planned as an ongoing program of assessment. The first cycle of the assessment has involved two "rounds." The first round, which is covered by this report, took place over the period of January 2008-October 2013. The main features of the first cycle of…

Descriptors: International Assessment, Adults, Skills, Test Construction

Classification Consistency and Accuracy for Complex Assessments under the Compound Multinomial Model

Peer reviewed

Direct link

Lee, Won-Chan; Brennan, Robert L.; Wan, Lei – Applied Psychological Measurement, 2009

For a test that consists of dichotomously scored items, several approaches have been reported in the literature for estimating classification consistency and accuracy indices based on a single administration of a test. Classification consistency and accuracy have not been studied much, however, for "complex" assessments--for example,…

Descriptors: Classification, Reliability, Test Items, Scoring

PISA 2012 Technical Report

Direct link

OECD Publishing, 2014

The "PISA 2012 Technical Report" describes the methodology underlying the PISA 2012 survey, which tested 15-year-olds' competencies in mathematics, reading and science and, in some countries, problem solving and financial literacy. It examines the design and implementation of the project at a level of detail that allows researchers to…

Descriptors: International Assessment, Secondary School Students, Foreign Countries, Achievement Tests

Magnitude of Task-Sampling Variability in Performance Assessment: A Meta-Analysis

Peer reviewed

Direct link

Huang, Chiungjung – Educational and Psychological Measurement, 2009

This study examined the percentage of task-sampling variability in performance assessment via a meta-analysis. In total, 50 studies containing 130 independent data sets were analyzed. Overall results indicate that the percentage of variance for (a) differential difficulty of task was roughly 12% and (b) examinee's differential performance of the…

Descriptors: Test Bias, Research Design, Performance Based Assessment, Performance Tests

PISA 2006 Technical Report

Direct link

OECD Publishing (NJ1), 2009

The Organisation for Economic Cooperation and Development's (OECD's) Programme for International Student Assessment (PISA) surveys, which take place every three years, have been designed to collect information about 15-year-old students in participating countries. PISA examines how well students are prepared to meet the challenges of the future,…

Descriptors: Policy Formation, Scaling, Academic Achievement, Interrater Reliability

Costs of Matrix Sampling of Test Items. ERIC Digest.

Download full text

Childs, Ruth A.; Jaciw, Andrew P. – 2003

Matrix sampling of test items, the division of a set of items into different versions of a test form, is used by several large-scale testing programs. This Digest discusses nine categories of costs associated with matrix sampling. These categories are: (1) development costs; (2) materials costs; (3) administration costs; (4) educational costs; (5)…

Descriptors: Costs, Matrices, Reliability, Sampling

Nonfunctioning Options: A Closer Look.

Peer reviewed

Cizek, Gregory J.; Robinson, K. Lynne; O'Day, Denis M. – Educational and Psychological Measurement, 1998

The effect of removing nonfunctioning items from multiple-choice tests was studied by examining change in difficulty, discrimination, and dimensionality. Results provide additional support for the benefits of eliminating nonfunctioning options, such as enhanced score reliability, reduced testing time, potential for broader domain sampling, and…

Descriptors: Difficulty Level, Multiple Choice Tests, Sampling, Scores

Previous Page | Next Page »

Pages: 1 | 2

Meijer, Rob R.	2
Anwyll, Steve	1
Blom, Annelies G.	1
Brennan, Robert L.	1
Cahen, Leonard S.	1
Childs, Ruth A.	1
Cizek, Gregory J.	1
Cornesse, Carina	1
Divers, Jasmin	1
Douglass, James B.	1
Farish, Stephen J.	1
Fuchs, Lynn	1
Garg, Rashmi	1
Gifford, Janice A.	1
Glanville, Matthew	1
Guo, Hongwen	1
Haberman, Shelby	1
Haladyna, Tom	1
Hambleton, Ronald K.	1
He, Qingping	1
Huang, Chiungjung	1
Höst, G. E.	1
Jaciw, Andrew P.	1
Jensen, Harald E.	1
More ▼