NotesFAQContact Us
Collection
Advanced
Search Tips
Audience
Researchers1
Laws, Policies, & Programs
What Works Clearinghouse Rating
Showing 1 to 15 of 27 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Cornesse, Carina; Blom, Annelies G. – Sociological Methods & Research, 2023
Recent years have seen a growing number of studies investigating the accuracy of nonprobability online panels; however, response quality in nonprobability online panels has not yet received much attention. To fill this gap, we investigate response quality in a comprehensive study of seven nonprobability online panels and three probability-based…
Descriptors: Probability, Sampling, Social Science Research, Research Methodology
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Lu, Ru; Haberman, Shelby; Guo, Hongwen; Liu, Jinghua – ETS Research Report Series, 2015
In this study, we apply jackknifing to anchor items to evaluate the impact of anchor selection on equating stability. In an ideal world, the choice of anchor items should have little impact on equating results. When this ideal does not correspond to reality, selection of anchor items can strongly influence equating results. This influence does not…
Descriptors: Test Construction, Equated Scores, Test Items, Sampling
Peer reviewed Peer reviewed
Direct linkDirect link
Kannan, Priya; Sgammato, Adrienne; Tannenbaum, Richard J.; Katz, Irvin R. – Applied Measurement in Education, 2015
The Angoff method requires experts to view every item on the test and make a probability judgment. This can be time consuming when there are large numbers of items on the test. In this study, a G-theory framework was used to determine if a subset of items can be used to make generalizable cut-score recommendations. Angoff ratings (i.e.,…
Descriptors: Reliability, Standard Setting (Scoring), Cutting Scores, Test Items
Wagemaker, Hans, Ed. – International Association for the Evaluation of Educational Achievement, 2020
Although International Association for the Evaluation of Educational Achievement-pioneered international large-scale assessment (ILSA) of education is now a well-established science, non-practitioners and many users often substantially misunderstand how large-scale assessments are conducted, what questions and challenges they are designed to…
Descriptors: International Assessment, Achievement Tests, Educational Assessment, Comparative Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Padilla, Miguel A.; Divers, Jasmin – Educational and Psychological Measurement, 2013
The performance of the normal theory bootstrap (NTB), the percentile bootstrap (PB), and the bias-corrected and accelerated (BCa) bootstrap confidence intervals (CIs) for coefficient omega was assessed through a Monte Carlo simulation under conditions not previously investigated. Of particular interests were nonnormal Likert-type and binary items.…
Descriptors: Sampling, Statistical Inference, Computation, Statistical Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
He, Qingping; Anwyll, Steve; Glanville, Matthew; Opposs, Dennis – Research Papers in Education, 2014
Since 2010, the whole national cohort Key Stage 2 (KS2) National Curriculum test in science in England has been replaced with a sampling test taken by pupils at the age of 11 from a nationally representative sample of schools annually. The study reported in this paper compares the performance of different subgroups of the samples (classified by…
Descriptors: National Curriculum, Sampling, Foreign Countries, Factor Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Schönborn, K. J.; Höst, G. E.; Lundin Palmerius, K. E. – Chemistry Education Research and Practice, 2015
As the application of nanotechnology in everyday life impacts society, it becomes critical for citizens to have a scientific basis upon which to judge their perceived hopes and fears of 'nano'. Although multiple instruments have been designed for assessing attitudinal and affective aspects of nano, surprisingly little work has focused on…
Descriptors: Molecular Structure, Technology, Test Construction, Test Validity
Md Desa, Zairul Nor Deana – ProQuest LLC, 2012
In recent years, there has been increasing interest in estimating and improving subscore reliability. In this study, the multidimensional item response theory (MIRT) and the bi-factor model were combined to estimate subscores, to obtain subscores reliability, and subscores classification. Both the compensatory and partially compensatory MIRT…
Descriptors: Item Response Theory, Computation, Reliability, Classification
OECD Publishing, 2013
The Programme for the International Assessment of Adult Competencies (PIAAC) has been planned as an ongoing program of assessment. The first cycle of the assessment has involved two "rounds." The first round, which is covered by this report, took place over the period of January 2008-October 2013. The main features of the first cycle of…
Descriptors: International Assessment, Adults, Skills, Test Construction
Peer reviewed Peer reviewed
Direct linkDirect link
Lee, Won-Chan; Brennan, Robert L.; Wan, Lei – Applied Psychological Measurement, 2009
For a test that consists of dichotomously scored items, several approaches have been reported in the literature for estimating classification consistency and accuracy indices based on a single administration of a test. Classification consistency and accuracy have not been studied much, however, for "complex" assessments--for example,…
Descriptors: Classification, Reliability, Test Items, Scoring
OECD Publishing, 2014
The "PISA 2012 Technical Report" describes the methodology underlying the PISA 2012 survey, which tested 15-year-olds' competencies in mathematics, reading and science and, in some countries, problem solving and financial literacy. It examines the design and implementation of the project at a level of detail that allows researchers to…
Descriptors: International Assessment, Secondary School Students, Foreign Countries, Achievement Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Huang, Chiungjung – Educational and Psychological Measurement, 2009
This study examined the percentage of task-sampling variability in performance assessment via a meta-analysis. In total, 50 studies containing 130 independent data sets were analyzed. Overall results indicate that the percentage of variance for (a) differential difficulty of task was roughly 12% and (b) examinee's differential performance of the…
Descriptors: Test Bias, Research Design, Performance Based Assessment, Performance Tests
OECD Publishing (NJ1), 2009
The Organisation for Economic Cooperation and Development's (OECD's) Programme for International Student Assessment (PISA) surveys, which take place every three years, have been designed to collect information about 15-year-old students in participating countries. PISA examines how well students are prepared to meet the challenges of the future,…
Descriptors: Policy Formation, Scaling, Academic Achievement, Interrater Reliability
Childs, Ruth A.; Jaciw, Andrew P. – 2003
Matrix sampling of test items, the division of a set of items into different versions of a test form, is used by several large-scale testing programs. This Digest discusses nine categories of costs associated with matrix sampling. These categories are: (1) development costs; (2) materials costs; (3) administration costs; (4) educational costs; (5)…
Descriptors: Costs, Matrices, Reliability, Sampling
Peer reviewed Peer reviewed
Cizek, Gregory J.; Robinson, K. Lynne; O'Day, Denis M. – Educational and Psychological Measurement, 1998
The effect of removing nonfunctioning items from multiple-choice tests was studied by examining change in difficulty, discrimination, and dimensionality. Results provide additional support for the benefits of eliminating nonfunctioning options, such as enhanced score reliability, reduced testing time, potential for broader domain sampling, and…
Descriptors: Difficulty Level, Multiple Choice Tests, Sampling, Scores
Previous Page | Next Page »
Pages: 1  |  2