NotesFAQContact Us
Collection
Advanced
Search Tips
Source
Applied Measurement in…111
Audience
Laws, Policies, & Programs
No Child Left Behind Act 20011
What Works Clearinghouse Rating
Showing 91 to 105 of 111 results Save | Export
Peer reviewed Peer reviewed
Jaeger, Richard M. – Applied Measurement in Education, 1995
A performance-standard setting procedure termed judgmental policy capturing (JPC) and its application are described. A study involving 12 panelists demonstrated the feasibility of the JPC method for setting performance standards for classroom teachers seeking certification from the National Board for Professional Teaching Standards. (SLD)
Descriptors: Decision Making, Educational Assessment, Evaluation Methods, Evaluators
Peer reviewed Peer reviewed
Feldt, Leonard S. – Applied Measurement in Education, 1993
The recommendation that the reliability of multiple-choice tests will be enhanced if the distribution of item difficulties is concentrated at approximately 0.50 is reinforced and extended in this article by viewing the 0/1 item scoring as a dichotomization of an underlying normally distributed ability score. (SLD)
Descriptors: Ability, Difficulty Level, Guessing (Tests), Mathematical Models
Peer reviewed Peer reviewed
Direct linkDirect link
Sykes, Robert C.; Hou, Liling – Applied Measurement in Education, 2003
Weighting responses to Constructed-Response (CR) items has been proposed as a way to increase the contribution these items make to the test score when there is insufficient testing time to administer additional CR items. The effect of various types of weighting items of an IRT-based mixed-format writing examination was investigated.…
Descriptors: Item Response Theory, Weighted Scores, Responses, Scores
Peer reviewed Peer reviewed
Wainer, Howard; Thissen, David – Applied Measurement in Education, 1993
Because assessment instruments of the future may well be composed of a combination of types of questions, a way to combine those scores effectively is discussed. Two new graphic tools are presented that show that it may not be practical to equalize the reliability of different components. (SLD)
Descriptors: Constructed Response, Educational Assessment, Graphs, Item Response Theory
Peer reviewed Peer reviewed
Holland, Paul W.; Wainer, Howard – Applied Measurement in Education, 1990
Two attempts to adjust state mean Scholastic Aptitude Test (SAT) scores for differential participation rates are examined. Both attempts are rejected, and five rules for performing adjustments are outlined to foster follow-up checks on untested assumptions. National Assessment of Educational Progress state data are determined to be more accurate.…
Descriptors: College Applicants, College Entrance Examinations, Estimation (Mathematics), Item Bias
Peer reviewed Peer reviewed
Loyd, Brenda H. – Applied Measurement in Education, 1990
Four mathematics test-item types that may perform differently when calculators are used were assessed using data from 160 high school students attending a summer enrichment program. The effects of testing with and without calculators on testing time, test reliability, item difficulty, and item discrimination were also assessed. (TJH)
Descriptors: Calculators, Difficulty Level, High School Students, High Schools
Peer reviewed Peer reviewed
Angoff, William H. – Applied Measurement in Education, 1988
Suggestions are provided for future research in item bias detection, reduction of essay-reader variation in setting cut-score levels, and limitations of equating theory. (TJH)
Descriptors: College Entrance Examinations, Cutting Scores, Equated Scores, Essay Tests
Peer reviewed Peer reviewed
Klein, Stephen P.; And Others – Applied Measurement in Education, 1995
Portfolios are the centerpiece of Vermont's statewide assessment program in mathematics. Portfolio scores in the first two years were not reliable enough to permit the reporting of student-level results, but increasing the number of readers or the number of portfolio pieces is not operationally feasible. (SLD)
Descriptors: Educational Assessment, Elementary Secondary Education, Mathematics Tests, Performance Based Assessment
Peer reviewed Peer reviewed
Linn, Robert L.; Kiplinger, Vonda L. – Applied Measurement in Education, 1995
The adequacy of linking statewide standardized test results to the National Assessment of Educational Progress by using equipercentile equating procedures was investigated using statewide mathematics data from four states. Results suggest that the linkings are not sufficiently trustworthy to make comparisons based on the tails of the distribution.…
Descriptors: Comparative Analysis, Educational Assessment, Equated Scores, Mathematics Tests
Peer reviewed Peer reviewed
Schiel, Jeffrey L.; Shaw, Dale G. – Applied Measurement in Education, 1992
Changes in information retention resulting from changes in reliability and number of intervals in scale construction were studied to provide quantitative information to help in decisions about choosing intervals. Information retention reached a maximum when the number of intervals was about 8 or more and reliability was near 1.0. (SLD)
Descriptors: Decision Making, Knowledge Level, Mathematical Models, Monte Carlo Methods
Peer reviewed Peer reviewed
McBee, Maridyth M.; Barnes, Laura L. B. – Applied Measurement in Education, 1998
The temporal stability and intertask consistency of an eighth-grade mathematics performance assessment and how task similarity affects the ability to generalize results of the assessments were studied with results from 101 eighth graders. Results support the suggestion that large-scale performance assessments be used with considerable caution…
Descriptors: Academic Achievement, Grade 8, Junior High School Students, Junior High Schools
Peer reviewed Peer reviewed
Hambleton, Ronald K.; Slater, Sharon C. – Applied Measurement in Education, 1997
A brief history of developments in the assessment of the reliability of credentialing examinations is presented, and some new results are outlined that highlight the interactions among scoring, standard setting, and the reliability and validity of pass-fail decisions. Decision consistency is an important concept in evaluating credentialing…
Descriptors: Certification, Credentials, Decision Making, Interaction
Peer reviewed Peer reviewed
Stone, Clement A.; Lane, Suzanne – Applied Measurement in Education, 1991
A model-testing approach for evaluating the stability of item response theory item parameter estimates (IPEs) in a pretest-posttest design is illustrated. Nineteen items from the Head Start Measures Battery were used. A moderately high degree of stability in the IPEs for 5,510 children assessed on 2 occasions was found. (TJH)
Descriptors: Comparative Testing, Compensatory Education, Computer Assisted Testing, Early Childhood Education
Peer reviewed Peer reviewed
Gao, Xiaohong; And Others – Applied Measurement in Education, 1994
This study provides empirical evidence about the sampling variability and generalizability (reliability) of a statewide performance assessment for grade six. Results for 600 students at individual and school levels indicate that task-sampling variability was the major source of measurement error. Rater-sampling variability was negligible. (SLD)
Descriptors: Achievement Tests, Educational Assessment, Elementary School Students, Error of Measurement
Peer reviewed Peer reviewed
Millman, Jason – Applied Measurement in Education, 1991
Alternatives to multiple-choice tests for teacher licensing examinations are described, and their advantages are cited. Concerns are expressed in the areas of cost and practicality, reliability, corruptibility, and validity. A suggestion for reducing costs using multiple-choice responses calibrated to constructed-response tasks is proposed. (SLD)
Descriptors: Beginning Teachers, Constructed Response, Cost Effectiveness, Educational Assessment
Pages: 1  |  2  |  3  |  4  |  5  |  6  |  7  |  8