Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 0 |
Since 2016 (last 10 years) | 1 |
Since 2006 (last 20 years) | 1 |
Descriptor
Author
Lunz, Mary E. | 2 |
O'Neill, Thomas R. | 2 |
Bay, Luz | 1 |
Beutner, Marc | 1 |
Bradley, Raymond T. | 1 |
Brull, Harry | 1 |
Burton, Elizabeth | 1 |
Fenton, Ray | 1 |
Ferrara, F. Felicia | 1 |
Goldberg, Gail Lynn | 1 |
Hamilton, Eric G. | 1 |
More ▼ |
Publication Type
Speeches/Meeting Papers | 27 |
Reports - Evaluative | 14 |
Reports - Research | 9 |
Opinion Papers | 2 |
Reports - Descriptive | 2 |
Journal Articles | 1 |
Education Level
Audience
Researchers | 2 |
Laws, Policies, & Programs
Assessments and Surveys
California Achievement Tests | 1 |
Graduate Record Examinations | 1 |
National Assessment of… | 1 |
What Works Clearinghouse Rating
Beutner, Marc; Rüscher, Frederike Anna – International Association for Development of the Information Society, 2017
This paper provides insights in the development of a skill matching test which addresses soft skills integrated videos as media to provide information about situations to be rated. The design of the skill testing and matching tool is situated in the educational ERASMUS+ project SMART which is presented as well. With a specific view on team work…
Descriptors: Foreign Countries, Test Construction, Testing, Video Technology
Ferrara, F. Felicia – 1995
Cut scores, quartile ranking, sample size, and overall classification scheme were studied as personnel selection procedures in two samples. The first was 120 simulated observations of employee scores based on actual selection procedures for applicants for administrative assistant positions. The other sample was composed of test results for 73…
Descriptors: Classification, Cutting Scores, Job Applicants, Personnel Selection
Thayer, Jerome D. – 1991
Combining student scores to form subtotals and finally a total score to determine a grade is discussed. The composite score reached by combining measures or subtotals is only valid when the scores are combined so that the actual weight of each measure or subtotal in the total score is the same as the intended weight. Three types of variability…
Descriptors: Academic Achievement, Elementary Secondary Education, Grading, Mathematical Models
O'Neill, Thomas R.; Lunz, Mary E. – 1997
This paper illustrates a method to study rater severity across exam administrations. A multi-facet Rasch model defined the ratings as being dominated by four facets: examinee ability, rater severity, project difficulty, and task difficulty. Ten years of data from administrations of a histotechnology performance assessment were pooled and analyzed…
Descriptors: Ability, Comparative Analysis, Equated Scores, Interrater Reliability
Taylor, Catherine S. – 1996
This study investigated the impact of task directions on the mathematical performance of high school students from six classes. Students analyzed data regarding school dropout by answering six short-answer questions and writing a letter discussing the trends and their predictions about school dropout. Tasks were scored using two methods: (1) trait…
Descriptors: Dropouts, High School Students, High Schools, Mathematics Tests
Page, Ellis B.; Poggio, John P.; Keith, Timothy Z. – 1997
Most human gradings of essays are holistic, or "overall." Therefore, Project Essay Grade (PEG), an attempt to develop computerized grading of essays, has concentrated most of its research on overall grading. It has successfully simulated human judges. However, since computer grading is less expensive than human grading, PEG has also…
Descriptors: Computer Assisted Testing, Elementary Secondary Education, Essays, Evaluators

Linn, Robert L.; Burton, Elizabeth – Educational Measurement: Issues and Practice, 1994
Generalizability of performance-based assessment scores across raters and tasks is examined, focusing on implications of generalizability analyses for specific uses and interpretations of assessment results. Although it seems probable that assessment conditions, task characteristics, and interactions with instructional experiences affect the…
Descriptors: Educational Assessment, Educational Experience, Generalizability Theory, Interaction
Bradley, Raymond T.; And Others – 1991
Test data from a large licensure examination were used to investigate the impact of the number of score groups and score group size on the Mantel-Haenszel (MH) alpha statistic. Two kinds of quasi-experimental studies were conducted: (1) baseline MH analyses in which the number of score groups and the ethnicity of each randomly equivalent…
Descriptors: Comparative Analysis, Ethnicity, Groups, Licensing Examinations (Professions)
Slater, Sharon C.; Schaeffer, Gary A. – 1996
The General Computer Adaptive Test (CAT) of the Graduate Record Examinations (GRE) includes three operational sections that are separately timed and scored. A "no score" is reported if the examinee answers fewer than 80% of the items or if the examinee does not answer all of the items and leaves the section before time expires. The 80%…
Descriptors: Adaptive Testing, College Students, Computer Assisted Testing, Equal Education
Bay, Luz; Loomis, Susan Cooper; Wang, Tianyou – 1995
This study examines the effects on the National Assessment of Educational Progress (NAEP) achievement levels of using item response theory (IRT) models that have nominal missing-response parameters. It compares cutpoints based on item parameters that were fitted using two different models. The first set of cutpoints were based on parameters for…
Descriptors: Academic Achievement, Comparative Analysis, Cutting Scores, High School Seniors
Nolet, Victor; Tindal, Gerald – 1990
Valid interpretation of test scores is the shared responsibility of the test designer and the test user. Test publishers must provide evidence of the validity of the decisions their tests are intended to support, while test users are responsible for analyzing this evidence and subsequently using the test in the manner indicated by the publisher.…
Descriptors: Achievement Tests, Construct Validity, Elementary Secondary Education, Norm Referenced Tests
O'Neill, Thomas R.; Lunz, Mary E. – 1996
To generalize test results beyond the particular test administration, an examinee's ability estimate must be independent of the particular items attempted, and the item difficulty calibrations must be independent of the particular sample of people attempting the items. This stability is a key concept of the Rasch model, a latent trait model of…
Descriptors: Ability, Benchmarking, Comparative Analysis, Difficulty Level
Fenton, Ray; Straugh, Tom; Stofflet, Fred – 1997
Writing assessment began in Alaska in the 1970s, and the Alaska Writing Assessment (AWA) that was piloted in 1997 built on previous efforts. The 1997 AWA involved more than 20,000 students in grades 5, 7, and 10 from 43 school districts, and the mandatory assessment planned for 1998 will include approximately 28,000 students. This review of the…
Descriptors: Elementary Secondary Education, Evaluation Methods, Program Implementation, Resource Allocation
Goldberg, Gail Lynn; Michaels, Hillary – 1995
Preliminary data was gathered to guide subsequent research that will shape training procedures and scoring practice for performance assessment activities that integrate multiple content areas. Content area integration is a key feature of many of the tasks in the Maryland School Performance Assessment Program (MSPAP), a large-scale assessment of…
Descriptors: Elementary Education, Evaluators, Grade 3, Grade 8
Kaiser, Paul D.; Brull, Harry – 1994
The design, administration, scoring, and results of the 1993 New York State Correctional Captain Examination are described. The examination was administered to 405 candidates. As in previous Sergeant and Lieutenant examinations, candidates also completed latent image written simulation problems and open/closed book multiple choice test components.…
Descriptors: Competitive Selection, Correctional Rehabilitation, Decision Making, Educational Innovation
Previous Page | Next Page »
Pages: 1 | 2