Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 1 |
Since 2016 (last 10 years) | 5 |
Since 2006 (last 20 years) | 18 |
Descriptor
Test Reliability | 30 |
College Entrance Examinations | 29 |
Graduate Study | 24 |
Scoring | 15 |
Higher Education | 14 |
Test Validity | 14 |
Correlation | 10 |
Scores | 10 |
Predictive Validity | 9 |
Interrater Reliability | 8 |
Reliability | 7 |
More ▼ |
Source
Author
Reilly, Richard R. | 4 |
Attali, Yigal | 3 |
Bridgeman, Brent | 3 |
Bennett, Randy Elliot | 2 |
Carlson, Sybil B. | 2 |
Davison, Mark L. | 2 |
Jackson, Rex | 2 |
Ricker-Pedley, Kathryn L. | 2 |
Semmes, Robert | 2 |
Steier, Michael | 2 |
Wendler, Cathy | 2 |
More ▼ |
Publication Type
Reports - Research | 24 |
Journal Articles | 21 |
Reports - Evaluative | 6 |
Speeches/Meeting Papers | 3 |
Numerical/Quantitative Data | 2 |
Reports - Descriptive | 2 |
Reports - General | 2 |
Tests/Questionnaires | 2 |
Collected Works - Serials | 1 |
Education Level
Higher Education | 17 |
Postsecondary Education | 17 |
Junior High Schools | 1 |
Middle Schools | 1 |
Secondary Education | 1 |
Audience
Researchers | 2 |
Location
Arizona | 1 |
Puerto Rico | 1 |
Taiwan | 1 |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Wendler, Cathy; Glazer, Nancy; Cline, Frederick – ETS Research Report Series, 2019
One of the challenges in scoring constructed-response (CR) items and tasks is ensuring that rater drift does not occur during or across scoring windows. Rater drift reflects changes in how raters interpret and use established scoring criteria to assign essay scores. Calibration is a process used to help control rater drift and, as such, serves as…
Descriptors: College Entrance Examinations, Graduate Study, Accuracy, Test Reliability
Razmi, Mohammad Hasan; Khabir, Masoud; Tilwani, Shouket Ahmad – International Journal of Language Testing, 2021
Since its inception in 1949, over 1,500 studies have investigated the validity of the GRE General Test to predict its performance criteria in higher education (Klieger, Bridgeman, Tannenbaum, & Cline, 2016). The present review paper sought to examine the predictive validity of the GRE General Test. Factors affecting the predictive validity…
Descriptors: Meta Analysis, Predictive Validity, College Entrance Examinations, Graduate Study
Finn, Bridgid; Wendler, Cathy; Ricker-Pedley, Kathryn L.; Arslan, Burcu – ETS Research Report Series, 2018
This report investigates whether the time between scoring sessions has an influence on operational and nonoperational scoring accuracy. The study evaluates raters' scoring accuracy on constructed-response essay responses for the "GRE"® General Test. Binomial linear mixed-effect models are presented that evaluate how the effect of various…
Descriptors: Intervals, Scoring, Accuracy, Essay Tests
Liu, Yuming; Robin, Frédéric; Yoo, Hanwook; Manna, Venessa – ETS Research Report Series, 2018
The "GRE"® Psychology test is an achievement test that measures core knowledge in 12 content domains that represent the courses commonly offered at the undergraduate level. Currently, a total score and 2 subscores, experimental and social, are reported to test takers as well as graduate institutions. However, the American Psychological…
Descriptors: College Entrance Examinations, Graduate Study, Psychological Testing, Scores
Klieger, David M.; Bridgeman, Brent; Tannenbaum, Richard J.; Cline, Frederick A.; Olivera-Aguilar, Margarita – ETS Research Report Series, 2018
Educational Testing Service (ETS), working with 21 U.S. law schools, evaluated the predictive validity of the GRE® General Test using a sample of 1,587 current and graduated law students. Results indicated that the GRE is a strong, generalizably valid predictor of first-year law school grades and that it provides useful information even when…
Descriptors: College Entrance Examinations, Graduate Study, Test Validity, Scores
Attali, Yigal; Lewis, Will; Steier, Michael – Language Testing, 2013
Automated essay scoring can produce reliable scores that are highly correlated with human scores, but is limited in its evaluation of content and other higher-order aspects of writing. The increased use of automated essay scoring in high-stakes testing underscores the need for human scoring that is focused on higher-order aspects of writing. This…
Descriptors: Scoring, Essay Tests, Reliability, High Stakes Tests
Zhang, Mo – ETS Research Report Series, 2013
Many testing programs use automated scoring to grade essays. One issue in automated essay scoring that has not been examined adequately is population invariance and its causes. The primary purpose of this study was to investigate the impact of sampling in model calibration on population invariance of automated scores. This study analyzed scores…
Descriptors: Automation, Scoring, Essay Tests, Sampling
Ricker-Pedley, Kathryn L. – Educational Testing Service, 2011
A pseudo-experimental study was conducted to examine the link between rater accuracy calibration performances and subsequent accuracy during operational scoring. The study asked 45 raters to score a 75-response calibration set and then a 100-response (operational) set of responses from a retired Graduate Record Examinations[R] (GRE[R]) writing…
Descriptors: Scoring, Accuracy, College Entrance Examinations, Writing Tests
Bejar, Isaac I.; VanWinkle, Waverely; Madnani, Nitin; Lewis, William; Steier, Michael – ETS Research Report Series, 2013
The paper applies a natural language computational tool to study a potential construct-irrelevant response strategy, namely the use of "shell language." Although the study is motivated by the impending increase in the volume of scoring of students responses from assessments to be developed in response to the Race to the Top initiative,…
Descriptors: Responses, Language Usage, Natural Language Processing, Computational Linguistics
Pacheco, Wendy I.; Noel, Richard J., Jr.; Porter, James T.; Appleyard, Caroline B. – CBE - Life Sciences Education, 2015
The use and validity of the Graduate Record Examination General Test (GRE) to predict the success of graduate school applicants is heavily debated, especially for its possible impact on the selection of underrepresented minorities into science, technology, engineering, and math fields. To better identify candidates who would succeed in our program…
Descriptors: Puerto Ricans, Predictor Variables, Success, Doctoral Programs
Attali, Yigal; Sinharay, Sandip – ETS Research Report Series, 2015
The "e-rater"® automated essay scoring system is used operationally in the scoring of the argument and issue tasks that form the Analytical Writing measure of the "GRE"® General Test. For each of these tasks, this study explored the value added of reporting 4 trait scores for each of these 2 tasks over the total e-rater score.…
Descriptors: Scores, Computer Assisted Testing, Computer Software, Grammar
Davison, Mark L.; Semmes, Robert; Huang, Lan; Close, Catherine N. – Educational and Psychological Measurement, 2012
Data from 181 college students were used to assess whether math reasoning item response times in computerized testing can provide valid and reliable measures of a speed dimension. The alternate forms reliability of the speed dimension was .85. A two-dimensional structural equation model suggests that the speed dimension is related to the accuracy…
Descriptors: Computer Assisted Testing, Reaction Time, Reliability, Validity
Semmes, Robert; Davison, Mark L.; Close, Catherine – Applied Psychological Measurement, 2011
If numerical reasoning items are administered under time limits, will two dimensions be required to account for the responses, a numerical ability dimension and a speed dimension? A total of 182 college students answered 74 numerical reasoning items. Every item was taken with and without time limits by half the students. Three psychometric models…
Descriptors: Individual Differences, Logical Thinking, Timed Tests, College Students
Dorans, Neil J. – Educational Measurement: Issues and Practice, 2012
Views on testing--its purpose and uses and how its data are analyzed--are related to one's perspective on test takers. Test takers can be viewed as learners, examinees, or contestants. I briefly discuss the perspective of test takers as learners. I maintain that much of psychometrics views test takers as examinees. I discuss test takers as a…
Descriptors: Testing, Test Theory, Item Response Theory, Test Reliability
Huang, Hung-Yu; Wang, Wen-Chung – Educational and Psychological Measurement, 2013
Both testlet design and hierarchical latent traits are fairly common in educational and psychological measurements. This study aimed to develop a new class of higher order testlet response models that consider both local item dependence within testlets and a hierarchy of latent traits. Due to high dimensionality, the authors adopted the Bayesian…
Descriptors: Item Response Theory, Models, Bayesian Statistics, Computation