NotesFAQContact Us
Collection
Advanced
Search Tips
Audience
Researchers2
Laws, Policies, & Programs
What Works Clearinghouse Rating
Showing 1 to 15 of 41 results Save | Export
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Wendler, Cathy; Glazer, Nancy; Cline, Frederick – ETS Research Report Series, 2019
One of the challenges in scoring constructed-response (CR) items and tasks is ensuring that rater drift does not occur during or across scoring windows. Rater drift reflects changes in how raters interpret and use established scoring criteria to assign essay scores. Calibration is a process used to help control rater drift and, as such, serves as…
Descriptors: College Entrance Examinations, Graduate Study, Accuracy, Test Reliability
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Razmi, Mohammad Hasan; Khabir, Masoud; Tilwani, Shouket Ahmad – International Journal of Language Testing, 2021
Since its inception in 1949, over 1,500 studies have investigated the validity of the GRE General Test to predict its performance criteria in higher education (Klieger, Bridgeman, Tannenbaum, & Cline, 2016). The present review paper sought to examine the predictive validity of the GRE General Test. Factors affecting the predictive validity…
Descriptors: Meta Analysis, Predictive Validity, College Entrance Examinations, Graduate Study
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Finn, Bridgid; Wendler, Cathy; Ricker-Pedley, Kathryn L.; Arslan, Burcu – ETS Research Report Series, 2018
This report investigates whether the time between scoring sessions has an influence on operational and nonoperational scoring accuracy. The study evaluates raters' scoring accuracy on constructed-response essay responses for the "GRE"® General Test. Binomial linear mixed-effect models are presented that evaluate how the effect of various…
Descriptors: Intervals, Scoring, Accuracy, Essay Tests
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Liu, Yuming; Robin, Frédéric; Yoo, Hanwook; Manna, Venessa – ETS Research Report Series, 2018
The "GRE"® Psychology test is an achievement test that measures core knowledge in 12 content domains that represent the courses commonly offered at the undergraduate level. Currently, a total score and 2 subscores, experimental and social, are reported to test takers as well as graduate institutions. However, the American Psychological…
Descriptors: College Entrance Examinations, Graduate Study, Psychological Testing, Scores
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Klieger, David M.; Bridgeman, Brent; Tannenbaum, Richard J.; Cline, Frederick A.; Olivera-Aguilar, Margarita – ETS Research Report Series, 2018
Educational Testing Service (ETS), working with 21 U.S. law schools, evaluated the predictive validity of the GRE® General Test using a sample of 1,587 current and graduated law students. Results indicated that the GRE is a strong, generalizably valid predictor of first-year law school grades and that it provides useful information even when…
Descriptors: College Entrance Examinations, Graduate Study, Test Validity, Scores
Peer reviewed Peer reviewed
Direct linkDirect link
Attali, Yigal; Lewis, Will; Steier, Michael – Language Testing, 2013
Automated essay scoring can produce reliable scores that are highly correlated with human scores, but is limited in its evaluation of content and other higher-order aspects of writing. The increased use of automated essay scoring in high-stakes testing underscores the need for human scoring that is focused on higher-order aspects of writing. This…
Descriptors: Scoring, Essay Tests, Reliability, High Stakes Tests
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Zhang, Mo – ETS Research Report Series, 2013
Many testing programs use automated scoring to grade essays. One issue in automated essay scoring that has not been examined adequately is population invariance and its causes. The primary purpose of this study was to investigate the impact of sampling in model calibration on population invariance of automated scores. This study analyzed scores…
Descriptors: Automation, Scoring, Essay Tests, Sampling
Ricker-Pedley, Kathryn L. – Educational Testing Service, 2011
A pseudo-experimental study was conducted to examine the link between rater accuracy calibration performances and subsequent accuracy during operational scoring. The study asked 45 raters to score a 75-response calibration set and then a 100-response (operational) set of responses from a retired Graduate Record Examinations[R] (GRE[R]) writing…
Descriptors: Scoring, Accuracy, College Entrance Examinations, Writing Tests
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Bejar, Isaac I.; VanWinkle, Waverely; Madnani, Nitin; Lewis, William; Steier, Michael – ETS Research Report Series, 2013
The paper applies a natural language computational tool to study a potential construct-irrelevant response strategy, namely the use of "shell language." Although the study is motivated by the impending increase in the volume of scoring of students responses from assessments to be developed in response to the Race to the Top initiative,…
Descriptors: Responses, Language Usage, Natural Language Processing, Computational Linguistics
Peer reviewed Peer reviewed
Direct linkDirect link
Pacheco, Wendy I.; Noel, Richard J., Jr.; Porter, James T.; Appleyard, Caroline B. – CBE - Life Sciences Education, 2015
The use and validity of the Graduate Record Examination General Test (GRE) to predict the success of graduate school applicants is heavily debated, especially for its possible impact on the selection of underrepresented minorities into science, technology, engineering, and math fields. To better identify candidates who would succeed in our program…
Descriptors: Puerto Ricans, Predictor Variables, Success, Doctoral Programs
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Attali, Yigal; Sinharay, Sandip – ETS Research Report Series, 2015
The "e-rater"® automated essay scoring system is used operationally in the scoring of the argument and issue tasks that form the Analytical Writing measure of the "GRE"® General Test. For each of these tasks, this study explored the value added of reporting 4 trait scores for each of these 2 tasks over the total e-rater score.…
Descriptors: Scores, Computer Assisted Testing, Computer Software, Grammar
Peer reviewed Peer reviewed
Direct linkDirect link
Davison, Mark L.; Semmes, Robert; Huang, Lan; Close, Catherine N. – Educational and Psychological Measurement, 2012
Data from 181 college students were used to assess whether math reasoning item response times in computerized testing can provide valid and reliable measures of a speed dimension. The alternate forms reliability of the speed dimension was .85. A two-dimensional structural equation model suggests that the speed dimension is related to the accuracy…
Descriptors: Computer Assisted Testing, Reaction Time, Reliability, Validity
Peer reviewed Peer reviewed
Direct linkDirect link
Semmes, Robert; Davison, Mark L.; Close, Catherine – Applied Psychological Measurement, 2011
If numerical reasoning items are administered under time limits, will two dimensions be required to account for the responses, a numerical ability dimension and a speed dimension? A total of 182 college students answered 74 numerical reasoning items. Every item was taken with and without time limits by half the students. Three psychometric models…
Descriptors: Individual Differences, Logical Thinking, Timed Tests, College Students
Peer reviewed Peer reviewed
Direct linkDirect link
Dorans, Neil J. – Educational Measurement: Issues and Practice, 2012
Views on testing--its purpose and uses and how its data are analyzed--are related to one's perspective on test takers. Test takers can be viewed as learners, examinees, or contestants. I briefly discuss the perspective of test takers as learners. I maintain that much of psychometrics views test takers as examinees. I discuss test takers as a…
Descriptors: Testing, Test Theory, Item Response Theory, Test Reliability
Peer reviewed Peer reviewed
Direct linkDirect link
Huang, Hung-Yu; Wang, Wen-Chung – Educational and Psychological Measurement, 2013
Both testlet design and hierarchical latent traits are fairly common in educational and psychological measurements. This study aimed to develop a new class of higher order testlet response models that consider both local item dependence within testlets and a hierarchy of latent traits. Due to high dimensionality, the authors adopted the Bayesian…
Descriptors: Item Response Theory, Models, Bayesian Statistics, Computation
Previous Page | Next Page »
Pages: 1  |  2  |  3