NotesFAQContact Us
Collection
Advanced
Search Tips
Back to results
Peer reviewed Peer reviewed
PDF on ERIC Download full text
ERIC Number: ED608067
Record Type: Non-Journal
Publication Date: 2020-Jul
Pages: 8
Abstractor: As Provided
ISBN: N/A
ISSN: N/A
EISSN: N/A
Whose Truth Is the "Ground Truth"? College Admissions Essays and Bias in Word Vector Evaluation Methods
Arthurs, Noah; Alvero, A. J.
International Educational Data Mining Society, Paper presented at the International Conference on Educational Data Mining (EDM) (13th, Online, Jul 10-13, 2020)
Word vectors are widely used as input features in natural language processing (NLP) tasks. Researchers have found that word vectors often encode the biases of society, and steps have been taken towards debiasing the vectors themselves. However, little has been said about the fairness of the methods used to evaluate the quality of vectors. Analogical and word similarity tasks are commonplace, but both rely on purportedly ground truth statements about the semantic relationships between words (e.g. "man is to woman as king is to queen"). These analogies look reasonable when only taking into account the literal meanings of words, but two issues arise: (1) people don't always use words in a literal sense, and (2) the same word may be used differently by different groups of people. In this paper, we split a dataset of over 800,000 college admissions essays into quartiles based on reported household income (RHI) and train sets of word vectors on each quartile. We then test these sets of vectors on common intrinsic evaluation tasks. We find that vectors trained on the essays of higher income students encode more of each task's target semantic relationships than vectors trained on the essays of lower income students. These results hold even when controlling for word frequency. We conclude that the tasks themselves are biased towards the writing of higher income students, and we challenge the notion that there exist ground truth semantic relationships that word vectors must encode in order to be useful. [For the full proceedings, see ED607784.]
International Educational Data Mining Society. e-mail: admin@educationaldatamining.org; Web site: http://www.educationaldatamining.org
Publication Type: Speeches/Meeting Papers; Reports - Research
Education Level: Higher Education; Postsecondary Education
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A
Grant or Contract Numbers: N/A