NotesFAQContact Us
Collection
Advanced
Search Tips
Back to results
ERIC Number: ED663569
Record Type: Non-Journal
Publication Date: 2024
Pages: 186
Abstractor: As Provided
ISBN: 979-8-3844-6218-7
ISSN: N/A
EISSN: N/A
Empirical Shortcomings of Transformer-Based Large Language Models as Expectation-Based Models of Human Sentence Processing
Byung-Doh Oh
ProQuest LLC, Ph.D. Dissertation, The Ohio State University
Decades of psycholinguistics research have shown that human sentence processing is highly incremental and predictive. This has provided evidence for expectation-based theories of sentence processing, which posit that the processing difficulty of linguistic material is modulated by its probability in context. However, these theories do not make strong claims about the latent probability distribution of the human comprehender, which poses key research questions about its characteristics. Computational language models that define a conditional probability distribution are helpful for answering these questions, as they can be trained to embody different predictive processes and yield concrete surprisal predictors (i.e. negative log probabilities) that can be evaluated against measures of processing difficulty collected from human subjects. This dissertation evaluates Transformer-based large language models, which are artificial neural networks trained to predict the next word on massive amounts of text, as expectation-based models of sentence processing. Experiments reveal a robust and systematic divergence between the predictive processing of large language models and that of human subjects, the degree of which increases reliably with their number of parameters and amount of training data. This strong effect indicates that human sentence processing is not driven by the predictions made by these large-scale machine learning models, and highlights a fundamental shortcoming of large language models as models of human cognition. This dissertation additionally elucidates this discrepancy between humans and large language models. A series of analyses shows that large language models generally underpredict human-like processing difficulty by making 'superhumanly accurate' predictions of upcoming words, which may be a manifestation of the extensive real-world knowledge gleaned from large sets of training examples that are not available to humans. The learning trajectories of large language models further reveal that word frequency has a strong interaction with the next-word probabilities learned by these models, which explains both the adverse effects of model size and training data amount on their fit to measures of processing difficulty. Moreover, this dissertation presents a neural network interpretability framework for analyzing how each word in context influences such 'superhumanly accurate' next-word predictions, which is used to demonstrate the importance of collocational association underlying the predictive mechanism of large language models. Finally, this dissertation concludes by exploring how large language models might be used to study memory-based effects in human sentence processing, such as the processing cost incurred by retrieving previous words or reallocating attentional focus. The ability of attention-based predictors that operationalize this cost to predict measures of processing difficulty over and above surprisal demonstrates the potential of studying large language models from this different perspective. Taken together, these results indicate that as large language models become more accurate and powerful, they become less appropriate as expectation-based models of human sentence processing. Nonetheless, the shortcomings of large language models identified in this dissertation provide directions for refining such cognitive models. Additionally, the analyses conducted in this dissertation demonstrate the importance of studying the learning trajectories and the resulting representations of large language models to inform theories of human sentence processing. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page: http://bibliotheek.ehb.be:2222/en-US/products/dissertations/individuals.shtml.]
ProQuest LLC. 789 East Eisenhower Parkway, P.O. Box 1346, Ann Arbor, MI 48106. Tel: 800-521-0600; Web site: http://bibliotheek.ehb.be:2222/en-US/products/dissertations/individuals.shtml
Publication Type: Dissertations/Theses - Doctoral Dissertations
Education Level: N/A
Audience: N/A
Language: English
Sponsor: National Science Foundation (NSF)
Authoring Institution: N/A
Grant or Contract Numbers: 1816891