Neural Recall Network: A Neural Network Solution to Low Recall Problem in Regex-Based Qualitative Coding.

Cai, Zhiqiang; Marquart, Cody; Shaffer, David W.

Notes FAQ Contact Us

Back to results

Peer reviewed
PDF on ERIC

Download full text

ERIC Number: ED624125

Record Type: Non-Journal

Publication Date: 2022

Pages: 11

Abstractor: As Provided

ISBN: N/A

ISSN: N/A

EISSN: N/A

Available Date: N/A

Neural Recall Network: A Neural Network Solution to Low Recall Problem in Regex-Based Qualitative Coding

Cai, Zhiqiang; Marquart, Cody; Shaffer, David W.

International Educational Data Mining Society, Paper presented at the International Conference on Educational Data Mining (EDM) (15th, Durham, United Kingdom, Jul 24-27, 2022)

Regular expression (regex) coding has advantages for text analysis. Humans are often able to quickly construct intelligible coding rules with high precision. That is, researchers can identify words and word patterns that correctly classify examples of a particular concept. And, it is often easy to identify false positives and improve the regex classifier so that the positive items are accurately captured. However, ensuring that a regex list is complete is a bigger challenge, because the concepts to be identified in data are often sparsely distributed, which makes it difficult to identify examples of "false negatives." For this reason, regex-based classifiers suffer by having low recall. That is, it often misses items that should be classified as positive. In this paper, we provide a neural network solution to this problem by identifying a "negative reversion set," in which false negative items occur much more frequently than in the data set as a whole. Thus, the regex classifier can be more quickly improved by adding missing regexes based on the false negatives found from the negative reversion set. This study used an existing data set collected from a simulation-based learning environment for which researchers had previously defined six codes and developed classifiers with validated regex lists. We randomly constructed incomplete (partial) regex lists and used neural network models to identify negative reversion sets in which the frequency of false negatives increased from a range of 3%-8% in the full data set to a range of 12%-52% in the negative reversion set. Based on this finding, we propose an interactive coding mechanism in which human-developed regex classifiers provide input for training machine learning algorithms and machine learning algorithms "smartly" select highly suspected false negative items for human to more quickly develop regex classifiers. [For the full proceedings, see ED623995.]

Descriptors: Coding, Classification, Artificial Intelligence, Engineering Education, Design, Models, Error Patterns

International Educational Data Mining Society. e-mail: admin@educationaldatamining.org; Web site: https://educationaldatamining.org/conferences/

Publication Type: Speeches/Meeting Papers; Reports - Research

Education Level: N/A

Audience: N/A

Language: English

Sponsor: National Science Foundation (NSF)

Authoring Institution: N/A

Grant or Contract Numbers: DRL1713110; DRL2100320; LDI1934745

Author Affiliations: N/A