NotesFAQContact Us
Collection
Advanced
Search Tips
Back to results
ERIC Number: ED628484
Record Type: Non-Journal
Publication Date: 2022-Sep-28
Pages: 15
Abstractor: As Provided
ISBN: N/A
ISSN: N/A
EISSN: N/A
Automated Paragraph Detection Using Cohesion Network Analysis
Robert-Mihai Botarleanu; Mihai Dascalu; Scott Andrew Crossley; Danielle S. McNamara
Grantee Submission, Paper presented at the Conference on Smart Learning Ecosystems and Regional Development: Polyphonic Construction of Smart Learning Ecosystems (Bucharest, Romania, Sep 28, 2022)
The ability to express yourself concisely and coherently is a crucial skill, both for academic purposes and professional careers. An important aspect to consider in writing is an adequate segmentation of ideas, which in turn requires a proper understanding of where to place paragraph breaks. However, these decisions are often performed intuitively, with little systematicity in sequencing ideas. Thus, an automated method of detecting the optimal hierarchical structure of texts using quantifiable features could be a valuable tool for learners. Here, we aim to define a framework grounded in Cohesion Network Analysis to establish the structure of a text by modeling paragraphs as clusters of sentences. The analogy to clustering enables us to identify paragraph breaks that maximize inter-paragraph separation while ensuring high intra-paragraph cohesion. Our approach consists of two steps acted on texts without paragraph breaks. First, the number of paragraphs is automatically inferred with an absolute error of 1.02 using a Recurrent Neural Network, which relies on text features and cohesion flow. Second, paragraph splits are detected using two algorithms: "top k" which selects the largest cohesion gaps between adjacent utterances, and "divisive clustering" which iteratively splits the text into paragraphs. Silhouette scores are used to assess performance and the obtained values denote adequately inferred structures.
Publication Type: Speeches/Meeting Papers; Reports - Research
Education Level: N/A
Audience: N/A
Language: English
Sponsor: Institute of Education Sciences (ED); Office of Naval Research (ONR) (DOD)
Authoring Institution: N/A
IES Funded: Yes
Grant or Contract Numbers: R305A180144; R305A180261; N000141712300; N000142012623