ERIC Number: EJ1109580
Record Type: Journal
Publication Date: 2016-Sep
Pages: 24
Abstractor: As Provided
ISBN: N/A
ISSN: ISSN-1360-2357
EISSN: N/A
A Study of Readability of Texts in Bangla through Machine Learning Approaches
Sinha, Manjira; Basu, Anupam
Education and Information Technologies, v21 n5 p1071-1094 Sep 2016
In this work, we have investigated text readability in Bangla language. Text readability is an indicator of the suitability of a given document with respect to a target reader group. Therefore, text readability has huge impact on educational content preparation. The advances in the field of natural language processing have enabled the automatic identification of reading difficulty of texts and contributed in the design and development of suitable educational materials. In spite of the fact that, Bangla is one of the major languages in India and the official language of Bangladesh, the research of text readability in Bangla is still in its nascent stage. In this paper, we have presented computational models to determine the readability of Bangla text documents based on syntactic properties. Since Bangla is a digital resource poor language, therefore, we were required to develop a novel dataset suitable for automatic identification of text properties. Our initial experiments have shown that existing English readability metrics are inapplicable for Bangla. Accordingly, we have proceeded towards new models for analyzing text readability in Bangla. We have considered language specific syntactic features of Bangla text in this work. We have identified major structural contributors responsible for text comprehensibility and subsequently developed readability models for Bangla texts. We have used different machine-learning methods such as regression, support vector machines (SVM) and support vector regression (SVR) to achieve our aim. The performance of the individual models has been compared against one another. We have conducted detailed user survey for data preparation, identification of important structural parameters of texts and validation of our proposed models. The work posses further implications in the field of educational research and in matching text to readers.
Descriptors: Indo European Languages, Readability, Foreign Countries, Computation, Models, Syntax, Artificial Intelligence, Regression (Statistics), Surveys, Users (Information)
Springer. 233 Spring Street, New York, NY 10013. Tel: 800-777-4643; Tel: 212-460-1500; Fax: 212-348-4505; e-mail: service-ny@springer.com; Web site: http://bibliotheek.ehb.be:2189
Publication Type: Journal Articles; Reports - Research
Education Level: N/A
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A
Identifiers - Location: Bangladesh; India
Grant or Contract Numbers: N/A