ERIC Number: EJ1420889
Record Type: Journal
Publication Date: 2024-Apr
Pages: 21
Abstractor: As Provided
ISBN: N/A
ISSN: ISSN-0267-6583
EISSN: EISSN-1477-0326
The CELI Corpus: Design and Linguistic Annotation of a New Online Learner Corpus
Second Language Research, v40 n2 p457-477 2024
This article introduces the CELI corpus, a new learner corpus of written Italian consisting of ca. 600,000 tokens, evenly distributed among CEFR (Common European Framework of Reference for Languages) proficiency levels B1, B2, C1 and C2. The collected texts derive from the language certification exams administered by the University for Foreigners of Perugia all around the world. The corpus contains rich metadata pertaining to text-related and learner-related variables. It expands the domain of learner corpora by being, among other things, both freely available online to the research community, and by focusing on a target language other than English. The article also presents and evaluates the POS-tagging procedure, thus contributing to best practices in learner corpus annotation.
Descriptors: Computational Linguistics, Second Language Instruction, Second Language Learning, Best Practices, Written Language, Italian, Foreign Students, Foreign Countries, Language Tests, Language Proficiency, Guidelines, Rating Scales, Certification, College Students, Student Characteristics, Metadata, Information Retrieval, Databases
SAGE Publications. 2455 Teller Road, Thousand Oaks, CA 91320. Tel: 800-818-7243; Tel: 805-499-9774; Fax: 800-583-2665; e-mail: journals@sagepub.com; Web site: https://bibliotheek.ehb.be:2993
Publication Type: Journal Articles; Information Analyses; Reports - Descriptive
Education Level: Higher Education; Postsecondary Education
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A
Identifiers - Location: Italy
Grant or Contract Numbers: N/A