ERIC Number: ED519270
Record Type: Non-Journal
Publication Date: 2010
Pages: 180
Abstractor: As Provided
ISBN: ISBN-978-1-1240-7512-9
ISSN: N/A
EISSN: N/A
Features and Methods for Automatic Dialect Identification
Rojas, David Michael
ProQuest LLC, Ph.D. Dissertation, Indiana University
Systematic differences among regional U.S. English speech are recognizable to native speakers to varying degrees. This has been demonstrated by researchers in perceptual dialectology who ask listeners to match a speaker to his or her dialect region. Machines have also been able to identify the regional origin of a speaker to some degree, although attempts to this end have typically not been as successful as efforts to identify a speaker's language. Inspired by research in automatic language identification, musical artist classification, dialectometry, and pattern recognition more broadly, this thesis seeks to measure the degree to which a machine can accurately identify the region of origin of a speaker of U.S. English and to determine whether knowledge of speaker gender contributes positively to correct classification. We first examine the relationships between the distributional properties of phonemes and dialect, constructing a language model representing phone sequences to identify the dialect of a speaker. We next draw upon linguistic notions that vowels contribute more heavily than consonants to regional differences and that sub-phonemic acoustic variation and prosodic features provide information useful in automatic dialect identification (ADI). Our classification schemes are able to correctly label the dialect of a speaker at rates comparable to, if not slightly better than, naive native speaker listeners. To enrich our study, a dialectometric analysis is then conducted, again using phone sequences alone, from which high level dialect relationships emerge that reflect research findings from both perceptual dialectology as well as more traditional sociolinguistic methods. The dissertation contributes a substantiated methodology and proof of concept of an ADI framework that complements existing research in the areas of sociolinguistics and perceptual dialectology. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page: http://bibliotheek.ehb.be:2222/en-US/products/dissertations/individuals.shtml.]
Descriptors: Dialects, Sociolinguistics, Phonemes, Identification, Suprasegmentals, Pattern Recognition, Classification, Native Speakers, Geographic Regions, Musicians, Pronunciation, North American English, Gender Differences, Acoustics, Auditory Perception, Computational Linguistics, Artificial Intelligence
ProQuest LLC. 789 East Eisenhower Parkway, P.O. Box 1346, Ann Arbor, MI 48106. Tel: 800-521-0600; Web site: http://bibliotheek.ehb.be:2222/en-US/products/dissertations/individuals.shtml
Publication Type: Dissertations/Theses - Doctoral Dissertations
Education Level: N/A
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A
Grant or Contract Numbers: N/A