ERIC Number: ED578116
Record Type: Non-Journal
Publication Date: 2017
Pages: 200
Abstractor: As Provided
ISBN: 978-0-3553-0265-3
ISSN: EISSN-
EISSN: N/A
Frequency and Proximity Clustering Analyses for Georeferencing Toponyms and Points-of-Interest Names from a Travel Journal
McDermott, Scott D.
ProQuest LLC, Ph.D. Dissertation, George Mason University
This research study uses geographic information retrieval (GIR) to georeference toponyms and points-of-interest (POI) names from a travel journal. Travel journals are an ideal data source with which to conduct this study because they are significant accounts specific to the author's experience, and contain geographic instances based on the experiences made at a specific time and location along a traversed route of a trip. Using a travel journal, toponyms and POI names are georeferenced to locate where the author visited or what the author observed along a travel path. GIR relies on algorithms to maximize the georeferencing of spatially sensitive data while minimizing issues related to semantic ambiguities, which can incorrectly place geographic content due to shared names by other geographic or non-geographic contents. Frequency analysis and proximity clustering are used to minimize semantic ambiguities and georeference the toponyms and POI names to their correct locations. Frequency analysis identifies the primary and adjacent state names for each chapter of the travel journal, which act as containers for the subsequent toponyms and POI names. Proximity clustering groups the toponyms and POI names based on the distance to the cluster group's centroid. A cluster group with a significant number of toponyms and POI names contains the placenames that are more relevant to the travel journal. The use of frequency and proximity clustering analyses narrows the geographic scope to select states and identify the toponyms and POI names that exist along the travel path. The reliability measurements for this dissertation yield a precision rate of 88 percent and a recall rate of 30 percent. The precision rate is comparable to similar peer-reviewed studies and shows that this dissertation can assist in the GIR process. Obstacles and issues in this research study include name matching errors between the travel journal, geoparser, and gazetteers; temporal disassociations between the time the journal was written and the time this dissertation was conducted; omissions of POI names from the gazetteers; and incorrect tagging by the geoparser. Future studies are needed to provide better name matching between the travel journal, geoparser, and gazetteers and on managing POI names to become integral to the GIR process. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page: http://bibliotheek.ehb.be:2222/en-US/products/dissertations/individuals.shtml.]
Descriptors: Travel, Diaries, Proximity, Geographic Information Systems, Information Retrieval, Geographic Location, Computer Software, Ambiguity (Semantics), Multivariate Analysis, Identification
ProQuest LLC. 789 East Eisenhower Parkway, P.O. Box 1346, Ann Arbor, MI 48106. Tel: 800-521-0600; Web site: http://bibliotheek.ehb.be:2222/en-US/products/dissertations/individuals.shtml
Publication Type: Dissertations/Theses - Doctoral Dissertations
Education Level: N/A
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A
Grant or Contract Numbers: N/A