Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01q237hv27r
 Title: Sequence Alignment and Parsimony Analysis for Constructing Language Phylogenies Authors: Shen, Qinlan Advisors: Fellbaum, Christiane Department: Computer Science Class Year: 2015 Abstract: The standard approach for determining whether languages are related is to manually detect regular sound correspondences in words between languages. This procedure, however, has been criticized for being subjective and does not take full advantage of available computational resources and techniques. To address these concerns, this thesis presents a method of extending sequence alignment techniques for constructing phylogenies in computational biology to language data by encoding phonetic transcriptions of words across different languages as three-character sequences reflecting the phonetic features of sounds. In addition, we propose a technique for using parsimony to measure the accuracy of generated trees against well-attested linguistic subgroups. Additional experiments were run to see whether the mean column score from an alignment could be correlated with tree parsimony to see if alignment scores could be used to estimate the accuracy of a generated phylogeny. Preliminary experiments on applying our alignment-based phylogeny construction method to the Indo-European and Austronesian language families suggest that an alignment-based method for constructing language phylogenies can accurately reconstruct well-known subgroups for Indo-European but performs poorly on Austronesian languages. Preliminary results also suggest that mean column score is not correlated with tree parsimony, making mean column score a poor estimator for the accuracy of a generated language tree. Extent: 59 pages URI: http://arks.princeton.edu/ark:/88435/dsp01q237hv27r Type of Material: Princeton University Senior Theses Language: en_US Appears in Collections: Computer Science, 1988-2016

Files in This Item:
File SizeFormat