Skip navigation
Please use this identifier to cite or link to this item:
Title: Sequence Alignment and Parsimony Analysis for Constructing Language Phylogenies
Authors: Shen, Qinlan
Advisors: Fellbaum, Christiane
Department: Computer Science
Class Year: 2015
Abstract: The standard approach for determining whether languages are related is to manually detect regular sound correspondences in words between languages. This procedure, however, has been criticized for being subjective and does not take full advantage of available computational resources and techniques. To address these concerns, this thesis presents a method of extending sequence alignment techniques for constructing phylogenies in computational biology to language data by encoding phonetic transcriptions of words across different languages as three-character sequences reflecting the phonetic features of sounds. In addition, we propose a technique for using parsimony to measure the accuracy of generated trees against well-attested linguistic subgroups. Additional experiments were run to see whether the mean column score from an alignment could be correlated with tree parsimony to see if alignment scores could be used to estimate the accuracy of a generated phylogeny. Preliminary experiments on applying our alignment-based phylogeny construction method to the Indo-European and Austronesian language families suggest that an alignment-based method for constructing language phylogenies can accurately reconstruct well-known subgroups for Indo-European but performs poorly on Austronesian languages. Preliminary results also suggest that mean column score is not correlated with tree parsimony, making mean column score a poor estimator for the accuracy of a generated language tree.
Extent: 59 pages
Type of Material: Princeton University Senior Theses
Language: en_US
Appears in Collections:Computer Science, 1988-2017

Files in This Item:
File SizeFormat 
PUTheses2015-Shen_Qinlan.pdf1.83 MBAdobe PDF    Request a copy

Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.