Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01vh53wv779
Title: Understanding Resource Usage and Performance in Wide-Area Distributed Systems
Authors: Kim, Wonho
Advisors: Pai, Vivek S
Contributors: Computer Science Department
Keywords: Computer Networking
Distributed Systems
Wide-Area Systems
Subjects: Computer science
Issue Date: 2012
Publisher: Princeton, NJ : Princeton University
Abstract: Many Internet services employ wide-area frameworks to deliver exponentially growing network traffic to end users with low response time. These systems typically leverage a large number of remote nodes at the edge of the Internet, which makes the systems difficult to develop and test. Therefore, federated testbeds are essential infrastructures for developing wide-area systems because they allow researchers to deploy new services under realistic network conditions. In this dissertation, we study resource usage in PlanetLab to understand and characterize user behavior in federated testbeds. We also present Lsync, a low-latency file transfer system for coordinating remote nodes in wide-area platforms, including testbeds. To support the development of new network services on a global scale, the next generation of federated testbeds are under active development, but very little is known about resource usage in these shared infrastructures. We conduct an extensive study of the usage profiles in PlanetLab that we collected for six years by running CoMon, a PlanetLab monitoring service. We examine various aspects of node-level behavior as well as experiment-centric behavior, and describe their implications for resource management in federated testbeds. We find that the usage is much different from shared compute clusters, that conventional wisdom does not hold for PlanetLab, and that several properties of PlanetLab as a network testbed are largely responsible for this difference. We also present a low-latency file transfer system, Lsync, that can be used as a synchronization building block for wide-area distributed systems where latency matters. While many distributed systems depend on fast data synchronization for coordinating remote nodes, current data dissemination systems focus on efficiency for open client populations, rather than focusing on completion latency for a known set of nodes. In examining this problem, we find that optimizing for latency produces strategies radically different from existing distribution tools, and can dramatically reduce latency across a wide range of scenarios. Lsync performs novel node selection, scheduling, and adaptive policy switching that dynamically chooses the best synchronization method using information available at runtime. Our evaluation results show that Lsync reduces latency by more than a factor of 14 compared to a widely used synchronization tool, and makes most remote nodes fully synchronized even under frequent file updates.
URI: http://arks.princeton.edu/ark:/88435/dsp01vh53wv779
Alternate format: The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog
Type of Material: Academic dissertations (Ph.D.)
Language: en
Appears in Collections:Computer Science

Files in This Item:
File Description SizeFormat 
Kim_princeton_0181D_10415.pdf1.3 MBAdobe PDFView/Download


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.