Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01mc87pq32q
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorFreedman, Michael J.en_US
dc.contributor.advisorTarjan, Robert E.en_US
dc.contributor.authorSen, Siddharthaen_US
dc.contributor.otherComputer Science Departmenten_US
dc.date.accessioned2013-05-21T13:34:07Z-
dc.date.available2013-05-21T13:34:07Z-
dc.date.issued2013en_US
dc.identifier.urihttp://arks.princeton.edu/ark:/88435/dsp01mc87pq32q-
dc.description.abstractUsers of Internet services are increasingly intolerant of delays and outages, while demanding a consistent online experience. A website that is down or misbehaving is reported within seconds, often with an embarrassing screenshot that spreads through the news like wildfire. Among these failures, the most notorious are the ones that manifest arbitrary behavior, such as returning the wrong content to users or accidentally deleting their data. Unfortunately, protecting against such failures---whether due to misconfigurations, bugs, or even malice---is prohibitively expensive, because most existing solutions do not scale beyond a single server's performance. As a result, these solutions are not used for customer-facing services, where scalability is required to cope with large user populations. This thesis describes new systems and algorithms for tolerating arbitrary failures in Internet services, inspired by real-world debacles. Unlike prior work, our solutions are highly scalable. Our approach integrates theoretical innovations into the later stages of system design, giving robust guarantees that are also practical. We begin with a real failure that occurred in the indexing technique used by a certain database provider, and explain theoretically why the technique failed. We remedy the technique by introducing a new class of tree data structures, called relaxed trees, with provably good properties. Our analysis of relaxed trees makes use of exponential potential functions. Then, we describe a general system for tolerating arbitrary failures, called Prophecy, that delivers scalable performance on read-mostly workloads. With a modest trust assumption, Prophecy is practical for modern Internet services, as our evaluation confirms. Finally, we devise two techniques to scale this fault tolerance to very large-scale systems and general workloads. The first is an algorithm for securely composing many small replica groups, subject to an adversary that can coordinate faulty nodes across the groups dynamically. The second is a technique for improving the fault tolerance within each replica group, by adding small, trusted broadcast channels that mitigate the impact of faulty nodes.en_US
dc.language.isoenen_US
dc.publisherPrinceton, NJ : Princeton Universityen_US
dc.relation.isformatofThe Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the <a href=http://catalog.princeton.edu> library's main catalog </a>en_US
dc.subjectbalanced treesen_US
dc.subjectByzantine fault toleranceen_US
dc.subjectdatabase access methodsen_US
dc.subjectexpander graphsen_US
dc.subjectjoin-leave attacksen_US
dc.subjectpartial broadcasten_US
dc.subject.classificationComputer scienceen_US
dc.subject.classificationApplied mathematicsen_US
dc.titleNew Systems and Algorithms for Scalable Fault Toleranceen_US
dc.typeAcademic dissertations (Ph.D.)en_US
pu.projectgrantnumber690-2143en_US
Appears in Collections:Computer Science

Files in This Item:
File Description SizeFormat 
Sen_princeton_0181D_10607.pdf1.33 MBAdobe PDFView/Download


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.