Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01p2676z68q
DC FieldValueLanguage
dc.contributor.authorNgo, Quang Minh Khiem
dc.contributor.otherComputer Science Department
dc.date.accessioned2021-10-04T13:49:10Z-
dc.date.available2021-10-04T13:49:10Z-
dc.date.created2021-01-01
dc.date.issued2021
dc.identifier.urihttp://arks.princeton.edu/ark:/88435/dsp01p2676z68q-
dc.description.abstractReplicated state machines (RSMs) are linearizable, fault-tolerant groups of replicas that are coordinated using a consensus algorithm. RSMs are used throughout large-scale systems, such as distributed databases, cloud storage, and service managers. At such scale, it is common for some machines to be slow. In this dissertation, we address the problem of developing slowdown-tolerant RSMs that continue to operate as fast RSMs despite the presence of slow replicas. We define s-slowdown-tolerance and identify why existing consensus protocols are not slowdown-tolerant. As the first and pragmatic step toward slowdown-tolerant RSMs, we design, implement, and evaluate two 1-slowdown-tolerant consensus protocols, Copilot and Latent Copilot.Copilot replication is the first 1-slowdown-tolerant consensus protocol: it delivers normal latency despite the slowdown of any 1 replica. Copilot uses two distinguished replicas—the pilot and copilot—to proactively add redundancy to all stages of processing a client’s command. Copilot uses dependencies and deduplication to resolve potentially differing orderings proposed by the pilots. To avoid dependencies leading to either pilot being able to slow down the group, Copilot uses fast takeovers that allow a fast pilot to complete the ongoing work of a slow pilot. Copilot includes two optimizations—pingpong batching and null dependency elimination—that improve its performance when there are 0 and 1 slow pilots respectively. Our evaluation of Copilot shows its performance is lower but competitive with Multi-Paxos and EPaxos when no replicas are slow. When a replica is slow, Copilot is the only protocol that avoids high latencies. Latent Copilot, a variant of Copilot, is another design and implementation of a 1-slowdown-tolerant consensus protocol. Latent Copilot operates with one active pilot, which actively proposes commands, and one latent pilot, which proposes commands only when necessary. In this way, Latent Copilot achieves an intermediate tradeoff between Multi-Paxos and Copilot in terms of throughput and slowdown tolerance. Our evaluation of Latent Copilot shows that it achieves 1-slowdown tolerance and that its performance with no slowdowns is comparable to Multi-Paxos. Finally, we generalize the design of Copilot replication to handle more than one slowdown. We provide the design of an s-slowdown-tolerant protocol that achieves slowdown tolerance despite s slow replicas.
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.publisherPrinceton, NJ : Princeton University
dc.relation.isformatofThe Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: <a href=http://catalog.princeton.edu>catalog.princeton.edu</a>
dc.subject.classificationComputer science
dc.titleTOLERATING SLOWDOWNS IN REPLICATED STATE MACHINES