May 09, 2020

That's interesting because the recent arxiv paper[1] by Heidi Howard indicated that Raft just has a clearer presentation rather than some technical superiority of being simpler (discussion here [2]). So perhaps people who believe in Paxos need to come up with similarly easy explanations for the community that learn from the presentation that Raft focused on?

[1] https://arxiv.org/abs/2004.05074 [2] https://news.ycombinator.com/item?id=22994420

May 09, 2020

I'll have to admit that over time I've ended up with mixed feelings about this paper. This is mainly due to people reading this paper without knowing much about consensus and drawing conclusions like "Raft is better than Paxos" or "Raft is the best consensus algorithm" though. Some thoughts (please elaborate if you think I'm simplifying it too much or if you disagree with me!):

First of all remember that Paxos is a family of protocols for solving consensus. When doing research it's useful to reduce a problem into smaller and smaller parts. The "standard" Paxos algorithm is a very simple consensus algorithm which can only decide a single value once. It's not practical at all, but provides a good framework for thinking about consensus.

When this article proposes "Raft vs Paxos" they are actually comparing Raft against a standard way of configuring Paxos with a leader (MultiPaxos). Note that MultiPaxos allows a lot of nuances in the implementations while still being called "MultiPaxos". MultiPaxos is not a spec you implement; it's a set of ideas.

Raft on the other hand is a concrete protocol with well-defined, specified behavior. In fact, Raft is essentially an implementation of MultiPaxos[1]. This is a very good thing! Paxos provides a framework for thinking about consensus, while Raft puts some of these ideas into a concrete specification which is easy to implement. And it is a good point that we should make the knowledge in the field of consensus available for a wider audience. Yay, Raft is good!

And here comes the problem: A lot of people have read the Raft paper and made the conclusion that "Raft is the best way of solving consensus". Raft is (relatively) easy to implement and get started with and gives you a very simple model to program for (a log of commands), but it's far from a panacea.

The most important thing to know about Raft is that it's not performant (every command has to be sent to a single leader which becomes a bottleneck) nor scalable (every command needs to be processed by all nodes). Etcd supports "1000s of writes" and recommends up to "7 nodes".

This doesn't mean that Raft is bad; it's just a trade-off you need to be aware of. Simplicity vs performance. If you're integrating Raft into your stack and aim for scalability/performance you must always be very weary of when you use it. You should minimize writes at all costs. Unfortunately many developers gets the impression that you can just plug Raft into an existing system and suddenly have a performant and scalable distributed system.

A good example is CockrouchDB: They're using plain Raft for writes, but uses "leader leases" for scaling reads. Suddenly things become a lot more complicated (for instance see this issue about how leader leases are implemented in the Go library for Raft: https://github.com/hashicorp/raft/issues/108). I'm sorry, but you're going to have to get your hands dirty if you want something that's both fast and correct.

The end result is that you have two choices: (1) You can use a library which provides a simple model (a log of commands), but doesn't scale well or (2) you can use a more complicated consensus algorithm and then deal with all of the Hard Problems™ that comes with it. If you're going for the second option, you might as well take advantage of all of the research discovered in the last few years (see https://vadosware.io/post/paxosmon-gotta-concensus-them-all/)

It should also be noted that even though the consensus algorithm doesn't scale, it doesn't mean your system can't scale. Scalog (https://www.usenix.org/system/files/nsdi20-paper-ding.pdf) is an example of a system which uses a consensus algorithm in a constant way (i.e. regardless of load of the system). Once again: Focus on how you can avoiding using a consensus algorithm due to the way your system works.

TL;DR: Hard problems are still hard. Don't think that Raft is a magical piece of software which solves all of your problems.

[1]: There are some nuances between Raft and "standard" MultiPaxos as mentioned in https://arxiv.org/abs/2004.05074. I would still consider Raft to be in the same class as MultiPaxos (compared to other solutions of consensus).