Strong Consistency: A more detailed view about the consistency model

Just what is Strong Consistency?

8 min readMay 4, 2021

In many system design papers, the authors tend to bring up the term “strong consistency”. While some of them are explicit about what the term entails, typically Linearizability, the term itself actually refers to a category of consistency models. It sits on the other end of the consistency spectrum as eventual consistency, which itself is also a broad category. However, these two categories are not dichotomic, in that there are many consistency models sitting in between, or simply having a different set of guarantees from these two. In this article, I wish to discuss the term strong consistency, and what it typically entails.

Why Consistency

Statefulness

A system is said to be stateful if it has to maintain a certain context on which the future execution would depend on. A banking system is stateful, where a user’s balance has to be saved and each withdrawal would result in a new state of the system and would determine if the next withdrawal is successful. For example, once user A with a balance of $500 has withdrawn $300, he must not be able to withdraw another $300.

Replication

In modern computing, software systems usually perform Horizontal Scaling, a concept whereby the software consists of many running instances serving as one single service (a different approach is Vertical Scaling where software runs on better hardware).

This type of scaling, known also as replication, brings about an inconsistent data view in a stateful system if implemented in a naive way. This is because, with replication, a user request that changes the state of the system is directed to one of the instances and is not reflected on another instance; the same user reading from the latter would get a stale record of data, opposite of what the first instance promised.

Intuitively, we wish to see our own write that is accepted by the system. A simple approach to ensuring this is to make sure the data write is propagated to all the instances (replicas) and are viewable before replying to the user that the write is successful. This comes with a great cost in terms of latency as the processes have to wait for all the processes to acknowledge the data write, sometimes way worse when a small subset of the instances experiences a long delay.

Therefore, many software adopts a more relaxed consistency model, e.g. causal consistency, eventual consistency, mainly for performance reasons. However, this cannot be done in some software whose nature is to provide a consistent view throughout the software lifecycle. For example, a MySQL cluster consisting of multiple database instances still has to guarantee ACID property, where the C stands for consistency. Some categories of use cases also mandate a consistent view of the data saved at any point in time, e.g. banking system and payment service.

Therefore, some software is designed with a certain consistency model to suit the need of some use cases.

From here on, we refer to a software instance as a process (as it typically should be).

What is Strong Consistency?

Strong Consistency is an all-encompassing term, generally referring to a protocol in which all accesses are seen by all parallel processes[1]. However, there has not been a commonly agreed-upon model of strong consistency in the industry. For each software that claims to provide strong consistency, the vendor typically should define the type of strong consistency in the implementation, which consists of a set of guarantees[2][3].

Despite many vendor-defined “strong consistency” models present in the industry, there are still academically agreed upon building blocks for describing how strong is strong enough. We will discuss a few of the classic ones below.

Sequential Consistency

Sequential consistency is one of the oldest formally defined consistency models by Leslie Lamport back in 1979. In his paper[4], sequential consistency is described as follows:

… the result of any execution is the same as if the operations of all the processors were executed in some sequential order, and the operations of each individual processor appear in this sequence in the order specified by its program.

The implications of this definition includes:

For a given sequence of executions in a system with a set of processes, you can rearrange them into some sequence that looks like the processes execute one after another
No reordering of executions within the same process, e.g. a stack pop cannot be reordered to before the corresponding push
The reordering does not have any restriction between processes

The third implication is what causes problems sometimes when some executions that depend on other executions performed on other processes earlier in real-time can be reordered before the latter. Trivially, all the data reads can be reordered before any data writes and returns the default value while fulfilling Sequential Consistency. This, more often seen in distributed systems setting, would result in incorrect results from the executions.

Linearizability

Linearizability is said to be the strongest single object consistency model so far. The definition is as followed[5]:

A history σ is linearizable if there is a linear order of the completed operations such that:

For every completed operation in σ, the operation returns the same result in the execution as the operation would return if every operation was completed one by one in order σ.
If an operation op1 completes (gets a response) before op2 begins (invokes), then op1 precedes op1 in σ.

This might look arcane. In a more understandable way, it first breaks down an operation into the invocation and the response and says that a sequence of executions from multiple processes could be reordered in a sequential history of their invocations and responses, in other words as if only one process is executing them and that the relative order of the invocations and responses in the original sequence must be preserved.

The last part sets it apart from sequential consistency, whereby the real-time ordering of the executions across different processes is preserved in linearizability, not sequential consistency. As such, linearizability is more commonly used as the strongest consistency model.

Serializability

As opposed to the single-object consistency models mentioned so far, Serializability is applicable to operations on multiple objects, typically in the form of transactions. A transaction involves multiple atomic operations on a set of objects by a single process. And Serializability suggests that the net effect of the execution of the transactions across all processes is the same as some sequential execution of the transactions.

This means that the individual operations within a transaction could be reordered as long as the end result could have happened by the guarantee above. This is enough to fulfill Isolation of the ACID property in most database systems, which requires each execution completes as if it is executed alone.

However, much like sequential consistency, the original execution order of the transactions is not preserved in the context of concurrent execution under Serializability. Even by the restriction imposed, reordering such that all reads happening before all write and thus returning initial values could still happen, as such reordering still forms a sequential execution of the transactions.

Strict Serializability

Strict Serializability is the strictest multi-object consistency model that mixes in the strictness from Linearizability. In short, Strict Serializability treats the whole multi-object system as the object and applies Linearizability to it.

In detail, Strict Serializability poses one more restriction than Serializability: that if transaction X ends before transaction Y starts, it does so in the reordered execution. Here, we are talking about two transactions that have a serial order in the original execution and do not overlap at any point in time, i.e. not concurrent. For concurrent transactions, i.e. any two transactions among which anyone does not start before the other ends, it has the same guarantee as Serializability.

What we have seen so far

Sequential Consistency was first invented by the pioneer in Distributed Systems field Leslie Lamport, followed by a stricter model of Linearizability which is the de facto strongest single-object consistency model. “single-object” was mentioned many times as multi-object consistency model demands a different stream of models where Strict Serializability is instead the strongest while having different guarantees as Linearizability.

Serializability is used in a more granular setting when multi-object operations are performed. It does not bring about a strictly consistent view of the system at all times, but guarantees a consistent view of the data at the consistent points of the system, typically when no transactions are ongoing. A stricter model Strict Serializability preserves the original ordering of any two transactions (before, after, or concurrent). It is the strictest and commonly used, although running with the name “Serializability” as a database system often runs in a single machine.

When we are running a database instance on a single machine where the inter-process communication is rather reliable and the machine clock is consistent on the processes (depending on the application runtime still), Sequential Consistency (for a single object) or Serializability (for a multi-object transaction) would suffice; however, in a distributed systems setting, Linearizability or Strict Serializability might provide a stronger guarantee that can guard against many odd cases caused by network delay and node failure, however at a much greater cost in terms of system complexity, development, and maintenance.

Lesson Learnt

Knowing the specific standards of strong consistency helps us understand how strong a consistency model is considered strong in the distributed setting. When designing a distributed system with data persistence, we should know what kind of guarantees to go for in order to make the system give a strongly consistent view of data to the user.

The consistency model also serves as a language to communicate concisely the kind of consistency guarantees a software system offers. When claiming that a certain storage service offers Linearizability, we know that the objects will be treated sequentially in the order of the user input regardless of the actual implementation details.

In practice, Strict Serializability is a lot harder to implement in a distributed setting compared to Serializability, where the former requires Consistent Read that is hard to synchronize due to various reasons such as skewed clock, lossy network. One should factor in the complexity of implementing Strict Serializability and the benefit of the real-time ordering it grants on top of Serializability, and also the overhead from Consistent Read with synchronization.

All in all, knowing the upper bound of the requirement for strong consistency lets us know what guarantees are enough for building the most consistent system.