On Finality and POW

In Kaspa we are implementing a finality window of ~16 hours. In this post I want to justify this decision and to clarify two prevailing misconceptions around finality: (i) that weak subjectivity is a liability specific to proof-of-stake protocols, and (ii) that proof-of-work inherently cannot have a finality feature. Before refuting these, some definitions of relevant concepts:

Terminology

Finality is the guarantee that a certain decision will not later change, e.g., that the identity of the leader at time t will not be reversed, or that the set of transactions accepted anytime before step t will not change. Bitcoin’s manually decided checkpoints is an example of a finality rule—a chain of blocks which forks Bitcoin’s chain and which does not include the checkpoint blocks would be ignored and discarded even if it is the longest or heaviest; in other words, the decision to include the checkpoint blocks in the main chain is finalized.

Probabilistic finality is a guarantee of the form “the decision will not be reversed with a probability of at least 1-\epsilon ”.

A self-stabilizing protocol is one which leads to agreement between all honest nodes regardless of the initial configuration. In particular, even if some catastrophic event such as a long network split happened, the protocol would still converge and reach consensus after some time.

Objectivity is the guarantee that if two nodes view the same set of blocks (or more generally: messages) they must also hold the same state; if there exist (potentially rare and peculiar) situations where this does not hold, the protocol is said to be subjective.

Finality Tradeoff

Roughly speaking, a proof-of-work consensus protocol with finality rules violates objectivity and the self-stabilizing condition, and on the other hand enjoys resilience to 51% attackers. Indeed, if for some catastrophic reason the network was split for a long timespan, the split would be reconciled once the network regains connectivity, and if the protocol does not enforce a finality rule, nodes will switch over to the longest chain (or heaviest branch of proof-of-work). However, once a finality window is defined, the network won’t be able to reconcile from splits longer than that window, as nodes would reject blocks behind their (subjective) finality window, even if they represent a larger portion of the proof-of-work. On the positive side, this very fact also implies that an attacker which gained hold of 51% of the hashrate won’t be able to reverse large chunks of the ledger, and the damage it may cause is scoped by the finality window.

As with many tradeoffs in consensus, this tradeoff is a direct result of the FLP result, which states that achieving fault tolerant agreement in an asynchronous system is impossible. It implies that we have to assume something (e.g.) about the synchrony of the system, and the protocol fails under extreme circumstances where our assumption no longer holds.

Eclipse Attacks

An important special case of a network split is an eclipse attack, whereby an attacker eclipses an individual node and separates it from the rest of the network. If during this attack the attacker feeds the victim with its own blocks, then the victim won’t be able to recover (without manual intervention) as blocks of the network will violate its (subjective) finality window. Especially vulnerable are new nodes that join the network and are synced by a malicious node which feeds them blocks in an order which causes them, at the end of the syncing process, to hold a different state than the rest of the network.

Note that as long as we are dealing with non-mining full nodes, the safest approach is probably to just turn on a flag, in case a non-negligible fraction of blocks violate finality, and let the node operator decide manually how to proceed. However, this is not an acceptable approach when dealing with the network of miners, as it will shut down the mining.

It is important to emphasize that, with proof-of-work, exploiting finality is very costly to the attacker. This is in contrast to proof-of-stake protocols which suffer from the “nothing-at-stake” problem.

The Case for Finality

The main argument in favour of finality is a social-consensus one: at a certain depth of a reorg, the community behind the corresponding blockchain is likely to refuse to accept it, and to manually intervene and reject the newly arrived heavier chain. Assuming the social consensus won’t accept deep reorgs, we might as well choose the easier path of setting the finality window explicitly, which simplifies many operations (such as pruning, syncing new nodes, etc.).

Our choice in Kaspa of 16 hours is inspired by Bitcoin’s coinbase maturity period, which is a waiting time of 100 blocks before a miner can spend its new coins, and which corresponds to – and, in fact, sets – the social consensus in Bitcoin around the depth of a reorg that will be accepted. Compare it to this poll where 59.2% voted for deviating from the Bitcoin protocol in face of a reorg of one year, and 47.8% would do so in face of a reorg post the maturity period.

Forgot to mention that this discussion refers to mining full-nodes. With non-mining full-nodes, it’s probably better to implement a flag which – in case a valid block causes a reorg deeper than the finality window – raises an alert and halts txn acceptance by the node until its operator manually intervenes. This mechanism allows more control to the users to decide on their action in case of such a split / huge attack