Atomic Composability and other considerations for L1/L2 support

hashdag · November 12, 2024, 11:32pm

This post is based on extensive conversations and brainstorming with Sutton, Roma, and Ilia@Starkware.

Background

We are putting efforts into designing Kaspa’s L1 support for L2. Our design follows the principles of zk-based rollups (ZK RUs): all smart contract activity is recorded in L1 as payloads (blobs in Eth jargon), and the state of all logic zones (smart contracts or rollups, I use them deliberately interchangeably, for reasons I’ll explain later or in a separate post) is committed to in block headers.

The term rollups originally required on-chain sequencing, but nowadays on-chain sequenced rollups became the exception, not the rule, and are referred to as based rollups. To anchor a new state of a certain logic zone to the base layer, after one or multiple state transitions, some prover node must send to the base layer a transaction that provides a ZKP. The latter is verified in the base consensus through a new op_code, and the (commitment to the) state of the logic zone is thence updated. Crucially, the proofs may be submitted with non-negligible delays, notwithstanding users can get instant (~100 ms) confirmation by any node following this logic zone, executing its (L1-sequenced) transactions, and parsing its state. In short: proof latency does not affect finality latency; it affects the state-sync time of new L2 nodes, the L1-pruning interval, and the computational load of cross-logic zone transactions.

Atomic Composability

Sync vs Async Composability

The best smart contract layers are designed for atomic or sync composability, where a smart contract can allow functions from other smart contracts to call it and interact with its variables (read and write) on the spot, atomically, in the scope of the triggering transactions. This cross-smart contract functionality describes the on-chain activity during Ethereum’s early years and arguably facilitated the incredible growth of its ecosystem and dev community. Unfortunately, Ethereum’s rollup-centric roadmap is working against this atomic composability feature and settles for async composability (which is still much better than TradFi UX, which requires manual composing).

In async composability, smart contracts can still interact with and send messages to one another, yet this is done through some layer that carries or bridges these messages – typically the base layer – which suffers from some latency. Consequently, read/write actions are not treated in the scope of the originating transaction, atomicity is not guaranteed, and the effect of a composable transaction (the contracts’ variables and the issuer’s account post the transaction) cannot be guaranteed in advance.

Note: The unavoidable lack of predictability due to multiple users acting simultaneously is a separate issue; transaction issuers can enforce behaviour through several techniques, e.g., slippage specification, explicit conditions in the transaction, or intent specification. The issue with async composability is that the effect of only parts of the transaction can be fully enforced.

Based rollups are atomically composable

There are arguments for why async composability is good enough, but I will not get into them here. Let’s just assume we don’t want to settle on that. The good news is that we don’t need to since we are going full zk-based mode: Since all data is on-chain, the state of each logic zone is fully reconstructable from on-chain data (until the pruning point). Consequently, the effect of a multi-logic zone transaction occurs at its very sequencing, with no latency, and conditions on its effect across all logic zones can be simultaneously enforced. Contrast this dynamic with the non-based rollup-centric Ethereum, wherein the semi-off-chain sequencing of logic zone I transactions may be inaccessible to L1 (and to logic zone II provers), hence not provable to it. I reiterate that the sync atomicity of transactions is irrespective of prover latency: Proof arrival frequency does not affect confirmation times in L1 or L2.

Logic zones interactions and dependencies

Now, consider a composable transaction txn that not only acts on two logic zones but also triggers an interaction between them; e.g., the transaction calls a function inside logic zone I and uses the output of this interaction as an argument for a function call to logic zone II. Observe that to verify the correct execution of logic zone II, the base layer must see a proof of the correct state transition of logic zone II and that of logic zone I, since the output or intermediate state of the latter served as input to the state transition of the former. Similarly, the operators (read: provers) of logic zone II that wish to follow and parse their state must follow and execute logic zone I as well.

This dependency seems problematic at first sight—if the existence of composable transactions implies that all provers need to execute all transactions of all logic zones, then the architecture supposedly collapses back to one grand logic zone that suffers from the same scalability hindrances as a computation-oriented L1—each txn consumes the same computation load that serves all other txns. But this is not really the case because:

Executing transactions of other logic zones needs to occur only when logic zones interact.
Logic zones can define permissions (in their program code) for specific logic zones to interact with them in sync mode and require other logic zones to interact in async mode through the base layer’s messaging protocol.
Transaction execution is cheaper than proof generation by 2 or 3 orders of magnitude. Observe that provers of logic zone II need to execute but not to prove the (intermediate) state of logic zone I, which is the computationally intense part.
Running a prover needs to be permissionless but not necessarily decentralized in the sense of optimizing for commodity hardware being able to run system-wide provers.

Minimizing cross logic-zone dependencies

These considerations imply that an ideal ecosystem would minimize the scope of logic zones, which as a byproduct would minimize cross-logic zone dependencies as well as the implications (e.g., the aforementioned execution burden) of dependencies. I strongly encourage L2/SC projects building on Kaspa to follow this design principle and avoid aggregating many separable logic zones (smart contracts) under one overarching logic zone (rollup).

Latency in async composability mode

It is important to note that when logic zones are not supporting atomic composability and instead use async composability through L1’s messaging feature, they suffer the latency of provers and not (only) the latency of the base layer. Thus, even when Kaspa implements 10 (Sutton, read: 100) BPS, if the prover of logic zone I provides a proof every 10 minutes, then that is the latency that async-composable transactions would suffer; and for many applications, 10 minutes = infinity (which is why Bitcoin can’t realistically serve as a base layer for zk-based rollups). This is why I think we should insist on atomic composability.

Ensuring State Commitments

A final comment on this construction: Recall that the composable txn described above forces provers of logic zone II to execute the state of logic zone I after txn’s first part. Now, a proof by the former operators can appear on-chain only after the latter submitted one; let’s denote this state by state_I_pre. However, logic zone I provers might submit a proof that batches a series of state transitions of I, of which state_I_pre is only an intermediate member. To allow II’s provers to build their proof utilizing the (chunk) proof of I, we must ensure L1 has access to the intermediate states that have been proven by I’s provers. In other words, we need all proofs to commit to all intermediate states in an accumulator (e.g., Merkle tree), and then II’s provers can add a witness to that commitment alongside the proof of their execution.

Reflection

At the risk of overselling the feature discussed in this post, I think that the construction I’m proposing here extracts the best of all architectures: Bitcoin, Ethereum (rollup-centric roadmap), Solana—an internet-speed version of Nakamoto base layer (verification-oriented), a zk-based computation layer, and a Solana-like unified defragmented state.

L2 throughput regulation and transaction fee market

Controlling L2 Throughput

How to control the throughput of L2? Let’s denominate computation with the familiar “gas” unit. How should the gas limit be enforced? Since L1’s sequencers—miners, in Kaspa—are the only entity capable of selecting and prioritizing transactions entering the system, the gas regulation mechanism too must be employed at the L1 layer. The most simplistic design is to convert gas units to mass units, and since the latter are capped per block, so will gas per block be. Such a unidimensional restriction pools together resources of different natures—L1 mass (script computation and storage) and L2 proving loads—and this is economically inefficient: It implies, for instance, that a user issuing a gas-heavy, storage-light transaction may be outcompeted by users issuing storage-heavy transactions, despite the fact that she imposes no externality on them and can be co-approved without consuming the same resource. We should thus keep the mass and gas throughput constraints decoupled, namely, to provide a two-dimensional throughput restriction on blocks in the form: mass(block) < mass_limit AND gas(block) < gas_limit. This proposal implies miners will face a two-dimensional knapsack problem when selecting transactions from the mempool.

Note: The very same discussion applies to our coupling of L1 script computation mass and KIP-9 storage mass. We opted to couple them nonetheless under the same mass function and accept the theoretical economic inefficiency.

Notice that a surge in demand for L2 operations will not translate to higher revenue for provers—miners, the selectors of the L1 club, will collect the profits This profit flow seems okay, since the gas limit implies provers’ load does not increase (beyond the limit) in times of higher demand; need to think about this more.

Another proposal that came up is to run a variant of the EIP-1559 mechanism on the gas units, which (i) would flow base fee profits to provers, and which (ii) would remove the complexity of running a dual knapsack, as it provides an in-consensus sufficiently precise price per gas unit.

Avg vs peak gas Limit

Either way, when setting gas_limit, one consideration to put in mind is the gap between the average and the peak load on a prover: While logic zone I may typically comprise 10% of the gas capacity in blocks, its relative demand can reach 100% in peak demand, when users want to interact with it more than anything else. Here too, an economically-efficient design would restrict the block’s gas per logic zone. This, however, would result in an n-dimensional knapsack problem for constructing a block out of the mempool, and so we are currently opting for a simpler design with one gas_limit per block, acknowledging the economic suboptimality.

Funding Schemes for Provers

With either of the above mechanisms, L2 projects may conceive of additional funding schemes for provers. From an ecosystem vantage point, it is imperative that these fees be given in KAS, contribute to the flywheel of Kaspa, and align all players.

Again, I emphasize that I’ve summarized here ideas and considerations raised by aforementioned co-contributors. Some of the topics discussed here are still WIP and open for discussion, hence the research post format.