Data Availability Concerns

Hans_Moog · July 11, 2025, 7:26am

Introduction

Kaspa is pursuing an ambitious architectural goal: moving its entire execution layer off-chain into enshrined roll-ups secured by zk-proofs.

Rather than persisting state and calldata on-chain indefinitely, Kaspa’s Layer 1 (L1) is designed to prune historical data and serve primarily as a data dissemination and anchoring layer.

In this system, roll-ups periodically submit zk-commitments that:

Prove the correctness of off-chain execution.
Enable indirect communication between independent roll-ups through verifiable state transitions.

This reimagining of L1 functionality creates a separation of concerns that brings both promising benefits and challenging implications.

The good

From an L1 perspective, Kaspa’s approach is elegant and efficient:

Avoids state bloat: By not storing all execution data on-chain, the protocol avoids the ever-growing state size that burdens full nodes in many other smart contract platforms.
Lightweight infrastructure: Users and nodes not interested in specific roll-ups are not forced to store or process their data.
Correctness without replication: Thanks to zk-proofs, correctness can be independently verified without everyone re-executing everything.
Selective participation: Only those interested in a particular roll-up need to follow and replicate it, reducing unnecessary overhead for the rest of the network.

In essence, the system aligns computational effort with actual interest, while still preserving security and verifiability through cryptographic proofs.

The bad

However, these benefits come with non-trivial trade-offs:

No full reconstruction from L1: Since the L1 prunes state, it cannot serve as a canonical archive. Reconstructing a roll-up’s latest state requires cooperation from actors who have preserved it.
Withholding risks: If those who hold or mirror roll-up state become inactive or malicious, users may lose access to their funds or be unable to prove ownership/state transitions.
Fragmented DA assumptions: With many independent roll-ups, each potentially operated by different entities, users cannot easily assess the data availability guarantees of the roll-up they’re interacting with.

This introduces a form of informational asymmetry - users may trust a roll-up without realizing that their ability to access their funds depends on the unstated behavior of off-chain actors.

For instance, a user interacting with Rollup A may assume it’s as robustly available as Rollup B, not realizing that the latter is backed by a commercial DA service while the former depends on a small, volunteer-run mirror without much community participation.

And the ugly

At the heart of the data availability (DA) issue lies a game-theoretic dilemma, not just a technical one:

In most traditional blockchains, shared smart contract state is treated as a common good - all nodes replicate it by default, ensuring broad availability.
In Kaspa’s model, state replication is voluntary. Users choose which roll-ups to follow, and by extension, which data to retain. This makes the system highly flexible but also fragile.

Even if a roll-up has sufficient replication today, this could deteriorate over time if interest wanes, or actors exit the network.

This leads us into a classic tragedy of the commons like scenario:

Everyone benefits from someone maintaining data, but no one is individually incentivized to do so for the collective good - especially if they are not directly impacted.

Note: Unlike traditional commons problems, this isn’t just free-riding - it’s structural. Actors may act perfectly rationally by not storing what doesn’t affect them, yet the cumulative result is fragility.

Because there is no global consensus on what data matters or how long it should persist, availability becomes subject to social consensus and economic incentives, not protocol guarantees.

Conclusion and open questions

Kaspa introduces a fascinating shift in blockchain design - from a model of forced consensus and replication to one of voluntary association and market-driven state tracking.

But this raises critical open questions:

How can users trust that state will remain available without mandatory replication?
What incentives (or penalties) can ensure long-term DA without undermining Kaspa’s lean L1 goals?
How will users evaluate the reliability of roll-ups without transparent visibility into their DA infrastructure?

These are non-trivial coordination problems that extend beyond code into social behavior, governance, and incentive design and solving them will (at least in my opinion) be key to Kaspa’s long-term success as a zk-secured, off-chain smart contract platform.

PS: I am going to propose a concrete solution to this problem but since the research post I am writing about this covers a lot of ground and is still expanding in scope, I thought that it makes sense to separate the problem statement from the proposal (and post it already) so they can be discussed independently - maybe somebody has elegant answers that are completely unrelated to my line of thought.

hashdag · July 11, 2025, 8:02am

Welcome aboard ser!

In the common good setup state replication is voluntary too. Perhaps you mean that users will opt in to state replication rather than opt out as in the default setup? If so, notice that this too is a design choice, and the default L2 client can/should be set up in the same manner we set an L1 node – to store the available state.

I agree pruning provides a new flavour to the state-availability challenge, I disagree that it is a newly introduced challenge, or that the reliance on social consensus is a new assumption that Kaspa introduces.

Cryptographic proofs-of-replication can be baked into the protocol, alleviating the reliance on social consensus. While this does not guarantee real time retrievability (replicas can still refuse to share the state on demand), this problem appears everywhere in crypto (eg L1 miners refusing to share the UTXO set with new nodes).

P.S.
Ideally the platform would be rollup-unfriendly, so maybe we should use another term. In the past we used logic-zones as placeholder, and now I propose vApps from Succinct’s white paper. I mean, the entire design efforts are in order to counter rollups, defined (hereby) as logic zones optimized for more vapps joining under their own state, state commitment / proving; as opposed to vapps which are apps with defined logic which will naturally optimize for interoping with other vapps. Eg Arbitrum vs Aave. Perhaps we should elaborate more on the inherent L1-rollup misalignment, for now referring to this quick comment https://x.com/hashdag/status/1886191148533944366

Hans_Moog · July 14, 2025, 1:24am

In other networks, opting out of replicating historical data is usually supported but replicating the entire smart contract state is actually mandatory for nodes to be able to efficiently validate blocks.

Making data retention the default mode of operation also goes a long way as it is not unreasonable to assume that at least some actors are lazy or altruistic enough to just follow best practices and retain data even if they theoretically don’t have to (especially if the underlying protocol limits state-growth to support this mode of operation).

If so, notice that this too is a design choice, and the default L2 client can/should be set up in the same manner we set an L1 node – to save the available state.

I am not saying that this is a show-stopper and you are right that just defaulting to the rule that everybody tracks everything forever like all other networks do would be an easy and straight forward solution.

But this “solution” also means that you inherit the same limitations around state-growth and scalability as all other networks and I was actually assuming that Kaspa was planning to leverage its modularity to build a more scalable and fluid system where it is no longer necessary for “everybody to just globally store and execute everything” (even if separating execution from the L1).

I agree pruning provides a new flavour to the state-availability challenge, I disagree that it is a newly introduced challenge, or that the reliance on social consensus is a new assumption that Kaspa introduces.

I didn’t say that DA is a “new challenge” - what I am saying is that our system is “modular enough” to make this become a problem if we want to fully leverage our modularity and allow actors to only store and execute parts of the global load (that is somehow relevant for them).

Cryptographic proofs-of-replication can be baked into the protocol, alleviating the reliance on social consensus. While this does not guarantee real time retrievability (replicas can still refuse to share the state on demand), this problem appears everywhere in crypto (eg L1 miners refusing to share the UTXO).

What kind of proofs do you envision as this is usually done with things like data availability sampling and data availability committees (utilizing threshold signatures to attest to the availability of data) which seems to not translate well into the realm of PoW.

And, yes you are absolutely right - it is in fact very related to things like mining attacks that withhold data to prevent others from being able to extend the latest chain (i.e. [1912.07497] BDoS: Blockchain Denial of Service).

What makes this tricky is the fact, that this can now be done by a user rather than a miner (who is at least bound by economic incentives to keep its own statements extendable and eventually reveal the missing data to stay relevant for the mining algorithm).

Imagine I spawn up a new logic zone that nobody else tracks (and for which historic data is eventually lost) and then I compose my state with yours (paying whatever fee is necessary to pay for the assumed “externalities” to make this operation dynamically possible) while never revealing my input data / state to anybody else.

This not only makes me the only person on the planet that can prove correct execution and advance the state commitment on the L1 but it also means that if I decide to never reveal the missing input data then everybody else will forever be locked out of accessing that shared state again.

P.S.
Ideally the platform would be rollup-unfriendly, so maybe we should use another term. In the past we used logic-zones as placeholder, and now I propose vApps from Succinct’s white paper. I mean, the entire design efforts are in order to counter rollups, defined (hereby) as logic zones optimized for more vapps joining under their own state, state commitment / proving; as opposed to vapps which are apps with defined logic which will naturally optimize for interoping with other vapps. Eg Arbitrum vs Aave. Perhaps we should elaborate more on the inherent L1-rollup misalignment, for now referring to this quick comment https://x.com/hashdag/status/1886191148533944366

I agree that we should optimize for decentralization rather than specialized infra providers but tbh. I don’t really care how we call things in our discussions as long as we recognize that Kaspas design choices and default parameters result in unique challenges that need to be addressed if we want to securely leverage our modularity.

And what I am furthermore claiming is that solving these problems algorithmically does not work - but they “have to be solved on the social consensus layer” which means that the moment somebody launches a “vApp” that is supposed to be composable with other “vApps” (at some point in the future) then there needs to be a mechanism in place (backed by strong game-theoretic guarantees) that ensures that the state of that vApp is tracked by a sufficiently large group of actors (that will “never” forget its latest state).

Establishing the social consensus that everybody just tracks everything forever absolutely solves this but if that is the goal / basic assumption for L2 nodes then I don’t understand why we even discuss things like atomic sync composability since if everybody is assumed to have access to the state of all other vApps then they can just natively call into each other?

PS: I think that we can do orders of magnitude better than this and actually “solve” not just some but all of the hardest problems around smart contract enabled chains (scalability, state growth and state expiry) but we first need to recognize the problem and the fact that possible solutions will significantly influence and constrain the “open questions” we are currently trying to answer.