• Author: Antoine Rondelet (ar@clearmatics.com)
• Version: 0.2.0

In this notebook, we will simulate the evolution of a simplified blockchain system on which Zeth is deployed in order to better educate the choice of the various protocol parameters (number of input notes, number of output notes, depth of the Merkle tree etc). This notebook can be used for further experiments (using parameters which are not yet documented).

This notebook is focused around some key questions regarding the blockchain state growth under different configurations. This is particularly important since the growth of the blockchain state is a key factor that impacts the number of nodes on the distributed system (it drives the HW requirements for existing nodes on the network as well as affects how easy it is for new nodes to join the network (i.e. sync a new node)). In other words, as the number of nodes on a blockchain "boils down to convenience", we are interested to see how convenient (easy/fast/cheap) it is to validate on a blockchain under various network assumptions. Studying the state growth provides valuable hints with that regard. Nevertheless, the reader is reminded that, by the very essence of modeling, we make several simplifying assumptions in the section below that will thus ignore various aspects of a "real-life" running system.

"All models are wrong, but some are useful"

George E. P. Box.

Hopefully this one is somewhat useful...

## Open questions¶

This notebook is structured around some key open questions that aim to better understand the impact of Zeth on a blockchain system. Likewise, in a future work, we will be investigasting how well the privacy-preserving scalability solutions Zecale performs in term of both data compression and TPS.

### Question 1¶

After how much time does the Zeth merkle tree become full for a given Merkle tree depth?

• Assumption: all blocks mined are full and only made of Zeth transactions

### Question 2¶

How does the chain state size compare when only Zeth transactions are used, as opposed to the case where only "plain" EOA-to-EOA Ethereum transactions are used?

### Question 3¶

What is the gas cost per byte for EoA-to-EoA transactions and for Zeth transactions?

### Question 4¶

After how much time does the chain data become higher that 1TB? (1TB is the max storage of the latest XPS-15 laptop. We use this threshold as an indicator to track after how much time running a node becomes inconvenient and requires some "specialized" HW)

### Question 5¶

What is the impact of Zeth on the TPS of the system?

• Assumption: all blocks mined are full and only made of Zeth transactions

### Question 6¶

How well does Zecale compress the state (compared to "vanilla Zeth")? Furthermore:

• How is TPS impacted when batching Zeth transactions with Zecale?
• Are data compression and TPS moving in the same direction?

This question is answered in another notebook dedicated to Zecale

## Model¶

### Zeth state¶

We assume that the Zeth state ($\zeta_z$) is only made of the following:

• Zeth Merkle tree, simply modelled as a set of leaves nodes, i.e. Merkle tree Leaves Set (denoted $\mathtt{MKLS}$). The full tree with intermediate nodes can be recovered by recursive hashing from the tree leaves.
• Nullifiers set (denoted $\mathtt{NS}$)
• Roots set (denoted $\mathtt{RS}$)

Importantly, we do not account for the storage cost of the Zeth contracts (one time operation carried out at initialization time) and their various storage constants (i.e. constant protocol parameters) etc.

### Blockchain state¶

We assume that the blockchain state ($\zeta_b$) is only made of:

• The chain of block headers
• The chain of block bodies (the set of transactions)
• The chain of receipts (past transaction results and contract logs)

### Further assumptions¶

Some of these assumptions are not strictly necessary, but that's helpful to make them for now, to further simplify the system and remove any potential unexpected moving pieces

1. We assume that all transactions emitted are eventually mined. More precisely, we assume:
• no network failures (no messages are dropped/lost)
• miners have unbounded memory (no need to drop transactions from the pool)
• miners mine transactions in the order they receive them (no censorship etc)
2. We only consider two types of blockchain transactions:
• plain "EOA-to-EOA" transactions with no extra data
• Zeth transactions
3. We assume the number of accounts is fixed throughout the simulation (for now we simply reason in term of number of transactions, without bothering about the account from which they come from).
4. We model the blockchain as a mere chain of blocks (no forks, no ommer blocks etc) that are made of a header and a list of transactions.
5. We assume that all blockchain related configuration parameters are fixed (fixed block gas limit etc.)
6. We assume that all Zeth contracts are already deployed (no deployment cost (size/storage-wise and gas-wise) to take into consideration).
7. We assume that the blockchain state is stored by the client in a database which supports automatic data compression:
• We assume that compression is instantaneous
• We assume a fixed compression ratio on the stored state as plain text data
• We ignore the potential overhead associated with accessing values associated with hashes on disk
• We assume that the state fits entirely in memory

Note: At the time of writing, the go-ethereum client uses the LevelDB database which compresses with Snappy. Other databases may be used by other clients however. For instance, the openethereum client uses Rocksdb which compression can be further configured to use lz4 for instance (though Snappy is kept as default). See also the documentation of Turbo-Geth which proposes an alternative to go-ethereum to organise the persistent data in its database.

### Constants¶

Below are the parameters that remain constant across simulations.

• The size (in bytes) of a plain Ethereum transaction is denoted by $\mathtt{ETHTXSIZE}$
• The intrinsic/default gas cost of an Ethereum transaction ("plain EOA to EOA" transaction) is denoted by $\mathtt{DGAS}$
• The gas cost for the set of supported pre-compiled contracts is denoted by $\mathtt{PRECOMPILED\_CURVES}$
• The database compression ratio is denoted by $\mathtt{COMPRESSION\_RATIO}$

### Variable parameters¶

Below are the parameters that may change across (and during) simulations.

#### Zeth parameters¶

• We denote the curve used by Zeth as $\mathtt{ZETHCRV} \in \{\mathtt{BN254}, \mathtt{BLS12-377}\}$ (will determine which precompiled sets we need to use to do the Zeth state transition (proof verification), which will educate on the expansiveness of the state transition)
• Merkle tree depth $\mathtt{MDEPTH}$ (i.e. |$\mathtt{MKLS}| \leq 2^\mathtt{MDEPTH}$)
• The number of Zeth input notes $\mathtt{JSIN}$
• The number of Zeth output notes $\mathtt{JSOUT}$
• The size (in bytes) of the Zeth Mix transaction is denoted by $\mathtt{ZETHTXSIZE}$.
• The number of Zeth Mix inputs is denoted by $\mathtt{ZETHINPSIZE} = 1 + \mathtt{JSOUT} + \mathtt{JSIN} + 1 + \mathtt{JSIN} + 1$ (MK root + JSOUT commitments + JSIN nullifiers + h_sig + JSIN h_i tags + residual_bits)
• The gas cost of a Zeth Mix call is denoted by $\mathtt{ZETHGCOST}$

#### Blockchain parameters¶

• We denote the blockchain by $\mathcal{B}$ and see it as a mere chain of blocks
• We denote the block gas limit as $\mathtt{BGLIM}$ (important to know how many Zeth transactions can fit into a block)
• TODO: Consider treating $\mathtt{BGLIM}$ as an "elastic" param/variable (instead of a constant) as in EIP1559.
• We denote the block production time target as $\mathtt{BTIMETRGT}$
• (Optional) We denote the block production time lag $\mathtt{BTIMELAG}$ (randomly selected in a time window to account for potential delays due to PoW and/or due to network latency in block propagation)
• For now, $\mathtt{BTIMELAG} = 0$
• TODO: Consider adding a randomized block production lag in future iterations of the model in order to use Monte Carlo executions.
• We denote the block production time as $\mathtt{BTIME} = \mathtt{BTIMETRGT} + \mathtt{BTIMELAG}$
• The size (in bytes) of a block is denoted by $\mathtt{BLKSIZE}$

### Initial state¶

These are the constants initialization values that do not vary across executions

• $\mathtt{MKLS} = \emptyset$
• $\mathtt{NS} = \emptyset$
• $\mathtt{RS} = \emptyset$
• $\mathcal{B}$ = $\mathcal{B}_{genesis}$ (The chain is instantiated, an empty genesis block is mined)

### State transition¶

• $\mathtt{JSOUT}$ leaves to the set $\mathtt{MKLS}$
• $\mathtt{JSIN}$ nullifiers to the set $\mathtt{NS}$
• $\mathtt{JSIN}$ roots to the set $\mathtt{RS}$
• $\lfloor \frac{\mathtt{BGLIM}}{\mathtt{ZETHGCOST}} \rfloor$ new transactions to the blockchain

## Modeling Zeth with different protocol parameters¶

We first start by tracking the blockchain state growth when only plain "EoA-to-EoA" transactions are carried out. Then, we model Zeth with different protocol parameters to see how the blockchain state size grows under different conditions, as well as track the rate at which the Merkle tree of Zeth notes commitments is filled.

We use A/B testing and "Parameters Sweep" simulations to study the state growth under different blockchain configurations (block gas limit etc.) and Zeth configurations (Merkle tree depth, JSIN/JSOUT etc.):

• Simulation A: Only "plain" (with no extra data) EoA-to-EoA transactions are mined. This simulation uses "Parameters Sweep" to simulate the system under various blockchain configurations.
• Simulation B: Only Zeth transactions are mined. This simulation uses "Parameters Sweep" to simulate the system under various blockchain configurations. The Zeth configuration tested is:
• Merkle tree depth = 32
• JSIN = JSOUT = 2
• Curve = BN254
• Simulation C: Only Zeth transactions are mined. This simulation uses "Parameters Sweep" to simulate the system under various blockchain configurations. The Zeth configuration tested is:
• Merkle tree depth = 32
• JSIN = JSOUT = 2
• Curve = BLS12_377

All these simulations are deterministic (no MC runs) and represent 24h worth of data. Since no random runs are employed, the simulation results can be cached into a file to avoid multiple (expensive) runs of the model's simulations.

### Simulation dataset¶

Before pursuing with the simulation, it is worth clarifying how the input dataset has been obtained.

Ideally, in order to determine the gas cost of a state transition, one may want to use the blockchain network's gas table along with the set of opcodes defining the state transition in order to come up with a deterministic formula that computes the cost of the smart-contract call. However, such approach is not sufficient to properly determine the cost of a state transition, since several opcodes (such as SSTORE) have different costs depending on the smart-contract's state (i.e. depending if empty storage slots are initiliazed or simply re-written). As a consequence, and to ease the process, the following data (transactions gas cost and byte-size) are obtained via empirical experiments, during which a set of transactions are fired on a test network. The results below are obtained via the arithmetic mean of a simulation's results. Importantly, certain Zeth configurations (i.e. certain curve selections: BLS12_377 and BW6_761) necessitate extensions to the EVM in order to support curve operations (point addition, scalar multiplication) and pairings for remarkable pairing groups. As such, Zeth related simulations have been carried out on an extended version of ganache-cli.

We use some Ethereum mainnet data as basis to determine values for the blockchain-related variables and constants.

### System parameters of the simulation¶

The set of parameters (of interest) used for the A/B(/C) testing and "Parameter Sweep" simulation of Zeth is defined below.

### Simulations¶

If you have already carried out the simulation, cached its results, and simply want to plot the simulation results, please jump to this step (and do not execute the boxes below).