The principle and proof of Solidity Merkle trees

what is merkle trees

Definition

Merkle trees, named after Ralph Merkle, are a fundamental data structure used in cryptography and computer science. They are commonly employed in blockchain systems to ensure data integrity and enable efficient verification.

The principle behind Merkle trees is based on the concept of hash functions. A hash function is a mathematical algorithm that takes an input (data) and produces a fixed-size output, known as a hash value or digest. The key properties of a hash function are collision resistance and the avalanche effect.

how to generate merkle tree data via solidity

we can calculate hash value through solidity keccak256 function:

1
2
3
4
5
6
7
8
keccak256(abi.encodePacked(toHashValue)

e.g.:
before hash
0xAb8483F64d9C6d1EcF9b849Ae677dD3315835cb2

after hash
0x999bf57501565dbd2fdcea36efa2b9aef8340a8901e3459f4a4c926275d36cdb

After performing the hash operation on the values of the leaf nodes, the adjacent nodes are then hashed together until only a single root node remains.

Suppose there are two adjacent nodes, A and B. The order of the hash operation, whether it is hash(A+B) or hash(B+A), is determined by the sizes of A and B. In the corresponding Merkle code in OpenZeppelin, we can find the following code snippet:

1
2
3
function _hashPair(bytes32 a, bytes32 b) private pure returns (bytes32) {
return a < b ? _efficientHash(a, b) : _efficientHash(b, a);
}

In summary, the smaller value is typically placed in front to determine the order for computing the hash values. This may cause some confusion when performing the calculations manually.

In practical projects, it is common to store only the final result, the root hash value, in the contract. Storing a large number of addresses in the contract would consume a significant amount of gas fees. By using a Merkle tree calculation, the amount of data that needs to be stored is greatly reduced.

Let’s demonstrate how to calculate and store the root hash value using a setup example from Foundry:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

bytes32 public root;
bytes32[] public leafs;
bytes32[] public l2;

function setUp() public {
address[] memory addrss = new address[](4);
addrss[0] = 0xAb8483F64d9C6d1EcF9b849Ae677dD3315835cb2;
addrss[1] = 0x2d886570A0dA04885bfD6eb48eD8b8ff01A0eb7e;
addrss[2] = 0xed857ac80A9cc7ca07a1C213e79683A1883df07B;
addrss[3] = 0x690B9A9E9aa1C9dB991C7721a92d351Db4FaC990;

//Calculating the hash values of leaf nodes from a list of addresses.
leafs.push(keccak256(abi.encodePacked(addrss[0])));
leafs.push(keccak256(abi.encodePacked(addrss[1])));
leafs.push(keccak256(abi.encodePacked(addrss[2])));
leafs.push(keccak256(abi.encodePacked(addrss[3])));

//Calculating the hash values of the second level.
l2.push(keccak256(abi.encodePacked(leafs[0], leafs[1])));
l2.push(keccak256(abi.encodePacked(leafs[2], leafs[3])));

//Calculating root hash value.
root = keccak256(abi.encodePacked(l2[1], l2[0]));
}

For demonstration purposes, we have provided only four addresses. In actual projects, the number of addresses can be significantly larger.

how to proof merkle data

Once we have stored the root hash value in the contract, how do we verify if a client-provided address is a valid address or belongs to a whitelist?

Firstly, we need to hash the address as the third parameter. Then, we pass the hash value of the address, along with the adjacent hash values, as the proof to the verification function.

The proof list corresponds to the red-marked area in the image below.

test proof function:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
function testVerify() public {
address proofAddress = 0xAb8483F64d9C6d1EcF9b849Ae677dD3315835cb2;

bytes32[] memory proof = new bytes32[](2);
proof[0] = leafs[1];
proof[1] = l2[1];

assert(
MerkleProof.verify(
proof,
root,
keccak256(abi.encodePacked(proofAddress))
)
);
}

Here is the complete Foundry testing code from github:https://gist.github.com/coffiasd/96022abd7f9a8a2189bda14bf9e755dc

Applications in Real-World Projects

  • airdrop
  • whitelist

Common Vulnerabilities in Smart Contract Audits

1
2
3
4
5
6
7
function parentHash(bytes32 a, bytes32 b) public pure returns (bytes32) {
if (a < b) {
return keccak256(abi.encode(a, b));
} else {
return keccak256(abi.encode(b, a));
}
}

The statement “abi.encode(address, uint)” will produce 64 bytes. Since “abi.encode(bytes32, bytes32)” also yields 64 bytes, hash collisions may potentially occur between leaf nodes and parent nodes.

Follow me on twitter :https://twitter.com/coffiasse
TG:@coffiasd

Comments