diff --git a/docs/docs/dev/architecture/index.md b/docs/docs/dev/architecture/index.md index 0e630c5182a..36dadfbc297 100644 --- a/docs/docs/dev/architecture/index.md +++ b/docs/docs/dev/architecture/index.md @@ -1,7 +1,7 @@ # Architecture This document describes the architecture of the current implementation -of a `hydra-node`. The following picture represents the main +of a `hydra-node`. The following diagram represents the main components of a Hydra node using the [C4 Component Diagram](https://c4model.com/#ComponentDiagram) notation. @@ -13,7 +13,7 @@ import ArchitectureSvg from './hydra-components.svg'; :::info -This diagram is manually produced using [PlantUML](https://plantuml.com) graphing tool with [C4 extensions](https://github.com/plantuml-stdlib/C4-PlantUML). +This diagram is manually produced using the [PlantUML](https://plantuml.com) graphing tool with [C4 extensions](https://github.com/plantuml-stdlib/C4-PlantUML). ``` $ plantuml -Tsvg architecture-c4.puml @@ -23,67 +23,51 @@ $ plantuml -Tsvg architecture-c4.puml ### Network -_Network_ component is responsible for all of the communications related to the off-chain part of the Hydra protocol between hydra-nodes. The [current implementation](./networking) is based on the [Typed Protocols](https://github.com/input-output-hk/typed-protocols) library which is also at the heart of the Cardano node's networking. It's asynchronous by nature and is using a push based protocol with a uniform _broadcast_ abstraction. +The _network_ component is responsible for all communications between Hydra nodes related to the off-chain part of the Hydra protocol. The [current implementation](./networking) is based on the [typed protocols](https://github.com/input-output-hk/typed-protocols) library, which is also used by the Cardano networking. It is asynchronous by nature and uses a push-based protocol with a uniform _broadcast_ abstraction. -Messages are exchanged between nodes on different internal transitions and are authenticated using each peer _Hydra Key_: Each message sent is signed by the emitter and the signature is verified by the transmitter. +Messages are exchanged between nodes during different internal transitions and are authenticated using each peer's _Hydra key_. Each message sent is signed by the sender, and the signature is verified by the receiver. -##### Authentication & Authorization +#### Authentication and authorization -The messages exchanged through the _Hydra Network_ layer between -participants are authenticated: Each message is -[signed](https://github.com/cardano-scaling/hydra/issues/727) using -the Hydra signing key of the emitting party which is identified by -the corresponding verification key. When a message with an unknown -or incorrect signature is received, it is dropped and a notification -is logged. +The messages exchanged through the _Hydra networking_ layer between participants are authenticated. Each message is [signed](https://github.com/input-output-hk/hydra/issues/727) using the Hydra signing key of the emitting party, which is identified by the corresponding verification key. When a message with an unknown or incorrect signature is received, it is dropped, and a notification is logged. -Messages are however not encrypted and therefore, if confidentiality is -required, some external mechanism needs to be put in place to prevent -other parties from observing the messages exchanged within a Head. +However, messages are not encrypted. If confidentiality is required, an external mechanism must be implemented to prevent other parties from observing the messages exchanged within a head. -##### Fault tolerance +#### Fault tolerance -The Hydra Head protocol guarantees safety of all (honest) -participants' funds, but does not in itself guarantee liveness, hence -all parties involved in a Hydra Head must be online and reactive for -the protocol to make progress. +The Hydra Head protocol guarantees the safety of all honest participants' funds but does not inherently guarantee liveness. Therefore, for the protocol to progress, all parties involved in a head must be online and reactive. -This implies that, should one or several participants' hydra-node -become _permanently_ unreachable from the others, through a crash, or -a network partition, no more transactions can happen in the Head and -it must be closed. However, the [Hydra network -layer](https://hydra.family/head-protocol/unstable/haddock/hydra-node/Hydra-Node-Network.html) -is tolerant to _transient_ disconnections and (non-Byzantine) crashes. +This means that if one or more participants' Hydra nodes become permanently unreachable due to a crash or network partition, no further transactions can occur in the head, and it must be closed. However, the [Hydra networking layer](https://hydra.family/head-protocol/unstable/haddock/hydra-node/Hydra-Node-Network.html) is tolerant to transient disconnections and (non-Byzantine) crashes. -### Chain Interaction +### Chain interaction #### Chain -The _Chain_ component is responsible for interfacing the Hydra node with the Cardano (aka. Layer 1) chain. The current, so-called `Direct`, implementation uses the `Node-to-Client` protocol to both "follow the chain" and observe new blocks and transactions which can change the state of the head, and submit transactions in response to client's requests or as needed to advance the protocol's state. It connects to a locally spun cardano-node using the _local socket_and contains the \_off-chain_ logic, based on cardano-api, that knows how to observe and build transactions relevant to the Hydra protocol. See [ADR-10](/adr/10) for more details. +The _chain_ component interfaces the Hydra node with the Cardano (layer 1) chain. The current implementation, called `Direct`, uses the `node-to-client` protocol to both 'follow the chain' and observe new blocks and transactions that can change the state of the head. It also submits transactions in response to client requests or as needed to advance the protocol's state. This component connects to a locally spun `cardano-node` using the _local socket_ and contains the _off-chain_ logic, based on `cardano-api`, which knows how to observe and build transactions relevant to the Hydra protocol. For more details, see [ADR-10](/adr/10). #### Wallet -The Hydra node maintains an internal wallet using the Cardano signing key provided to the `hydra-node`. This is used to handle the payment of transaction fees and signing transactions. +The Hydra node maintains an internal wallet using the Cardano signing key provided to the `hydra-node`. This wallet is used to handle the payment of transaction fees and to sign transactions. -### Head Logic +### Head logic -This is the component which is the heart of the Hydra node, implementing the protocol's _state machine_. It is structured around the concepts of `Input`s and `Effect`s: `Input`s from the outside world are interpreted against the current state, this may result in internal `Event`s, which are aggregated into an updated state, and `Effect`s which result in "side-effects" on the outside world. The state available to the _Head Logic_ consists of both, the content of the Head itself (eg. current Ledger, transactions pending) _and_ the data from the Layer 1 that's needed to observe and trigger on-chain transitions. +This component is at the core of the Hydra node, implementing the protocol's _state machine_. It is structured around the concepts of `Input`s and `Effect`s. `Input`s from the outside world are interpreted against the current state, potentially resulting in internal `Event`s, which are aggregated into an updated state. This can then produce `Effect`s, leading to 'side effects' on the outside world. The state available to the _head logic_ consists of the head's content (eg, current ledger, pending transactions) _and_ the data from layer 1 needed to observe and trigger on-chain transitions. -### Hydra Smart Contracts +### Hydra smart contracts -This "component" represents all of the Hydra smart contracts needed for Head protocol operation. Currently the contracts are written using `Plutus-Tx`. The scripts are optimized using custom `ScriptContext` and error codes for now. +This component represents all the Hydra smart contracts needed for the Hydra Head protocol operation. Currently, the contracts are written using `Plutus-Tx`. The scripts are optimized using a custom `ScriptContext` and specific error codes. ### API -`hydra-node` exposes an [Asynchronous API](https://hydra.family/head-protocol/unstable/api-reference) through a Websocket server. This API is available to _Hydra Client_ to send commands and observe changes in the state of the Hydra head. Upon startup, the API server loads all historical messages from persistence layer and serves them to clients in case they are interested in observing them. +The `hydra-node` component exposes an [asynchronous API](https://hydra.family/head-protocol/unstable/api-reference) through a WebSocket server. This API is available to the _Hydra client_ for sending commands and observing changes in the state of the Hydra head. Upon startup, the API server loads all historical messages from the persistence layer and serves them to clients interested in observing them. ### Persistence -All API server outputs and the `hydra-node` state is preserved on disk. The persistence layer is responsible for loading the historical messages/hydra state from disk and also storing them. For the time being there was no need to make this layer more complex or use a database. +All API server outputs and the `hydra-node` state are preserved on disk. The persistence layer is responsible for loading historical messages and the Hydra state from disk, as well as storing them. Currently, there hasn't been a need to increase the complexity of this layer or use a database. ### Logging -The Hydra node logs all side-effects occurring internally as JSON-formatted messages to the _standard output_ stream attached to its process. The format of the logs is documented as a [JSON schema](https://raw.githubusercontent.com/input-output-hk/hydra/master/hydra-node/json-schemas/logs.yaml), and follows the principles outlined in [ADR-9](/adr/9). +The Hydra node logs all internal side effects as JSON-formatted messages to its _standard output_ stream. The log format adheres to a documented [JSON schema](https://raw.githubusercontent.com/input-output-hk/hydra/master/hydra-node/json-schemas/logs.yaml), following the principles outlined in [ADR-9](/adr/9). ### Monitoring diff --git a/docs/docs/dev/architecture/networking.md b/docs/docs/dev/architecture/networking.md index 11a8c0dec00..d8fec0d6859 100644 --- a/docs/docs/dev/architecture/networking.md +++ b/docs/docs/dev/architecture/networking.md @@ -1,84 +1,83 @@ # Networking -This document provides details about the _Hydra Networking Layer_, eg. the network comprised of Hydra nodes upon which Heads can be opened. +This document provides details about the Hydra networking layer, which encompasses the network of Hydra nodes where heads can be opened. :::warning -πŸ›  This document is a work in progress. We know the current situation w.r.t. networking is less than ideal, it's just a way to get started and have _something_ that works. There is already a [proposal](https://github.com/input-output-hk/hydra/pull/237) to improve the situation by making the network more dynamic. +πŸ›  This document is a work in progress. We recognize that the current state of networking is suboptimal, serving as an initial implementation to establish a functional basis. Efforts are underway to enhance the network dynamics through a proposed improvement initiative, detailed in [this proposal](https://github.com/input-output-hk/hydra/pull/237). ::: -# Questions +## Questions - What's the expected topology of the transport layer? - - Are connected peers a subset/superset/identity set of the Head parties? + - Are connected peers a subset, superset, or identical set of the head parties? - Do we need the delivery ordering and reliability guarantees TCP provides? - - TCP is full-duplex stream oriented persistent connection between nodes - - Our networking layer is based on asynchronous messages passing which seems better suited to UDP -- Do we need to care about nodes being reachable through firewalls? - - This could be something we push onto end users, eg. let them be responsible of configuring their firewalls/NATs to match Hydra node needs - - Probably easier for business/corporate/organisation players than for end-users -- Do we want _privacy_ within a Head? - - The details of txs should be opaque for outside observers, only the end result of the Head's fanout is observable -- How do we know/discover peers/parties? - - The paper assumes there exists a _Setup_ phase where - > In order to create a head-protocol instance, an initiator invites a set of participants \{p1,...,pn\} (himself being one of them) to join by announcing to them the protocol parameters: the list of participants, the parameters of the (multi-)signature scheme to be used, etc. - > Each party then establishes pairwise authenticated channels to all other parties. - - What exactly is a _list of participants_? It seems at the very least each participant should be _identified_, in order to be distinguished from each other, but how? Some naming scheme? IP:Port address? Public key? Certificate? - - What are "pairwise authenticated channels" exactly? Are these actual TCP/TLS connections? Or is it more a layer 4 (Transport) or layer 5 (Session) solution? + - TCP provides full-duplex, stream-oriented, persistent connections between nodes + - The Hydra networking layer is based on asynchronous message passing, which seems better suited to UDP +- Do we need to consider nodes being reachable through firewalls? + - This responsibility could be delegated to end users, allowing them to configure their firewalls/NATs to align with Hydra node requirements + - This may be more manageable for business, corporate, or organizational parties than for individual end-users +- Do we want _privacy_ within a head? + - Transactions' details should be opaque to outside observers, with only the final outcome of the head's fanout being observable +- How do we identify/discover peers/parties? + - The paper assumes a _setup_ phase where: + > To create a head-protocol instance, an initiator invites a set of participants \{p1,...,pn\} (including themselves) to join by announcing protocol parameters: the participant list, parameters of the (multi-)signature scheme, etc. + > Each party subsequently establishes pairwise authenticated channels with all other parties involved. +- What constitutes a _list of participants_? Should each participant be uniquely identifiable? If so, what identification method should be used β€” naming scheme, IP: port address, public key, certificate? + - What do 'pairwise authenticated channels' entail? Are these actual TCP/TLS connections, or do they operate at the Transport (layer 4) or Session (layer 5) level? - How open do we want our network protocol to be? - - We are currently using Ouroboros stack with CBOR encoding of messages, this will make it somewhat hard to have other tools be part of the Hydra network + - Currently leveraging the Ouroboros stack with CBOR message encoding, integrating other tools into the Hydra network may pose challenges. -# Investigations +## Investigations -## Ouroboros +### Ouroboros -We had a meeting with network team on 2022-02-14 where we investigated how Ouroboros network stack fits in Hydra. -Discussion quickly derived on performance, with Neil Davies giving some interesting numbers: +We held a meeting with the networking team on February 14, 2022, to explore the integration of the Ouroboros network stack into Hydra. During the discussion, there was a notable focus on performance, with Neil Davies providing insightful performance metrics. - World circumference: 600ms - Latency w/in 1 continent: 50-100ms - Latency w/in DC: 2-3ms - Subsecond roundtrip should be fine wherever the nodes are located -- basic reliability of TCP connections decrease w/ distance: - - w/in DC connection can last forever) - - outside DC: it's hard to keep a single TCP cnx up forever, if a reroute occurs because some intermediate node is down, it takes 90s to resettle a route - - This implies that as the number of connections goes up, the probability of having at least one connection down at all time increases -- closing of the head must be dissociated from network connections => a TCP cnx disappearing =/=> closing the head -- Within cardano network, propagation of a single empty block takes 400ms (to reach 10K nodes) - - ouroboros network should withstand 1000s of connections (there are some system-level limits) -- Modelling Hydra network - - A logical framework for modelling performance of network associate CDF with time for a message to appear at all nodes (this is what is done in the [hydra-sim](https://github.com/cardano-scaling/hydra-sim) - - We could define a layer w/ the semantics we expect, eg. Snocket = PTP connection w/ ordered guaranteed messages delivery. Do we need that in Hydra? +- Basic reliability of TCP connections decreases w/ distance: + - w/in DC connection can last forever + - outside DC: it's hard to keep a single TCP cnx up forever; if a reroute occurs because some intermediate node is down, it takes 90s to resettle a route + - this implies that as the number of connections goes up, the probability of having at least one connection down at all times increases +- Closing of the head must be dissociated from network connections => a TCP cnx disappearing =/=> closing the head +- Within the Cardano network, propagation of a single empty block takes 400ms (to reach 10K nodes) + - the Ouroboros network should withstand 1000s of connections (there are some system-level limits) +- Modelling the Hydra network + - a logical framework for modelling the performance of network associate CDF with time for a message to appear at all nodes (this is what is done in the [hydra-sim](https://github.com/input-output-hk/hydra-sim) + - we could define a layer w/ the semantics we expect; for example, Snocket = PTP connection w/ ordered guaranteed messages delivery – do we need that in Hydra? - How about [Wireguard](https://wireguard.io)? It's a very interesting approach, with some shortcomings: - no global addressing scheme - - there is one `eth` interface / connection + - there is one `eth` interface/connection - on the plus side, it transparently manages IP address changes - - does not help w/ Firewalls, eg. NAT needs to be configured on each node + - does not help w/ Firewalls, eg NAT needs to be configured on each node. -## Cardano Networking +### Cardano networking -See [this wiki page](https://github.com/cardano-scaling/hydra.wiki/blob/master/Networking.md#L1) for detailed notes about how Cardano network works and uses Ouroboros. +See [this Wiki page](https://github.com/input-output-hk/hydra.wiki/blob/master/Networking.md#L1) for detailed notes about how the Cardano network works and uses Ouroboros. -- Cardano is a global network spanning 1000s of nodes, with nodes coming and going and a widely varying topology. Its main purpose is _block propagation_: Blocks produced by some nodes according to the consensus rules needs to reach every node in the network in less than 20 seconds. -- Nodes cannot be connected to all other nodes, as such block diffusion occurs through some form of _gossipping_ whereby a node is connected to a limited set of peers with which it exchanges blocks -- Nodes need to be resilient to adversarial behaviour from peers or other nodes, as such they need to control the amount and rate of data they want to ingest, hence the need for a _pull-based_ messaging layer -- Producer nodes are sensitive assets as they need access to signing keys, hence are usually run behind _relay nodes_ in order to increase safety and reduce the risk of DoS or other malicious activities -- Nodes are expected to run behind ADSL or cable boxes, firewalls, or in other complex networking settings preventing nodes to be _addressed_ directly, hence the need for nodes to _initiate_ connections to externally reachable _relay nodes_, and for _pull-based_ messaging +- Cardano is a global network spanning thousands of nodes, with nodes constantly joining and leaving, resulting in a widely varying topology. Its primary function is block propagation: blocks produced by certain nodes according to consensus rules must reach every node in the network within 20 seconds. +- Nodes cannot maintain direct connections to all other nodes; instead, block diffusion occurs through a form of _gossiping_. Each node is connected to a limited set of peers with whom it exchanges blocks. +- Nodes must withstand adversarial behavior from peers and other nodes, necessitating control over the amount and rate of data they ingest. Hence, a _pull-based_ messaging layer is essential. +- Producer nodes, which require access to signing keys, are considered sensitive assets. They are typically operated behind *relay nodes* to enhance security and mitigate the risks of DoS attacks or other malicious activities. +- Nodes often operate behind ADSL or cable modems, firewalls, or in other complex networking environments that prevent direct addressing. Therefore, nodes must initiate connections to externally reachable *relay nodes*, and rely on a *pull-based* messaging approach. -# Implementations +## Implementations -## Current State +### Current state -- Hydra nodes form a network of pairwise connected _peers_ using Point-to-point (e.g TCP) connections. Those connections are expected to be up and running at all time - - Nodes use [Ouroboros](https://github.com/input-output-hk/ouroboros-network/) as the underlying network abstraction, which takes care of managing connections with peers providing a reliable Point-to-Point stream-based communication abstraction called a `Snocket` +- Hydra nodes form a network of pairwise connected *peers* using point-to-point (eg, TCP) connections that are expected to remain active at all times: + - Nodes use [Ouroboros](https://github.com/input-output-hk/ouroboros-network/) as the underlying network abstraction, which manages connections with peers via a reliable point-to-point stream-based communication framework known as a `Snocket` - All messages are _broadcast_ to peers using the PTP connections - - Due to the nature of the Hydra protocol, the lack of connection to a peer prevents any progress of the Head -- A `hydra-node` can only open a Head with _all_ its peers, and only them. This implies the nodes need to know in advance the topology of the peers and Heads they want to open -- Connected nodes implement basic _failure detection_ through heartbeats and monitoring exchanged messages -- Messages between peers are signed using party's hydra key and also validated upon receiving. + - Due to the nature of the Hydra protocol, the lack of a connection to a peer halts any progress of the head. +- A `hydra-node` can only open a head with *all* its peers and exclusively with them. This necessitates that nodes possess prior knowledge of the topology of both peers and heads they intend to establish. +- Connected nodes implement basic _failure detection_ through heartbeats and monitoring exchanged messages. +- Messages exchanged between peers are signed using the party's Hydra key and validated upon receiving. -## Gossip diffusion network +### Gossip diffusion network -The following diagram is one possible implementation of a pull-based messaging system for Hydra, drawn from a discussion with IOG's networking engineers: +The following diagram illustrates one possible implementation of a pull-based messaging system for Hydra, developed from discussions with IOG’s networking engineers: ![Hydra pull-based network](./hydra-pull-based-network.jpg) diff --git a/docs/docs/dev/haskell-packages.md b/docs/docs/dev/haskell-packages.md index 354e9a949f1..40eb0aa8fff 100644 --- a/docs/docs/dev/haskell-packages.md +++ b/docs/docs/dev/haskell-packages.md @@ -1,22 +1,22 @@ # Haskell packages -The Hydra project is divided into several Haskell packages fulfilling different parts of the protocol. While some packages are internal and specific to the Hydra project, some are quite generic and may be useful to other projects facing similar issues. Regardless, we expose [Haddock](https://www.haskell.org/haddock/) documentation for all of them. +The Hydra project consists of several Haskell packages, each serving distinct parts of the protocol. While some packages are internal and tailored specifically to Hydra, others offer more generic functionalities that could benefit other projects tackling similar challenges. Comprehensive [Haddock](https://www.haskell.org/haddock/) documentation is provided for all packages. -## Public Packages +## Public packages | Package | Description | | --- | --- | -| [plutus-merkle-tree](pathname:///haddock/plutus-merkle-tree/index.html) | Implementation of Merkle Trees, compatible with on-chain Plutus validators. | -| [plutus-cbor](pathname:///haddock/plutus-cbor/index.html) | Implementation of CBOR encoders, compatible with on-chain Plutus validators. | -| [hydra-prelude](pathname:///haddock/hydra-prelude/index.html) | Custom Hydra Prelude used across other Hydra packages. | -| [hydra-cardano-api](pathname:///haddock/hydra-cardano-api/index.html) | A wrapper around the `cardano-api`, with era-specialized types and extra utilities. | +| [plutus-merkle-tree](pathname:///haddock/plutus-merkle-tree/index.html) | Implementation of Merkle trees, compatible with on-chain Plutus validators | +| [plutus-cbor](pathname:///haddock/plutus-cbor/index.html) | Implementation of CBOR encoders, compatible with on-chain Plutus validators | +| [hydra-prelude](pathname:///haddock/hydra-prelude/index.html) | Custom Hydra prelude used across other Hydra packages | +| [hydra-cardano-api](pathname:///haddock/hydra-cardano-api/index.html) | A wrapper around the `cardano-api`, with era-specialized types and extra utilities | -## Internal Packages +## Internal packages | Package | Description | | --- | --- | -| [hydra-node](pathname:///haddock/hydra-node/index.html) | The Hydra node. | -| [hydra-node tests](pathname:///haddock/hydra-node/tests/index.html) | The Hydra node test code. | +| [hydra-node](pathname:///haddock/hydra-node/index.html) | The Hydra node | +| [hydra-node tests](pathname:///haddock/hydra-node/tests/index.html) | The Hydra node test code | | [hydra-tui](pathname:///haddock/hydra-tui/index.html) | Terminal User Interface (TUI) for managing a Hydra node | -| [hydra-plutus](pathname:///haddock/hydra-plutus/index.html) | Hydra Plutus Contracts | -| [hydra-cluster](pathname:///haddock/hydra-cluster/index.html) | Integration test suite using a local cluster of Cardano and hydra nodes | +| [hydra-plutus](pathname:///haddock/hydra-plutus/index.html) | Hydra Plutus contracts | +| [hydra-cluster](pathname:///haddock/hydra-cluster/index.html) | Integration test suite using a local cluster of Cardano and Hydra nodes | diff --git a/docs/docs/dev/index.md b/docs/docs/dev/index.md index a9bd30ddb9f..054fa349b47 100644 --- a/docs/docs/dev/index.md +++ b/docs/docs/dev/index.md @@ -1,7 +1,7 @@ # About Hydra -So you already read the [User Manual](../index.md) and still want to learn more about the Hydra protocol, its inner workings and how it is implemented? +If you've read the [user manual](../index.md) and want to learn more about the Hydra protocol, its inner workings, and its implementation, you're in the right place. -The developer documentation can help you with that. It is a collection of materials expanding on _using_ into _understanding_ Hydra. It is aimed anyone wanting to get a deeper understanding of the Hydra protocol, protocol architects who want to build their own variants and of course developers who contribute to the [reference implementation](https://github.com/input-output-hk/hydra) of Hydra itself. +The developer documentation provides detailed materials that move beyond just _using_ Hydra to _understanding_ it. This section is for anyone looking to gain a comprehensive understanding of the Hydra protocol, including protocol architects interested in developing their own variants and developers contributing to the [reference implementation](https://github.com/input-output-hk/hydra) of Hydra. -Consequently, the following sections are more technical and maybe not as cohesive as the user manual, but the [architecture](./architecture) or the [specification](./specification) should be a good starting point. +The following sections are more technical and may be less cohesive than the user manual. However, the [architecture](./architecture) and the [specification](./specification) sections are good starting points. diff --git a/docs/docs/dev/rollbacks/index.md b/docs/docs/dev/rollbacks/index.md index f49a5cb1528..ce29ebbd92d 100644 --- a/docs/docs/dev/rollbacks/index.md +++ b/docs/docs/dev/rollbacks/index.md @@ -1,44 +1,46 @@ # Handling rollbacks -Rollbacks are an integral part of the behaviour of the Cardano chain: Any application built on top of Cardano and synchronizing its behaviour with the chain must be prepared to occasionally observe such _rollbacks_ and Hydra is no exception. +Rollbacks are fundamental to the operation of the Cardano chain. Any application built on Cardano, including Hydra, must anticipate occasional rollbacks, which are reversals of confirmed transactions due to chain reorganization or other consensus adjustments. -This short document explains what rollbacks are and where they come from, and how Hydra Heads handle them. +This document provides an overview of rollbacks, their origins, and details how Hydra heads manage them. -## What are rollbacks really? +## Understanding rollbacks -Rollbacks happen on the Cardano chain, or any other truly decentralised blockchain for that matter, because it is essentially _asynchronous_ in nature, eg. each node has its own view of the state of chain which it updates by communicating with other nodes, exchanging messages about known blocks, and this process takes time. New blocks are produced, which may be valid or invalid, and the state of the chain is _eventually consistent_, all nodes agreeing on the state of the chain only after some number of blocks have been processed. +Rollbacks occur on the Cardano chain, and other decentralized blockchains, due to its _asynchronous_ nature. Each node maintains its own view of the chain's state, updating it through communication with other nodes. This process involves exchanging messages about known blocks, which can lead to new blocks being produced that may be valid or invalid. As a result, the chain's state is _eventually consistent_, with all nodes agreeing on its state after processing a certain number of blocks. -Actually, _Rollbacks_ is a misnomer and we should rather talk about _forks_. Let's see what this means from the perspective of three nodes running a Hydra Head. The following picture represents each node's view of the Layer 1 chain. +In reality, 'rollbacks' are a misnomer; it's more accurate to refer to these events as 'forks'. Let's delve into what this means from the perspective of three nodes running a Hydra head. The following diagram illustrates each node's view of the layer 1 chain. ![](rollbacks-1.jpg) -The _immutable part_ is guaranteed to be identical on all nodes, being `k` blocks in the past from current _tip_ (on the mainnet `k` is 2160). Here, node 2 receives a new block that's identical node 1 but node 3 receives a different block. Eventually, as node 3's chain is shorter than the other's it will be superseded by a longer one hence _rolled back_. +The _immutable part_ is guaranteed to be identical on all nodes, extending `k` blocks in the past from the current _tip_ (on the mainnet, `k` is 2160). Here's an example scenario: node 2 receives a new block identical to node 1's view, but node 3 receives a different one. Eventually, because node 3's chain is shorter than the others, it will be superseded by a longer chain, resulting in a rollback. -What happens for the node's _Direct Chain_ observer is detailed in the following picture: +The impact on the node's _direct chain_ observer is detailed in the following diagram: ![](rollbacks-2.jpg) -When new blocks are available, the `ChainSync` client receives a `RollForward` message with each new block. When a fork happens, it will first receive a `RollBackward` message with a _point_, which identifies the slot and block hash at which point the chain has been rolled back (abstracted as a single number in the figure), then resume receiving new blocks through `RollForward` messages. +When new blocks become available, the `ChainSync` client receives a `RollForward` message for each new block. In the event of a fork, it first receives a `RollBackward` message indicating a _point_ that identifies the slot and block hash where the rollback occurred (represented as a single number in the figure). After this rollback point, the client resumes receiving new blocks through `RollForward` messages. -## How do they impact Hydra Node? +## How do rollbacks impact the Hydra node? -Rollbacks are problematic because, when a transaction is observed on-chain, it potentially changes the state of the Head, first by _Initialising_ it, then collecting the _Commits_, opening the head through the _CollectCom_ transaction and ultimately _Closing_ it and _Fanoutting_ the Head's final UTxO. +Rollbacks pose challenges because when a transaction is observed on-chain, it can alter the state of the head in several stages: _initializing_ it, collecting _commits_, opening the head via the `CollectCom` transaction, and ultimately _closing_ it and _fanning out_ the head's final UTXO. -The following picture illustrates the issue of a rollback leading to potentially conflicting `Commit` transactions: +The following diagram illustrates the issue where a rollback can lead to potentially conflicting `Commit` transactions: ![](rollbacks-3.jpg) -If the Head does not properly handle the rollback, then it risks being in an inconsistent state w.r.t other nodes taking part in the Head. It is thus important that a rollback observed at the level of the `Direct` chain component be propagated to the `HeadLogic` in order for the latter to reset its state to be consistent with whatever happened on layer 1. +If the head does not properly handle the rollback, it risks becoming inconsistent with other nodes participating in the head. Therefore, any rollback observed at the `Direct` chain component level must be promptly communicated to the `HeadLogic`. This ensures that the `HeadLogic` can reset its state to maintain consistency with the changes on layer 1. -The consequences of a rollback on the Head's state are different depending at which point the Head is rolled back: -1. If the rollback happens before or after the Head is open, eg. before the `CollectCom` transaction or after the `Close`, then things are relatively straightforward: We can just reset the Head's state to the point it was before the rolled back transaction was observed, -2. If it happens while the Head is open, eg. the `CollectCom` transaction is rolled back, it's much more problematic because the node has already started exchanging messages with its peers and its state no longer depends only on the chain. +The consequences of a rollback on the head's state vary depending on when the rollback occurs: + +1. If the rollback occurs before or after the head is opened – for example, before the `CollectCom` transaction or after the `Close` – the resolution is relatively straightforward: the head's state can be reset to the point it was at before the rolled-back transaction was observed. + +2. If the rollback occurs while the head is open – for instance, if the `CollectCom` transaction is rolled back – it poses greater challenges. At this point, the node has already begun exchanging messages with its peers, and its state no longer depends solely on the blockchain. ## How do we handle them? :::warning -πŸ›  Hydra currently handles rollback gracefully in simple cases, eg. case 1 above, and does not try to do anything clever when a `CollectCom` happens which can lead easily to a Head becoming stale: Because one node is desynchronised from other nodes (it has observed a rollback of a `Collectcom` transaction, reset its state before that, thus lost track of everything that happened while the Head was open), it will be necessary to close the head. +πŸ›  Hydra currently handles rollbacks gracefully in simpler cases, such as scenario 1 above. However, in cases where a `CollectCom` transaction rollback occurs, which can easily lead to a head becoming stale due to desynchronization among nodes (where one node resets its state before the rollback and loses track of subsequent events during the head's open phase), the head will need to be closed. ::: -Rollbacks handling has been partially deactivated in hydra by [ADR-23](https://github.com/input-output-hk/hydra/blob/master/docs/adr/2023-04-26_023-single-state.md). This section shall be updated with a more appropriate and detailed rollback handling with issue [#185](https://github.com/input-output-hk/hydra/issues/185). +Rollback handling has been partially deactivated in Hydra per [ADR-23](https://github.com/input-output-hk/hydra/blob/master/docs/adr/2023-04-26_023-single-state.md). This section will be updated with a more comprehensive and refined rollback handling approach with issue [#185](https://github.com/input-output-hk/hydra/issues/185).