Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevents node from starting given persisted network state is inconsistent with configuration #1179

Merged
merged 17 commits into from
Nov 28, 2023

Conversation

abailly-iohk
Copy link
Contributor

@abailly-iohk abailly-iohk commented Nov 24, 2023

To fix #1174 we choose to die when the loaded persisted state for the network acknowledgments is not consistent with the configured parties, eg. when they don't have the same size. This is a bit lame as we don't really check much and it could very well be the case the user restarts a node with a different configuration (eg. different parties) but same number and got tripped by this issue.

deeper


  • CHANGELOG updated or not needed
  • Documentation updated or not needed
  • Haddocks updated or not needed
  • No new TODOs introduced or explained herafter

Copy link

github-actions bot commented Nov 24, 2023

Transactions Costs

Sizes and execution budgets for Hydra protocol transactions. Note that unlisted parameters are currently using arbitrary values and results are not fully deterministic and comparable to previous runs.

Metadata
Generated at 2023-11-27 14:49:20.344418406 UTC
Max. memory units 14000000
Max. CPU units 10000000000
Max. tx size (kB) 16384

Script summary

Name Hash Size (Bytes)
νInitial 4868d5365af5120ae0b3c93b819d3452a3cbdcc98595da2a7ae765b5 4069
νCommit 171a1e6bdbc8aa96d957a65b3f505517386af06ba265e3f784741f67 2050
νHead e89b0c4a6155bac2434d1e500bd49c155b2b56744ccf5a0efa72a82e 9092
μHead 6849328242b5912ad218f134378e6baff11f3e74f7e36dcf8e13d53e* 4062
  • The minting policy hash is only usable for comparison. As the script is parameterized, the actual script is unique per Head.

Cost of Init Transaction

Parties Tx size % max Mem % max CPU Min fee ₳
1 4585 10.97 4.35 0.47
2 4786 13.09 5.16 0.50
3 4988 15.29 6.00 0.54
5 5390 20.00 7.81 0.61
10 6395 31.04 12.02 0.77
41 12628 99.82 38.27 1.79

Cost of Commit Transaction

This is using ada-only outputs for better comparability.

UTxO Tx size % max Mem % max CPU Min fee ₳
1 534 12.22 4.81 0.31
2 724 15.93 6.48 0.36
3 907 19.77 8.20 0.41
5 1285 27.87 11.80 0.52
10 2222 50.58 21.69 0.81
18 3719 94.20 40.16 1.36

Cost of CollectCom Transaction

Parties UTxO (bytes) Tx size % max Mem % max CPU Min fee ₳
1 56 479 22.20 8.87 0.41
2 114 590 34.81 14.01 0.56
3 170 700 47.28 19.24 0.70
4 228 810 64.71 26.39 0.90
5 282 920 81.18 33.31 1.09

Cost of Close Transaction

Parties Tx size % max Mem % max CPU Min fee ₳
1 551 17.76 8.35 0.38
2 764 19.51 10.01 0.41
3 736 19.02 8.69 0.40
5 1199 24.19 14.43 0.50
10 2177 32.94 22.69 0.68
50 8907 96.67 83.33 1.92

Cost of Contest Transaction

Parties Tx size % max Mem % max CPU Min fee ₳
1 566 21.33 9.70 0.42
2 770 23.23 11.39 0.45
3 930 25.03 13.02 0.49
5 1301 28.92 16.42 0.56
10 2213 38.69 24.94 0.74
44 7937 98.26 78.35 1.86

Cost of Abort Transaction

Some variation because of random mixture of still initial and already committed outputs.

Parties Tx size % max Mem % max CPU Min fee ₳
1 4529 20.14 8.76 0.58
2 4712 33.62 14.82 0.74
3 4853 50.82 22.49 0.94
4 5046 69.69 30.98 1.16
5 5223 93.77 41.68 1.44

Cost of FanOut Transaction

Involves spending head output and burning head tokens. Uses ada-only UTxO for better comparability.

Parties UTxO UTxO (bytes) Tx size % max Mem % max CPU Min fee ₳
5 0 0 4417 8.98 3.78 0.44
5 1 57 4453 10.28 4.57 0.46
5 5 285 4597 15.46 7.75 0.53
5 10 570 4776 22.18 11.81 0.62
5 20 1138 5135 35.02 19.70 0.80
5 30 1709 5497 48.14 27.70 0.98
5 40 2274 5852 60.99 35.59 1.16
5 50 2846 6215 73.88 43.51 1.33
5 69 3928 6897 98.80 58.72 1.68

End-To-End Benchmark Results

This page is intended to collect the latest end-to-end benchmarks results produced by Hydra's Continuous Integration system from the latest master code.

Please take those results with a grain of salt as they are currently produced from very limited cloud VMs and not controlled hardware. Instead of focusing on the absolute results, the emphasis should be on relative results, eg. how the timings for a scenario evolve as the code changes.

Generated at 2023-11-27 14:44:10.912697101 UTC

3-nodes Scenario

A rather typical setup, with 3 nodes forming a Hydra head.

Number of nodes 3
Number of txs 900
Avg. Confirmation Time (ms) 23.467601128
P99 114.94708048ms
P95 33.242046550000005ms
P50 20.895232ms
Number of Invalid txs 0

Baseline Scenario

This scenario represents a minimal case and as such is a good baseline against which to assess the overhead introduced by more complex setups. There is a single hydra-node d with a single client submitting single input and single output transactions with a constant UTxO set of 1.

Number of nodes 1
Number of txs 300
Avg. Confirmation Time (ms) 4.563733836
P99 7.004485219999999ms
P95 5.592480100000002ms
P50 4.428237ms
Number of Invalid txs 0

@ffakenz ffakenz force-pushed the fix-network-reliability-missconfigured-peer branch from bc99ad9 to 9e4fd02 Compare November 24, 2023 11:53
Copy link

github-actions bot commented Nov 24, 2023

Test Results

372 tests   367 ✔️  26m 17s ⏱️
126 suites      5 💤
    5 files        0

Results for commit 3989d2b.

♻️ This comment has been updated with latest results.

@ffakenz ffakenz force-pushed the fix-network-reliability-missconfigured-peer branch 2 times, most recently from 1644732 to 9aae4a5 Compare November 27, 2023 09:11
@ffakenz ffakenz marked this pull request as ready for review November 27, 2023 09:37
Copy link
Collaborator

@ch1bo ch1bo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surprisingly large change. I think we can (and should) reduce the number of mentions of acks and other internal details of the network layer.

One thing I am missing is any concrete instructions to the user. How would they know that they need to delete the <persistence-dir>/acks file? To solve this, I would still die with an exception, but implement https://hackage.haskell.org/package/base-4.19.0.0/docs/Control-Exception.html#v:displayException of the thrown exception accordingly to give a human-friendly english instruction to delete acksFile

hydra-node/src/Hydra/Node/Network.hs Outdated Show resolved Hide resolved
hydra-node/src/Hydra/Logging.hs Show resolved Hide resolved
hydra-node/test/Hydra/NetworkSpec.hs Outdated Show resolved Hide resolved
hydra-node/test/Hydra/NetworkSpec.hs Outdated Show resolved Hide resolved
hydra-node/json-schemas/logs.yaml Outdated Show resolved Hide resolved
@abailly-iohk
Copy link
Contributor Author

You're right there's some instructions missing and we should provide better error messages to the user should we crash with an exception. However:

  1. this is not specific to this PR, there are other places where we throw exceptions without caring about the user-facing message
  2. I would not use displayException for this but would rather reuse our explain function: It avoids "scattering" error messages in various places where the exceptions are defined, and is much more amenable to localisation and customisation, eg. should we want to display this error using JSON, or even i18n. I think it's the responsibility of the user interface to provide sensible feedback to the user.

Doing so here would increase even more the surface of this PR so I would like to propose we address this particular issue in a later PR.

ffakenz and others added 14 commits November 27, 2023 15:21
This is needed in order to see in our test the node has died with a
misconfiguration, which is not the problem we are trying to solve but
which is an important information to have.
Also renaming the spec to better match the scenarion.
This is not strictly needed but allows for fine tuning the time we
wait for connection depending on the exact context of the check, and
not blindly on the number of nodes.
… specific function


This helps in finding relevant tests and eases future renaming of this function.

Co-authored-by: Sebastian Nagel <[email protected]>
@ffakenz ffakenz force-pushed the fix-network-reliability-missconfigured-peer branch from 42d4d4c to d7095be Compare November 27, 2023 14:21
@ch1bo
Copy link
Collaborator

ch1bo commented Nov 28, 2023

You're right there's some instructions missing and we should provide better error messages to the user should we crash with an exception. However:

1. this is not specific to this PR, there are [other places](https://github.com/input-output-hk/hydra/blob/00e9743f3e05243fc57991dc42540c986b0620bb/hydra-node/exe/hydra-node/Main.hs#L87) where we throw  exceptions without caring about the user-facing message

2. I would not use `displayException` for this but would rather reuse our [`explain`](https://github.com/input-output-hk/hydra/blob/00e9743f3e05243fc57991dc42540c986b0620bb/hydra-node/src/Hydra/Options.hs#L639) function: It avoids "scattering" error messages in various places where the exceptions are defined, and is much more amenable to localisation and customisation, eg. should we want to display this error using JSON, or even i18n. I think it's the responsibility of the user interface to provide sensible feedback to the user.

Doing so here would increase even more the surface of this PR so I would like to propose we address this particular issue in a later PR.

I'm okay in not doing that in this PR.. it's just something which I thought would be nice small step in less confusing error messages.

I see your point with different encodings and internationalization (although YAGNI?), but would want to avoid arbitrary centralization of otherwise unreleated exceptions. The case we have here is a good example, I would want to retain information that some file at path "/acks" should be deleted to the component which decides that we do persist into a file at that path. Related to this is the IMO unnecessary addition of SavedNetworkPartiesInconsistent to ParameterMismatch, which has a completely different origin.

That being said, I can approve and might just PR what I think should be myself :)

@abailly-iohk
Copy link
Contributor Author

abailly-iohk commented Nov 28, 2023

I see your point with different encodings and internationalization (although YAGNI?), but would want to avoid arbitrary centralization of otherwise unreleated exceptions.

Agreed, this is YAGNI and I just raised this point to highlight the current need to avoid putting arbitrary strings into places where they do not belong: Defining and using the (structured) error is one thing, presenting this issue in an understandable way is a different concern.

The case we have here is a good example, I would want to retain information that some file at path "/acks" should be deleted to the component which decides that we do persist into a file at that path.

I agree, this is currently missing from the exception and should be added.

Related to this is the IMO unnecessary addition of SavedNetworkPartiesInconsistent to ParameterMismatch, which has a completely different origin.

That's where our views differ: This issue comes from a discrepancy between the persisted state and the configuration passed to the node which percolates down to the network layer, hence I think it's legitimate to make it part of those exceptions related to configuration "errors". We could refine the hierarchy more perhaps, but at this stage it's definitely YAGNI :)

To be clear, what I have in mind is something like that in the Main module:

main :: IO ()
main = do
   parseOptions >>= try . configureNode >>= \case
      Left (e :: PreFlightCheckError) -> die (explain e)
      Right node -> runNode node

@abailly-iohk abailly-iohk merged commit 60aac8c into master Nov 28, 2023
20 checks passed
@abailly-iohk abailly-iohk deleted the fix-network-reliability-missconfigured-peer branch November 28, 2023 07:09
@abailly-iohk abailly-iohk mentioned this pull request Nov 29, 2023
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Restarting a node with different number of peers prevents it to connect to the cluster
3 participants