Skip to content
Draft
Show file tree
Hide file tree
Changes from 187 commits
Commits
Show all changes
206 commits
Select commit Hold shift + click to select a range
0a63230
Add tla spec
cjen1-msft May 9, 2025
859d30d
Update spec to refine safety property
cjen1-msft May 9, 2025
9eb305d
Add basic fizzbee spec
cjen1-msft May 22, 2025
0bf26f9
Add stateright model
cjen1-msft May 28, 2025
2b8a1d6
Update stateright dr spec
cjen1-msft May 28, 2025
bad8a13
Update Readme.md
cjen1-msft May 28, 2025
1965453
Update Readme.md
cjen1-msft May 28, 2025
5f066e5
broken version
cjen1-msft Jun 3, 2025
09388e7
refactor
cjen1-msft Jun 4, 2025
b991b9d
Restore correct liveness property.
cjen1-msft Jun 16, 2025
4f6de45
Add more checked conditions
cjen1-msft Jun 25, 2025
5a98922
Add reasonably clean curlm support
cjen1-msft Jul 3, 2025
9edf637
Add proper curl and libuv interaction
cjen1-msft Jul 4, 2025
c745ade
Pass curl singleton over enclave barrier
cjen1-msft Jul 4, 2025
71c1fb3
Ensure singleton is initialised
cjen1-msft Jul 4, 2025
695f351
Make quote endorsement client use curl_multi
cjen1-msft Jul 4, 2025
2956c38
Add curl to public ccf linked libraryes
cjen1-msft Jul 7, 2025
709228f
fix cond
cjen1-msft Jul 7, 2025
32d1361
Initialise request
cjen1-msft Jul 7, 2025
f88d7b5
Fix handler
cjen1-msft Jul 7, 2025
4458c8b
fiddle with pointers
cjen1-msft Jul 7, 2025
cdebe29
Fix timeout
cjen1-msft Jul 7, 2025
4ea2bb7
Maybe fix issue?
cjen1-msft Jul 7, 2025
6214b6c
refmt
cjen1-msft Jul 21, 2025
a4be0c3
Merge branch 'main' into curlm
cjen1-msft Jul 21, 2025
fce77da
Update
cjen1-msft Jul 21, 2025
58eb20c
fmt
cjen1-msft Jul 21, 2025
5b52e3d
remove static_cast
cjen1-msft Jul 22, 2025
b876cca
Fix url query
cjen1-msft Jul 22, 2025
68aff99
Add kickstart for curlm and document interaction between libuv and curlm
cjen1-msft Jul 22, 2025
934010f
Refactor interface to make checks more careful.
cjen1-msft Jul 22, 2025
c84ba3f
move to a constructor pattern
cjen1-msft Jul 22, 2025
594f536
Add missing nullptr check in curl_socket_callback
cjen1-msft Jul 22, 2025
333c427
Update src/http/curl.h
cjen1-msft Jul 22, 2025
12edb67
Add check and warn of duplicate headers in responses
cjen1-msft Jul 23, 2025
14d827b
Migrate fetch.h to new interface
cjen1-msft Jul 24, 2025
6c44deb
fix
cjen1-msft Jul 24, 2025
1fbd015
Pass through config bits for self-heal-open
cjen1-msft Jul 22, 2025
c790f4d
Update test infra to test self-healing-open
cjen1-msft Jul 22, 2025
b153e53
Fix undefined request body and multi-threaded access to curl
cjen1-msft Jul 28, 2025
fee3559
Runnable checkpoint
cjen1-msft Jul 28, 2025
dc6a7ee
Config changes
cjen1-msft Jul 29, 2025
2058180
Add timeouts
cjen1-msft Jul 29, 2025
f8981ae
Fix curl put with empty body issue
cjen1-msft Jul 30, 2025
963b6c1
Add test for timeouts
cjen1-msft Jul 30, 2025
4d22d82
Get open working
cjen1-msft Jul 30, 2025
c67f032
Get join working (still requires trusting of replacement nodes)
cjen1-msft Aug 1, 2025
207b142
Changes to prevent repeated joins
cjen1-msft Aug 12, 2025
1073177
curl client fixes
cjen1-msft Aug 12, 2025
9d95055
Update network to better integrate with volatile node identities
cjen1-msft Aug 12, 2025
b3a1f9b
fmt
cjen1-msft Aug 12, 2025
39da991
Changes to curl to make it close carefully
cjen1-msft Aug 15, 2025
64e3dc8
e2e sho test
cjen1-msft Aug 15, 2025
bba91ce
Fix undefined request body and multi-threaded access to curl
cjen1-msft Jul 28, 2025
d28af46
Fix curl put with empty body issue
cjen1-msft Jul 30, 2025
b4e1d16
Changes to curl to make it close carefully
cjen1-msft Aug 15, 2025
20e54fe
Merge branch 'main' into curlm
cjen1-msft Aug 15, 2025
5626b86
Stop passing the singleton over the enclave boundary
cjen1-msft Aug 15, 2025
7b72ea7
refactor and format curl response interface
cjen1-msft Aug 15, 2025
54f0823
Add and fix for e2e test
cjen1-msft Aug 15, 2025
e55f9df
Add license
cjen1-msft Aug 15, 2025
af81c10
fmt
cjen1-msft Aug 15, 2025
eb0dcd0
Fix bug in fetch code
cjen1-msft Aug 18, 2025
86f624f
Reuse response to skip a copy
cjen1-msft Aug 18, 2025
ebf73b4
tidy
cjen1-msft Aug 18, 2025
0152a4e
Tidy up
cjen1-msft Aug 18, 2025
0d85ab8
Merge branch 'curlm' into self-healing-open
cjen1-msft Aug 18, 2025
64da579
Testing changes to test testing infra
cjen1-msft Aug 19, 2025
fa52e9b
transition_to_open immediately on OPENING rather than waiting for a t…
cjen1-msft Aug 19, 2025
5205458
Update src/http/curl.h
cjen1-msft Aug 20, 2025
080ded9
Ensure opening replica sends iamopen messages
cjen1-msft Aug 20, 2025
60a66c3
Make ownership more explicit.
cjen1-msft Aug 20, 2025
d0fe1f3
Fix clang-tidy gripe
cjen1-msft Aug 20, 2025
e6bfb0b
Separate response_body from response_headers
cjen1-msft Aug 20, 2025
d716c8b
Remove easy handle before throwing an error.
cjen1-msft Aug 20, 2025
62b99ed
Merge branch 'main' into curlm
achamayou Aug 20, 2025
b6352c0
Update src/http/curl.h
cjen1-msft Aug 21, 2025
a23a882
Update src/http/curl.h
cjen1-msft Aug 21, 2025
7f55f13
Snagging
cjen1-msft Aug 21, 2025
5282f19
Snags
cjen1-msft Aug 21, 2025
03a8d5d
Rejig logic around header processing
cjen1-msft Aug 21, 2025
8cfb104
Set a 1mb default maximum size
cjen1-msft Aug 21, 2025
9b59d25
fix maximum sizing to be sane but not yet configurable for quote endo…
cjen1-msft Aug 21, 2025
ef1f464
Make quote endorsements maximum response size configurable.
cjen1-msft Aug 22, 2025
acbdcb1
fmt
cjen1-msft Aug 22, 2025
2e1089c
Rephrase
cjen1-msft Aug 22, 2025
6e05563
reboop
cjen1-msft Aug 22, 2025
df7c7d4
Merge branch 'curlm' into self-healing-open
cjen1-msft Aug 22, 2025
6fa1587
Ensure attaching request check curl_request_curlm
cjen1-msft Aug 22, 2025
63c4383
Merge branch 'main' into curlm
achamayou Aug 22, 2025
51b4f3b
Reformat
cjen1-msft Aug 26, 2025
ad47f4f
Merge branch 'main' into curlm
achamayou Aug 26, 2025
5a8de4a
Add trace logging of timeout actions
cjen1-msft Aug 26, 2025
02bb533
Add tests for slow requests and timed out requests.
cjen1-msft Aug 26, 2025
ff72e76
Make e2e_curl a long test
cjen1-msft Aug 26, 2025
1dc290c
Add logging on all curl requests
cjen1-msft Aug 26, 2025
fd39352
Add debug print for all unclosed uv handles
cjen1-msft Aug 26, 2025
699445c
fix
cjen1-msft Aug 26, 2025
6fa7169
Refactor closing logic
cjen1-msft Aug 26, 2025
068cc59
Improve lifetime handling of the requestcontext uv_handle
cjen1-msft Aug 26, 2025
9e0ae3d
Revert "Improve lifetime handling of the requestcontext uv_handle"
cjen1-msft Aug 26, 2025
6830d02
Just close the handle when closing the socket
cjen1-msft Aug 26, 2025
d18a950
Merge branch 'main' into curlm
cjen1-msft Aug 26, 2025
a4235ac
Use a queue to manage curl requests.
cjen1-msft Aug 27, 2025
aeeef5f
Fix test
cjen1-msft Aug 27, 2025
a04a755
move for attachment
cjen1-msft Aug 27, 2025
20041ea
fmt
cjen1-msft Aug 27, 2025
9a2ba9a
Revert "Fix test"
cjen1-msft Aug 27, 2025
03f0590
All instantiating new proxy_ptrs from a pointer
cjen1-msft Aug 27, 2025
a4f234c
Fix asan errors
cjen1-msft Aug 27, 2025
6a083b3
refmt
cjen1-msft Aug 27, 2025
0578b15
Merge branch 'main' into curlm
achamayou Aug 27, 2025
8b7cedd
Don't have a default...
cjen1-msft Aug 27, 2025
31e1b5f
Explicitly drain deque
cjen1-msft Aug 27, 2025
29d6a9d
fmt
cjen1-msft Aug 28, 2025
4b910b8
Fix asan failure
cjen1-msft Aug 28, 2025
27be8e9
Snags
cjen1-msft Aug 28, 2025
259522d
Bump js max_execution_time from 1s to 5s
cjen1-msft Aug 28, 2025
47aff71
Also bump limits test limit
cjen1-msft Aug 28, 2025
ed0d4f0
e2e_curl should use a random port
cjen1-msft Aug 29, 2025
5198f4d
Merge branch 'main' into curlm
cjen1-msft Aug 29, 2025
b93ab79
Use ipv4 (127.0.0.1) and a random port
cjen1-msft Aug 29, 2025
a94de31
Merge branch 'curlm' into self-healing-open
cjen1-msft Aug 29, 2025
8e9e2ff
Merge branch 'main' into self-healing-open
cjen1-msft Sep 12, 2025
2ead593
fmt
cjen1-msft Sep 12, 2025
3ecfd7e
Refactor sho out of recovery config
cjen1-msft Sep 12, 2025
8384d1a
Fixup curl calls
cjen1-msft Sep 12, 2025
5a9fa04
Just stop when recv iamopen
cjen1-msft Sep 12, 2025
efe59bb
refactor config
cjen1-msft Sep 12, 2025
64779bd
Make build
cjen1-msft Sep 12, 2025
ef70b52
refmt
cjen1-msft Sep 12, 2025
c1a7aed
Get a single test to pass! woop woop
cjen1-msft Sep 12, 2025
30344fd
And do the other tests as well...
cjen1-msft Sep 12, 2025
84960c0
snags
cjen1-msft Sep 12, 2025
b55ac60
Cleanup
cjen1-msft Sep 15, 2025
e102af9
Cleanup
cjen1-msft Sep 15, 2025
a955313
Merge pull request #2 from cjen1-msft/modelling-autoopen
cjen1-msft Sep 15, 2025
4a7b3a5
Large refactor to pull out the self_healing_open code from node_state.h
cjen1-msft Sep 17, 2025
f16c452
Inline to prevent ODR violations
cjen1-msft Sep 17, 2025
8003292
make cmake happy
cjen1-msft Sep 17, 2025
d21d71f
fmt
cjen1-msft Sep 17, 2025
1dc62d0
Fixup todo
cjen1-msft Sep 17, 2025
3e04eb4
clean imports diff
cjen1-msft Sep 17, 2025
465e1df
Merge branch 'main' into self-healing-open
cjen1-msft Sep 18, 2025
9b5d6b6
Fix clang-tidy errors
cjen1-msft Sep 18, 2025
7950eb7
error reporter imports
cjen1-msft Sep 19, 2025
45f54f0
remove extra e2e_curl
cjen1-msft Sep 19, 2025
e0fa598
Merge branch 'main' into self-healing-open
cjen1-msft Sep 19, 2025
f0175d9
Basic running test
cjen1-msft Sep 19, 2025
e9cb10d
Allow curl handles to fix themselves during shutdown.
cjen1-msft Sep 19, 2025
8c8816c
Allow nodes to restart before refreshing network state
cjen1-msft Sep 19, 2025
1c8d6cb
Log restart
cjen1-msft Sep 19, 2025
753d511
Test timeout path
cjen1-msft Sep 22, 2025
58ffb4d
Local sealing self-healing-open
cjen1-msft Sep 22, 2025
0d9283a
Merge branch 'main' into self-healing-open
cjen1-msft Sep 22, 2025
bf08f9c
fmt
cjen1-msft Sep 22, 2025
abeccda
fixup test
cjen1-msft Sep 22, 2025
53a9139
Ensure sealed secrets are passed
cjen1-msft Sep 22, 2025
6b82bf3
fixup timeout path
cjen1-msft Sep 22, 2025
db6fb56
Improve test infra
cjen1-msft Sep 22, 2025
4c527e2
fixup
cjen1-msft Sep 22, 2025
d05b745
imports
cjen1-msft Sep 22, 2025
a4b6d83
Make NodeState a shared_ptr
cjen1-msft Sep 22, 2025
300b13b
Make clang-tidy happy
cjen1-msft Sep 22, 2025
cae83b7
Pass shared_ptr
cjen1-msft Sep 22, 2025
f0b0fd5
tidying
cjen1-msft Sep 22, 2025
2cf40b2
tidying 2
cjen1-msft Sep 22, 2025
2e4ca7c
Revert shared_ptr node_state
cjen1-msft Sep 22, 2025
cbc1513
Stop skipping timers
cjen1-msft Sep 22, 2025
2430e17
Ensure we initialise self-healing-open state
cjen1-msft Sep 22, 2025
ec72561
Ensure we use the correct timeout for failovers
cjen1-msft Sep 22, 2025
d5e96e0
sigh
cjen1-msft Sep 22, 2025
29b4de4
Update cchost_config
cjen1-msft Sep 22, 2025
85538ec
Reformat frontend
cjen1-msft Sep 22, 2025
2543e1d
fmt
cjen1-msft Sep 22, 2025
fbbf5f9
Remove old tla spec
cjen1-msft Sep 22, 2025
49cce91
Update network.py to coalesce ledger secrets
cjen1-msft Sep 22, 2025
22dca96
Fix network.py
cjen1-msft Sep 22, 2025
afb4a1d
Add docs
cjen1-msft Sep 23, 2025
fd8750b
Add flag for detecting whether a timeout has occurred during self-hea…
cjen1-msft Sep 23, 2025
13e05b7
Doc update
cjen1-msft Sep 23, 2025
a9ab437
typo
cjen1-msft Sep 23, 2025
474b199
Update path names
cjen1-msft Sep 23, 2025
c06895c
Revert "Allow curl handles to fix themselves during shutdown."
cjen1-msft Sep 23, 2025
499ea78
Merge branch 'main' into self-healing-open
cjen1-msft Sep 23, 2025
7e5af0d
Update docs
cjen1-msft Sep 23, 2025
009a1c1
Make clang-tidy happy
cjen1-msft Sep 23, 2025
9e7d6e0
Update doc/host_config_schema/cchost_config.json
cjen1-msft Sep 23, 2025
f01931a
Update doc/operations/recovery.rst
cjen1-msft Sep 23, 2025
125a0fb
Update src/common/configuration.h
cjen1-msft Sep 23, 2025
be6edc9
typoing
cjen1-msft Sep 23, 2025
7ee3d5b
config snags
cjen1-msft Sep 23, 2025
ccb43b7
inline restarter
cjen1-msft Sep 23, 2025
af6757f
Refactoring
cjen1-msft Sep 23, 2025
a83a44e
Don't use network.tables anymore
cjen1-msft Sep 23, 2025
56ebb3e
Refactor and document
cjen1-msft Sep 23, 2025
cb23343
rejig
cjen1-msft Sep 23, 2025
e977b74
de-replica-ing
cjen1-msft Sep 23, 2025
c3a8a46
improved error messages
cjen1-msft Sep 23, 2025
d6827c9
Refactor node_frontend
cjen1-msft Sep 23, 2025
eacbf66
fmt
cjen1-msft Sep 23, 2025
0e25ca7
Add model checking
cjen1-msft Sep 23, 2025
e9b5c4c
Setup rustfmt
cjen1-msft Sep 24, 2025
c3f2b79
fmt
cjen1-msft Sep 24, 2025
750e259
Remove rustfmt for separate PR
cjen1-msft Sep 24, 2025
b8126ee
fmt
cjen1-msft Sep 24, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 30 additions & 10 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -380,6 +380,7 @@ endif()
set(CCF_IMPL_SOURCE
${CCF_DIR}/src/enclave/main.cpp ${CCF_DIR}/src/enclave/thread_local.cpp
${CCF_DIR}/src/node/quote.cpp ${CCF_DIR}/src/node/uvm_endorsements.cpp
${CCF_DIR}/src/node/self_healing_open_impl.cpp
)

add_ccf_static_library(
Expand Down Expand Up @@ -688,11 +689,20 @@ if(BUILD_TESTS)
add_unit_test(
frontend_test
${CMAKE_CURRENT_SOURCE_DIR}/src/node/rpc/test/frontend_test.cpp
${CCF_DIR}/src/node/quote.cpp ${CCF_DIR}/src/node/uvm_endorsements.cpp
${CCF_DIR}/src/node/quote.cpp
${CCF_DIR}/src/node/uvm_endorsements.cpp
${CMAKE_CURRENT_SOURCE_DIR}/src/node/self_healing_open_impl.cpp
)
target_link_libraries(
frontend_test PRIVATE ${CMAKE_THREAD_LIBS_INIT} http_parser ccf_js
ccf_endpoints ccfcrypto ccf_kv
frontend_test
PRIVATE ${CMAKE_THREAD_LIBS_INIT}
http_parser
ccf_js
ccf_endpoints
ccfcrypto
ccf_kv
uv
curl
)

add_unit_test(
Expand All @@ -718,11 +728,20 @@ if(BUILD_TESTS)
add_unit_test(
node_frontend_test
${CMAKE_CURRENT_SOURCE_DIR}/src/node/rpc/test/node_frontend_test.cpp
${CCF_DIR}/src/node/quote.cpp ${CCF_DIR}/src/node/uvm_endorsements.cpp
${CCF_DIR}/src/node/quote.cpp
${CCF_DIR}/src/node/uvm_endorsements.cpp
${CMAKE_CURRENT_SOURCE_DIR}/src/node/self_healing_open_impl.cpp
)
target_link_libraries(
node_frontend_test PRIVATE ${CMAKE_THREAD_LIBS_INIT} http_parser ccf_js
ccf_endpoints ccfcrypto ccf_kv
node_frontend_test
PRIVATE ${CMAKE_THREAD_LIBS_INIT}
http_parser
ccf_js
ccf_endpoints
ccfcrypto
ccf_kv
uv
curl
)

add_unit_test(
Expand Down Expand Up @@ -1185,15 +1204,16 @@ if(BUILD_TESTS)
10000
--use-jwt
)
add_test_bin(
curl_test ${CMAKE_CURRENT_SOURCE_DIR}/src/http/test/curl_test.cpp
)
target_link_libraries(curl_test PRIVATE curl uv http_parser)

if(LONG_TESTS)
add_e2e_test(
NAME e2e_curl PYTHON_SCRIPT ${CMAKE_SOURCE_DIR}/tests/e2e_curl.py
)

add_test_bin(
curl_test ${CMAKE_CURRENT_SOURCE_DIR}/src/http/test/curl_test.cpp
)
target_link_libraries(curl_test PRIVATE curl uv http_parser)
endif()
endif()

Expand Down
22 changes: 22 additions & 0 deletions doc/host_config_schema/cchost_config.json
Original file line number Diff line number Diff line change
Expand Up @@ -417,6 +417,28 @@
"previous_sealed_ledger_secret_location": {
"type": ["string"],
"description": "Path to the sealed ledger secret folder, the ledger secrets for the recovered service will be unsealed from here instead of reconstructed from recovery shares."
},
"self_healing_open": {
"type": "object",
"properties": {
"addresses": {
"type": "array",
"items": {
"type": "string"
},
"description": "List of addresses (host:port) of the cluster that should open via self-healing-open"
},
"retry_timeout": {
"type": "string",
"default": "100ms",
"description": "Interval (time string) at which the node re-sends self-healing-open messages. This should be leass than 'timeout'"
},
"timeout": {
"type": "string",
"default": "2000ms",
"description": "Interval (time string) after which the node forcibly advances to the next phase of the self-healing-open protocol"
}
}
}
},
"required": ["previous_service_identity_file"],
Expand Down
88 changes: 88 additions & 0 deletions doc/operations/recovery.rst
Original file line number Diff line number Diff line change
Expand Up @@ -145,6 +145,94 @@ Which of these two paths is taken is noted in the `public:ccf.internal.last_reco
...
$ /opt/ccf/bin/js_generic --config /path/to/config/file

Self-Healing-Open recovery
--------------------------

In environments with limited orchestration or limited operator access, it is desirable to allow a limited disaster recovery without operator intervention.
At a high level, Self-Healing-Open recovery allows recovering replicas to discover which replica has the most up-to-date ledger and automatically recover the network using that ledger.

There are two paths, a election path, and a very-high-availablity failover path.
The election path ensures that if: all nodes restart and have full network connectivity, a majority of nodes' on-disk ledger contains every committed transaction, and no timeouts trigger; then there will be only one recovered network, then all committed transaction will be persisted.
However, the election path can become stuck, in which case the failover path is designed to ensure progress.

In the election path, nodes first gossip with each other, learning of the ledgers of other nodes.
Once they have heard from every node they vote for the node with the best ledger.
If a node receives votes from a majority of nodes, it invokes `transition-to-open` and notifies the other nodes to restart and join it.
This path is illustrated below, and is guaranteed to succeed if all nodes can communicate and no timeouts trigger.

.. mermaid::

sequenceDiagram
participant N1
participant N2
participant N3

Note over N1, N3: Gossip

N1 ->> N2: Gossip(Tx=1)
N1 ->> N3: Gossip(Tx=1)
N2 ->> N3: Gossip(Tx=2)
N3 ->> N2: Gossip(Tx=3)

Note over N1, N3: Vote
N2 ->> N3: Vote
N3 ->> N3: Vote

Note over N1, N3: Open/Join
N3 ->> N1: IAmOpen
N3 ->> N2: IAmOpen

Note over N1, N2: Restart

Note over N3: Transition-to-open

Note over N3: Local unsealing

Note over N3: Open

N1 ->> N3: Join
N2 ->> N3: Join

In the failover path, each phase has a timeout to skip to the next phase if a failure has occurred.
For example, the election path requires all nodes to communicate to advance from the gossip phase to the vote phase.
However, if any node fails to recover, the election path is stuck.
In this case, after a timeout, nodes will advance to the vote phase regardless of whether they have heard from all nodes, and vote for the best ledger they have heard of at that point.

Unfortunately, this can lead to multiple forks of the service if different nodes cannot communicate with each other and timeout.
Hence, we recommend setting the timeout substantially higher than the highest expected recovery time, to minimise the chance of this happening.
To audit if timeouts were used to open the service, the `public:ccf.gov.selfhealingopen.failover_open` table tracks this.

This failover path is illustrated below.

.. mermaid::

sequenceDiagram
participant N1
participant N2
participant N3

Note over N1, N3: Gossip

N2 ->> N3: Gossip(Tx=2)
N3 ->> N2: Gossip(Tx=3)

Note over N1: Timeout
Note over N3: Timeout

Note over N1, N3: Vote

N1 ->> N1: Vote
N3 ->> N3: Vote
N2 ->> N3: Vote

Note over N1, N3: Open/Join

Note over N1: Transition-to-open
Note over N3: Transition-to-open


If the network fails during reconfiguration, each node will use its latest known configuration to recover. Since reconfiguration requires votes from a majority of nodes, the latest configuration should recover using the election path, however nodes in the previous configuration may recover using the election path.

Notes
-----

Expand Down
9 changes: 9 additions & 0 deletions include/ccf/node/startup_config.h
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,14 @@ namespace ccf
Snapshots snapshots = {};
};

struct SelfHealingOpenConfig
{
std::vector<std::string> addresses;
ccf::ds::TimeString retry_timeout = {"100ms"};
ccf::ds::TimeString timeout = {"2000ms"};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest we call this failover_timeout, if that's the term for the fallback state?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely not unqualified timeout, is this when we stop waiting for new votes? If so something like ballot_timeout or recovery_ballot_timeout would be better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@achamayou It is when we use the failover path to advance to the next phase rather than the election path.
I've tentatively renamed it to failover_retry.

bool operator==(const SelfHealingOpenConfig&) const = default;
};

struct StartupConfig : CCFConfig
{
StartupConfig() = default;
Expand Down Expand Up @@ -146,6 +154,7 @@ namespace ccf
std::nullopt;
std::optional<std::string> previous_sealed_ledger_secret_location =
std::nullopt;
std::optional<SelfHealingOpenConfig> self_healing_open = std::nullopt;
};
Recover recover = {};
};
Expand Down
76 changes: 76 additions & 0 deletions include/ccf/service/tables/self_healing_open.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
// Copyright (c) Microsoft Corporation. All rights reserved.
// Licensed under the Apache 2.0 License.
#pragma once

#include "ccf/ds/enum_formatter.h"
#include "ccf/ds/json.h"
#include "ccf/ds/quote_info.h"
#include "ccf/service/map.h"

using IntrinsicIdentifier = std::string;

struct SelfHealingOpenNodeInfo_t
{
ccf::QuoteInfo quote_info;
std::string published_network_address;
std::vector<uint8_t> cert_der;
std::string service_identity;
IntrinsicIdentifier intrinsic_id;
};

DECLARE_JSON_TYPE(SelfHealingOpenNodeInfo_t);
DECLARE_JSON_REQUIRED_FIELDS(
SelfHealingOpenNodeInfo_t,
quote_info,
published_network_address,
cert_der,
service_identity,
intrinsic_id);

enum class SelfHealingOpenSM
{
GOSSIPPING = 0,
VOTING,
OPENING, // by chosen replica
JOINING, // by all other replicas
OPEN,
};

DECLARE_JSON_ENUM(
SelfHealingOpenSM,
{{SelfHealingOpenSM::GOSSIPPING, "Gossipping"},
{SelfHealingOpenSM::VOTING, "Voting"},
{SelfHealingOpenSM::OPENING, "Opening"},
{SelfHealingOpenSM::JOINING, "Joining"},
{SelfHealingOpenSM::OPEN, "Open"}});

namespace ccf
{
using SelfHealingOpenNodeInfo =
ServiceMap<IntrinsicIdentifier, SelfHealingOpenNodeInfo_t>;
using SelfHealingOpenGossips =
ServiceMap<IntrinsicIdentifier, ccf::kv::Version>;
using SelfHealingOpenChosenReplica = ServiceValue<IntrinsicIdentifier>;
using SelfHealingOpenVotes = ServiceSet<IntrinsicIdentifier>;
using SelfHealingOpenSMState = ServiceValue<SelfHealingOpenSM>;
using SelfHealingOpenTimeoutSMState = ServiceValue<SelfHealingOpenSM>;
using SelfHealingOpenFailoverFlag = ServiceValue<bool>;

namespace Tables
{
static constexpr auto SELF_HEALING_OPEN_NODES =
"public:ccf.gov.selfhealingopen.nodes";
static constexpr auto SELF_HEALING_OPEN_GOSSIPS =
"public:ccf.gov.selfhealingopen.gossip";
static constexpr auto SELF_HEALING_OPEN_CHOSEN_REPLICA =
"public:ccf.gov.selfhealingopen.chosen_replica";
static constexpr auto SELF_HEALING_OPEN_VOTES =
"public:ccf.gov.selfhealingopen.votes";
static constexpr auto SELF_HEALING_OPEN_SM_STATE =
"public:ccf.gov.selfhealingopen.sm_state";
static constexpr auto SELF_HEALING_OPEN_TIMEOUT_SM_STATE =
"public:ccf.gov.selfhealingopen.timeout_sm_state";
static constexpr auto SELF_HEALING_OPEN_FAILOVER_FLAG =
"public:ccf.gov.selfhealingopen.failover_open";
}
}
10 changes: 8 additions & 2 deletions src/common/configuration.h
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,10 @@ namespace ccf
node_to_node_message_limit,
historical_cache_soft_limit);

DECLARE_JSON_TYPE_WITH_OPTIONAL_FIELDS(SelfHealingOpenConfig);
DECLARE_JSON_REQUIRED_FIELDS(SelfHealingOpenConfig, addresses);
DECLARE_JSON_OPTIONAL_FIELDS(SelfHealingOpenConfig, retry_timeout, timeout);

DECLARE_JSON_TYPE(StartupConfig::Start);
DECLARE_JSON_REQUIRED_FIELDS(
StartupConfig::Start, members, constitution, service_configuration);
Expand All @@ -127,9 +131,11 @@ namespace ccf

DECLARE_JSON_TYPE(StartupConfig::Recover);
DECLARE_JSON_REQUIRED_FIELDS(
StartupConfig::Recover, previous_service_identity);
DECLARE_JSON_OPTIONAL_FIELDS(
StartupConfig::Recover,
previous_service_identity,
previous_sealed_ledger_secret_location);
previous_sealed_ledger_secret_location,
self_healing_open);

DECLARE_JSON_TYPE_WITH_BASE(StartupConfig, CCFConfig);
DECLARE_JSON_REQUIRED_FIELDS(
Expand Down
2 changes: 1 addition & 1 deletion src/crypto/csr.h
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ namespace ccf::crypto
* @param signing_request CSR to extract the public key from
* @return extracted public key
*/
Pem public_key_pem_from_csr(const Pem& signing_request)
inline Pem public_key_pem_from_csr(const Pem& signing_request)
{
X509* icrt = NULL;
OpenSSL::Unique_BIO mem(signing_request);
Expand Down
6 changes: 5 additions & 1 deletion src/enclave/interface.h
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,10 @@ enum AdminMessage : ringbuffer::Message
DEFINE_RINGBUFFER_MSG_TYPE(tick),

/// Notify the host of work done since last message. Enclave -> Host
DEFINE_RINGBUFFER_MSG_TYPE(work_stats)
DEFINE_RINGBUFFER_MSG_TYPE(work_stats),

/// Notify the host that it should restart
DEFINE_RINGBUFFER_MSG_TYPE(restart)
};

DECLARE_RINGBUFFER_MESSAGE_PAYLOAD(AdminMessage::fatal_error_msg, std::string);
Expand All @@ -36,6 +39,7 @@ DECLARE_RINGBUFFER_MESSAGE_NO_PAYLOAD(AdminMessage::stop_notice);
DECLARE_RINGBUFFER_MESSAGE_NO_PAYLOAD(AdminMessage::stopped);
DECLARE_RINGBUFFER_MESSAGE_NO_PAYLOAD(AdminMessage::tick);
DECLARE_RINGBUFFER_MESSAGE_PAYLOAD(AdminMessage::work_stats, std::string);
DECLARE_RINGBUFFER_MESSAGE_NO_PAYLOAD(AdminMessage::restart);

/// Messages sent from app endpoints
enum AppMessage : ringbuffer::Message
Expand Down
5 changes: 4 additions & 1 deletion src/host/configuration.h
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,8 @@ namespace host
std::string previous_service_identity_file;
std::optional<std::string> previous_sealed_ledger_secret_location =
std::nullopt;
std::optional<ccf::SelfHealingOpenConfig> self_healing_open =
std::nullopt;
bool operator==(const Recover&) const = default;
};
Recover recover = {};
Expand Down Expand Up @@ -168,7 +170,8 @@ namespace host
CCHostConfig::Command::Recover,
initial_service_certificate_validity_days,
previous_service_identity_file,
previous_sealed_ledger_secret_location);
previous_sealed_ledger_secret_location,
self_healing_open);

DECLARE_JSON_TYPE_WITH_OPTIONAL_FIELDS(CCHostConfig::Command);
DECLARE_JSON_REQUIRED_FIELDS(CCHostConfig::Command, type);
Expand Down
7 changes: 7 additions & 0 deletions src/host/handle_ring_buffer.h
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@
#include "../ds/files.h"
#include "../enclave/interface.h"
#include "ds/internal_logger.h"
#include "ds/non_blocking.h"
#include "self_healing_open.h"
#include "timer.h"

#include <chrono>
Expand Down Expand Up @@ -53,6 +55,11 @@ namespace asynchost
uv_stop(uv_default_loop());
LOG_INFO_FMT("Host stopped successfully");
});

DISPATCHER_SET_MESSAGE_HANDLER(
bp, AdminMessage::restart, [&](const uint8_t*, size_t) {
ccf::SelfHealingOpenRBHandlerSingleton::instance()->trigger_restart();
});
}

void on_timer()
Expand Down
Loading
Loading