Allow darknode migration across cloud provider regions #75

bakebrain · 2020-08-03T09:03:05Z

Summary

Allow a migration of darknodes across cloud provider regions - this is related to #22 - but staying with the currently selected provider.

Basic example (aws)

darknode migrate YOUR-DARKNODE-NAME --source-aws-region sa-east-1 --target-aws-region eu-west-1

Motivation

allow an "internal" migration and avoid the lengthy deregister/destroy process - saving time, money, fees...
react to issues within the current region, e.g. longer downtimes, instance type unavailability changes etc.
potentially save cost since pricing across "random" regions differ

The text was updated successfully, but these errors were encountered:

loongy · 2020-08-20T01:25:35Z

While this would be a great feature to have, it is very difficult to achieve without risk for the node operation.

Option 1:

a. setup new node, but keep it off (easy)
b. keep new node up-to-date with all of the internal state of the old node (very hard)
c. turn off old node and then turn on new node without losing track of messages in-flight (potentially impossible)

Without achieving these four steps reliably, the new node would be at risk of broadcasting information that was at odds with the old node. In the worst case, for example, it might see itself proposing two different blocks in the same height/round, and this would result in slashing.

Option 2:

a. setup new node, but keep it off (easy)
b. turn off old node (easy)
c. download state and upload to new node (easy, but slow)
d. turn on new node (easy)

This solves the problem in (1), by effectively cloning the state from the old node. This has to be done while the old node is off (and before the new node is on), otherwise the cloned state can become stale and we are have the same problem as (1) again. But, step (2.b) is slow and could result in extended down-time for the node. Especially if there is an intermittent connectivity issue between the node operator's workspace and the node. Storage space can be several GBs, so downloading and then uploading the entire backup will take time. Then, the new node will be (at best) a few minutes behind the rest of the network and need to re-synchronise with it.

It is more than possible that this takes too long, and the node begins begin slashed (and is forcibly deregistered) for not being online.

loongy · 2020-08-20T01:26:54Z

Decided not to label as wont fix, and instead label as help wanted. Open to suggestions here, but unless something compelling is suggested that minimises risk to the node operator, I am not sure migrations like this will actually be possible.

loongy added feature New feature or request wontfix This will not be worked on help wanted Extra attention is needed and removed wontfix This will not be worked on labels Aug 20, 2020

loongy mentioned this issue Aug 24, 2020

Migration function between cloud providers. #22

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow darknode migration across cloud provider regions #75

Allow darknode migration across cloud provider regions #75

bakebrain commented Aug 3, 2020

loongy commented Aug 20, 2020

loongy commented Aug 20, 2020

Allow darknode migration across cloud provider regions #75

Allow darknode migration across cloud provider regions #75

Comments

bakebrain commented Aug 3, 2020

Summary

Basic example (aws)

Motivation

loongy commented Aug 20, 2020

loongy commented Aug 20, 2020