Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Patch state non-deterministically failing on underpowered machines #8328

Open
ChaoticTempest opened this issue Jan 11, 2023 · 1 comment
Open

Comments

@ChaoticTempest
Copy link
Member

Describe the bug
Patch state for sandbox is failing when being ran concurrently with a large number of sandbox nodes.
The following error message is presented when the test fails. The test was patching the registrar account from testnet into the sandbox node.

Error: Failed to query access key: handler error: [Access key for public key ed25519:5WMgq6gKZbAr7xBZmXJHjnj4C3UZkNJ4F5odisUBFcRh has never been observed on the node]

This seems to happen on machines that are under powered such as a CI pipeline. Running the same tests locally (on a macbook M1) works fine.

This issue seems related to sharding, as the code for patching state seems to have been moved with it when it was added. Also with the relevant code bit here being a candidate suspect:

// XXX: This is a bit questionable -- sandbox state patching works
// only for a single shard. This so far has been enough.
let state_patch = state_patch.take();

To Reproduce
This is a bit hard to reproduce as we need a somewhat underpowered machine running a lot of tests at once. This was first noticed while being ran with PR tests on aurora-is-near/aurora-eth-connector. They have at least 36 tests with each of the tests spinning up their own separate sandbox node, calling into patch state at least once. Most of the tests do pass, but like 1 or 2 sometimes fail. It feels like there's data contention somewhere related to patching state.

Expected behavior
All tests pass as normal.

Version (please complete the following information):

  • nearcore: master as of Jan 4
  • sandbox

Additional context
First reported in near/near-workspaces-rs#253

@frol
Copy link
Collaborator

frol commented Jun 29, 2023

@ChaoticTempest It seems that fast-forwarding can get stuck: near/near-workspaces-rs#266 (comment), do you think it can be related?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants