-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: ".app"
Folder lost and chain reinitialized when node runs out of space (Cosmovisor)
#21617
Comments
That's a weird one, never saw this before. Cosmovisor doesn't delete anything ever in the node directory. Did this happen more than 1 time? Directly after an upgrade? If so what are the exact reproducing steps? Could you show your upgrade handlers logic? |
The validator was jailed for inactivity (because the folder was broken) 3 days after this upgrade:
It has happened again today to one tester from the public. (If not, I would have 100% assumed user error which it still could be) Just reached out to them to ask if they could add their findings here too. 🙏 Will get help for the upgrade handler's logic and edit this comment with that in a bit. |
Okay, your upgrade logic looks sane (I thought that maybe you were deleting the node home there, lol). |
Update: I was able to reproduce the error (super scientific method of starting a genesis sync on the same machine). Logs are attached to this comment, but at the moment the folder changed...
I haven't heard back from the other person that this happened to to know if it was just this crazy coincidence that we both ran out of storage at the same time and lost data. (They were running multiple chains on the same server) It is not ideal that the .layer (.app) directory is lost when this happens. I will keep this aws machine as is as long as the issue is open in case anyone wants more information. |
".app"
Folder".app"
Folder lost and chain reinitialized when node runs out of space (Cosmovisor)
Interesting, I wonder how this happens. |
Hey @0xSpuddy, we found that the issue isn’t with Closing this issue for now. If you find no issues with your code, feel free to reopen it. |
Is there an existing issue for this?
What happened?
This is my first issue, hello! Let me know where I can provide more information (if desired) and I'll do my best.
What happened:
While running the latest tellor testnet, a validator node that was running via systemd / cosmovisor was found to be jailed. When the operator logged in, they found that the chain's ".app folder" (~/.layer in our case) was in an odd newly initialized state. The
.app/config
directory had no genesis file, and the .app/data directory no state data. Thesnapshots
,keyring-test
, andcosmovisor
directories were gone.I have been investigating all the logs I can find from the machine, but it seems that it was running normally prior whatever event changed the .layer folder. I'll put the setup details in the "How to Reproduce?".
cosmovisor config variables:
Cosmos SDK Version
v0.50.9
How to reproduce?
The setup:
./layerd start --api.enable --api.swagger --price-daemon-enabled=false --panic-on-daemon-failure-enabled=false --home /home/user/.layer --key-name $ACCOUNT_NAME
The text was updated successfully, but these errors were encountered: