-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lotus daemon crashes on start in Badger: file size: 2207107581 greater than 4294967295 #1831
Comments
In general Badger splits the logs in 4GB files, it is very strange (other than data corruption) to see this assert fail, maybe we should report it to them. |
Looks like it's happening right when the badger datastore is being opened. Perhaps it's corrupted? I'll investigate a bit more tomorrow. |
Looking at the largest files in my datastore, I have...
I've lost all my "deals" if I don't fix my node, so I'm going to try to see if I can dig into this and possibly fix it. |
Yes, that |
Might be related? ipfs/go-ds-badger#54 (comment) |
@schomatis Can't just comment out the assert ... lots of code assumes the offsets are uint32's ... I'm hoping I can somehow split things and process it in batches. |
I spent a few hours looking through the badger code for how it handles the vlog files... I could possibly save the data, but it would be a lot of effort. |
I'm very confused about how this happened ... I looked at some other nodes I have set up, and they don't appear to have runaway .vlog files. I'll set up a new node, and watch it to see if it's happens again. |
@jimpick At this point your best bet is to report this to Badger so they can best advice how to proceed, it is indeed a strange case, if the data itself is not corrupted I'm confident you'll be able to retrieve most of it. |
I think this is related to "too many open files"... This probably unrelated, but I just saw this in my logs:
|
Now, when I restart my node, it looks like it's trying to do compaction on badger, but then it fails due to "too many open files". |
I'm not sure compaction was related to log file (it's more about converting the SST into logs I think, but I don't have the Badger model clearly on my head now), anyway, try increasing the |
I'll close this now ... I increased the ulimit, and I'll re-open if it happens again. Definitely a weird failure mode, but I'm not sure how to reproduce. |
i have the same issue.some one like has deal it. |
I think I'm having similar issues. I've started the daemon a couple of times - first time over SSH, but since I had to shut down my laptop I quit the daemon over SSH, and restarted it on the machine itself so it could keep running. Then is started to get stuck at certain blocks - after a few restarts it started syncing from a few blocks earlier, and now I keep getting these errors, and I can't even connect to the daemon anymore to view the sync height;
The datastore file number seems to keep increasing but I can't ever connect anymore;
|
Like @roiger said - the .vlog file was ~ 1GB in size. Renamed it to |
After a while I started getting the "too many open files" error again. Increased it from 1024 to 4096 with |
Describe the bug
Lotus daemon dies on restart.
Version (run
lotus --version
):lotus version 0.3.0'+git92d58ab1.dirty'
Additional context
Ubuntu Linux. I've been running this node since near the start of testnet.
The text was updated successfully, but these errors were encountered: