-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Parallel Checking Under MergerFS #32
Comments
It would require a big'ish change in the way it processes data and data representation. Also, with mergerfs, you can't know for sure that it'd actually improve performance. I don't want to have to add mergerfs specific logic into scorch and have it ensure that it's not, for example, trying to read two files from the same branch. I didn't make it parallel for mostly that reason. If you're reading data off a single drive you expect to be IO limited. Parallel checks will actually slow things down unless you're CPU limited. Something like SnapRAID gets better than drive performance because it's literally using multiple drives. Knowingly. Then you're limited by CPU or controller bandwidth or whatnot. The intent was to use the limit (number of files or size) and random sort and just have it run every night. To explicitly check a set of files like that would require changing both the balance tool and scorch. The former to generate a file list and the latter to consume it. |
How do you recommend running it like that then? Is there a percentage you recommend? I currently run it monthly at 100%. Is there an option to run at a percentage? |
Now that I think about it time might be a good limit to add. But I just took the approximate speed of my drives and then divided by how much time I'm willing to give them. For instance: If you are willing to let it run for 2 hours and it runs at 100 MB/s then you can set the data limit to (60 * 60 * 2 * 100MB).
|
Could you add a feature where it remembers what it checked in a recent amount of time like snapraid so It doesnt check the same file 5 days in a row and then not for another 3 months? Since random is random and all. Idk how hard that would be. |
statistically over time random should be the same but I understand the concern. I'd have to think about it. There are a number of ways you could do it but they all have gotchas. |
Thats fair. Thanks for the clarifications. Ill definitely change my setup to be daily. That seems smarter. |
Another idea could be to limit the throughput of scorch and then just run it in a loop as a service. You'd have to have a way to monitor the output though. Having it run via cron is nice because the output is mailed to you. |
I actually have it output to log files. Havent figured out the mailing aspect. Currently just grepping FAILED every time i remember that i have the software. Im thinking of implimenting pushbullet. I use that to notify me when my backups fail. |
https://github.com/trapexit/scorch/releases/tag/1.0.0 Should address your saving to a log problem. Also includes last checked timestamp and sorting by that timestamp. I don't have a "don't check things checked within x days" but can add it if interested. I'm currently using something like this as part of my automatic scanning now.
Then you can use |
Sorry if this is the wrong spot for a feature request, but I use MergerFS also (awesome and flawless), and have been using scorch to check for corruption (also awesome). But due to the 4 8TB HDDs i have 85% full, it takes days for scorch to complete. I was wondering if it would be possible to somehow integrate with mergerfs and parallel check all the drives I have under my /mnt . Or if there is a way to better do that already that doesnt miss out on potential corruption when I use mergerfs.balance. Thanks for the awesome software!
The text was updated successfully, but these errors were encountered: