Support Parallel Checking Under MergerFS #32

LrrrAc · 2020-04-14T20:21:11Z

Sorry if this is the wrong spot for a feature request, but I use MergerFS also (awesome and flawless), and have been using scorch to check for corruption (also awesome). But due to the 4 8TB HDDs i have 85% full, it takes days for scorch to complete. I was wondering if it would be possible to somehow integrate with mergerfs and parallel check all the drives I have under my /mnt . Or if there is a way to better do that already that doesnt miss out on potential corruption when I use mergerfs.balance. Thanks for the awesome software!

trapexit · 2020-04-14T20:33:37Z

It would require a big'ish change in the way it processes data and data representation.

Also, with mergerfs, you can't know for sure that it'd actually improve performance. I don't want to have to add mergerfs specific logic into scorch and have it ensure that it's not, for example, trying to read two files from the same branch.

I didn't make it parallel for mostly that reason. If you're reading data off a single drive you expect to be IO limited. Parallel checks will actually slow things down unless you're CPU limited. Something like SnapRAID gets better than drive performance because it's literally using multiple drives. Knowingly. Then you're limited by CPU or controller bandwidth or whatnot.

The intent was to use the limit (number of files or size) and random sort and just have it run every night. To explicitly check a set of files like that would require changing both the balance tool and scorch. The former to generate a file list and the latter to consume it.

LrrrAc · 2020-04-14T20:37:30Z

How do you recommend running it like that then? Is there a percentage you recommend? I currently run it monthly at 100%. Is there an option to run at a percentage?

trapexit · 2020-04-14T20:44:00Z

Now that I think about it time might be a good limit to add. But I just took the approximate speed of my drives and then divided by how much time I'm willing to give them.

For instance: If you are willing to let it run for 2 hours and it runs at 100 MB/s then you can set the data limit to (60 * 60 * 2 * 100MB).

scorch -s random -M 700G check ...

LrrrAc · 2020-04-14T20:47:26Z

Could you add a feature where it remembers what it checked in a recent amount of time like snapraid so It doesnt check the same file 5 days in a row and then not for another 3 months? Since random is random and all. Idk how hard that would be.

trapexit · 2020-04-14T21:01:46Z

statistically over time random should be the same but I understand the concern.

I'd have to think about it. There are a number of ways you could do it but they all have gotchas.

LrrrAc · 2020-04-14T21:02:59Z

Thats fair. Thanks for the clarifications. Ill definitely change my setup to be daily. That seems smarter.

trapexit · 2020-04-14T21:06:41Z

Another idea could be to limit the throughput of scorch and then just run it in a loop as a service. You'd have to have a way to monitor the output though. Having it run via cron is nice because the output is mailed to you.

LrrrAc · 2020-04-14T21:08:29Z

I actually have it output to log files. Havent figured out the mailing aspect. Currently just grepping FAILED every time i remember that i have the software. Im thinking of implimenting pushbullet. I use that to notify me when my backups fail.

trapexit · 2020-05-18T14:55:12Z

https://github.com/trapexit/scorch/releases/tag/1.0.0

Should address your saving to a log problem. Also includes last checked timestamp and sorting by that timestamp. I don't have a "don't check things checked within x days" but can add it if interested. I'm currently using something like this as part of my automatic scanning now.

scorch -q -T 1h -s checked check /

Then you can use list-changed and list-failed to see what it found.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Parallel Checking Under MergerFS #32

Support Parallel Checking Under MergerFS #32

LrrrAc commented Apr 14, 2020

trapexit commented Apr 14, 2020

LrrrAc commented Apr 14, 2020 •

edited

Loading

trapexit commented Apr 14, 2020

LrrrAc commented Apr 14, 2020

trapexit commented Apr 14, 2020

LrrrAc commented Apr 14, 2020

trapexit commented Apr 14, 2020

LrrrAc commented Apr 14, 2020

trapexit commented May 18, 2020 •

edited

Loading

Support Parallel Checking Under MergerFS #32

Support Parallel Checking Under MergerFS #32

Comments

LrrrAc commented Apr 14, 2020

trapexit commented Apr 14, 2020

LrrrAc commented Apr 14, 2020 • edited Loading

trapexit commented Apr 14, 2020

LrrrAc commented Apr 14, 2020

trapexit commented Apr 14, 2020

LrrrAc commented Apr 14, 2020

trapexit commented Apr 14, 2020

LrrrAc commented Apr 14, 2020

trapexit commented May 18, 2020 • edited Loading

LrrrAc commented Apr 14, 2020 •

edited

Loading

trapexit commented May 18, 2020 •

edited

Loading