-
Notifications
You must be signed in to change notification settings - Fork 9
Increasing memory usage #22
Comments
By the way, here's the current output from
|
Thanks for the detailed report! I too noticed the ever increasing memory usage which definitely looks like a memory leak. Could you try recompiling with a new version of Go? |
Thanks for the quick reply :-) This was freshly compiled using Go 1.10.3, which I believe is the latest. However, there's one important mistake I made: I had this other memory increase tested with restic, not zbackup (I bailed on zbackup early cause it seemed too slow). So it's entirely possible it's due to restic's chunker. (and it was restic that died after 6 TiB, not zbackup) Later I'll try to run it with perf and pprof and see if I can figure it out where the leak is coming from. I'm a Go newbie though so it might be hard ;-) |
OK, so I ran some initial tests with pprof. I started with a simple proc of I discovered it seems to be leaky by design: in procs/index.go:62, it's assigning the chunk-hash to an in-memory map. It later uses that map to see if the chunk was already processed. I originally imagined it wouldn't do that, as it can check that by seeing if the appropriate filename exists in the output directory. So I don't see a simple way of fixing that, short of changing how it works and possibly making it slower in the process (although the filesystem checks can perhaps be cached by the OS). I then ran it again with the original proc and some real data from My plan is to rewrite it, so that it uses an on-disk database of chunks. I'll need this also for other features, such as being able to restore only particular files rather than an entire backup, or being able to keep track of tape/disk changes (i.e. backing up a huge filesystem to many smaller BluRays, tapes, or USB HDDs, only few of which are connected at a given time). This should also help with #23, as it'll be easier to rename the output chunks then (and group them into bigger ones, to also hide the individual chunks' sizes). |
Hi @goblin - glad you're still active on this project and thanks for having investigated the leak. I must admit though, I'm not using scat at the moment and most of the internals I have forgotten about, nor would I have the incentive to look at them in details. However from what I understand, I think your idea of rewriting procs/index to an on-disk database seems sensible. Index history would have to be stored within that database instead of git (since it wouldn't be a simple text file anymore), but other than that, why not. Good luck! I'd be curious to see if this this fixes the leak. Hopefully it will 🍀 May I add, I still do believe in the idea behind the project and still need such a tool. I've since fallen back to cleartext syncing to Google Drive 😫 to at least have some kind of backup despite the privacy issues and risks of loss. It's just that some open issues were preventing me from using scat as I initially envisioned it and I didn't have the guts to address them head on. I do have brewing in mind since the past few years to either give another go at it in the current code base, or rewrite the whole thing in Ruby. Yes, single-threaded, slow Matz Ruby - so enjoyable to code in that everything feels possible: easy to experiment, tinker with, tear apart and rewrite, or even... make performant, paradoxically. Should that last point prove infeasible, there's Crystal, hehe. |
I tried Ruby a few years ago, and I'm much more fond of learning Go at the moment ;-) Especially given that you've done so much work on it in Go. |
Which issues, specifically? |
Hi, thanks for this great backup solution! :-)
I'm wondering why is it consuming so much RAM during backup. I'm using this proc:
I'm currently at about 10 TiB of data from tar, and scat is now using 57GiB of RAM. This amount is always increasing, at about 5 TiB it was around 30 GiB. I've noticed zbackup behaved similarly (but it died with a stacktrace after about 6 TiB of input).
What's this RAM needed for?
The index produced on stdout is currently 2.5 GiB in size, so even if scat is storing all the checksums of all the chunks produced so far, it's still using over 40x as much memory as it should :-S But storing all the chunk checksums in RAM shouldn't be necessary, because the filesystem could be queried to see if they exist already...
Thanks :-)
The text was updated successfully, but these errors were encountered: