change default settings for the database, to allow for effective gc #94

RubenKelevra · 2020-05-29T22:07:02Z

fixes #54

settings are by courtesy of @jsign, see his post here

@jsign

…age collection fixes ipfs#54 settings are by courtesy of @jsign, see [his post here](dgraph-io/badger#1297 (comment))

welcome · 2020-05-29T22:07:05Z

Thank you for submitting this PR!
A maintainer will be here shortly to review it.
We are super grateful, but we are also overloaded! Help us by making sure that:

The context for this PR is clear, with relevant discussion, decisions
and stakeholders linked/mentioned.
Your contribution itself is clear (code comments, self-review for the
rest) and in its best form. Follow the code contribution
guidelines
if they apply.

Getting other community members to do a review would be great help too on complex PRs (you can ask in the chats/forums). If you are unsure about something, just leave us a comment.
Next steps:

A maintainer will triage and assign priority to this PR, commenting on
any missing things and potentially assigning a reviewer for high
priority items.
The PR gets reviews, discussed and approvals as needed.
The PR is merged by maintainers when it has been approved and comments addressed.

We currently aim to provide initial feedback/triaging within two business days. Please keep an eye on any labelling actions, as these will indicate priorities and status of your contribution.
We are very grateful for your contribution!

willscott · 2020-05-29T22:19:22Z

datastore.go

-	// read-only and efficiently queried. We don't do that and hanging on
-	// stop isn't nice.
-	DefaultOptions.Options.CompactL0OnClose = false
+	// This is to optimize the database on closure


how long does this cause things to hang on close?

I haven't had any issues with hanging closures in my testing. Worst case was around 2 seconds to shut down on a 250 GB database.

We made this change intentionally. Otherwise, ipfs commands without a daemon running pause for a couple of seconds every time.

While this change would ensure that the database is as small as it can be, compacting on close doesn't target the real issue we're having: multiple gigabytes of data are left behind after garbage collecting.

Stebalien

Thank you for trying to fix this, but there is a lot of nuance here. Please be careful to make sure you understand a solution before copy/pasting code.

Stebalien · 2020-05-31T04:57:47Z

datastore.go

-	// read-only and efficiently queried. We don't do that and hanging on
-	// stop isn't nice.
-	DefaultOptions.Options.CompactL0OnClose = false
+	// This is to optimize the database on closure


We made this change intentionally. Otherwise, ipfs commands without a daemon running pause for a couple of seconds every time.

While this change would ensure that the database is as small as it can be, compacting on close doesn't target the real issue we're having: multiple gigabytes of data are left behind after garbage collecting.

Stebalien · 2020-05-31T05:06:57Z

datastore.go

+	DefaultOptions.Options.NumLevelZeroTablesStall = 2
+
+	// Reduce the max vlog size usage after compaction (
+	DefaultOptions.Options.ValueLogFileSize = 10485760


This sets the maximum value log file size to 10 MiB. That will definitely help fix the issue, however:

It will set a maximum value size of 10MiB (probably fine but I'm not sure).

I believe badger keeps value logs open. That will mean a 100x increase in the number of file descriptors use by badger which won't work for us (20GiB -> 2048 file descriptors versus 20 now).

As far as I understand this option, badger won't try to compact value logs when they are below ValueLogFileSize. So it the GC will only reduce the ValueLogFileSize when it's above this value.

Maybe @jarifibrahim can help us out a little and clear this up?

Hey @RubenKelevra let me try to explain.
The value log file is the write ahead log file for badger. All the new writes will be added to this file. A smaller vlog file size is recommended only if you'd like value log GC to clean up disk space easily.
Note - Compaction is for SST files. GC is for value log file.

As a result of this change, you will have too many value log files which means too many open file descriptors. I do not suggest settings value log file size to such a small value. If your value log file is 10mb, badger will not be able to store values more than 9 mb (there are headers and checksums as well).

If 1 gb seems like too big for your usecase, I suggest using 500 mb and evaluate if the new change is an improvement or not. Dropping file size to 10 mb could lead to strange issues (I've never seen anyone use a 10 mb file size. That's too low)
I haven't gone through the entire change in this PR and the issue but are you guys having issues with the GC? If this is something new can you please create a ticket on badger so that I can figure out the right fix for your issue?

Please feel free to ping me @jarifibrahim if something doesn't make sense. The last thing we would want is setting incorrect options for badger :)

Stebalien · 2020-05-31T05:12:38Z

datastore.go

@@ -79,15 +79,25 @@ var DefaultOptions Options

 func init() {
 	DefaultOptions = Options{
-		GcDiscardRatio: 0.2,
+		GcDiscardRatio: 0.01,


0.2 is likely good enough (although 0.1 may be better). We're not going for "delete everything", we're going for "recover space". 0.2 is 20% and leaving <=20% garbage on disk is fine in almost all cases (although it may make sense to expose this).

Dropping this down to 0.01 would force us to do a lot of extra work every time we garbage collect, just to save 1% of space.

Stebalien · 2020-05-31T05:15:40Z

datastore.go

+	DefaultOptions.Options.CompactL0OnClose = true
+
+	// Remove elements which the has been deleted from the database
+	DefaultOptions.Options.NumVersionsToKeep = 0


@jsign this seems wrong. Shouldn't the default of 1 keep one version? From my reading of the code, 0 will just be buggy.

Well, if we delete a CID, it will remain in the database, since a delete will update the entry that it was deleted - not actually delete the entry.

You need to set it to 0 to let the badger gc clear up deleted entries.

That's what @jarifibrahim wrote here:

Also, please note that the default value of NumVersionsToKeep is 1 which means your deleted entries are also stored in the DB. You might want to set it to 0 if you don't want any deleted/expired keys to be stored in the DB.

@RubenKelevra @Stebalien please do not set it to 0. I made a mistake. I'll update my comment on the original issue. Apologies for the confusion.

Stebalien · 2020-05-31T05:19:20Z

datastore.go

+	DefaultOptions.Options.NumLevelZeroTables = 1
+
+	// Reduce the number of zero tables which are stalled
+	DefaultOptions.Options.NumLevelZeroTablesStall = 2


This will hurt write throughput. All this does is block writes until compaction happens. Given that compaction will happen eventually anyways, this setting won't help us.

Stebalien · 2020-05-31T05:22:04Z

datastore.go

+	DefaultOptions.Options.NumVersionsToKeep = 0
+
+	// Reduce the number of zero tables (which are hold in memory)
+	DefaultOptions.Options.NumLevelZeroTables = 1


This will help compact tables faster, but is likely too aggressive and probably not a huge issue. Tables are 16MiB each so the default of 5 will cause compaction to kick in when we hit 80MiB.

The idea was to reduce memory consumption and do the compaction more often, than in larger intervals.

Sure it will hurt the performance. But I didn't want to change the settings, just do a PR for @jsign since he seems to be busy.

@RubenKelevra, @Stebalien, please read what I originally wrote with precision here, in particular:

Right tunning of these params might require some extra work to discover the perf cost and right tradeoff.

So, my comment was never meant to be directly applied by any means nor asked a PR to be created. What I commented was validated by this and a small program that shows how those knobs affect sizes. (I see now that Ibrhaim seems gave some wrong advice)

So, my goal was to help shed more light on how someone using this repo could have control on this long-problem.

But I didn't want to change the settings, just do a PR for @jsign since he seems to be busy.

@RubenKelevra, I've never asked you to do any PR for me. The ones that I know are ready to do, be sure I can do it myself (maybe that's why I might be busy ;))
What I did was invest some time in giving more light about a long-haul problem this repo had, just to help other people using this repo who to possibly deal with it. Jumping to say that was a definitive solution is wrong as I quoted myself above.

RubenKelevra · 2020-05-31T12:49:01Z

@jarifibrahim it would be nice if you could take a look at our usage scenario and do some recommendations. Thanks! :)

jarifibrahim · 2020-05-31T12:49:15Z

@Stebalien is there an issue about improving badger options for effective gc? I can send a PR to set the correct options.

jarifibrahim · 2020-05-31T12:49:54Z

@RubenKelevra please point me to the issue which has all the details. I would love to help you guys.

RubenKelevra · 2020-05-31T12:50:15Z

@Stebalien is there an issue about improving badger options for effective gc? I can send a PR to set the correct options.

I think #54 is all we got :)

RubenKelevra · 2020-05-31T13:39:17Z

@jarifibrahim

@RubenKelevra please point me to the issue which has all the details. I would love to help you guys.

I'm not sure you're familiar with the IPFS application, so in short:

We read and write many small chunks of data for the DHT to the database.
We write (by default) up to 256 KByte chunks (maximum 1 MB chunks) to the database associated with a Content ID for storing the actual data
We need a somewhat guaranteed very low response time for reads.
High performance for large write operations would be nice, while no impact for low response time for other operations
Most data stays for a long time
When we delete data, we do this in one large batch (when we run our GC), which is initiated either manually or automatic at a specified database size.
We know how large the database will be allowed to grow - so we could tweak some settings between the default (9 GB) and server usage (multiple TB)
We have some issues with high memory consumption of badger when it runs it's GC with the current settings, sometimes exceeding the memory capacity of the system

We're currently exploring the possibility of an upgrade to version 2, maybe you could give some hints why this might be beneficial for our usecase as well

jarifibrahim · 2020-06-02T17:54:08Z

Hey @RubenKelevra,

We read and write many small chunks of data for the DHT to the database.
We write (by default) up to 256 KByte chunks (maximum 1 MB chunks) to the database associated with a Content ID for storing the actual data
We need a somewhat guaranteed very low response time for reads.

Have you tried running the badger.LSMOnyOptions?. This would improve your read speed. The LSMOnly options would also help with the badger value log GC. The GC won't have to read the value log files for the value and it can also easily discard entries since everything is stored in the LSM Tree.
https://github.com/dgraph-io/badger/blob/fd8989493b52f39957e89ff3a7679bc45ea92674/options.go#L205

We have some issues with high memory consumption of badger when it runs it's GC with the current settings, sometimes exceeding the memory capacity of the system

Do you have a memory profile? I can optimize it if you have the memory profile :)

We're currently exploring the possibility of an upgrade to version 2, maybe you could give some hints why this might be beneficial for our usecase as well

I would suggest you migrate to badger v2. One major benefit is that v2 has a cache via which you can limit the memory used by badger (but affects the read speed). The table index and bloom filter (one for each SST) are kept in the cache. We also have compression and encryption in badger v2 but I won't suggest enabling them unless you really need them. Compression/Encryption affects read latency.

We recently fixed two bugs related to SST compaction and value log GC. The clean up should be much more effective once these fixes are released (the fixes are in master).

Stebalien · 2020-06-02T18:22:00Z

Do you have a memory profile? I can optimize it if you have the memory profile :)

It's just dgraph-io/badger#1292, as far as I know.

jarifibrahim · 2020-06-02T19:14:15Z

Do you have a memory profile? I can optimize it if you have the memory profile :)

It's just dgraph-io/badger#1292, as far as I know.

Cool. I've bumped up the priority. I'll try to fix it soon. It should be an easy fix.

RubenKelevra · 2020-06-02T19:45:50Z

We read and write many small chunks of data for the DHT to the database.
We write (by default) up to 256 KByte chunks (maximum 1 MB chunks) to the database associated with a Content ID for storing the actual data
We need a somewhat guaranteed very low response time for reads.

Have you tried running the badger.LSMOnyOptions?. This would improve your read speed. The LSMOnly options would also help with the badger value log GC. The GC won't have to read the value log files for the value and it can also easily discard entries since everything is stored in the LSM Tree.
https://github.com/dgraph-io/badger/blob/fd8989493b52f39957e89ff3a7679bc45ea92674/options.go#L205

This sounds promising, can you create a PR for this here, (and maybe some other beneficial tweaks for our usecase)?

I would suggest you migrate to badger v2. One major benefit is that v2 has a cache via which you can limit the memory used by badger (but affects the read speed). The table index and bloom filter (one for each SST) are kept in the cache. We also have compression and encryption in badger v2 but I won't suggest enabling them unless you really need them. Compression/Encryption affects read latency.

I think compression isn't a thing which would be beneficial, since as far as I understand the blocksize for this would be limited to 4K.

Thanks for the details on that :)

jarifibrahim · 2020-06-04T14:09:04Z

This sounds promising, can you create a PR for this here, (and maybe some other beneficial tweaks for our usecase)?

Sure. I can do that. But since all datasets are different, the only way to verify if the new options are better would be to benchmark it. From badger's point of view, it might be an improvement but it might not necessarily translate as an improvement for this project. I'll still send a PR and you guys and compare its performance with your current code.

change default settings for the database, to allow for effective garb…

08e404a

…age collection fixes ipfs#54 settings are by courtesy of @jsign, see [his post here](dgraph-io/badger#1297 (comment))

RubenKelevra changed the title ~~change default settings for the database, to allow for effective garb…~~ change default settings for the database, to allow for effective gc May 29, 2020

willscott reviewed May 29, 2020

View reviewed changes

Stebalien requested changes May 31, 2020

View reviewed changes

Stebalien closed this May 31, 2020

RubenKelevra mentioned this pull request Jun 10, 2020

SURVEY: Who is using Badger? dgraph-io/badger#1352

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

change default settings for the database, to allow for effective gc #94

change default settings for the database, to allow for effective gc #94

RubenKelevra commented May 29, 2020 •

edited

Loading

welcome bot commented May 29, 2020

willscott May 29, 2020

RubenKelevra May 29, 2020

Stebalien May 31, 2020

Stebalien left a comment

Stebalien May 31, 2020

Stebalien May 31, 2020

RubenKelevra May 31, 2020

jarifibrahim May 31, 2020

jarifibrahim May 31, 2020

Stebalien May 31, 2020

Stebalien May 31, 2020

RubenKelevra May 31, 2020

jarifibrahim May 31, 2020

Stebalien May 31, 2020

Stebalien May 31, 2020

RubenKelevra May 31, 2020

jsign May 31, 2020 •

edited

Loading

RubenKelevra commented May 31, 2020 •

edited

Loading

jarifibrahim commented May 31, 2020

jarifibrahim commented May 31, 2020

RubenKelevra commented May 31, 2020

RubenKelevra commented May 31, 2020 •

edited

Loading

jarifibrahim commented Jun 2, 2020

Stebalien commented Jun 2, 2020

jarifibrahim commented Jun 2, 2020

RubenKelevra commented Jun 2, 2020 •

edited

Loading

jarifibrahim commented Jun 4, 2020

change default settings for the database, to allow for effective gc #94

change default settings for the database, to allow for effective gc #94

Conversation

RubenKelevra commented May 29, 2020 • edited Loading

welcome bot commented May 29, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Stebalien left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jsign May 31, 2020 • edited Loading

Choose a reason for hiding this comment

RubenKelevra commented May 31, 2020 • edited Loading

jarifibrahim commented May 31, 2020

jarifibrahim commented May 31, 2020

RubenKelevra commented May 31, 2020

RubenKelevra commented May 31, 2020 • edited Loading

jarifibrahim commented Jun 2, 2020

Stebalien commented Jun 2, 2020

jarifibrahim commented Jun 2, 2020

RubenKelevra commented Jun 2, 2020 • edited Loading

jarifibrahim commented Jun 4, 2020

RubenKelevra commented May 29, 2020 •

edited

Loading

jsign May 31, 2020 •

edited

Loading

RubenKelevra commented May 31, 2020 •

edited

Loading

RubenKelevra commented May 31, 2020 •

edited

Loading

RubenKelevra commented Jun 2, 2020 •

edited

Loading