go/store/{nbs,types}: GC: Move the reference walk from types to nbs.#8752
go/store/{nbs,types}: GC: Move the reference walk from types to nbs.#8752
Conversation
Make the ChunkStore itself responsible for the reference walk, being given handles for walking references and excluding chunks as part of the GC process. This is an incremental step towards adding dependencies on read chunks during the GC process. The ChunkStore can better distinguish whether the read is part of the GC process itself or whether it came from the application layer. It also allows better management of cache impact and the potential for better memory usage. This transformation gets rid of parallel reference walking and some manual batching which was present in the ValueStore implementation of reference walking. The parallel reference walking was necessary for reasonable performance in format __LD_1__, but it's actually not necessary in __DOLT__. For some use cases it's a slight win, but the simplification involved in getting rid of it is worth it for now.
max-hoffman
left a comment
There was a problem hiding this comment.
LGTM, the simplification is nice just a few related questions I noticed while getting up to speed
| return nil, fmt.Errorf("NBS does not support copying garbage collection") | ||
| } | ||
|
|
||
| gcc, err := newGarbageCollectionCopier() |
There was a problem hiding this comment.
somewhat unrelated, but if this embedded tfp their relationship might be clearer
There was a problem hiding this comment.
Good suggestion! I'll take a pass and potentially send out a separate PR :)
| src NBSCompressedChunkStore | ||
| dest *NomsBlockStore | ||
| getAddrs chunks.GetAddrsCurry | ||
| filter chunks.HasManyFunc |
There was a problem hiding this comment.
i'm hazy on what filter does, when would we discard hashes?
There was a problem hiding this comment.
Great question. It's used for generational GC. So, when we collect newgen -> oldgen, we're walking refs and we want to stop the walk anytime we walk into the old gen. Then, after those chunks are in the old gen, when we collect newgen -> newgen, we want to stop the walk once again anytime we walk into the old gen.
| require.True(t, ok) | ||
|
|
||
| keepChan := make(chan []hash.Hash, numChunks) | ||
| require.NoError(t, st.BeginGC(nil)) |
There was a problem hiding this comment.
is our GC testing this sparse? or do we have tests at other interface levels somewhere else
There was a problem hiding this comment.
We have some tests further up at doltdb, and then we have bats tests and go-sql-server-driver tests. The coverage isn't fantastic currently though.
Make the ChunkStore itself responsible for the reference walk, being given handles for walking references and excluding chunks as part of the GC process. This is an incremental step towards adding dependencies on read chunks during the GC process. The ChunkStore can better distinguish whether the read is part of the GC process itself or whether it came from the application layer. It also allows better management of cache impact and the potential for better memory usage.
This transformation gets rid of parallel reference walking and some manual batching which was present in the ValueStore implementation of reference walking. The parallel reference walking was necessary for reasonable performance in format LD_1, but it's actually not necessary in DOLT. For some use cases it's a slight win, but the simplification involved in getting rid of it is worth it for now.