Little bug when compacting with tombstones #36

sebheitzmann · 2024-11-11T15:00:31Z

When the first Sstable has 0 records, it prevent the compaction of all the tombstone of other sstable.

The first sstable is never selected for compaction. How can a sstable have 0 records ? Should it still exist ?

thomasjungblut · 2024-11-15T12:53:34Z

I assume it's because of this filter here, right?
https://github.com/thomasjungblut/go-sstables/blob/main/simpledb/sstable_manager.go#L125-L126

sebheitzmann · 2024-11-15T13:10:56Z

Yes. That's it. I can make you a PR.

thomasjungblut · 2024-11-15T16:18:07Z

I think there are a couple more edge cases that I didn't previously think of. I've been just trying to write some more unit tests in main...tombstone_edge_tests

First off, it seems that canRemoveTombstone isn't used anywhere. I reckon that should've been used to switch off the removal in ScanReduceLatestWins.

When the table manager selects for compaction, it matters whether there are holes in them or not.
Imagine the scenario:

Table 1 (size 2gig)
Table 2 (size 1gig)
Table 3 (size 3gig)
Table 4 (size 1gig)

Then we would try to compact (assuming max size of 2 gigs): 1, 2 and 4. Which completely disregards 3 and would destroy the data lineage. It's a more general bug, which may or may not happen, depending on how the files are bunched together. The tombstoning just makes it more apparent because suddenly you can have empty tables:

Table 1 (size 1gig, numbers from 1-100) 
Table 2 (size 1gig, deletes numbers from 1-100)

Now with the tombstones compacted, you end up with a new Table 1, which is empty. Then you can further run into your initial state, because that table will never become a candidate for compaction anymore.

That kinda diverges from the easy peasy algorithm that initially just lumped everything onto the lowest table that is below size X :)

sebheitzmann · 2024-11-15T17:09:11Z

You're right, the campaction cannot ignore some table due to it's size if we want to delete tombstoned records.
I've done some test and we need to add a second layer or remove the size limit.
If we have a L1 dispatched by the key we could merge all the table in a segment without having some huge table.

I imaging something like this

sstable1/
sstable2
L1/
L1/0000/sstable1
L1/0000/sstable2
L1/xxxx/sstable1

etc.

Where XXXX is the first 2 bytes of the key ( or the hash of the key)
The compaction could run on each of these segment without problem or size limit.
We just need a function to convert the key in segment folder ( provided by the user )

What do you think about this design ?

thomasjungblut · 2024-11-15T17:21:37Z

a partition key would work, but the underlying issue is still the same- unless you want to always compact the entire segment

To keep things simple, how about something like Kandane's algorithm, where we're just running up continuous chunks of tables which can get compacted - empty and non-empty alike.

OR we're always compacting from the other direction, from the most recent file to the oldest. Need to think about this a bit more though.. That way we can keep the max size, I think...

sebheitzmann · 2024-11-15T17:31:04Z

Yes, the idea is to compact everything in the segment. But he segment can be small.

sebheitzmann · 2024-11-16T10:05:35Z

There can't be a compaction with garbage collection without a full compaction. We can have a version of a key in every sstable. If you ignore a table the result can't be right.

In the original concept, sstable should have non overlapping keyspace for L1 and over.

Maybe we should seperate the compaction and the garbage collection to keep the skip of some table.

thomasjungblut · 2024-11-16T18:11:13Z

We can have a version of a key in every sstable. If you ignore a table the result can't be right.

correct, but we have an ordering guarantee through the filenames. When we ensure that we can only merge adjacent files, that would still work (I guess). We just can't skip any of them as I've done it with the maximum size.

should have non overlapping keyspace for L1 and over.

yep, that's fair. Currently it's just one big key space which makes it somewhat inefficient to query all segments at once.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Little bug when compacting with tombstones #36

Little bug when compacting with tombstones #36

sebheitzmann commented Nov 11, 2024

thomasjungblut commented Nov 15, 2024

sebheitzmann commented Nov 15, 2024

thomasjungblut commented Nov 15, 2024

sebheitzmann commented Nov 15, 2024

thomasjungblut commented Nov 15, 2024

sebheitzmann commented Nov 15, 2024

sebheitzmann commented Nov 16, 2024

thomasjungblut commented Nov 16, 2024

Little bug when compacting with tombstones #36

Little bug when compacting with tombstones #36

Comments

sebheitzmann commented Nov 11, 2024

thomasjungblut commented Nov 15, 2024

sebheitzmann commented Nov 15, 2024

thomasjungblut commented Nov 15, 2024

sebheitzmann commented Nov 15, 2024

thomasjungblut commented Nov 15, 2024

sebheitzmann commented Nov 15, 2024

sebheitzmann commented Nov 16, 2024

thomasjungblut commented Nov 16, 2024