Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Little bug when compacting with tombstones #36

Open
sebheitzmann opened this issue Nov 11, 2024 · 8 comments
Open

Little bug when compacting with tombstones #36

sebheitzmann opened this issue Nov 11, 2024 · 8 comments

Comments

@sebheitzmann
Copy link
Contributor

When the first Sstable has 0 records, it prevent the compaction of all the tombstone of other sstable.

The first sstable is never selected for compaction. How can a sstable have 0 records ? Should it still exist ?

@thomasjungblut
Copy link
Owner

I assume it's because of this filter here, right?
https://github.com/thomasjungblut/go-sstables/blob/main/simpledb/sstable_manager.go#L125-L126

@sebheitzmann
Copy link
Contributor Author

Yes. That's it. I can make you a PR.

@thomasjungblut
Copy link
Owner

I think there are a couple more edge cases that I didn't previously think of. I've been just trying to write some more unit tests in main...tombstone_edge_tests

First off, it seems that canRemoveTombstone isn't used anywhere. I reckon that should've been used to switch off the removal in ScanReduceLatestWins.

When the table manager selects for compaction, it matters whether there are holes in them or not.
Imagine the scenario:

Table 1 (size 2gig)
Table 2 (size 1gig)
Table 3 (size 3gig)
Table 4 (size 1gig)

Then we would try to compact (assuming max size of 2 gigs): 1, 2 and 4. Which completely disregards 3 and would destroy the data lineage. It's a more general bug, which may or may not happen, depending on how the files are bunched together. The tombstoning just makes it more apparent because suddenly you can have empty tables:

Table 1 (size 1gig, numbers from 1-100) 
Table 2 (size 1gig, deletes numbers from 1-100)

Now with the tombstones compacted, you end up with a new Table 1, which is empty. Then you can further run into your initial state, because that table will never become a candidate for compaction anymore.

That kinda diverges from the easy peasy algorithm that initially just lumped everything onto the lowest table that is below size X :)

@sebheitzmann
Copy link
Contributor Author

You're right, the campaction cannot ignore some table due to it's size if we want to delete tombstoned records.
I've done some test and we need to add a second layer or remove the size limit.
If we have a L1 dispatched by the key we could merge all the table in a segment without having some huge table.

I imaging something like this

sstable1/
sstable2
L1/
L1/0000/sstable1
L1/0000/sstable2
L1/xxxx/sstable1

etc.

Where XXXX is the first 2 bytes of the key ( or the hash of the key)
The compaction could run on each of these segment without problem or size limit.
We just need a function to convert the key in segment folder ( provided by the user )

What do you think about this design ?

@thomasjungblut
Copy link
Owner

a partition key would work, but the underlying issue is still the same- unless you want to always compact the entire segment

To keep things simple, how about something like Kandane's algorithm, where we're just running up continuous chunks of tables which can get compacted - empty and non-empty alike.

OR we're always compacting from the other direction, from the most recent file to the oldest. Need to think about this a bit more though.. That way we can keep the max size, I think...

@sebheitzmann
Copy link
Contributor Author

Yes, the idea is to compact everything in the segment. But he segment can be small.

@sebheitzmann
Copy link
Contributor Author

There can't be a compaction with garbage collection without a full compaction. We can have a version of a key in every sstable. If you ignore a table the result can't be right.

In the original concept, sstable should have non overlapping keyspace for L1 and over.

Maybe we should seperate the compaction and the garbage collection to keep the skip of some table.

@thomasjungblut
Copy link
Owner

We can have a version of a key in every sstable. If you ignore a table the result can't be right.

correct, but we have an ordering guarantee through the filenames. When we ensure that we can only merge adjacent files, that would still work (I guess). We just can't skip any of them as I've done it with the maximum size.

should have non overlapping keyspace for L1 and over.

yep, that's fair. Currently it's just one big key space which makes it somewhat inefficient to query all segments at once.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants