Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Data partitions for snapshot and durable state #515

Merged
merged 13 commits into from
Feb 9, 2024

Conversation

patriknw
Copy link
Member

@patriknw patriknw commented Feb 2, 2024

On top of #508 , and not complete yet

Base automatically changed from wip-data-partitions-patriknw to main February 5, 2024 09:33
@patriknw patriknw force-pushed the wip-more-data-partitions-patriknw branch from 73af6fe to fdfd25a Compare February 5, 2024 14:33
@patriknw patriknw marked this pull request as ready for review February 5, 2024 14:33
@patriknw
Copy link
Member Author

patriknw commented Feb 5, 2024

Ready for review, I think it should cover everything for snapshot and durable state.

@patriknw patriknw force-pushed the wip-more-data-partitions-patriknw branch from fdfd25a to 9f379a7 Compare February 5, 2024 14:37
Copy link
Member

@johanandren johanandren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, left a few nitpicks.

* the table name is without suffix.
*/
def snapshotTableWithSchema(slice: Int): String =
resolveTableName(snapshotsTableWithSchema, slice)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is accidentally used in PostgresSnapshotDao, can we make the name factory private and have a public method that uses the table to avoid such mistakes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean? This is the new method, which should always be used. The old snapshotsTableWithSchema without slice parameter could become private, but I didn't want to break bin compat (@InternalStableApi), but maybe worth doing that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I deprecated the old tableWithSchema methods

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, maybe I was unclear, what I meant is that the consuming code calls this, which generates a new string for every call instead of using the allSnapshotTablesWithSchema lookup table defined below. Probably not a big thing but seems wasteful when we anyway have the lookup table.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see now the lookup table is the reverse, name to slice

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean, but allSnapshotTablesWithSchema isn't a lookup for each of the 1024 slices. It contains one entry per data partition, where the slice is the lower slice for that data partition.
E.g. snapshotTableWithSchema(slice = 17) can't be found easily in allSnapshotTablesWithSchema.

To make it a full lookup table it would have to contain 1024 entries, with the slice as key. Might be possible with some other structure such as binary search but not sure that is worth it. Then I'm more worried about all string construction of the full sql statements that now happen on each use. We could introduce a cache for them, and that would also solve this because the table names are always used within such sql.

allSnapshotTablesWithSchema was mostly added because in tests we delete all tables before running.

* The durable state table and schema name with data partition suffix for the given slice. When number-of-partitions
* is 1 the table name is without suffix.
*/
def durableStateTableWithSchema(slice: Int): String =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also used all over the place instead of the prepared lookup table.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment here. With lookup table, do you mean _durableStateTableByEntityType. For Durable State it was already possible to configure separate table per entity type, and the accessor for that is

def getDurableStateTableWithSchema(entityType: String, slice: Int)

but I think we also need this one for the default durable_state table

@patriknw patriknw force-pushed the wip-more-data-partitions-patriknw branch 2 times, most recently from f6b13dd to 3ec6548 Compare February 8, 2024 12:54
@patriknw patriknw mentioned this pull request Feb 8, 2024
@patriknw patriknw force-pushed the wip-more-data-partitions-patriknw branch from b8342d3 to 14c2d67 Compare February 9, 2024 09:15
@patriknw patriknw merged commit 28b95bc into main Feb 9, 2024
9 checks passed
@patriknw patriknw deleted the wip-more-data-partitions-patriknw branch February 9, 2024 10:11
@patriknw patriknw added this to the 1.2.2 milestone Feb 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants