Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try sharding sqlite for inflight taskactivations #228

Open
markstory opened this issue Mar 5, 2025 · 0 comments
Open

Try sharding sqlite for inflight taskactivations #228

markstory opened this issue Mar 5, 2025 · 0 comments

Comments

@markstory
Copy link
Member

We are unable to drive more throughput with larger worker pool. Our current peak yield comes from 24x32 worker pools. However, taskbroker is not saturating disk, cpu or network doing this workload. My theory is that we're limited by contention on sqlite, and that if we had additional sqlite databases we could unlock additional broker throughput.

  • Refactor/extract a trait from InflightActivationStore. This trait can be used as a type by the consumer, upkeep and grpc activities.
  • Implement an sharded store that spreads inflight task storage across multiple sqlite databases based on a configuration value for the number of databases.
  • During startup taskbroker can create the required number of connections, databases and apply migrations to each shard.

Change to RPC

  • SetTaskStatus can use the activation_id, to locate the correct shard to update.
  • GetTask will need to round robin between each of the shards to ensure even consumption

Changes to consumer

  • As activations are consumed from Kafka, we can choose which shard an activation is stored in by activation.id % num_dbs.
  • Free space calculation will have to take all shards into account.

Changes to metrics

Having multiple databases, will impact several metrics we collect from taskbroker.

  • upkeep summary metrics would need to be aggregated across each shard.
  • status count queries + metrics will need to be aggregated against each shard.

Desired result

Ideally with additional sqlite shards we are able to get more throughput from a single broker and support larger worker pools per broker. This should also help utilize broker CPU better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant