Try sharding sqlite for inflight taskactivations #228

markstory · 2025-03-05T20:27:58Z

We are unable to drive more throughput with larger worker pool. Our current peak yield comes from 24x32 worker pools. However, taskbroker is not saturating disk, cpu or network doing this workload. My theory is that we're limited by contention on sqlite, and that if we had additional sqlite databases we could unlock additional broker throughput.

Refactor/extract a trait from InflightActivationStore. This trait can be used as a type by the consumer, upkeep and grpc activities.
Implement an sharded store that spreads inflight task storage across multiple sqlite databases based on a configuration value for the number of databases.
During startup taskbroker can create the required number of connections, databases and apply migrations to each shard.

Change to RPC

SetTaskStatus can use the activation_id, to locate the correct shard to update.
GetTask will need to round robin between each of the shards to ensure even consumption

Changes to consumer

As activations are consumed from Kafka, we can choose which shard an activation is stored in by activation.id % num_dbs.
Free space calculation will have to take all shards into account.

Changes to metrics

Having multiple databases, will impact several metrics we collect from taskbroker.

upkeep summary metrics would need to be aggregated across each shard.
status count queries + metrics will need to be aggregated against each shard.

Desired result

Ideally with additional sqlite shards we are able to get more throughput from a single broker and support larger worker pools per broker. This should also help utilize broker CPU better.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Try sharding sqlite for inflight taskactivations #228

Try sharding sqlite for inflight taskactivations #228

markstory commented Mar 5, 2025

Try sharding sqlite for inflight taskactivations #228

Try sharding sqlite for inflight taskactivations #228

Comments

markstory commented Mar 5, 2025

Change to RPC

Changes to consumer

Changes to metrics

Desired result