You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are unable to drive more throughput with larger worker pool. Our current peak yield comes from 24x32 worker pools. However, taskbroker is not saturating disk, cpu or network doing this workload. My theory is that we're limited by contention on sqlite, and that if we had additional sqlite databases we could unlock additional broker throughput.
Refactor/extract a trait from InflightActivationStore. This trait can be used as a type by the consumer, upkeep and grpc activities.
Implement an sharded store that spreads inflight task storage across multiple sqlite databases based on a configuration value for the number of databases.
During startup taskbroker can create the required number of connections, databases and apply migrations to each shard.
Change to RPC
SetTaskStatus can use the activation_id, to locate the correct shard to update.
GetTask will need to round robin between each of the shards to ensure even consumption
Changes to consumer
As activations are consumed from Kafka, we can choose which shard an activation is stored in by activation.id % num_dbs.
Free space calculation will have to take all shards into account.
Changes to metrics
Having multiple databases, will impact several metrics we collect from taskbroker.
upkeep summary metrics would need to be aggregated across each shard.
status count queries + metrics will need to be aggregated against each shard.
Desired result
Ideally with additional sqlite shards we are able to get more throughput from a single broker and support larger worker pools per broker. This should also help utilize broker CPU better.
The text was updated successfully, but these errors were encountered:
We are unable to drive more throughput with larger worker pool. Our current peak yield comes from 24x32 worker pools. However, taskbroker is not saturating disk, cpu or network doing this workload. My theory is that we're limited by contention on sqlite, and that if we had additional sqlite databases we could unlock additional broker throughput.
InflightActivationStore
. This trait can be used as a type by the consumer, upkeep and grpc activities.Change to RPC
SetTaskStatus
can use the activation_id, to locate the correct shard to update.GetTask
will need to round robin between each of the shards to ensure even consumptionChanges to consumer
activation.id % num_dbs
.Changes to metrics
Having multiple databases, will impact several metrics we collect from taskbroker.
Desired result
Ideally with additional sqlite shards we are able to get more throughput from a single broker and support larger worker pools per broker. This should also help utilize broker CPU better.
The text was updated successfully, but these errors were encountered: