You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Each of the workers in the Tezos Cluster keeps a local cache via docker of the images it uses.
When this cache becomes too full the individual worker pauses and runs docker system prune to free up space. Currently this prune is taking 4 hours on a worker, effectively taking out a worker for that entire time.
On top of missing 1 worker for 4 hours each time, the Octez pipeline seems to produce many large docker images (10Gb or more) as both input and output of the pipeline. We need to understand and document why that is and whether they are all necessary.
Solution
Some possible solutions to try (in rough order of suitability):
Run a nightly job to prune the docker cache, when the cluster is less busy
Add extra worker to allow for pruning time
Cleanup docker images that get produced by the pipeline, if they're not being published somewhere.
The text was updated successfully, but these errors were encountered:
Background
Each of the workers in the Tezos Cluster keeps a local cache via docker of the images it uses.
When this cache becomes too full the individual worker pauses and runs
docker system prune
to free up space. Currently this prune is taking 4 hours on a worker, effectively taking out a worker for that entire time.On top of missing 1 worker for 4 hours each time, the Octez pipeline seems to produce many large docker images (10Gb or more) as both input and output of the pipeline. We need to understand and document why that is and whether they are all necessary.
Solution
Some possible solutions to try (in rough order of suitability):
The text was updated successfully, but these errors were encountered: