Investigate docker image usage and production in Tezos Cluster #55

tmcgilchrist · 2022-06-20T23:28:29Z

Background

Each of the workers in the Tezos Cluster keeps a local cache via docker of the images it uses.
When this cache becomes too full the individual worker pauses and runs docker system prune to free up space. Currently this prune is taking 4 hours on a worker, effectively taking out a worker for that entire time.

On top of missing 1 worker for 4 hours each time, the Octez pipeline seems to produce many large docker images (10Gb or more) as both input and output of the pipeline. We need to understand and document why that is and whether they are all necessary.

Solution

Some possible solutions to try (in rough order of suitability):

Run a nightly job to prune the docker cache, when the cluster is less busy
Add extra worker to allow for pruning time
Cleanup docker images that get produced by the pipeline, if they're not being published somewhere.

The text was updated successfully, but these errors were encountered:

tmcgilchrist assigned mtelvers Jun 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate docker image usage and production in Tezos Cluster #55

Investigate docker image usage and production in Tezos Cluster #55

tmcgilchrist commented Jun 20, 2022

Investigate docker image usage and production in Tezos Cluster #55

Investigate docker image usage and production in Tezos Cluster #55

Comments

tmcgilchrist commented Jun 20, 2022

Background

Solution