Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add emerge jobs tmpdir blocks and files threshold options #1351

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

zmedico
Copy link
Member

@zmedico zmedico commented Jun 19, 2024

This supersedes #1345 and distinguishes between both used blocks and used files (inodes).

--jobs-tmpdir-blocks-threshold[=RATIO]

Specifies the maximum ratio of used blocks allowed (a
floating-point number between 0.0 and 1.0) in PORTAGE_TMPDIR
when starting a new job. With no argument, removes a previous
blocks ratio threshold. For example, use a ratio of 0.50 to stop
starting new jobs when the blocks usage in PORTAGE_TMPDIR
exceeds 50%. This option conflicts with FEATURES="keepwork".
WARNING: Since the job scheduler is unable to predict the future
consumption of jobs that it has scheduled, users are advised to
set a threshold that provides a significant amount of headroom,
in order to decrease the probability that jobs will fail due
to ENOSPC errors.

--jobs-tmpdir-files-threshold[=RATIO]

Specifies the maximum ratio of used files (inodes) allowed (a
floating-point number between 0.0 and 1.0) in PORTAGE_TMPDIR
when starting a new job. With no argument, removes a previous
files ratio threshold. For example, use a ratio of 0.50 to stop
starting new jobs when the files usage in PORTAGE_TMPDIR
exceeds 50%. This option conflicts with FEATURES="keepwork".
WARNING: Since the job scheduler is unable to predict the future
consumption of jobs that it has scheduled, users are advised to
set a threshold that provides a significant amount of headroom,
in order to decrease the probability that jobs will fail due
to ENOSPC errors.

Bug: https://bugs.gentoo.org/934382

Copy link
Member

@thesamesam thesamesam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs (functional) typo fixing but otherwise looks good, thanks! Hopefully @akhuettel can test it though to see if it works well for him.

lib/_emerge/main.py Outdated Show resolved Hide resolved
@zmedico zmedico force-pushed the bug_934382_jobs_tmpdir_blocks_files_threshold branch from ca05b6d to cdee46d Compare June 19, 2024 18:58
lib/_emerge/Scheduler.py Outdated Show resolved Hide resolved
@zmedico zmedico force-pushed the bug_934382_jobs_tmpdir_blocks_files_threshold branch from cdee46d to 0f2a0b2 Compare June 20, 2024 00:09
Copy link
Member

@Flowdalic Flowdalic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think that an absolute threshold is more sensible than a relative threshold. And considering the fact that it would be good to minimize the number of configuration options of Portage, we may want to start with an absolute threshold. A relative one can always be added later on, if the need arises.

It is also probably sufficient that users are only able to specify the absolute value in GiB, so we do not need to parse bytes in a human readable format.

Therefore, I believe instead of
--jobs-tmdir-blocks-threshold
we should start with
--jobs-tmpdir-require-free-gb
(or so)

Again, just my personal opinion. While I believe what I suggest is a small change resulting in a relevant improvement, this does not constitute in any shape or form me blocking this PR in its current form.

@zmedico
Copy link
Member Author

zmedico commented Jun 22, 2024

I still think that an absolute threshold is more sensible than a relative threshold. And considering the fact that it would be good to minimize the number of configuration options of Portage, we may want to start with an absolute threshold. A relative one can always be added later on, if the need arises.

It is also probably sufficient that users are only able to specify the absolute value in GiB, so we do not need to parse bytes in a human readable format.

Therefore, I believe instead of --jobs-tmdir-blocks-threshold we should start with --jobs-tmpdir-require-free-gb (or so)

I'm afraid this --jobs-tmpdir-require-free-gb option might give users a false sense of security, thinking the last scheduled job will have at least this amount of space reserved.

When emerge starts multiple jobs around the same time, the corresponding space usage does not manifest itself until later in time, after the scheduling decisions were already made. We can try to mitigate this issue by adding a long delay in between starting jobs, but this strategy could slow things down unnecessarily.

One way to deal with this kind of issue is to set a threshold that requires plenty of head room, like a ratio of 0.5 or so. An option like --jobs-tmpdir-require-free-gb makes less sense when we consider that space usage can be difficult to predict.

@Flowdalic
Copy link
Member

I'm afraid this --jobs-tmpdir-require-free-gb option might give users a false sense of security, thinking the last scheduled job will have at least this amount of space reserved.

That is, broadly speaking, also true for --jobs-tmdir-blocks-threshold 0.85, which could give the impression that the last scheduled jobs will have 15% of space reserved.

Unrelated if the setting denotes an absolute or relative value, one way to mitigate this is to make this a setting scale with the number of emerge jobs. For example, assume --jobs 3, then we could simply multiply required free space by three.1

And this is a example why an absolute threshold is preferable over a relative one. Assume an analysis of ::gentoo showed that the worst-case PORTAGE_TMPDIR consumption per package is 6 GiB. Then we could simply set the default of --jobs-tmpdir-require-free-gb to 6. But you can't hardcode such an value in portage which --jobs-tmdir-blocks-threshold. You can only dynamically calculate it one you know PORTAGE_TMPDIR.

The gist is, a good threshold is derived from our packages, not from the size of the filesystem that holds PORTAGE_TMPDIR. Therefore, a setting that specifies an absolute value is preferable over one that specifies a relative one.

Again, not blocking this PR. However, I strongly believe a relative threshold is not sensible for the reasons explained above. And, if it helps to avoid a relative threshold, then I am happy to work on --jobs-tmpdir-require-free-gb.

1: Instead of linearly scaling the free-space requirements with the number of emerge jobs, we could also use some more elaborate scheme using a decay function if we deem this sensible (which I believe it is).

Flowdalic added a commit to Flowdalic/portage that referenced this pull request Jun 23, 2024
This adds
  --jobs-tmpdir-require-free-gb=GB
  --jobs-tmpdir-require-free-kilo-inodes=INODES
as emerge emerge options.

When used with --jobs, makes portage/emerge check that PORTAGE_TMPDIR
has sufficient free resources before a new job is started.

Thanks goes out to Zac Medico, as this was inspired by
gentoo#1351, with the following
differences:

- options are absolute values, not relatives ones
- defaults for both options are specified
- option values are scaled, using a decaying function, considering
  the number or runnings jobs
- emit a warning once a threshold is reached

Bugs: https://bugs.gentoo.org/934382
Signed-off-by: Florian Schmaus <[email protected]>
Flowdalic added a commit to Flowdalic/portage that referenced this pull request Jun 23, 2024
This adds
  --jobs-tmpdir-require-free-gb=GB
  --jobs-tmpdir-require-free-kilo-inodes=INODES
as emerge emerge options.

When used with --jobs, makes portage/emerge check that PORTAGE_TMPDIR
has sufficient free resources before a new job is started.

Thanks goes out to Zac Medico, as this was inspired by
gentoo#1351, with the following
differences:

- options are absolute values, not relatives ones
- defaults for both options are specified
- option values are scaled, using a decaying function, considering
  the number or runnings jobs
- emit a warning once a threshold is reached

Bugs: https://bugs.gentoo.org/934382
Signed-off-by: Florian Schmaus <[email protected]>
Flowdalic added a commit to Flowdalic/portage that referenced this pull request Jun 23, 2024
This adds
  --jobs-tmpdir-require-free-gb=GB
  --jobs-tmpdir-require-free-kilo-inodes=INODES
as emerge emerge options.

When those are used with --jobs, makes portage/emerge check that
PORTAGE_TMPDIR has sufficient free resources before a new job is
started.

Thanks goes out to Zac Medico, as this was inspired by
gentoo#1351, with the following
differences:

- options are absolute values, not relatives ones
- defaults for both options are specified
- option values are scaled, using a decaying function, considering
  the number or running jobs
- emit a warning once a threshold is reached

Note that the scaling of the resource constraints can not be perfect
in the presence of concurrently running emerge jobs and without
_can_add_job() being provided with the number of jobs that are
potentially added. It is always possible that a emerge job has not yet
used much of the filesystem when we check the remaining filesystem
resources, and later on uses much more than the scaling function
accounted for it.

Ultimately, there is a tradeoff between portage limiting parallelism
needlessly (but still being able to emerge all packages) and portage
failing due to missing resources in PORTAGE_TMPDIR. The chosen
defaults are rather large and most packages use much less
filesystem resources then the scaling function accounts for
them. Therefore, the implemented approach idea is to lean towards
favoring functionality over parallelism.

Bugs: https://bugs.gentoo.org/934382
Signed-off-by: Florian Schmaus <[email protected]>
Flowdalic added a commit to Flowdalic/portage that referenced this pull request Jun 23, 2024
This adds
  --jobs-tmpdir-require-free-gb=GB
  --jobs-tmpdir-require-free-kilo-inodes=INODES
as emerge emerge options.

When those are used with --jobs, makes portage/emerge check that
PORTAGE_TMPDIR has sufficient free resources before a new job is
started.

Thanks goes out to Zac Medico, as this was inspired by
gentoo#1351, with the following
differences:

- options are absolute values, not relatives ones
- defaults for both options are specified
- option values are scaled, using a decaying function, considering
  the number or running jobs
- emit a warning once a threshold is reached

Note that the scaling of the resource constraints can not be perfect
in the presence of concurrently running emerge jobs and without
_can_add_job() being provided with the number of jobs that are
potentially added. It is always possible that a emerge job has not yet
used much of the filesystem when we check the remaining filesystem
resources, and later on uses much more than the scaling function
accounted for it.

Ultimately, there is a tradeoff between portage limiting parallelism
needlessly (but still being able to emerge all packages) and portage
failing due to missing resources in PORTAGE_TMPDIR. The chosen
defaults are rather large and most packages use much less
filesystem resources then the scaling function accounts for
them. Therefore, the implemented approach idea is to lean towards
favoring functionality over parallelism.

Bugs: https://bugs.gentoo.org/934382
Signed-off-by: Florian Schmaus <[email protected]>
@zmedico
Copy link
Member Author

zmedico commented Jun 23, 2024

I'm afraid this --jobs-tmpdir-require-free-gb option might give users a false sense of security, thinking the last scheduled job will have at least this amount of space reserved.

That is, broadly speaking, also true for --jobs-tmdir-blocks-threshold 0.85, which could give the impression that the last scheduled jobs will have 15% of space reserved.

Yes, so now I'd like to fix the documentation to explain the difficulties in predicting space usage, and suggest a safer ratio that provides more headroom.

Again, not blocking this PR. However, I strongly believe a relative threshold is not sensible for the reasons explained above. And, if it helps to avoid a relative threshold, then I am happy to work on --jobs-tmpdir-require-free-gb.

1: Instead of linearly scaling the free-space requirements with the number of emerge jobs, we could also use some more elaborate scheme using a decay function if we deem this sensible (which I believe it is).

Yes, please do work on --jobs-tmpdir-require-free-gb. However, it seems clear that it will introduce some significant complexities (decay functions and so on), and I think a relative threshold is a reasonable way to keep things simple for the first iteration of this feature.

@zmedico zmedico force-pushed the bug_934382_jobs_tmpdir_blocks_files_threshold branch from 0f2a0b2 to 34bb5bc Compare June 23, 2024 20:33
--jobs-tmpdir-blocks-threshold[=RATIO]

  Specifies the maximum ratio of used blocks allowed (a
  floating-point number between 0.0 and 1.0) in PORTAGE_TMPDIR
  when starting a new job. With no argument, removes a previous
  blocks ratio threshold. For example, use a ratio of 0.50 to stop
  starting new jobs when the blocks usage in PORTAGE_TMPDIR
  exceeds 50%. This option conflicts with FEATURES="keepwork".
  WARNING: Since the job scheduler is unable to predict the future
  consumption of jobs that it has scheduled, users are advised to
  set a threshold that provides a significant amount of headroom,
  in order to decrease the probability that jobs will fail due
  to ENOSPC errors.

--jobs-tmpdir-files-threshold[=RATIO]

  Specifies the maximum ratio of used files (inodes) allowed (a
  floating-point number between 0.0 and 1.0) in PORTAGE_TMPDIR
  when starting a new job. With no argument, removes a previous
  files ratio threshold. For example, use a ratio of 0.50 to stop
  starting new jobs when the files usage in PORTAGE_TMPDIR
  exceeds 50%. This option conflicts with FEATURES="keepwork".
  WARNING: Since the job scheduler is unable to predict the future
  consumption of jobs that it has scheduled, users are advised to
  set a threshold that provides a significant amount of headroom,
  in order to decrease the probability that jobs will fail due
  to ENOSPC errors.

Bug: https://bugs.gentoo.org/934382
Signed-off-by: Zac Medico <[email protected]>
@zmedico zmedico force-pushed the bug_934382_jobs_tmpdir_blocks_files_threshold branch from 34bb5bc to 3fd6642 Compare June 23, 2024 21:11
@Flowdalic
Copy link
Member

However, it seems clear that it will introduce some significant complexities (decay functions and so on)

I don't think that this is how the situation presents. There is no additional complexity in my PR besides the scale_to_jobs(num) function. Which isn't very complex and attempts to mitigate an issue that exists also in this PR. In other words, it is not specific to an absolute value. We could also drop it, if it is deemed to much complexity for "just" a mitigation.

and I think a relative threshold is a reasonable way [...] for the first iteration of this feature.

Unfortunately, I'd rather we do not. The my main motivation of authoring what I present in #1353 is that we do not end up with

  • --jobs-tmpdir-blocks-threshold
  • --jobs-tmpdir-files-threshold
  • --jobs-tmpdir-require-free-gb
  • --jobs-tmpdir-require-free-kilo-inodes

portage/emerge already has many command line arguments. Therefore I hope we can decide for either the absolute option or the relative one.

Fortunately, both approaches are now implemented, and I've provided the advantages of an absolute value over an relative one in my previous posts, so people can make an informed decision what they feel is best.

Again, not blocking this PR. I just personally believe that we should not go with --jobs-tmpdir-blocks-threshold (and --jobs-tmpdir-files-threshold).

@zmedico
Copy link
Member Author

zmedico commented Jun 26, 2024

Unfortunately, I'd rather we do not. The my main motivation of authoring what I present in #1353 is that we do not end up with

  • --jobs-tmpdir-blocks-threshold
  • --jobs-tmpdir-files-threshold
  • --jobs-tmpdir-require-free-gb
  • --jobs-tmpdir-require-free-kilo-inodes

portage/emerge already has many command line arguments. Therefore I hope we can decide for either the absolute option or the relative one.

We're in one of those situations where a plugin system is needed in order to satisfy everyone's needs. We can use a single --jobs-tmpdir option that can be specified multiple times in order to load multiple plugins, and then prefix the arguments with plugin names like --jobs-tmpdir=require-free-gb:16 --jobs-tmpdir=require-free-kilo-inodes:32.

@Flowdalic
Copy link
Member

We're in one of those situations where a plugin system is needed in order to satisfy everyone's needs.

We should avoid introducing additional complexity, especially since portage has already plenty of it.

Therefore, if I couldn't establish to you, or anybody else, that the relative-threshold approach of PR isn't ideal and that the absolute-threshold approach I suggest has unique advantages, just merge this PR and not mine. Carrying the semantics of both PRs is unnecessary and just adds even more knobs to portage. If it is decided to go with this PR (not sure how such decisions are made, portage dev team vote?), then please consider adjusting this PR so that emerge also emits a one-time warning if the threshold is it (see my PR). I am also not sure why this PR marks the threshold conflicting with the keepwork feature.

Copy link
Member

@thesamesam thesamesam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm already hesitant to add the 2 new options but I feel like they should make most people happy.

@@ -1304,6 +1350,27 @@ def emerge_main(args: Optional[list[str]] = None):
emerge_config.action, emerge_config.opts, emerge_config.args = parse_opts(
tmpcmdline
)
if (
"--jobs-tmpdir-blocks-threshold" in emerge_config.opts
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we should check for merge-wait too?

Copy link
Member Author

@zmedico zmedico Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's valid to use this option with or without merge-wait because either way the user could desire to use disk pressure to limit starting of new jobs.

On the other hand, #1349 only makes sense with merge-wait.

@@ -1033,6 +1043,42 @@ def parse_opts(tmpcmdline, silent=False):

myoptions.jobs = jobs

if myoptions.jobs_tmpdir_blocks_threshold == "True":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just coming back to this: all of this really makes me think we need to revisit the option handling :(

But it's of course fine for now

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True here indicates a missing argument, and this can mean different things for different options. For example, for --jobs a missing argument means unlimited jobs, and so it's handled like this instead:

        if myoptions.jobs == "True":
            jobs = True

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants