-
Notifications
You must be signed in to change notification settings - Fork 61
Move build-mechnical to 12:00 UTC #1231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This time was originally chosen (1da5e54) based on when Fedora composes would finish for |
why? wouldn't you want to wait until after new content was available? |
/hold |
We could make the cron spec a global pipecfg knob if it comes to it. But that said, if we want perfect timing, obviously it's possible to trigger a job whenever content changes. I think ART today watches RHEL repos already, and also knows how to trigger jobs, so they're well-positioned for this. |
All of the last 10 rhel-9.6 jobs showing up in Jenkins were triggered by timer or me. I can check to see why we're not getting builds triggered on change by ART for at least rhel-9.6.
The RHCOS jenkins doesn't seem to have any Fedora related jobs, I assume some other pipeline instance is running the same jobs against Fedora?
I'm OK with any timing that we think yields less failed builds and content that's no more than 24 hours old as of Wednesday at 16:00 UTC. Where RHEL builds associated with GA versions of OCP are the highest priority within the set both in terms of success and freshness. |
yeah. we have an upstream pipeline, but share the same code that powers the pipeline. I'm not opposed to setting the times based on the pipeline, just trying understand the requirements. |
To be clear I wasn't implying that ART is doing this today. But that it could. |
I'd actually prefer ART didn't. Since RHEL content shouldn't really change more than once a day, is it really needed? |
Don't we today already trigger once a day? ISTM the ask here is to have better timing wrt RHEL, not necessarily have more builds. (This would actually be a reduction in builds if RHEL content changes less often than that... though I'm not sure in practice how often that happens.) But I guess if we go this way, we should probably also ask ART to cap the rate as well in case it sometimes churns faster for some reason. |
Just summarizing. Right now build-mechanical is only triggered by timer and it initiates builds for c10s, c9s, rhel-10.1 and rhel-9.6, in that order. It's never initiated by ART. Builds of c10s, c9s, rhel-10.1 and rhel-9.6 are pretty much exclusively triggered by build-mechanical or a human replaying a failure. I guess also Fedora pipelines happen too but I don't know about those. build-mechanical has a median run time of six hours and a maximum runtime of 13 hours. We should expect that runtime to grow whenever we add 9.8 and 10.2, though we'll remove 10.1 at roughly the same time. There are almost always changes to c10s and c9s Tuesday through Saturday, so this means that rhel-10.1 or rhel-9.6 typically start 13:00-14:00 UTC. build-node-image IS triggered by ART whenever OCP content changes, potentially multiple times a day. I'm not sure how we can please everyone with just one knob to set the start time across two pipelines and three OSes publishing schedules. Do we know if moving four hours earlier would be too early for Fedora? It seems like CS9 composes are complete by 04:00, CS10 composes by 04:30, Rawhide by 5:30. RHEL is a bit of a wildcard unfortunately, but lets assume done 16:00 UTC is generally done. With just one knob and no further changes I'd probably propose moving things two hours later in hopes of RHEL pushes having completed by the time we start RHEL builds but still with enough time to complete the process before the daily cut-off. Any appetite for that change? An alternative would be to try to go two hours earlier but I'd like to try later first. |
A single knob controls both FCOS and RHCOS pipelines here. C9S, C10S, and Rawhide composes seem to run between 04:00 and 05:00 UTC. Meanwhile RHEL content is generally pushed 09:00 to 16:00 UTC. If there are changes each build takes approximately 2.5 hours meaning that RHEL builds would start at 17:00 UTC and should have enough time to complete by end of day cut-offs, though it'll be tight. This will likely require additional tweaking in the future when we add 10.2 and 9.8.
682dd21
to
a39368a
Compare
Typically the time when the compose completes isn't the time that directory was created. For today's rawhide build I'd use the timestamp on this file which says:
Another option here (I think) would be to change the order the
Yeah. Later by two hours seems fine. The only problem I see here is that it leans more heavily on more western timezone folks to investgate failures because it will be later in the day before failures trickle in (more towards the end of eastern TZ folks' day). If you don't mind, can we clear up what the actual goal is? In your original description you mention:
but I don't really understand why we wouldn't want to pick up the newest content sooner after it's available rather than later. I would think the goal would be "to have the newest content available by XYZ cutoff time on W day of the week". Can you put it in those terms? FTR I'm not opposed to having multiple knobs here. It just makes things slightly more complicated. |
The goal being to ensure build-mechnical and the downstream build-node-image jobs finish before we typically see content changed, which tends to be EMEA to US East AM hours. In situations where we're hoping to pick up late changes this may cause delays, if that becomes an issue I'd propose we consider adding a second run at 17:00 UTC (EDT 12:00 + 5hr), but ideally we'd limit that to only versions where we hope to have rapid turn around. Those versions would be 9.6 today, then 9.6, 9.8, and 10.2 starting approximately May 2026.