Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collecting input: validating releases and improving cache #2375

Closed
jbergstroem opened this issue Jul 3, 2020 · 8 comments
Closed

Collecting input: validating releases and improving cache #2375

jbergstroem opened this issue Jul 3, 2020 · 8 comments
Labels
cdn changes or problems related to our cdn: cloudflare help wanted question stale

Comments

@jbergstroem
Copy link
Member

I'm looking at how we can improve our end to end storage of distfiles. Part of that also touches our release process, pushing distfiles and making these public (in the eyes of a user as well as cache).

To extend my train of thought a bit; I want our end storage to be something a bit more distributed as well as "colder"; be it GCS or s3-like. This way, we can offload a lot of traffic from our server(s) as well as improve resiliency when catering to the majority of our bandwidth.

As far as I understand, we invalidate cache (read: changing user expectations) as part of release processes. One reason seems to be that some releases needs to be re-baked. What other reasons do we have? Incomplete releases perhaps?

@jbergstroem jbergstroem added help wanted question cdn changes or problems related to our cdn: cloudflare labels Jul 3, 2020
@jbergstroem
Copy link
Member Author

jbergstroem commented Jul 3, 2020

My first idea was to have any ci output with destination "dist" landing in a staging folder. Before passing onto cold storage (read: never write twice, never delete) there would be some kind of validation. Preferably automatic (checksums, extracting, binary checks, ..) or alternatively manual since parts of our release is already intentionally manual.

Based on the input I collect this is subject to change.

@rvagg
Copy link
Member

rvagg commented Jul 3, 2020

We invalidate cache because brute-force "invalidate all" is the best tool we have and when we publish new website updates and directory indexes we need those updates to be pushed out to the edges and not stay stale. CF has more fine-grained controls now but I think it means tagging everything into groups with nginx so we can invalidate just certain things. That work hasn't been done because ... it's work, and it's not simple.

@jbergstroem
Copy link
Member Author

We invalidate cache because brute-force "invalidate all" is the best tool we have and when we publish new website updates and directory indexes we need those updates to be pushed out to the edges and not stay stale.

I usually read this as "I don't know what I'm changing". My dream scenario would be to understand these changes and introduce a flow where we can replace "restarting windows" with "killall Finder".

CF has more fine-grained controls now but I think it means tagging everything into groups with nginx so we can invalidate just certain things.

Cache tags is one solution. The landscape has changed a bit so I would say we have more options now.

Discovering all exceptions would probably help us make the best decision and Make It So.

@rvagg
Copy link
Member

rvagg commented Jul 3, 2020

Right, it's always been a crappy setup but originally we didn't have much choice and as options started appearing they were just too complicated and time consuming to implement. We have two broad cases which overlap:

  • website redeploy - all of the stuff we serve as website content would need to be invalidated, or magically just the things that have changed if that's possible. There might be flexibility on urgency of this but I would expect that people deploying new versions of the website would be surprised if their changes didn't show up straight away so I'm not sure relying on a timed cache invalidation is acceptable. This content is under /en/ etc. and we should be able to isolate it because of the localisation setup and because in nging pretty much everything that's not this website content is pulled in as aliases.
  • releases - nightlies, tests, rcs and actual releases. We want to invalidate the various directory indexes (/dist/, /download/*/, /docs/, others?) so the new items show up. index.json and index.tab in the respective release type directory needs invalidation. And, when there's a new /release/ build, we have to rebuild the website again to get it properly listed on the front page and on the downloads page. Although we have a separate trigger for website rebuild so this step could be rolled into the website redeploy step above. There's also /api/ which probably needs invalidation on new /release/s.

The tragedy of our current situation is all of the download binaries getting invalidated. Even if we could just say "any .tar.?z, .exe, .msi, .etc should never be invalidated, you can have them for as long as you like" then it'd be a massive step forward.

@jbergstroem
Copy link
Member Author

Even if we could just say "any .tar.?z, .exe, .msi, .etc should never be invalidated, you can have them for as long as you like" then it'd be a massive step forward.

This is also what I find being the biggest win (hence #2376).

And, when there's a new /release/ build,

I don't remember this - can you share a link?

[paraphrase] Generic indexes and front pages

Doable; but ultimately not scalable hand-tooling. We can set timeouts from nginx based on file-types and do exceptions for indexes. The maintainable solution here is usually having a caching server in front that more easily can express logic for these cases; be it ATS or varnish (or cloudflare or fastly). If we split out downloads, we don't have to change much of this behavior for now since cache populates quickly.

@rvagg
Copy link
Member

rvagg commented Jul 3, 2020

And, when there's a new /release/ build,

I don't remember this - can you share a link?

what I mean is that "release" type builds are special cases of all the types. nightly, test and rc just get auto-promoted and they don't interact with the rest of the website at all, they're just a new subdirectory, a change to index.json and index.tab in their respective parent directories and a refresh of that parent directory index. For release builds that are either the current "current" line or the current "lts" line, the main website page gets updated to list it, https://nodejs.org/en/download/ (and friends) gets updated to show it.

We run this in crontab every 5 minutes: https://github.com/nodejs/build/blob/master/ansible/www-standalone/resources/scripts/check-build-site.sh which checks the release build index.tab with website index.html. If the former is newer then it needs a website rebuild.

This is also very relevant, and also in need of attention if you're keen for something to chew on: #2123 (comment)

@jbergstroem
Copy link
Member Author

This is also very relevant, and also in need of attention if you're keen for something to chew on: #2123 (comment)

This seems highly relevant to #2359 as well.

@github-actions
Copy link

This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is removed or a comment is made.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cdn changes or problems related to our cdn: cloudflare help wanted question stale
Projects
None yet
Development

No branches or pull requests

2 participants