Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use generated resource allocation options for veda hub #3566

Merged
merged 3 commits into from
Jan 4, 2024

Conversation

yuvipanda
Copy link
Member

@yuvipanda yuvipanda commented Jan 3, 2024

While starting to work on #3565, I realized that VEDA was still using the older style 'node share' rather than the generated 'resource allocation' options. I've swapped over the options to now be based on images for users to choose and resource allocation options generated by our resource allocation script. This matches openscapes, and there has generally been pretty big positive feedback on this mode. It also gives more visibility to the R & QGIS options.

I've kept the initial cloning to only happen on the pangeo image as it currently exists, without making any changes. That should be cleaned up as part of #3565

I also had to update the node-info.json file by running the appropriate command.

Old:

image image

New:

image image

While starting to work on 2i2c-org#3565,
I realized that VEDA was still using the older style 'node share'
rather than the generated 'resource allocation' options. I've swapped
over the options to now be based on images for users to choose and
resource allocation options generated by our resource allocation script.
This matches openscapes, and there has generally been pretty big positive
feedback on this mode.

I've kept the initial cloning to only happen on the pangeo image as
it currently exists, without making any changes. That should be cleaned
up as part of 2i2c-org#3565
@yuvipanda yuvipanda requested a review from a team as a code owner January 3, 2024 23:27
Copy link

github-actions bot commented Jan 3, 2024

Merging this PR will trigger the following deployment actions.

Support and Staging deployments

Cloud Provider Cluster Name Upgrade Support? Reason for Support Redeploy Upgrade Staging? Reason for Staging Redeploy
aws nasa-veda No Yes Core infrastructure has been modified
aws nasa-esdis No Yes Core infrastructure has been modified
aws gridsst No Yes Core infrastructure has been modified
aws ubc-eoas No Yes Core infrastructure has been modified
kubeconfig utoronto No Yes Core infrastructure has been modified
aws catalystproject-africa No Yes Core infrastructure has been modified
aws openscapes No Yes Core infrastructure has been modified
gcp 2i2c-uk No Yes Core infrastructure has been modified
gcp hhmi No Yes Core infrastructure has been modified
gcp leap No Yes Core infrastructure has been modified
gcp qcl No Yes Core infrastructure has been modified
aws 2i2c-aws-us No Yes Core infrastructure has been modified
aws nasa-ghg No Yes Core infrastructure has been modified
gcp awi-ciroh No Yes Core infrastructure has been modified
gcp 2i2c No Yes Core infrastructure has been modified
aws nasa-cryo No Yes Core infrastructure has been modified
gcp catalystproject-latam No Yes Core infrastructure has been modified
aws jupyter-meets-the-earth No Yes Core infrastructure has been modified
gcp callysto No Yes Core infrastructure has been modified
gcp pangeo-hubs No Yes Core infrastructure has been modified
gcp cloudbank No Yes Core infrastructure has been modified
aws smithsonian No Yes Core infrastructure has been modified
aws victor No Yes Core infrastructure has been modified
gcp linked-earth No Yes Core infrastructure has been modified
gcp meom-ige No Yes Core infrastructure has been modified

Production deployments

Cloud Provider Cluster Name Hub Name Reason for Redeploy
aws nasa-veda prod Core infrastructure has been modified
aws nasa-esdis prod Core infrastructure has been modified
aws gridsst prod Core infrastructure has been modified
aws ubc-eoas prod Core infrastructure has been modified
kubeconfig utoronto prod Core infrastructure has been modified
kubeconfig utoronto r-prod Core infrastructure has been modified
aws catalystproject-africa nm-aist Core infrastructure has been modified
aws catalystproject-africa must Core infrastructure has been modified
aws openscapes prod Core infrastructure has been modified
gcp 2i2c-uk lis Core infrastructure has been modified
gcp hhmi prod Core infrastructure has been modified
gcp leap prod Core infrastructure has been modified
gcp qcl prod Core infrastructure has been modified
aws 2i2c-aws-us showcase Core infrastructure has been modified
aws 2i2c-aws-us ncar-cisl Core infrastructure has been modified
aws 2i2c-aws-us go-bgc Core infrastructure has been modified
aws 2i2c-aws-us itcoocean Core infrastructure has been modified
aws 2i2c-aws-us cosmicds Core infrastructure has been modified
aws nasa-ghg prod Core infrastructure has been modified
gcp awi-ciroh prod Core infrastructure has been modified
gcp 2i2c imagebuilding-demo Core infrastructure has been modified
gcp 2i2c demo Core infrastructure has been modified
gcp 2i2c ohw Core infrastructure has been modified
gcp 2i2c aup Core infrastructure has been modified
gcp 2i2c temple Core infrastructure has been modified
gcp 2i2c ucmerced Core infrastructure has been modified
gcp 2i2c climatematch Core infrastructure has been modified
gcp 2i2c mtu Core infrastructure has been modified
gcp 2i2c tufts Core infrastructure has been modified
aws nasa-cryo prod Core infrastructure has been modified
gcp catalystproject-latam unitefa-conicet Core infrastructure has been modified
aws jupyter-meets-the-earth prod Core infrastructure has been modified
gcp callysto prod Core infrastructure has been modified
gcp pangeo-hubs prod Core infrastructure has been modified
gcp pangeo-hubs coessing Core infrastructure has been modified
gcp cloudbank bcc Core infrastructure has been modified
gcp cloudbank ccsf Core infrastructure has been modified
gcp cloudbank csm Core infrastructure has been modified
gcp cloudbank dvc Core infrastructure has been modified
gcp cloudbank elcamino Core infrastructure has been modified
gcp cloudbank evc Core infrastructure has been modified
gcp cloudbank glendale Core infrastructure has been modified
gcp cloudbank howard Core infrastructure has been modified
gcp cloudbank miracosta Core infrastructure has been modified
gcp cloudbank skyline Core infrastructure has been modified
gcp cloudbank demo Core infrastructure has been modified
gcp cloudbank fresno Core infrastructure has been modified
gcp cloudbank humboldt Core infrastructure has been modified
gcp cloudbank laney Core infrastructure has been modified
gcp cloudbank sbcc Core infrastructure has been modified
gcp cloudbank sbcc-dev Core infrastructure has been modified
gcp cloudbank lacc Core infrastructure has been modified
gcp cloudbank lamission Core infrastructure has been modified
gcp cloudbank mills Core infrastructure has been modified
gcp cloudbank mission Core infrastructure has been modified
gcp cloudbank norco Core infrastructure has been modified
gcp cloudbank palomar Core infrastructure has been modified
gcp cloudbank pasadena Core infrastructure has been modified
gcp cloudbank sjcc Core infrastructure has been modified
gcp cloudbank sacramento Core infrastructure has been modified
gcp cloudbank srjc Core infrastructure has been modified
gcp cloudbank saddleback Core infrastructure has been modified
gcp cloudbank santiago Core infrastructure has been modified
gcp cloudbank sjsu Core infrastructure has been modified
gcp cloudbank sierra Core infrastructure has been modified
gcp cloudbank tuskegee Core infrastructure has been modified
gcp cloudbank wlac Core infrastructure has been modified
gcp cloudbank csulb Core infrastructure has been modified
gcp cloudbank csum Core infrastructure has been modified
aws smithsonian prod Core infrastructure has been modified
aws victor prod Core infrastructure has been modified
gcp linked-earth prod Core infrastructure has been modified
gcp meom-ige prod Core infrastructure has been modified

@yuvipanda
Copy link
Member Author

I've pinged @freitagb and @slesaad on slack to take a look.

@wildintellect
Copy link
Contributor

I'd suggest just rounding down the numbers to the nearest 0.5?
e.g 1.9 = 1.5, 28.937 = 28.5 etc
As long as your not over promising it's accurate enough.

@yuvipanda
Copy link
Member Author

Positive feedback from Alex Mandel on slack:

image

Although there was a suggestion to do better rounding of the numbers.

@wildintellect
Copy link
Contributor

This does seems like an improvement, though I wonder why more than one section is necessary:
box1 - What image type
box2 (based on box1) - what machine size
box3 - only if needed for a custom image

Also linking to external descriptions/docs on the what software is installed or how to use the custom image choice.

@yuvipanda
Copy link
Member Author

@wildintellect that issue is pending jupyterhub/kubespawner#778 getting fixed.

@sgibson91
Copy link
Member

I've approved pending any tweaks asked for by the community :)

@yuvipanda
Copy link
Member Author

I'll open another issue to deal with the rounding as it's reasonably complex, and deploy this now. The community seemed to not object to this on slack :)

@yuvipanda yuvipanda merged commit e6ed45a into 2i2c-org:master Jan 4, 2024
33 checks passed
Copy link

github-actions bot commented Jan 4, 2024

🎉🎉🎉🎉

Monitor the deployment of the hubs here 👉 https://github.com/2i2c-org/infrastructure/actions/runs/7413484076

yuvipanda added a commit to yuvipanda/pilot-hubs that referenced this pull request Jan 4, 2024
2i2c-org#3569 changed
the cryptnono daemonset to have different resource requests
for the init containers as well as the container. While working
on 2i2c-org#3566, I noticed
this was generating wrong choices - the overhead was calculated
wrong (too small).

We were intentionally ignoring init containers while calculating
overhead, and turns out the scheduler and the autoscaler both
do take it into consideration. The effective resource request
for a pod is the higher of the resource requests for the containers
*or* the init containers - this ensures that a pod with higher
requests for init containers than containers (like our cryptnono pod!)
will actually run. This is documented at
https://kubernetes.io/docs/concepts/workloads/pods/init-containers/#resource-sharing-within-containers,
and implemented in Kubernetes itself at
https://github.com/kubernetes/kubernetes/blob/9bd0ef5f173de3cc2d1d629a4aee499d53690aee/pkg/api/v1/resource/helpers.go#L50
(this is the library code that the cluster autoscaler uses).

This PR updates the two places we currently have that calculate
effective resource requests (I assume eventually these will be
merged into one - I haven't kept up with the team's work last
quarter here).

I've updated the node-capacity-info.json file, which is what seems
to be used by the generator script right now.
yuvipanda added a commit to yuvipanda/pilot-hubs that referenced this pull request Jan 4, 2024
2i2c-org#3569 changed
the cryptnono daemonset to have different resource requests
for the init containers as well as the container. While working
on 2i2c-org#3566, I noticed
this was generating wrong choices - the overhead was calculated
wrong (too small).

We were intentionally ignoring init containers while calculating
overhead, and turns out the scheduler and the autoscaler both
do take it into consideration. The effective resource request
for a pod is the higher of the resource requests for the containers
*or* the init containers - this ensures that a pod with higher
requests for init containers than containers (like our cryptnono pod!)
will actually run. This is documented at
https://kubernetes.io/docs/concepts/workloads/pods/init-containers/#resource-sharing-within-containers,
and implemented in Kubernetes itself at
https://github.com/kubernetes/kubernetes/blob/9bd0ef5f173de3cc2d1d629a4aee499d53690aee/pkg/api/v1/resource/helpers.go#L50
(this is the library code that the cluster autoscaler uses).

This PR updates the two places we currently have that calculate
effective resource requests (I assume eventually these will be
merged into one - I haven't kept up with the team's work last
quarter here).

I've updated the node-capacity-info.json file, which is what seems
to be used by the generator script right now.
@yuvipanda
Copy link
Member Author

@wildintellect I opened #3584 with the explanation for why we show the odd numbers in memory.

yuvipanda added a commit to yuvipanda/pilot-hubs that referenced this pull request Jan 11, 2024
Brings in 2i2c-org#3566
(and follow-ups) to the GHG hub

Ref 2i2c-org#3565
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: Done 🎉
Development

Successfully merging this pull request may close these issues.

3 participants