Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add logic on nbc-webhook to pickup internal or external registry #329

Merged

Conversation

atheo89
Copy link
Member

@atheo89 atheo89 commented May 17, 2024

Related to: https://issues.redhat.com/browse/RHOAIENG-6617

Description

This PR implements a logic on the odh-notebook-webhook controller, that checks if there is an internal registry and takes the corresponding actions to set the container.image value. If an internal registry is detected, it uses the default values specified in the Notebook Custom Resource (CR). Otherwise, it checks the last-image-selection annotation to find the image stream and fetches the image from status.dockerImageReference, assigning it to the container.image value.

How Has This Been Tested?

It tested in a cluster with & without internal registry.

  1. Replace the deployment image of the odh-notebook-controller with
    quay.io/opendatahub/odh-notebook-controller:pr-329

  2. Spin up a new notebook should be started as usual.

    • Scale down/up the notebook
    • Edit the notebook and change the image
    • Check on the odh-notebook-controller logs should write Internal registry found. Will pickup the default value from image field.
  3. To test this without internal registry, you need to update the Registry CR spec with spec.managementState from Managed to Removed (For more look here)

    • Spin up a new notebook should be up and running
    • Check that the image that picked up is from the external registry and corresponds to the selected tag from the user
    • Check on the odh-notebook-controller logs should write No Internal registry found, pick up imageHash from status.tag.dockerImageReference

Important Note: When the internal registry is down, the dashboard displays a red !Deleted label. For this we are working with @lucferbux to identify and fix it.
image

Merge criteria:

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

@atheo89 atheo89 requested review from jstourac and harshad16 and removed request for LaVLaS and VaishnaviHire May 17, 2024 10:06

annotations := notebook.GetAnnotations()
if annotations != nil {
if imageSelection, exists := annotations["notebooks.opendatahub.io/last-image-selection"]; exists {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is never true in kubeflow tests, so the code below is untested, as indicated by the red bar on the left side

image

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you check this? I can not see this on vs code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code coverage should be part of the go vscode plugin, https://github.com/golang/vscode-go/blob/master/docs%2Ffeatures.md

Copy link
Member Author

@atheo89 atheo89 May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, i don't see anything, I use this one indeed :
image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rsc in his talk https://research.swtch.com/testing uses

go tool cover -html

that visualizes the coverage data as HTML

image

Copy link
Member

@harshad16 harshad16 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Great approach , just one question:
In disconnected cluster, can there be a case
where there is a existing notebook in the cluster ?

@atheo89
Copy link
Member Author

atheo89 commented May 23, 2024

/lgtm

Great approach , just one question: In disconnected cluster, can there be a case where there is a existing notebook in the cluster ?

Thank you, Harshad, for your review. 🙂

This approach doesn't affect already created notebooks; it only applies during their creation. In the disconnected cluster I used for testing, I didn't notice anything unusual with existing notebooks. However, since the cluster was shared, I couldn't perform extensive testing.

Copy link
Member

@harshad16 harshad16 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested in cluster, works for a new workbenches.
/lgtm
/approve

Copy link

openshift-ci bot commented May 23, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: harshad16

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit 660c3ac into opendatahub-io:v1.7-branch May 23, 2024
7 checks passed
@harshad16
Copy link
Member

/cherrypick stable

@openshift-cherrypick-robot

@harshad16: new pull request created: #332

In response to this:

/cherrypick stable

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@shalberd
Copy link

shalberd commented May 24, 2024

@harshad16 @atheo89 ah, I see, you are not using the image change trigger annotation here in odh notebook controller. I will have a look at it more today and tomorrow.
Fetching the info from status.dockerImageReference ... what about the situation for a floating tag, i.e. same docker image tag, but changing digests? Checking this, as I have the situation, too.
I have a fixed imagestream name:tag jupyter-minimal-notebook:jupyter-minimal-py39-2024a with changing digests behind the underlying docker.from image due to continuous updates every couple of weeks (see my code snippet below)
As it is now in my implementation, with the image change trigger annotation (not your approach here) and paused: False, the image field updates always to the latest dockerImageReference entry in status.tags for a given imagestream tag (not image tag).
Do you take into account the possibility of multiple entries for a given imagestream tag and get the latest?
You do that on first creation of the notebook only, not on updates of already-running notebooks, right? Otherwise, your container would restart.
As mentioned, just for now sharing some thoughts. I will analyze your code in detail tomorrow. I think I see why you went with this approach for now, but in effect, it is replicating what the image content change trigger annotation of Openshift is doing.

status:
  dockerImageRepository: ''
  tags:
    - tag: jupyter-minimal-py39-2024a
      items:
        - created: '2024-05-16T07:28:59Z'
          dockerImageReference: >-
            registry.bla.ch/ml/workbench-images@sha256:cf96395eee320f6e7cde1bf2ea223a9d235d32668bbe86a2aca0bd73d09cc95a
          image: >-
            sha256:cf96395eee320f6e7cde1bf2ea223a9d235d32668bbe86a2aca0bd73d09cc95a
          generation: 14
        - created: '2024-03-28T13:00:01Z'
          dockerImageReference: >-
            registry.bla.ch/ml//workbench-images@sha256:0653a015552f859c33d5041b514a650412045a55bc0fe292cc0ab39916534b03
          image: >-
           sha256:0653a015552f859c33d5041b514a650412045a55bc0fe292cc0ab39916534b03
          generation: 12
        - created: '2024-03-27T22:27:01Z'
          dockerImageReference: >-
            registry.bla.ch/ml//workbench-images@sha256:21f28945b0491a9db6761f5f9bf44d9be8c81640a5e3b861a23192077b955376
          image: >-
            sha256:21f28945b0491a9db6761f5f9bf44d9be8c81640a5e3b861a23192077b955376
          generation: 10

imagestream tag spec:

spec:
  lookupPolicy:
    local: true
  tags:
    - name: jupyter-minimal-py39-2024a
        from:
          kind: DockerImage
          name: >-
            registry.ch/mine/workbench-images:floatingtag
          referencePolicy:
            type: Local

@shalberd
Copy link

shalberd commented May 24, 2024

general thought:

I think you are really replicating the task of the image content change annotation image.openshift.io/triggers with paused: true here. I get the reasoning, in that your sprint task is to just "make it work without internal openshift docker registry". What I am missing is a kind of architectural discussion, though your work in the PR is impressive.
But are there plans to instead construct the image change trigger annotation in odh notebook controller, possibly with paused: false or true, depending on a to be defined notebook annotation like "long-running" or "apply image changes immediately" instead of filling the container image field value, in future? https://docs.openshift.com/container-platform/4.12/openshift_images/triggering-updates-on-imagestream-changes.html
I could image odh notebook controller playing a much smaller part in setting paused: true, on the other hand, odh-dashboard could also allow people to set paused: true for long-running notebooks after first notebook / container startup.
Thank you and Adriana for the hard work and for keeping me involved.
Not only that, but the image change trigger annotation also handles image pull-thru always vs, if present, pull from internal registry cache depending on spec.tags[tagname].referencePolicy.type: Local vs spec.tags[tagname].referencePolicy.type: Source ... as described in the lower 2/3 of the first comment of kubeflow-notebook-controller (not odh-notebook-controller) PR-133

@shalberd
Copy link

shalberd commented May 24, 2024

@atheo89 regarding working with @lucferbux on "When the internal registry is down, the dashboard displays a red !Deleted label (and no package info is available)":
This might help in your discussions. It is, at least for the package info, due to getting imagestream tag from container[0].image value, which does not work when the internal registry is down and also does not work when the image value is digest. See getImageTagByContainer at https://github.com/opendatahub-io/odh-dashboard/pull/800/files#diff-827cc077af838c919ae33204b8e50d538225dd04c4ad5fce0f8783ec22d111c8L17 and https://github.com/opendatahub-io/odh-dashboard/pull/800/files#diff-c87489fce539bf912c474f6890d251f50b251e99e02acaada0284c12912b64f6L28
possibly in other places, too.

Also, I'd suggest not updating the JUPYTER_IMAGE env var to i.e. myregistry.com/bla/image@sha256:3d92929292 kind of digest notation. It was myregistryinternalopenshift:5000/imagestreamname:imagestreamtag before, I know.

Instead, just put imagestreamname:imagestreamtag in there, it is fine, checked with upstream.

https://github.com/opendatahub-io/odh-dashboard/pull/800/files#diff-0e61ca9f1a1c0dd011d4c2e693c5ced1687b2f220aeed574ad3787a7cb388e79L52

@shalberd
Copy link

shalberd commented May 24, 2024

I think it'd help if we were to arrange for a 30-45 minute Zoom call some time in the morning US time, in the afternoon or evening European time.

})
imageHash := items[0].(map[string]interface{})["dockerImageReference"].(string)
notebook.Spec.Template.Spec.Containers[0].Image = imageHash
// Update the JUPYTER_IMAGE environment variable
Copy link

@shalberd shalberd May 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi all, definitely, putting an sha256 style name in for env var JUPYTER_IMAGE, especially for an image (docker.from.name origin), not an imagestream, is not ok.
What I can say for sure is, that, during my work on PR-800, I found out that JUPYTER_IMAGE env var is used during commercial RHOIA / upstream CI testing
https://github.com/red-hat-data-services/ods-ci/blob/master/ods_ci/tests/Resources/Page/ODH/JupyterHub/JupyterHubSpawner.robot#L374
to determine imagestream name and tag.

Meaning I checked with someone back then and also with @VaishnaviHire that it is ok to newly put in imagestreamname:imagestreamtag into that env var, as I did in my changes to dashboard PR-800.

Copy link

@shalberd shalberd May 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, as I mentioned before, ODH notebook controller is the wrong place to handle this sort of stuff.
Much better in my view to start declaratively with odh dashboard and the image change trigger annotation.
There are ways to handle long-running notebooks, avoid container restarts, even with that, but we have to put in a slider or checkmark in dashboard GUI to make notebook long-running, set annotation field to paused: true after initial startup.

@shalberd
Copy link

shalberd commented May 28, 2024

@lucferbux @harshad16 about long-running notebooks and avoiding changes to the image-field when the underlying digest of an imagestream tag / docker.image.from change:

The way to do this is:

setting paused: true right away does not work, understandably, the image-field of the container is not updated.

apiVersion: apps/v1
kind: StatefulSet
metadata:
  annotations:
    image.openshift.io/triggers: >-
      [{"from":{"kind":"ImageStreamTag","name":"jupyter-minimal-notebook:jupyter-minimal-py39-2024a",
      "namespace":"tst-opendatahub"},"fieldPath":"spec.template.spec.containers[?(@.name==\"notebook\")].image",
      "paused": true}]
  name: annotationtest
  namespace: tst-datascienceprojectsns
spec:
  serviceName: notebook
  replicas: 1
  selector:
    matchLabels:
      app: notebook
  template:
    metadata:
      labels:
        app: notebook
    spec:
      terminationGracePeriodSeconds: 10
      containers:
        - name: notebook
          image: anyplaceholdertext
spec:
  replicas: 1
  selector:
    matchLabels:
      app: notebook
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: notebook
    spec:
      containers:
        - name: notebook
          image: anyplaceholdertext
          resources: {}

first, set paused: false

kind: StatefulSet
apiVersion: apps/v1
metadata:
  name: annotationtest
  namespace: tst-datascienceprojectsnamespace
  uid: 03b8169a-2990-4685-9e7f-61c4866bb510
  resourceVersion: '760152699'
  generation: 1
  creationTimestamp: '2024-05-28T06:57:30Z'
  annotations:
    image.openshift.io/triggers: >-
      [{"from":{"kind":"ImageStreamTag","name":"jupyter-minimal-notebook:jupyter-minimal-py39-2024a",
      "namespace":"tst-opendatahub"},"fieldPath":"spec.template.spec.containers[?(@.name==\"notebook\")].image",
      "paused": false}]

image field is resolved, we're letting image change admission plugin resolve the name of the container image

spec:
  replicas: 1
  selector:
    matchLabels:
      app: notebook
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: notebook
    spec:
      containers:
        - name: notebook
          image: >-
            registry.bla.com/blasubrepo/workbench-images@sha256:cf96395eee320f6e7cde1bf2ea223a9d235d32668bbe86a2aca0bd73d09cc95a

now, as mentioned, if we want to avoid any pod restarts to to image digest changes behind a floating tag, for long-running-notebooks, we need to enable in odh dashboard a GUI slider that sets paused: true in the notebook annotation so no more updates happen should the underlying digest change, thus no restarts.

log.Info("Imagestream not found in any of the specified namespaces", "imagestreamName", imagestreamName, "tag", tag)
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice from us to provide some log/warning for the else branches in these 2 cases (ifs on lines 473 and 474).

if annotations != nil {
if imageSelection, exists := annotations["notebooks.opendatahub.io/last-image-selection"]; exists {
// Check if the image selection has an internal registry, if so will pickup this. This value constructed on the initialization of the Notebook CR.
if strings.Contains(notebook.Spec.Template.Spec.Containers[0].Image, "image-registry.openshift-image-registry.svc:5000") {
Copy link
Member

@jstourac jstourac May 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure the container we're interested in is first always in all cases? Wouldn't it be better to check based on the container name which should match name of the Notebook? The other container - oauth-proxy - contains direct quay link from the beginning.

Copy link

@shalberd shalberd May 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is that way the kind: Notebooks with their podspec are assembled by odh dashboard, i.e. that the notebook container is always the first one, index 0. But it is not guaranteed.
Here, also, as you can see in my comments elsewhere, the admission plugin works with fieldPath-based lookups of the container by container name, which is a good idea.

Copy link

@shalberd shalberd May 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"The other container - oauth-proxy - contains direct quay link from the beginning."
Correct, that is done by odh notebook controller. It is part of an oauth-argument in the odh-notebook-controller-manager pod container and evaluated by odh notebook controller.

if annotations != nil {
if imageSelection, exists := annotations["notebooks.opendatahub.io/last-image-selection"]; exists {
// Check if the image selection has an internal registry, if so will pickup this. This value constructed on the initialization of the Notebook CR.
if strings.Contains(notebook.Spec.Template.Spec.Containers[0].Image, "image-registry.openshift-image-registry.svc:5000") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another question - are we sure that the value for the internal registry will always be this way? Is there a possibility that there could be some custom value?

Why don't we want to check presence of internal registry via ImageStream status.dockerImageRepository value as empty (no internal registry enabled) vs non-empty (internal registry present)? This seem to me as a more proper approach than to check with what the Notebook CR was created 🤔

Copy link

@shalberd shalberd May 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, that is what the image content change trigger admission plugin, which we are not using up to now (part of PR 800) is doing on the openshift side as well, openshift-internally via golang.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants