Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long Running Notebook Testing Support #151

Open
harshad16 opened this issue Aug 7, 2023 · 2 comments
Open

Long Running Notebook Testing Support #151

harshad16 opened this issue Aug 7, 2023 · 2 comments
Labels
kind/feature New feature priority/minor An issue with the product that doesn't impact the user much or not at all (ie tech debt)

Comments

@harshad16
Copy link
Member

harshad16 commented Aug 7, 2023

/kind feature

Why you need this feature:

With Notebook-controller changes, we need to be careful with making changes to the reconciliation loop,
the change would result in restarting all the Notebooks already running in a cluster.
As a user of the notebook, might be working/not saving work when the upgrade happens,
so this makes it a reason for concern.

Describe the solution you'd like:

Include Tests for checking long-running notebooks to catch changes that would cause such issues.

Anything else you would like to add:

Previously, we have fallen into the trap as we were missing this kind of testing.
Example:
Original inclusion: 41baf49
reverted due to the restart of notebooks: 5881426

@openshift-ci openshift-ci bot added the kind/feature New feature label Aug 7, 2023
@harshad16 harshad16 added the priority/minor An issue with the product that doesn't impact the user much or not at all (ie tech debt) label Aug 7, 2023
@shalberd
Copy link

shalberd commented Aug 21, 2023

the change would result in restarting all the Notebooks already running in a cluster

It depends on which change you mean.

Regarding the new image change trigger annotation in the notebook CR metadata:

@atheo89 @harshad16 No, it would not. Or, to be more precise: the coming change for auto-fill in of image field value when a Notebook and StatefulSet has the image content change trigger annotation is not critical here. The image change trigger annotation is only added on newly-created notebooks, not to old notebooks.

The controller only changes reconcile logic when annotation is set https://github.com/opendatahub-io/kubeflow/pull/133/files#diff-a436986c5e93996f9035eee6798d064ab8659957e8443eb44b3a4d3db289bde4R207

About the annotation: Done in odh-dashboard code for newly-created notebook CRs via odh-dashboard PR-800. Now, if someone were to deliberately modify an existing notebook via ODH dashboard, setting a different underlying imagestream name and tag, only then would the odh-dashboard annotation be added in a notebook CR. But I don't think that is your use case with long-running notebooks and the notebook controller.

Regarding PodSpec changes done by odh-notebook-controller:

Regarding any changes in the PodSpec, you are correct, though. When for example any argument of the ose-oauth container is modified via odh-notebook-controller, that does have an impact on notebook restarts, independent of the image change trigger annotation mechanism. You could exclude the ose-oauth-section PodSpec from DeepCompare in kubeflow-notebook-controller as well, but I don't think that is a good idea. Better way would be to only change redirectUrl on newly-created notebooks, not on already existing ones (probably odh-notebook-controller code).

@harshad16 @atheo89 regarding long-running notebooks: The fact that a container restarts when any of its parts are modified in the PodSpec is not possible to avoid, I think, in Kubernetes and Openshift terms. You'd have to selectively set the Pod restartPolicy to Never, which opens another can of worms. https://stackoverflow.com/questions/53262133/kubernetes-stop-containers-from-restarting-automatically. The previous change to add shared memory specific volumes and volumemounts to the podSpec was such a case, of course that led to Pod / container restarts.

It really depends on what the team working with long-running notebooks wants and how deep its integration with ODH dashboard is wished to be. When ODH dashboard assembles a notebook CR, it does not set any specific value for the podTemplate restartPolicy, meaning behavior is "Always" https://github.com/opendatahub-io/odh-dashboard/blob/10818c8cd977f46909a29809474504b5e942e6a4/frontend/src/api/k8s/notebooks.ts#L119

@harshad16
Copy link
Member Author

@shalberd this issue is for a general test support, for each PR.
The long running notebook get restarted when there is any change to statefulsets or PodSpec upgrade.
As noted in comment as well.

As pointed in the description , this was opened in response to the some changes , we made
which caused issues with the long running notebook and we had to revert.
and is not related to changes in annotations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature New feature priority/minor An issue with the product that doesn't impact the user much or not at all (ie tech debt)
Projects
Status: 📋 Backlog
Development

No branches or pull requests

2 participants