Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

show event details when starting a new notebook server #1608

Merged

Conversation

christianvogt
Copy link
Contributor

Closes: #1607

Description

The notebook server is starting but cannot start due to insufficient resources. If the Notebook resource doesn't have a notebooks.kubeflow.org/last-activity annotation, the UI doesn't display any errors.

Now the user gets an error:
image
And events log:
image

How Has This Been Tested?

  1. create a new project
  2. create a new workbench and use a large size. Consider adding a new notebookSizes entry to the OdhDashboardConfig if your notebook server still starts.
  3. Observe the pod not starting.
  4. Check the dashboard UI by clicking on Starting... and viewing the event log
  5. Expect errors to be displayed if there is insufficient resources to start the server
  6. Check the Notebook k8s resource for the notebooks.kubeflow.org/last-activity annotation. It should not be present.

Test Impact

no new tests added

Request review criteria:

Self checklist (all need to be checked):

  • The developer has manually tested the changes and verified that the changes work
  • Commits have been squashed into descriptive, self-contained units of work (e.g. 'WIP' and 'Implements feedback' style messages have been removed)
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has added tests or explained why testing cannot be added (unit tests & storybook for related changes)

If you have UI changes:

  • Included any necessary screenshots or gifs if it was a UI change.
  • Included tags to the UX team if it was a UI/UX change.

After the PR is posted & before it merges:

  • The developer has tested their solution on a cluster by using the image produced by the PR to main

@christianvogt christianvogt changed the title show event details if spawn in progress without last activity timestamp [WIP] show event details if spawn in progress without last activity timestamp Jul 28, 2023
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress This PR is in WIP state label Jul 28, 2023
@christianvogt
Copy link
Contributor Author

put PR into WIP state while I try to understand more about the notebook controller culling feature and how notebooks.kubeflow.org/last-activity gets updated.

@andrewballantyne
Copy link
Member

put PR into WIP state while I try to understand more about the notebook controller culling feature and how notebooks.kubeflow.org/last-activity gets updated.

This is done by the Notebook Controller. We don't ever mess with it. The Culler is not a Dashboard feature, we just set the values so it will cull idle pods and cleanup resource usage.

@christianvogt
Copy link
Contributor Author

christianvogt commented Jul 28, 2023

Well I learned a bit about culling now.

There is no last-activity annotation at all unless culling is enabled. Therefore the checks made where if no activity we bail on showing events is wrong when culling is disabled.

We have two notebook utilities with duplicate code:
utilities/notebookControllerUtils and pages/projects/notebook/utils

#640 added a change for some edge case related to last activity which is in one utility but not the other.

@shalberd
Copy link
Contributor

There is no last-activity annotation at all unless culling is enabled

Correct, nice catch in odh dashboard code (when to show events)

@shalberd
Copy link
Contributor

We have two notebook utilities with duplicate code: utilities/notebookController and pages/projects/notebook/utils

Isn't one for the dashboard project jupyter tile and the other for other projects called data science projects?

@christianvogt
Copy link
Contributor Author

We have two notebook utilities with duplicate code: utilities/notebookController and pages/projects/notebook/utils

Isn't one for the dashboard project jupyter tile and the other for other projects called data science projects?

TBH I don't know the history of these files. But I'd like to minimize duplicate logic so that we don't apply fixes in one area only when it needs to be in both.

@DaoDaoNoCode
Copy link
Member

@christianvogt @shalberd Yes, there are some duplicated codes for Jupyter Notebook Controller (KFNBC) and Data Science Project Workbenches. The notebook controller relies a lot on the old backend code which we want to deprecate in the future, but the util functions were implemented there first. Then we have the concept of data science projects and workbenches, where @andrewballantyne imports the pass-through API, where we move all the concepts after that to the frontend. However, the spawner page of both are similar, but also different, and we (I think it was probably me) tried to differentiate the concept of the notebook controller and the workbenches. So I copied some of the util functions instead of reusing them.

@shalberd
Copy link
Contributor

shalberd commented Jul 31, 2023

@DaoDaoNoCode yeah, ok, so I remembered half-way correctly

It is ok for me, I think we leave eventually getting rid of backend / traditional jupyter tile logic to you.

@christianvogt
Copy link
Contributor Author

@DaoDaoNoCode gave me a bit of a history lesson on the code and helped me understand more about the difference in where data is fetched.

For now the simple solution is to address the immediate issue. I've updated both places with fallbacks when the last-activity annotation isn't present.

@christianvogt christianvogt changed the title [WIP] show event details if spawn in progress without last activity timestamp show event details when starting a new notebook server Jul 31, 2023
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress This PR is in WIP state label Jul 31, 2023
@DaoDaoNoCode
Copy link
Member

I think the PR could also be a fix for #579 ?

@lucferbux
Copy link
Contributor

I think the PR could also be a fix for #579 ?

Yes, that might be the case, after we merge this @christianvogt I think we might wanna do a rundown of all the issues related, it seems that this bug has been tracked several times already 😂

Copy link
Contributor

@lucferbux lucferbux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@lucferbux
Copy link
Contributor

/hold

we are not expecting to be merging features until next week cc @andrewballantyne

@openshift-ci openshift-ci bot added the do-not-merge/hold This PR is hold for some reason label Aug 11, 2023
@Gkrumbach07 Gkrumbach07 removed their assignment Aug 14, 2023
@alexcreasy
Copy link
Contributor

Just to reaffirm the hold on this PR - there's ambiguity around what should make it into rods 1.32, until we clear that up we're not merging anything.

@andrewballantyne
Copy link
Member

/unhold

Release done, unholding.

@openshift-ci openshift-ci bot removed the do-not-merge/hold This PR is hold for some reason label Aug 17, 2023
Copy link
Member

@Gkrumbach07 Gkrumbach07 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 17, 2023

[APPROVALNOTIFIER] This PR is APPROVED

Approval requirements bypassed by manually added approval.

This pull-request has been approved by: Gkrumbach07, lucferbux

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit 0b041ae into opendatahub-io:main Aug 17, 2023
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: event log empty for new notebook that is failing to start
8 participants