Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues connecting TLJH with Enterprise Gateway #1058

Open
lumberbot-app bot opened this issue Mar 29, 2022 · 8 comments
Open

Issues connecting TLJH with Enterprise Gateway #1058

lumberbot-app bot opened this issue Mar 29, 2022 · 8 comments
Labels
configuration jupyter hub waiting for author Waiting for information from item's author

Comments

@lumberbot-app
Copy link

lumberbot-app bot commented Mar 29, 2022

Hi,

I have a Spark+Hadopp+Yarn cluster and a installation of TLJH (The littles JupyterHub) and Jupyter Enterprise Gateway in the head node of the cluster.

I am trying to connect TLJH to Enterprise Gateway using DockerSpawner and the image elyra/nb2kg.
I can start a regular jupyter notebook and connect it to the Enterprise Gateway and see the notebook running as a container in Yarn. However it fails to start when I try to do it through JupyterHub.

I have been following this tutorial: https://ideonate.com/DockerSpawner-in-TLJH/ but using the elyra image as said.
And I have been following the steps in the issue jupyter/nb2kg#32 but it seems to not work.

This is how my tljh config file looks like:

c.JupyterHub.spawner_class = 'dockerspawner.DockerSpawner'

c.DockerSpawner.image = 'elyra/nb2kg:2.4.0'

c.DockerSpawner.args = '--gateway-url='http://node-head:8887' --NotebookApp.session_manager_class=nb2kg.managers.SessionManager --NotebookApp.kernel_manager_class=nb2kg.managers.RemoteKernelManager --NotebookApp.kernel_spec_manager_class=nb2kg.managers.RemoteKernelSpecManager'

from jupyter_client.localinterfaces import public_ips

c.JupyterHub.hub_ip = public_ips()[0]

c.DockerSpawner.name_template = "{prefix}-{username}-{servername}"

And the JEG is running by this:
jupyter enterprisegateway --ip=0.0.0.0 --port_retries=0 --port=8887 --config='/usr/local/share/jupyter/kernels/spark_python_yarn_cluster/kernel.json' --EnterpriseGatewayApp.yarn_endpoint='http://node-head:8088/ws/v1'

Am I missing any configuration or any of the config I have in place is wrong?

Thanks in advance.

Eduardo


Originally opened as jupyter/nb2kg#51 by @EToledoR, migration requested by @kevin-bates

@lumberbot-app
Copy link
Author

lumberbot-app bot commented Mar 29, 2022

@EToledoR commented: Additionaly when I run the image directly with docker with:

sudo docker run -t --rm -e gateway-url='http://node-head:8887' -e LOG_LEVEL=DEBUG -e NotebookApp.session_manager_class=nb2kg.managers.SessionManager -e NotebookApp.kernel_manager_class=nb2kg.managers.RemoteKernelManager -e NotebookApp.kernel_spec_manager_class=nb2kg.managers.RemoteKernelSpecManager -v ${HOME}/notebooks/:/tmp/notebooks -p 8888:8888 -w /tmp/notebooks elyra/nb2kg:2.4.0

The container is up and I can navigate and open a JupyterNotebook but I cannot see the kernels specified in the gateway.
I am having the following error:

Uncaught exception GET /api/kernelspecs (X.X.X.X)
HTTPServerRequest(protocol='http', host='localhost:8003', method='GET', uri='/api/kernelspecs', version='HTTP/1.1', remote_ip='X.X.X.X'

@welcome
Copy link

welcome bot commented Mar 29, 2022

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! 🤗

If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively.
welcome
You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! 👋

Welcome to the Jupyter community! 🎉

@lumberbot-app
Copy link
Author

lumberbot-app bot commented Mar 29, 2022

@kevin-bates commented: Hi @EToledoR.

As of the Notebook 6.0 release, use of the nb2kg is no longer necessary to communicate with an EG server for kernel lifecycle management - the extension is "built-in" to the server. If you're using Jupyter Lab >= 3, then that server is Jupyter Server (and supports the same behavior). The fact that you're defining both --gateway-url and all of the class overrides, indicates you're conflating these configurations. You should only need to set the --gateway-url option and the class overrides will be taken care of.

That said, there will be additional configuration steps necessary and, because EG is running within Docker, I suspect you'll need the single-response address changes in the 2.6 release to make this work so spawned kernels know where to send their connection information back to the Docker container.

Besides the issue you reference, you might also want to check out this one: #620. We could really use a section on configuring EG for use by Hub in our new Operators Guide, and it would be fantastic if this could be contributed once this is resolved.

Btw, this is not a valid configuration option: --config='/usr/local/share/jupyter/kernels/spark_python_yarn_cluster/kernel.json' and your kernel spec directories should be properly configured within your EG image (or, preferably, via a mount so you can make adjustments as necessary).

Because NB2KG is no longer used and this should be documented in the EG repo, I'm going to transfer this issue. (See you over on EG!)

@lumberbot-app
Copy link
Author

lumberbot-app bot commented Mar 29, 2022

@kevin-bates commented: Hi @blink1073 - I (now) suspect the reason the meeseeks bot isn't performing the migrate command is because I just realized I'm not an admin on this repo. Are you able to add admin rights for me? (Thanks)

@lumberbot-app
Copy link
Author

lumberbot-app bot commented Mar 29, 2022

@blink1073 commented: > Are you able to add admin rights for me?

Done!

@lumberbot-app
Copy link
Author

lumberbot-app bot commented Mar 29, 2022

@EToledoR commented: Hi Kevin,

First of all thanks so much for answering this so quickly.

I think I didn't explain properly my setup.
I don't EG running on a docker instance. I have, at the moment, both EG and JupyterHub both running in the head node of my Yarn cluster. The idea is to spawn Jupyter notebooks through JupyterHub using DockerSpawner and that the resources needed for those notebooks were assigned by EG in the Yarn cluster. And my understanding is that in order to achieve that I should use the nb2kg image for the DockerSpawner and tweak a bit the configuration but I am bit confused now that those are not the moving pieces I need for the system.

We are not using standalone notebooks (I tested the gateway with them thought) neither JupyterServer as we would like to have multiple users, hence the use of JupyterHub.

On the other hand, is YarnSpawner currently out of support? I believe that Spawner in JupyterHub allows to spawn notebooks taking resources from Yarn directly , what will make this setup easier. But it seems that the project is not being actively supported.

Again thanks. I am probably doing some basic mistakes here.

Eduardo

@lumberbot-app
Copy link
Author

lumberbot-app bot commented Mar 29, 2022

@kevin-bates commented:

I don't EG running on a docker instance. I have, at the moment, both EG and JupyterHub both running in the head node of my Yarn cluster. The idea is to spawn Jupyter notebooks through JupyterHub using DockerSpawner and that the resources needed for those notebooks were assigned by EG in the Yarn cluster. And my understanding is that in order to achieve that I should use the nb2kg image for the DockerSpawner and tweak a bit the configuration but I am bit confused now that those are not the moving pieces I need for the system.

Per my comment, and assuming the version of notebook you're using is >= 6.0, you do not need NB2KG nor do you need to configure the KernelManager, KernelSpecManager or SessionManager classes that NB2KG provides. Instead, you minimally need to set --gateway-url and, when set, the Notebook (or JupyterServer) instance (depending on which front-end you're using) will automatically configure the kernel lifecycle management classes to redirect to the EG located at wherever --gateway-url points.

Because Hub spawns Notebook servers, and because (it sounds like) you want your kernels to utilize resources within the YARN cluster, I believe you have two choices.

  1. You can try to leverage YarnSpawner, which (I'm assuming) spawns Notebook servers across a YARN cluster. With this approach, the Notebook server will be fixed on a node within the cluster, so all kernels launched from that Notebook server will reside on that node consuming those resources. In this case, no EG server would come into play.
  2. You can leverage EG and its YarnClusterProcessProxy kernel configuration where the Notebook server (spawned from Hub) can be running anywhere. In this case, you do not need Hub on the head node of the YARN cluster nor do its spawned Notebook servers need to be on the head node (or any node) of the YARN cluster. With this configuration, the Notebook servers are instructed to proxy their kernel lifecycle management to the EG server and the EG server, via its YarnClusterProcessProxy will let the various kernels land within the YARN cluster whereever the YARN resource manager schedules them. As a result, multiple kernels launched from a given Notebook server will reside on different nodes within the YARN cluster.

You should be able to confirm option 2 without Hub by simply pointing a standalone Notebook server at the EG server configured to use YARN. This is a recommended step anyway, prior to configuring Hub-spawned Notebook servers to use EG. YARN deployment information can be found in the Operators Guide here.

On the other hand, is YarnSpawner currently out of support? I believe that Spawner in JupyterHub allows to spawn notebooks taking resources from Yarn directly , what will make this setup easier. But it seems that the project is not being actively supported.

I have no idea, I don't deal with Hub. You might try asking about the status of YarnSpawner in the Hub portion of the Jupyter Community Forum.

@kevin-bates
Copy link
Member

@EToledoR any updates on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
configuration jupyter hub waiting for author Waiting for information from item's author
Projects
None yet
Development

No branches or pull requests

1 participant