Skip to content
This repository has been archived by the owner on Dec 8, 2022. It is now read-only.

Issues connecting TLJH with Enterprise Gateway #51

Closed
EToledoR opened this issue Mar 28, 2022 · 8 comments
Closed

Issues connecting TLJH with Enterprise Gateway #51

EToledoR opened this issue Mar 28, 2022 · 8 comments

Comments

@EToledoR
Copy link

Hi,

I have a Spark+Hadopp+Yarn cluster and a installation of TLJH (The littles JupyterHub) and Jupyter Enterprise Gateway in the head node of the cluster.

I am trying to connect TLJH to Enterprise Gateway using DockerSpawner and the image elyra/nb2kg.
I can start a regular jupyter notebook and connect it to the Enterprise Gateway and see the notebook running as a container in Yarn. However it fails to start when I try to do it through JupyterHub.

I have been following this tutorial: https://ideonate.com/DockerSpawner-in-TLJH/ but using the elyra image as said.
And I have been following the steps in the issue #32 but it seems to not work.

This is how my tljh config file looks like:

c.JupyterHub.spawner_class = 'dockerspawner.DockerSpawner'

c.DockerSpawner.image = 'elyra/nb2kg:2.4.0'

c.DockerSpawner.args = '--gateway-url='http://node-head:8887' --NotebookApp.session_manager_class=nb2kg.managers.SessionManager --NotebookApp.kernel_manager_class=nb2kg.managers.RemoteKernelManager --NotebookApp.kernel_spec_manager_class=nb2kg.managers.RemoteKernelSpecManager'

from jupyter_client.localinterfaces import public_ips

c.JupyterHub.hub_ip = public_ips()[0]

c.DockerSpawner.name_template = "{prefix}-{username}-{servername}"

And the JEG is running by this:
jupyter enterprisegateway --ip=0.0.0.0 --port_retries=0 --port=8887 --config='/usr/local/share/jupyter/kernels/spark_python_yarn_cluster/kernel.json' --EnterpriseGatewayApp.yarn_endpoint='http://node-head:8088/ws/v1'

Am I missing any configuration or any of the config I have in place is wrong?

Thanks in advance.

Eduardo

@EToledoR
Copy link
Author

EToledoR commented Mar 28, 2022

Additionaly when I run the image directly with docker with:

sudo docker run -t --rm -e gateway-url='http://node-head:8887' -e LOG_LEVEL=DEBUG -e NotebookApp.session_manager_class=nb2kg.managers.SessionManager -e NotebookApp.kernel_manager_class=nb2kg.managers.RemoteKernelManager -e NotebookApp.kernel_spec_manager_class=nb2kg.managers.RemoteKernelSpecManager -v ${HOME}/notebooks/:/tmp/notebooks -p 8888:8888 -w /tmp/notebooks elyra/nb2kg:2.4.0

The container is up and I can navigate and open a JupyterNotebook but I cannot see the kernels specified in the gateway.
I am having the following error:

Uncaught exception GET /api/kernelspecs (X.X.X.X)
HTTPServerRequest(protocol='http', host='localhost:8003', method='GET', uri='/api/kernelspecs', version='HTTP/1.1', remote_ip='X.X.X.X'

@kevin-bates
Copy link
Member

Hi @EToledoR.

As of the Notebook 6.0 release, use of the nb2kg is no longer necessary to communicate with an EG server for kernel lifecycle management - the extension is "built-in" to the server. If you're using Jupyter Lab >= 3, then that server is Jupyter Server (and supports the same behavior). The fact that you're defining both --gateway-url and all of the class overrides, indicates you're conflating these configurations. You should only need to set the --gateway-url option and the class overrides will be taken care of.

That said, there will be additional configuration steps necessary and, because EG is running within Docker, I suspect you'll need the single-response address changes in the 2.6 release to make this work so spawned kernels know where to send their connection information back to the Docker container.

Besides the issue you reference, you might also want to check out this one: jupyter-server/enterprise_gateway#620. We could really use a section on configuring EG for use by Hub in our new Operators Guide, and it would be fantastic if this could be contributed once this is resolved.

Btw, this is not a valid configuration option: --config='/usr/local/share/jupyter/kernels/spark_python_yarn_cluster/kernel.json' and your kernel spec directories should be properly configured within your EG image (or, preferably, via a mount so you can make adjustments as necessary).

Because NB2KG is no longer used and this should be documented in the EG repo, I'm going to transfer this issue. (See you over on EG!)

@kevin-bates
Copy link
Member

Hi @blink1073 - I (now) suspect the reason the meeseeks bot isn't performing the migrate command is because I just realized I'm not an admin on this repo. Are you able to add admin rights for me? (Thanks)

@blink1073
Copy link
Member

Are you able to add admin rights for me?

Done!

@EToledoR
Copy link
Author

Hi Kevin,

First of all thanks so much for answering this so quickly.

I think I didn't explain properly my setup.
I don't EG running on a docker instance. I have, at the moment, both EG and JupyterHub both running in the head node of my Yarn cluster. The idea is to spawn Jupyter notebooks through JupyterHub using DockerSpawner and that the resources needed for those notebooks were assigned by EG in the Yarn cluster. And my understanding is that in order to achieve that I should use the nb2kg image for the DockerSpawner and tweak a bit the configuration but I am bit confused now that those are not the moving pieces I need for the system.

We are not using standalone notebooks (I tested the gateway with them thought) neither JupyterServer as we would like to have multiple users, hence the use of JupyterHub.

On the other hand, is YarnSpawner currently out of support? I believe that Spawner in JupyterHub allows to spawn notebooks taking resources from Yarn directly , what will make this setup easier. But it seems that the project is not being actively supported.

Again thanks. I am probably doing some basic mistakes here.

Eduardo

@kevin-bates
Copy link
Member

I don't EG running on a docker instance. I have, at the moment, both EG and JupyterHub both running in the head node of my Yarn cluster. The idea is to spawn Jupyter notebooks through JupyterHub using DockerSpawner and that the resources needed for those notebooks were assigned by EG in the Yarn cluster. And my understanding is that in order to achieve that I should use the nb2kg image for the DockerSpawner and tweak a bit the configuration but I am bit confused now that those are not the moving pieces I need for the system.

Per my comment, and assuming the version of notebook you're using is >= 6.0, you do not need NB2KG nor do you need to configure the KernelManager, KernelSpecManager or SessionManager classes that NB2KG provides. Instead, you minimally need to set --gateway-url and, when set, the Notebook (or JupyterServer) instance (depending on which front-end you're using) will automatically configure the kernel lifecycle management classes to redirect to the EG located at wherever --gateway-url points.

Because Hub spawns Notebook servers, and because (it sounds like) you want your kernels to utilize resources within the YARN cluster, I believe you have two choices.

  1. You can try to leverage YarnSpawner, which (I'm assuming) spawns Notebook servers across a YARN cluster. With this approach, the Notebook server will be fixed on a node within the cluster, so all kernels launched from that Notebook server will reside on that node consuming those resources. In this case, no EG server would come into play.
  2. You can leverage EG and its YarnClusterProcessProxy kernel configuration where the Notebook server (spawned from Hub) can be running anywhere. In this case, you do not need Hub on the head node of the YARN cluster nor do its spawned Notebook servers need to be on the head node (or any node) of the YARN cluster. With this configuration, the Notebook servers are instructed to proxy their kernel lifecycle management to the EG server and the EG server, via its YarnClusterProcessProxy will let the various kernels land within the YARN cluster whereever the YARN resource manager schedules them. As a result, multiple kernels launched from a given Notebook server will reside on different nodes within the YARN cluster.

You should be able to confirm option 2 without Hub by simply pointing a standalone Notebook server at the EG server configured to use YARN. This is a recommended step anyway, prior to configuring Hub-spawned Notebook servers to use EG. YARN deployment information can be found in the Operators Guide here.

On the other hand, is YarnSpawner currently out of support? I believe that Spawner in JupyterHub allows to spawn notebooks taking resources from Yarn directly , what will make this setup easier. But it seems that the project is not being actively supported.

I have no idea, I don't deal with Hub. You might try asking about the status of YarnSpawner in the Hub portion of the Jupyter Community Forum.

@kevin-bates
Copy link
Member

@meeseeksdev migrate to jupyter-server/enterprise_gateway

@lumberbot-app
Copy link

lumberbot-app bot commented Mar 29, 2022

@lumberbot-app lumberbot-app bot closed this as completed Mar 29, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants