-
-
Notifications
You must be signed in to change notification settings - Fork 728
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to allocate 5.27 EiB ... when trying to access cluster dashboard through wrong URL #8368
Comments
What browser are you using for this?
If I just put in the IP, I get errors like Firefox actually manages to get through to the server and just prints the handshake info (the stuff we're sending over the network to a connecting server) I can see how your exception can be triggered from our server code but the browser must send something rather specific to trigger such a response. |
Ah interesting, didn't consider this. Chrome through JupyterLab proxy. |
I ran into a very similar issue today. tl;dr Running Docker images for all three distributed layers:
I open the Jupyter notebook in Google Chrome via the
I see the printed "yay" in my local worker, and I see the task completion debug logs in my scheduler.
I should also mention that I am able to see the tcp connections from the GCP scheduler into my local worker at the gather step by monitoring the ngrok tunnel stats, so I was able to verify connectivity.
Does my error line up with your expectations here? Any ideas on why this might be happening here? |
I'm not familiar with ngrok so I can't tell what's going on in your case. The way I think the original exception was triggered is that the browser connected to the dask server and the dask server tried to engage in its application side handshake (where it is reading and writing things to the TCP socket). However, instead of receiving plain bytes that correspond to our protocol, it encountered a certain HTTP message that ended up triggering this exception (our protocol is using the first couple of bytes in a message to infer how much data is incoming and we're using this information to efficiently allocate memory. If the first couple of bytes are anything else / random bytes this is easily interpreted as a very big integer). I'm not sure what ngrok does but if it is changing the bytestream even slightly, this could cause such an exception. It could also happen if it is erroneously thinking this connection is using HTTP |
Ah, I see - that makes sense. Thanks for the insight :) |
just starting |
Any updates? Followed the guide to provision a new cluster with k8s operator and hitting this error |
As @fjetter says I think a lot of people landing on this issue are coming here because this error happens when you try and open the Dask TCP port used the communication in a browser. Reproducer steps
This results in the This is expected behaviour. You're openening a TCP only connection in a web browser. If you're trying to access the dashboard you need to connect to a different port http://localhost:8787. The discussion about ngrok is interesting. Ngrok supports HTTP proxying (layer 7) and TCP proxying (layer 4). They support both modes as there are pros/cons to each, see this article to learn more. I assume that folks who are running into issues are using HTTP proxying instead of TCP proxying, which results in the same error as when you open the TCP port in a browser. The fix for this should just be to use the TCP proxying. I'm going to close this issue out as "wontfix" as hopefully this comment solves most folks problems. I've also opened #8905 to track improving the failure mode of opening the TCP port in a browser. If there are still ngrok related issues that happens when using the TCP proxying then I encourage folks to open a new issue with steps to reproduce the issue so we can look further into it. |
It seems that this is indeed a bug that needs to be fixed. |
@tbazadaykin they need to ping the port via TCP, not HTTP. You mentioned Kubernetes so I assume you're talking about a liveness probe, so you need to use a TCP probe. If you make an HTTP request to a non-HTTP port there is no guarantee that it will behave as expected. |
Sorry for screenshot, I don't have copy and paste or GitHub access on that machine.
Describe the issue:
When you try to open the dashboard through the link printed by
print(client)
, you trigger this exception in the schedulerMinimal Complete Verifiable Example:
Try to open that URL in the browser (I thought it's the dashboard URL).
Anything else we need to know?:
Environment:
The text was updated successfully, but these errors were encountered: