Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Py4JJavaError: An error occurred while calling z:com.johnsnowlabs.util.start.registerListenerAndStartRefresh. : java.net.SocketTimeoutException: connect timed out #10

Open
uzairahmadxy opened this issue Oct 24, 2022 · 16 comments
Assignees

Comments

@uzairahmadxy
Copy link

Hi guys. I'm trying to run spark NLP for healthcare locally and I seem to have the compatible versions of spark/java but it still throws an error (screenshots attached).
Anyone face this?
image
image


import json
import os

# Loading license key
with open('key.json') as f:
    license_keys = json.load(f)

# Defining license key-value pairs as local variables
locals().update(license_keys)
os.environ.update(license_keys)

# Installing pyspark and spark-nlp
! pip install --upgrade -q pyspark==3.1.2 spark-nlp==$PUBLIC_VERSION

# Installing Spark NLP Healthcare
! pip install --upgrade -q spark-nlp-jsl==$JSL_VERSION  --extra-index-url https://pypi.johnsnowlabs.com/$SECRET

!pyspark --version

!pip show spark-nlp-jsl

!pip show spark-nlp

import json
import os

from pyspark.ml import Pipeline, PipelineModel
from pyspark.sql import SparkSession

import sparknlp
import sparknlp_jsl

from sparknlp.annotator import *
from sparknlp_jsl.annotator import *
from sparknlp.base import *
from sparknlp.util import *
from sparknlp.pretrained import ResourceDownloader
from pyspark.sql import functions as F

import pandas as pd

pd.set_option('display.max_columns', None)  
pd.set_option('display.expand_frame_repr', False)
pd.set_option('max_colwidth', None)

import string
import numpy as np

params = {"spark.driver.memory":"16G",
          "spark.kryoserializer.buffer.max":"2000M",
          "spark.driver.maxResultSize":"2000M"}

spark = sparknlp_jsl.start(secret = SECRET, params=params)

print ("Spark NLP Version :", sparknlp.version())
print ("Spark NLP_JSL Version :", sparknlp_jsl.version())

spark
@maziyarpanahi maziyarpanahi transferred this issue from JohnSnowLabs/spark-nlp Oct 24, 2022
@uzairahmadxy
Copy link
Author

uzairahmadxy commented Oct 24, 2022

I forgot to mention I have a trial Healthcare license.

@C-K-Loan
Copy link
Member

@uzairahmadxy can you share the full error trace from the notebook and also check your jupyter shell for any errors and share those?

@C-K-Loan C-K-Loan assigned C-K-Loan and unassigned C-K-Loan Oct 27, 2022
@uzairahmadxy
Copy link
Author

Hi @C-K-Loan. Here's the additional information

image
image

@C-K-Loan
Copy link
Member

Thank you for sharing @uzairahmadxy
Looks like something is not correctly setup with your hadoop utils.
Make sure to precisely follow every step listed here https://nlp.johnsnowlabs.com/docs/en/install#windows-support
This should fix all your issues

@uzairahmadxy
Copy link
Author

Hi @C-K-Loan

I re-installed everything using the instructions. It still throws the error (note: I don't see the Hadoop utils error now in the jupyter kernel though).
image

@C-K-Loan
Copy link
Member

C-K-Loan commented Oct 31, 2022

Nice that's one less error!
@uzairahmadxy can you test running this open source notebook and see if it works or not ?

https://github.com/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/Certification_Trainings/Public/1.SparkNLP_Basics.ipynb
You can skip the cells with pip install

Also could you copy paste the entire error trace you get here or https://pastebin.com/

@uzairahmadxy
Copy link
Author

Hi @C-K-Loan This is for the healthcare notebook kernel (https://pastebin.com/cV6ymZvR)


Also, the training notebook doesn't run. Here are the traces for the open source notebook:
Python Interpreter Error: https://pastebin.com/XiXLxnnT
Jupyter Kernel: https://pastebin.com/v7jn0EBr

Side note: Pyspark works ok (as shown in the screenshot. I thought there was an issue with spark before)
image

@C-K-Loan
Copy link
Member

C-K-Loan commented Nov 3, 2022

Thank you for sharing @uzairahmadxy

Looks like the jar loaded into you spark session is missing some classes.
But you should have downloaded the fat jar, i.e. the one with all the dependencies when running sparknlp.start()

@uzairahmadxy
Can you try manually downloading the Spark-NLP jar and then start a Spark-Session by passing the path to it?
I.e. Download : https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-assembly-4.2.2.jar

Then instead of sparknlp.start() run the following and try continue running the rest of the Notebook 1

spark =  SparkSession.builder \
    .appName("Spark NLP")\
    .master("local[*]")\
    .config("spark.driver.memory","16G")\
    .config("spark.driver.maxResultSize", "0") \
    .config("spark.kryoserializer.buffer.max", "2000M")\
    .config("spark.jars", "path/to/the/spark-nlp.jar")\
    .getOrCreate()

Maybe this is a Windows Specific bug, I think @josejuanmartinez is on Windows have you maybe seen this?

@josejuanmartinez
Copy link
Contributor

Hey I am not on Windows anymore sorry

@uzairahmadxy
Copy link
Author

Thanks @C-K-Loan. Manually loading the jar worked for the basic spark nlp.

I guess the same will have to be done for using the healthcare library as well. Can you please share where I can get these from?

@C-K-Loan
Copy link
Member

C-K-Loan commented Nov 7, 2022

Hi @uzairahmadxy, great good to know that this works and sorry for the bug

to get the healthcare jar :
replace secret with your healthcare Secret and lib_version and you will have the URL.
https://pypi.johnsnowlabs.com/{secret}/spark-nlp-jsl-{lib_version}.jar
i.e. if the secret is 4.2.1.agdfgdgdl the url would be
https://pypi.johnsnowlabs.com/4.2.1.agdfgdgdl/spark-nlp-jsl-4.2.1.jar

@Meryem1425 can you see if you run into the same issue on Windows?

@uzairahmadxy
Copy link
Author

Thank you for sharing @C-K-Loan

While the jars are loaded, the problem still persists as I want to load pretrained healthcare models/pipelines.
image

Error Trace: https://pastebin.com/xtkJKVLk
Jupyter Kernel: https://pastebin.com/fznqEBvq

Side note: In order to manually download the healthcare model from the models hub, I'm assuming I have to specify the secret. How do we do download that?

@Cabir40
Copy link
Contributor

Cabir40 commented Nov 10, 2022

Can you test if your license is valid by running it on this notebook?

Can you share the last versions you used?
(java? pyspark?, spark-nlp?, spark-nlp-jsl?)

if you want to download manually? you can use this script, and in this notebook there is same example

from sparknlp.pretrained import ResourceDownloader
ResourceDownloader.downloadModelDirectly("clinical/models/embeddings_clinical_en_2.4.0_2.4_1580237286004.zip", "clinical/models")  

@uzairahmadxy
Copy link
Author

uzairahmadxy commented Nov 10, 2022

The license works on notebook (tried on Collab).

Here are the versions used:

  • Java 8 (OpenJDK 64-Bit Server VM, 1.8.0_345)
  • Pyspark (Version 3.3.1)
  • Spark-NLP (4.2.0)
  • Spark-NLP-JSL (4.2.0)

@Meryem1425
Copy link
Contributor

I followed https://nlp.johnsnowlabs.com/docs/en/install#windows-support that website @uzairahmadxy. I set up correctly. I didn't any bug. Please make sure all stage apply correctly.

image

You have to create java folder, spark folder, hadoop folder and tmp folder under the C folder. And then you have to make sure about set environment variable. Look at stage number 4 and 5.

Could you delete all things and then follow installation step? Thank you

@C-K-Loan
Copy link
Member

@uzairahmadxy I notice you are using openJDK, but Adopt OpenJDK is recommended,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants