Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTPS Proxy Support #38

Open
NateDawg97 opened this issue Feb 11, 2021 · 2 comments
Open

HTTPS Proxy Support #38

NateDawg97 opened this issue Feb 11, 2021 · 2 comments

Comments

@NateDawg97
Copy link

NateDawg97 commented Feb 11, 2021

Is there support for using this to connect from a local workstation to a remote AWS Glue Hive Catalog when the local client workstation has to go through an HTTP proxy?

For instance, with Spark, one can set the following to enable using HTTP proxy for accessing s3 data remotely into a Spark dataframe. Is there something equivalent for this Glue Hive catalog?

    .config("spark.hadoop.fs.s3a.proxy.host","myproxy_url.com") \
    .config("spark.hadoop.fs.s3a.proxy.port","2929") \
    .config("spark.hadoop.fs.s3a.connection.ssl.enabled", True) \
@ams1
Copy link

ams1 commented Sep 6, 2022

Hi,

First of all thanks for making this available!

I also am trying to connect from local spark to remote glue datacatalog via proxy.

I tried to set the proxy on the JVM via:

spark = (
SparkSession.builder
.config("spark.driver.extraJavaOptions", "-Dhttps.proxyHost=aaa -Dhttps.proxyPort=aaa -Dhttps.proxyUser=aaa -Dhttps.proxyPassword=aaa")
.getOrCreate()
)

but i still get ... Caused by: java.net.UnknownHostException: glue.hidden_region.amazonaws.com (I've hidden the region - which is as expected).

Anything else I could try?

Thanks!

P.S.: @NateDawg97: did you manage to fix it?

@ams1
Copy link

ams1 commented Sep 6, 2022

Well, for anyone interested, I managed to get the proxy configured from pyspark via:

spark._jvm.java.lang.System.setProperty("https.proxyHost","aaa")
spark._jvm.java.lang.System.setProperty("https.proxyPort","aaa")
spark._jvm.java.lang.System.setProperty("https.proxyUser","aaa")
spark._jvm.java.lang.System.setProperty("https.proxyPassword","aaa")

Maybe it's like shooting an ant with a cannon, but it works 😄.

Now, when in local spark I do spark.sql("show databases").show() I can see the dbs from the aws glue datacatalog.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants