Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection refused issue #1183

Open
isbn390 opened this issue Oct 16, 2024 · 2 comments
Open

Connection refused issue #1183

isbn390 opened this issue Oct 16, 2024 · 2 comments

Comments

@isbn390
Copy link

isbn390 commented Oct 16, 2024

Hi, I am a beginner working with spark and dotnet. Let me explain my setup first.
I have a spark master worker setup deployed using bitnami helm chart. Image is custom made to include deltalake and I have a deltatable created and stored in the azure datalake. My requirement was to create an api to take some argument and query the deltatable in the azure datalake. Inorder to do the processing I need spark right?. So I thought I could call the spark in my AKS, pass the arguments to the spark, spark will query it from deltatable located in the azure and return back me the output. I created the api but upon running i am getting the below error

System.Net.Internals.SocketExceptionFactory+ExtendedSocketException (111): Connection refused 127.0.0.1:5567

I tried the master url with an external ip, spark headless service, even local port. I presume the port and host is of the visual studio debug configuration. But how can I configure the session builder in the code? Is there any alternative solution to my requirement? Please share some insights , I believe spark-kubernetes-spark.net is a somewhat popular setup, hoping some could help.
Thanks

@dbeavon
Copy link

dbeavon commented Oct 17, 2024

If you just want to read a parquet file or deltatable from a storage account, then Spark may be overkill.
Especially as a beginner.

Can you start with Parquet.Net from nuget and point at the file from your API? That is what I would do. I probably wouldn't even use delta tables if you can get by with regular parquet files.

Spark is for massively parallel algorithms and transformations. If you aren't working with millions of rows, and you don't have timing constraints, then you probably don't need Spark..

@isbn390
Copy link
Author

isbn390 commented Oct 17, 2024

Thanks @dbeavon , I will check it out, but still, do you have any idea about the connection issue.

Update: For simplicity, my api will only read the file and show the results. I installed spark locally, started master and a worker, manually built the code using dotnet build. Used the created .dll file to run spark-submit on my cmd line.

spark-submit ^
--packages io.delta:delta-core_2.12:1.2.0,org.apache.hadoop:hadoop-azure:3.2.0 ^
--class org.apache.spark.deploy.dotnet.DotnetRunner ^
--master spark://192.168.1.53:7077 ^
microsoft-spark-3-2_2.12-2.1.1.jar ^
dotnet ConsoleApp.dll

this setup is working fine and it's local. So for replicating entire setup remotely. I created a docker image out of the code and deployed in my AKS, there is a swagger interface for testing and it is showing the connection issue. If there any way I can set the host and port, even if I set my spark cluster address in my session builder, api is connecting to localhost:5567.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants