Connection refused issue #1183

isbn390 · 2024-10-16T12:23:51Z

Hi, I am a beginner working with spark and dotnet. Let me explain my setup first.
I have a spark master worker setup deployed using bitnami helm chart. Image is custom made to include deltalake and I have a deltatable created and stored in the azure datalake. My requirement was to create an api to take some argument and query the deltatable in the azure datalake. Inorder to do the processing I need spark right?. So I thought I could call the spark in my AKS, pass the arguments to the spark, spark will query it from deltatable located in the azure and return back me the output. I created the api but upon running i am getting the below error

System.Net.Internals.SocketExceptionFactory+ExtendedSocketException (111): Connection refused 127.0.0.1:5567

I tried the master url with an external ip, spark headless service, even local port. I presume the port and host is of the visual studio debug configuration. But how can I configure the session builder in the code? Is there any alternative solution to my requirement? Please share some insights , I believe spark-kubernetes-spark.net is a somewhat popular setup, hoping some could help.
Thanks

dbeavon · 2024-10-17T02:55:50Z

If you just want to read a parquet file or deltatable from a storage account, then Spark may be overkill.
Especially as a beginner.

Can you start with Parquet.Net from nuget and point at the file from your API? That is what I would do. I probably wouldn't even use delta tables if you can get by with regular parquet files.

Spark is for massively parallel algorithms and transformations. If you aren't working with millions of rows, and you don't have timing constraints, then you probably don't need Spark..

isbn390 · 2024-10-17T09:49:35Z

Thanks @dbeavon , I will check it out, but still, do you have any idea about the connection issue.

Update: For simplicity, my api will only read the file and show the results. I installed spark locally, started master and a worker, manually built the code using dotnet build. Used the created .dll file to run spark-submit on my cmd line.

spark-submit ^
--packages io.delta:delta-core_2.12:1.2.0,org.apache.hadoop:hadoop-azure:3.2.0 ^
--class org.apache.spark.deploy.dotnet.DotnetRunner ^
--master spark://192.168.1.53:7077 ^
microsoft-spark-3-2_2.12-2.1.1.jar ^
dotnet ConsoleApp.dll

this setup is working fine and it's local. So for replicating entire setup remotely. I created a docker image out of the code and deployed in my AKS, there is a swagger interface for testing and it is showing the connection issue. If there any way I can set the host and port, even if I set my spark cluster address in my session builder, api is connecting to localhost:5567.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Connection refused issue #1183

Connection refused issue #1183

isbn390 commented Oct 16, 2024

dbeavon commented Oct 17, 2024

isbn390 commented Oct 17, 2024 •

edited

Loading

Connection refused issue #1183

Connection refused issue #1183

Comments

isbn390 commented Oct 16, 2024

dbeavon commented Oct 17, 2024

isbn390 commented Oct 17, 2024 • edited Loading

isbn390 commented Oct 17, 2024 •

edited

Loading