-
Notifications
You must be signed in to change notification settings - Fork 328
Description
Hi team,
I’m hitting a runtime failure when using DataStreamWriter.Foreach(IForeachWriter) on Azure Databricks Runtime 16.4 LTS (includes Apache Spark 3.5.2, Scala 2.12) with Microsoft.Spark 2.3.0. The job fails inside SQLUtils.createPythonFunction with a java.lang.NoSuchMethodError
referring to org.apache.spark.api.python.SimplePythonFunction
constructor (the one that includes PythonAccumulatorV2 in the signature).
Environment
Platform: Azure Databricks (cluster)
Databricks Runtime: 16.4 LTS (Apache Spark 3.5.2, Scala 2.12)
Microsoft.Spark (NuGet): 2.3.0
Cluster JAR: microsoft-spark-3-5_2.12-2.3.0.jar
Microsoft.Spark.Worker: 2.3.0 (linux-x64)
DOTNET_WORKER_DIR (driver + executors):
/usr/local/bin/spark-dotnet/Microsoft.Spark.Worker-2.3.0/Microsoft.Spark.Worker
SPARK_HOME: /databricks/spark (Databricks default)
Target framework: net8.0 (app)
OS/Java: Databricks managed (Ubuntu base, Java 17 on DBR 16.x)
What happens
The streaming query crashes right after calling WriteStream().Foreach(...).
Driver log excerpt:
[Error] [JvmBridge] JVM method execution failed: Static method 'createPythonFunction' failed for class 'org.apache.spark.sql.api.dotnet.SQLUtils' when called with 7 arguments ([Index=1, Type=Byte[], Value=System.Byte[]], [Index=2, Type=Hashtable, Value=Microsoft.Spark.Interop.Internal.Java.Util.Hashtable], [Index=3, Type=ArrayList, Value=Microsoft.Spark.Interop.Internal.Java.Util.ArrayList], [Index=4, Type=String, Value=Microsoft.Spark.Worker], [Index=5, Type=String, Value=2.3.0.0], [Index=6, Type=ArrayList, Value=Microsoft.Spark.Interop.Internal.Java.Util.ArrayList], [Index=7, Type=null, Value=null], ) [Error] [JvmBridge] java.lang.NoSuchMethodError: 'void org.apache.spark.api.python.SimplePythonFunction.<init>(scala.collection.Seq, java.util.Map, java.util.List, java.lang.String, java.lang.String, java.util.List, org.apache.spark.api.python.PythonAccumulatorV2)' at org.apache.spark.sql.api.dotnet.SQLUtils$.createPythonFunction(SQLUtils.scala:35) at org.apache.spark.sql.api.dotnet.SQLUtils.createPythonFunction(SQLUtils.scala) ... Unhandled exception. System.Exception: JVM method execution failed: Static method 'createPythonFunction' failed for class 'org.apache.spark.sql.api.dotnet.SQLUtils' ... ---> Microsoft.Spark.JvmException: java.lang.NoSuchMethodError: 'void org.apache.spark.api.python.SimplePythonFunction.<init>(scala.collection.Seq, java.util.Map, java.util.List, java.lang.String, java.lang.String, java.util.List, org.apache.spark.api.python.PythonAccumulatorV2)' at org.apache.spark.sql.api.dotnet.SQLUtils$.createPythonFunction(SQLUtils.scala:35) ... at Microsoft.Spark.Utils.UdfUtils.CreatePythonFunction(...) at Microsoft.Spark.Sql.Streaming.DataStreamWriter.Foreach(IForeachWriter writer)
Minimal repro
using System;
using Microsoft.Spark.Sql;
using Microsoft.Spark.Sql.Streaming;
class NoopWriter : IForeachWriter
{
public bool Open(long partitionId, long epochId) => true;
public void Process(Row value) { /* no-op / }
public void Close(Exception errorOrNull) { / no-op */ }
}
class Program
{
static void Main(string[] args)
{
var spark = SparkSession.Builder()
.AppName("Foreach-NoSuchMethod-Repro")
.GetOrCreate();
var df = spark.ReadStream().Format("rate").Load();
var query = df.WriteStream()
.OutputMode("append")
.Foreach(new NoopWriter())
.Start();
query.AwaitTermination();
}
}
Cluster setup notes
Only one microsoft-spark JAR present: microsoft-spark-3-5_2.12-2.3.0.jar.
Microsoft.Spark.Worker 2.3.0 installed on all nodes at
/usr/local/bin/spark-dotnet/Microsoft.Spark.Worker-2.3.0/Microsoft.Spark.Worker.
Spark config (driver & executors) points DOTNET_WORKER_DIR to that path.
Expected behavior
The streaming sink starts; NoopWriter receives rows.
Actual behavior
The query fails immediately with NoSuchMethodError in SQLUtils.createPythonFunction referencing the SimplePythonFunction constructor (with PythonAccumulatorV2 parameter).
Notes / Hypothesis
This looks like a signature mismatch against Spark’s internal Python function wrapper on Databricks Spark 3.5.2 when going through the Foreach(IForeachWriter) path.
The same app works locally with upstream Spark 3.5.x; the failure appears specific to this DBR build path where SQLUtils.createPythonFunction expects a constructor signature that isn’t present.