Skip to content

[BUG]: NoSuchMethodError in SQLUtils.createPythonFunction on Databricks 16.4 LTS (Spark 3.5.2) with Microsoft.Spark 2.3.0 when using DataStreamWriter.Foreach #1220

@ngquoctrong

Description

@ngquoctrong

Hi team,
I’m hitting a runtime failure when using DataStreamWriter.Foreach(IForeachWriter) on Azure Databricks Runtime 16.4 LTS (includes Apache Spark 3.5.2, Scala 2.12) with Microsoft.Spark 2.3.0. The job fails inside SQLUtils.createPythonFunction with a java.lang.NoSuchMethodError referring to org.apache.spark.api.python.SimplePythonFunction constructor (the one that includes PythonAccumulatorV2 in the signature).

Environment

Platform: Azure Databricks (cluster)
Databricks Runtime: 16.4 LTS (Apache Spark 3.5.2, Scala 2.12)
Microsoft.Spark (NuGet): 2.3.0
Cluster JAR: microsoft-spark-3-5_2.12-2.3.0.jar
Microsoft.Spark.Worker: 2.3.0 (linux-x64)
DOTNET_WORKER_DIR (driver + executors):
/usr/local/bin/spark-dotnet/Microsoft.Spark.Worker-2.3.0/Microsoft.Spark.Worker
SPARK_HOME: /databricks/spark (Databricks default)
Target framework: net8.0 (app)
OS/Java: Databricks managed (Ubuntu base, Java 17 on DBR 16.x)

What happens

The streaming query crashes right after calling WriteStream().Foreach(...).
Driver log excerpt:
[Error] [JvmBridge] JVM method execution failed: Static method 'createPythonFunction' failed for class 'org.apache.spark.sql.api.dotnet.SQLUtils' when called with 7 arguments ([Index=1, Type=Byte[], Value=System.Byte[]], [Index=2, Type=Hashtable, Value=Microsoft.Spark.Interop.Internal.Java.Util.Hashtable], [Index=3, Type=ArrayList, Value=Microsoft.Spark.Interop.Internal.Java.Util.ArrayList], [Index=4, Type=String, Value=Microsoft.Spark.Worker], [Index=5, Type=String, Value=2.3.0.0], [Index=6, Type=ArrayList, Value=Microsoft.Spark.Interop.Internal.Java.Util.ArrayList], [Index=7, Type=null, Value=null], ) [Error] [JvmBridge] java.lang.NoSuchMethodError: 'void org.apache.spark.api.python.SimplePythonFunction.<init>(scala.collection.Seq, java.util.Map, java.util.List, java.lang.String, java.lang.String, java.util.List, org.apache.spark.api.python.PythonAccumulatorV2)' at org.apache.spark.sql.api.dotnet.SQLUtils$.createPythonFunction(SQLUtils.scala:35) at org.apache.spark.sql.api.dotnet.SQLUtils.createPythonFunction(SQLUtils.scala) ... Unhandled exception. System.Exception: JVM method execution failed: Static method 'createPythonFunction' failed for class 'org.apache.spark.sql.api.dotnet.SQLUtils' ... ---> Microsoft.Spark.JvmException: java.lang.NoSuchMethodError: 'void org.apache.spark.api.python.SimplePythonFunction.<init>(scala.collection.Seq, java.util.Map, java.util.List, java.lang.String, java.lang.String, java.util.List, org.apache.spark.api.python.PythonAccumulatorV2)' at org.apache.spark.sql.api.dotnet.SQLUtils$.createPythonFunction(SQLUtils.scala:35) ... at Microsoft.Spark.Utils.UdfUtils.CreatePythonFunction(...) at Microsoft.Spark.Sql.Streaming.DataStreamWriter.Foreach(IForeachWriter writer)

Minimal repro

using System;
using Microsoft.Spark.Sql;
using Microsoft.Spark.Sql.Streaming;

class NoopWriter : IForeachWriter
{
public bool Open(long partitionId, long epochId) => true;
public void Process(Row value) { /* no-op / }
public void Close(Exception errorOrNull) { /
no-op */ }
}

class Program
{
static void Main(string[] args)
{
var spark = SparkSession.Builder()
.AppName("Foreach-NoSuchMethod-Repro")
.GetOrCreate();

    var df = spark.ReadStream().Format("rate").Load();

    var query = df.WriteStream()
        .OutputMode("append")
        .Foreach(new NoopWriter())
        .Start();

    query.AwaitTermination();
}

}

Cluster setup notes

Only one microsoft-spark JAR present: microsoft-spark-3-5_2.12-2.3.0.jar.
Microsoft.Spark.Worker 2.3.0 installed on all nodes at
/usr/local/bin/spark-dotnet/Microsoft.Spark.Worker-2.3.0/Microsoft.Spark.Worker.
Spark config (driver & executors) points DOTNET_WORKER_DIR to that path.

Expected behavior

The streaming sink starts; NoopWriter receives rows.

Actual behavior

The query fails immediately with NoSuchMethodError in SQLUtils.createPythonFunction referencing the SimplePythonFunction constructor (with PythonAccumulatorV2 parameter).

Notes / Hypothesis

This looks like a signature mismatch against Spark’s internal Python function wrapper on Databricks Spark 3.5.2 when going through the Foreach(IForeachWriter) path.
The same app works locally with upstream Spark 3.5.x; the failure appears specific to this DBR build path where SQLUtils.createPythonFunction expects a constructor signature that isn’t present.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions