Skip to content

Conversation

zuston
Copy link
Member

@zuston zuston commented Aug 28, 2025

What changes were proposed in this pull request?

This is an experimental feature to introduce the fory serializer to replace the villina spark serializer to speed up.

Why are the changes needed?

for #2596

Does this PR introduce any user-facing change?

Yes.

spark.rss.client.shuffle.serializer=FORY

How was this patch tested?

Unit test.

@zuston zuston linked an issue Aug 28, 2025 that may be closed by this pull request
3 tasks
@zuston
Copy link
Member Author

zuston commented Aug 28, 2025

cc @chaokunyang . If you have time, could you help review this integration with Fory?

So far, this implementation hasn’t shown significant improvements. I would greatly appreciate any guidance you could provide on using Fory.

Copy link

github-actions bot commented Aug 28, 2025

Test Results

 2 731 files   - 359   2 731 suites   - 359   4h 10m 44s ⏱️ - 2h 38m 52s
 1 112 tests  -  86   1 026 ✅  - 171   1 💤 ±0   2 ❌ + 2   83 🔥 + 83 
14 465 runs   - 701  14 252 ✅  - 899  15 💤 ±0  32 ❌ +32  166 🔥 +166 

For more details on these failures and errors, see this check.

Results for commit c2a7d46. ± Comparison against base commit d5e689c.

This pull request removes 99 and adds 13 tests. Note that renamed tests count towards both.
org.apache.spark.shuffle.DelegationRssShuffleManagerTest ‑ testCreateFallback
org.apache.spark.shuffle.DelegationRssShuffleManagerTest ‑ testCreateInDriver
org.apache.spark.shuffle.DelegationRssShuffleManagerTest ‑ testCreateInDriverDenied
org.apache.spark.shuffle.DelegationRssShuffleManagerTest ‑ testCreateInExecutor
org.apache.spark.shuffle.DelegationRssShuffleManagerTest ‑ testDefaultIncludeExcludeProperties
org.apache.spark.shuffle.DelegationRssShuffleManagerTest ‑ testExcludeProperties
org.apache.spark.shuffle.DelegationRssShuffleManagerTest ‑ testIncludeProperties
org.apache.spark.shuffle.DelegationRssShuffleManagerTest ‑ testTryAccessCluster
org.apache.spark.shuffle.FunctionUtilsTests ‑ testOnceFunction0
org.apache.spark.shuffle.RssShuffleManagerTest ‑ testCreateShuffleManagerServer
…
org.apache.spark.serializer.ForySerializerTest ‑ ForyDeserializationStream should handle stream operations after close
org.apache.spark.serializer.ForySerializerTest ‑ ForySerializationStream should handle empty stream
org.apache.spark.serializer.ForySerializerTest ‑ ForySerializationStream should handle stream operations after close
org.apache.spark.serializer.ForySerializerTest ‑ ForySerializationStream should serialize and deserialize simple objects
org.apache.spark.serializer.ForySerializerTest ‑ ForySerializer should create new instance
org.apache.spark.serializer.ForySerializerTest ‑ ForySerializer should support relocation of serialized objects
org.apache.spark.serializer.ForySerializerTest ‑ ForySerializerInstance should handle byte arrays
org.apache.spark.serializer.ForySerializerTest ‑ ForySerializerInstance should handle large strings
org.apache.spark.serializer.ForySerializerTest ‑ ForySerializerInstance should handle null values
org.apache.spark.serializer.ForySerializerTest ‑ ForySerializerInstance should serialize and deserialize simple case class
…

♻️ This comment has been updated with latest results.

<dependency>
<groupId>org.apache.fory</groupId>
<artifactId>fory-core</artifactId>
<version>0.12.0</version>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also introduce fory-scala dependency: https://mvnrepository.com/artifact/org.apache.fory/fory-scala

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pity to say that Spark still uses the scala2.x

.withRefTracking(true)
.withCompatibleMode(CompatibleMode.COMPATIBLE)
.requireClassRegistration(false)
.build()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should also register scala serializers and enable scala serialization optimization:

    val f = Fory.builder()
      .withLanguage(Language.JAVA)
      .withRefTracking(true)
      .withCompatibleMode(CompatibleMode.COMPATIBLE)
      .requireClassRegistration(false)
      .withScalaOptimizationEnabled(true)
      .build()
    ScalaSerializers.registerSerializers(f)

See more details on https://fory.apache.org/docs/docs/guide/scala_guide#fory-creation

}

override def deserialize[T: ClassTag](bytes: ByteBuffer): T = {
val array = if (bytes.hasArray) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can pass bytebuffer to fory directly without an extra copy

throw new IllegalStateException("Stream is closed")
}

val bytes = fury.serialize(t.asInstanceOf[AnyRef])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe hold a Fory MemoryBuffer as an instance field in the class, and serialize object into that buffer, then you can get heap buffer from that buffer, and write it into out. In this way, yo u can reduce a copy

}

private def writeInt(value: Int): Unit = {
out.write((value >>> 24) & 0xFF)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just use:

  public void writeInt64(MemoryBuffer buffer, long value) {
    LongSerializer.writeInt64(buffer, value, longEncoding);
  }

  public long readInt64(MemoryBuffer buffer) {
    return LongSerializer.readInt64(buffer, longEncoding);
  }

This will be faster and simple, it also compress data

@chaokunyang
Copy link

Shuffle data should already be binary, is there anything that needs being serialized?

Have you ever benchmark your job to see whether there is bottleneck on serialization?

@zuston
Copy link
Member Author

zuston commented Aug 28, 2025

Big thanks for your quick and patient review. @chaokunyang

Shuffle data should already be binary, is there anything that needs being serialized?

If using the vanilla spark, record is a object class and then serialized into bytes to push to remote shuffle-server. If using the gluten/auron/datafusion-comet, there is no need to serialize.

Have you ever benchmark your job to see whether there is bottleneck on serialization?

Haven't. This PR is still in initial phase

@chaokunyang
Copy link

Only if you are using spark rdd with raw java objects, there will be serialization bottleneck. Such cases are similiar to datastream in flink. We've observed several times of e2e performance speed up for multiple cases.

@zuston
Copy link
Member Author

zuston commented Aug 28, 2025

Only if you are using spark rdd with raw java objects, there will be serialization bottleneck. Such cases are similiar to datastream in flink. We've observed several times of e2e performance speed up for multiple cases.

Thanks for your sharing. Do you mean that there is no need to optimize performance of vanilla spark SQL shuffle serialization ?

@chaokunyang
Copy link

Data record in Spark SQL are alreay binary, there is no serialization happened. I suggest benchmark first before optimizing.

@zuston
Copy link
Member Author

zuston commented Aug 28, 2025

Data record in Spark SQL are alreay binary, there is no serialization happened. I suggest benchmark first before optimizing.

It seems that serialization is still happening. https://github.com/apache/spark/blob/2de0248071035aa94818386c2402169f6670d2d4/core/src/main/scala/org/apache/spark/shuffle/ShuffleWriteProcessor.scala#L57

The product2 contains the Key/Value that will be serializated. refer: https://github.com/apache/spark/blob/47991b074a5a277e1fb75be3a5cc207f400b0b0c/core/src/main/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriter.java#L243

@jerqi
Copy link
Contributor

jerqi commented Sep 1, 2025

The serialization of Spark happens in the shuffle write shuffle stage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] Dedicated faster serialization when shuffle writing/reading
3 participants