Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Improvement]: Object Serialization Optimization and Support #3355

Open
3 tasks done
czy006 opened this issue Dec 9, 2024 · 2 comments
Open
3 tasks done

[Improvement]: Object Serialization Optimization and Support #3355

czy006 opened this issue Dec 9, 2024 · 2 comments
Assignees

Comments

@czy006
Copy link
Contributor

czy006 commented Dec 9, 2024

Search before asking

  • I have searched in the issues and found no similar issues.

What would you like to be improved?

Currently, we use Java serialization and Kyro serialization. This serialization method may have some issues, including low performance, We use Kyro serialization for PUT and GET operations on Rocksdb, which is a lookup join feature in Mixed Format

In the objects we store in the database, we also need to serialize and deserialize. During the upgrade process, we occasionally encounter deserialization errors and issues (as shown in the figure below)

Through research, we found that Apache Fury can improve serialization performance and solve deserialization problems. We will provide performance testing reports in the future to compare before and after replacement

How should we improve?

  • Abstract resource serialization interface, implementation of native serialization in current Java, implementation of Kyro serialization
  • Implement Fury serialization and provide configuration options for Fury serialization, while marking other serialization methods as expired
  • When the Amoro LTS version is completed, we will remove the implementation of Kyro serialization

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Subtasks

No response

Code of Conduct

@czy006
Copy link
Contributor Author

czy006 commented Dec 9, 2024

This problem is caused by our upgrading to different iceberg versions. There seems to be nothing we can do to intervene

java.lang.IllegalArgumentException: deserialization error 
	at org.apache.amoro.utils.SerializationUtil.simpleDeserialize(SerializationUtil.java:68)
	at org.apache.amoro.optimizer.spark.SparkOptimizerExecutor.jobDescription(SparkOptimizerExecutor.java:85)
	at org.apache.amoro.optimizer.spark.SparkOptimizerExecutor.executeTask(SparkOptimizerExecutor.java:58)
	at org.apache.amoro.optimizer.common.OptimizerExecutor.start(OptimizerExecutor.java:53)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.InvalidClassException: org.apache.iceberg.BaseFile; local class incompatible: stream classdesc serialVersionUID = 4493876333706690896, local class serialVersionUID = -6272254142325460014
	at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:687)
	at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1883)
	at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1749)
	at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1883)
	at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1749)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2040)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
	at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1973)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1565)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
	at org.apache.amoro.utils.SerializationUtil.simpleDeserialize(SerializationUtil.java:65)
	... 4 more

@ihadoop
Copy link
Contributor

ihadoop commented Dec 21, 2024

i think json is the best way for object serialization. it can be parsed by any language.Anyway, using java default serialization isn't the efficient. Kyro is enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants