This repository was archived by the owner on Sep 18, 2023. It is now read-only.

Description
python udf will need extra executorMemoryOverhead, so we decided to use scala udf for udf categorify
While from test, we see big GC overhead of scala-udf
Need to fix this
possibility:
- Categorify class is extends from UserDefinedFunction, while this is not a case class
- input and output encoder?