Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardization can fail with Task not serializable when casting to date is requested #57

Open
yruslan opened this issue Aug 22, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@yruslan
Copy link
Contributor

yruslan commented Aug 22, 2024

Describe the bug

When conversion from an integer to a date in format of yyyyMM (possibly YYYYMM), Standardization had thrown a big error message starting from:

Job aborted. (Job aborted due to stage failure: Task not serializable: = java.io.NotSerializableException: za.co.absa.standardization.config.Default= StandardizationConfig$ Serialization stack: - object not serializable (class: za.co.absa.standardization.config.Defaul= tStandardizationConfig$, value: za.co.absa.standardization.config.DefaultSt= andardizationConfig$@3ba3e058) - element of array (index: 3) - array (class [Ljava.lang.Object;, size 5) - field (class: java.lang.invoke.SerializedLambda, name: capturedArgs, typ= e: class [Ljava.lang.Object;) - object (class java.lang.invoke.SerializedLambda, SerializedLambda[captur= ingClass=3Dclass za.co.absa.standardization.udf.UDFBuilder$, functionalInte= rfaceMethod=3Dscala/Function1.apply:(Ljava/lang/Object;)Ljava/lang/Object;,= implementation=3DinvokeStatic za/co/absa/standardization/udf/UDFBuilder$.$= anonfun$stringUdfViaNumericParser$1:(Lza/co/absa/standardization/types/pars= ers/NumericParser;ZLjava/lang/String;Lza/co/absa/standardization/config/Sta= ndardizationConfig;Lscala/Option;Ljava/lang/String;)Lza/co/absa/standardiza= tion/udf/UDFResult;, instantiatedMethodType=3D(Ljava/lang/String;)Lza/co/ab= sa/standardization/udf/UDFResult;, numCaptured=3D5]) - writeReplace data (class: java.lang.invoke.SerializedLambda) - object (class za.co.absa.standardization.udf.UDFBuilder$$$Lambda$5214/36= 0225670,

To Reproduce

Steps to reproduce the behavior OR commands run:

  1. Go to '...'
  2. Use value '...'
  3. Run using
  4. See error

Try also to have an integer type with metadata:

{"name": "MY_DATE", "type": "integer", "metadata": {"pattern": "yyyyMM", "timezone": "Africa/Johannesburg"}, "nullable": true}

Expected behavior

  • The standardization should have data errors if incorrect date format is passed, not failing the pipeline.
  • If the above is not an option, at least, the error message should say which coulmn or condition have caused the error.

Screenshots

If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Versions of libraries (Spark, Scala, ...)
  • Version [e.g. 22]

Additional context

Add any other context about the problem here.

@yruslan yruslan added the bug Something isn't working label Aug 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant