Skip to content

Conversation

@hvanhovell
Copy link
Contributor

Why are the changes needed?

This PR fixes a number of problems with literal handling in Spark Connect:

  • This removes the dataType field in literal, this is an unshipped change. The problem with that particular change is that it is prohibitively easy to create an inconsitent state (e.g. an int can be a binary now...), and it causes duplication.
  • ... TBD...
  • ...TBD...

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Modified existing tests.

Was this patch authored or co-authored using generative AI tooling?

No.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Difference in Decimal handling:

[89.976200000000000000,89.976210000000000000] AS ARRAY(89.976200000000000000BD, 89.976210000000000000BD)#0, [89889.766723100000000000,89889.766723100000000000] AS ARRAY(89889.766723100000000000BD, 89889.766723100000000000BD)#0
[89.97620,89.97621] AS ARRAY(89.97620BD, 89.97621BD)#0, [89889.7667231,89889.7667231] AS ARRAY(89889.7667231BD, 89889.7667231BD)#0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Difference in Binary handling:

0x0806 AS X'0806'#0, 
[8,6] AS ARRAY(8Y, 6Y)#0

expressions.Literal(value, dataType)
}

private object FromProtoToCatalystConverter extends FromProtoConvertor {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This directly converts the literal to its catalyst representation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant