You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What's the issue
In the parquet files generated from the conversion, strings are encoded in base64. It occurs to all the string fields, which may diverge from user's intentions.
Take RelationWriteSupport.java as an example. memberRoleType = new PrimitiveType(REQUIRED, BINARY, "role");
In the above piece of code, we call this constructor of primitiveType,
we are actually setting its logicalTypeAnnotation to null. Therefore, parquet converter knows nothing about its actual type, then uses its default way to convert it as a binary - which is base64.
How to fix
To fix, we can set the logicalTypeAnnotation parameter to stringtype. We know the tags are actully in string format, it should be safe to do so, and parquet convert will be aware the field is string and convert it using UTF-8 instead of base64.
The text was updated successfully, but these errors were encountered:
What's the issue
In the parquet files generated from the conversion, strings are encoded in base64. It occurs to all the string fields, which may diverge from user's intentions.
Take RelationWriteSupport.java as an example.
memberRoleType = new PrimitiveType(REQUIRED, BINARY, "role");
In the above piece of code, we call this constructor of
primitiveType
,we are actually setting its logicalTypeAnnotation to null. Therefore, parquet converter knows nothing about its actual type, then uses its default way to convert it as a binary - which is base64.
How to fix
To fix, we can set the
logicalTypeAnnotation
parameter tostringtype
. We know the tags are actully in string format, it should be safe to do so, and parquet convert will be aware the field is string and convert it using UTF-8 instead of base64.The text was updated successfully, but these errors were encountered: