-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DO NOT MERGE: Make binary string encoding configurable, add support for base64 #770
base: master
Are you sure you want to change the base?
DO NOT MERGE: Make binary string encoding configurable, add support for base64 #770
Conversation
@sfc-gh-lsembera what do you think about supporting base64 as well? Also, could we do it automatically instead of adding a parameter? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sfc-gh-lsembera what do you think about supporting base64 as well?
I am fine with supporting base64, as well. I just don't like the removal of surrounding quotes - we don't do it for any other data type, and I think we shouldn't do it for base64 either.
Also, could we do it automatically instead of adding a parameter?
We cannot do it automatically. Without additional information, we cannot tell if a string is base64- or hex-encoded.
if(binaryStringEncoding == Constants.BinaryStringEncoding.BASE64) { | ||
try { | ||
String stringInput = ((String) input).trim(); | ||
// Remove double quotes if present |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not strip surrounding quotes. We don't do it for hex and neither does it work in Snowflake Worksheets.
@@ -888,76 +890,76 @@ public void testValidateAndParseBinary() throws DecoderException { | |||
assertArrayEquals( | |||
"honk".getBytes(StandardCharsets.UTF_8), | |||
validateAndParseBinary( | |||
"COL", "honk".getBytes(StandardCharsets.UTF_8), Optional.empty(), 0)); | |||
"COL", "honk".getBytes(StandardCharsets.UTF_8), Optional.empty(), 0, Constants.BinaryStringEncoding.HEX)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need the same test coverage for base64, as we have for hex.
import java.util.Map; | ||
import java.util.Optional; | ||
import java.util.Set; | ||
import java.util.*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no wildcard imports.
@@ -64,6 +67,9 @@ public class ParameterProvider { | |||
public static final Constants.BdecParquetCompression BDEC_PARQUET_COMPRESSION_ALGORITHM_DEFAULT = | |||
Constants.BdecParquetCompression.GZIP; | |||
|
|||
public static final Constants.BinaryStringEncoding BINARY_STRING_ENCODING_DEFAULT = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add tests to ParameterProviderTest
.
Thanks for confirming, then I think this should be a channel level configuration instead of the whole client, take a look at what we did for |
The problem with having the configuration per-channel is that it is not as easily configurable from KC. |
@sfc-gh-tzhang @sfc-gh-lsembera, based on the requirement to work with kafka connector, I think this config should apply to the whole client. Also, I suspect that most streaming use cases with embedded binary in Avro messages will be either 100% hex encoded or 100% base64 encoded, based on the particular tooling used in the customer's environment, so the need to support mixed encodings would be less likely. |
…#759) * SNOW-1357377 Add request Id in all streaming ingest APIs * Fix tests and log requestId * Format * Remove request_id and no star imports
…sor 0.21 (snowflakedb#774) SNOW-1457523: Fix CVE for snowflake-ingest-java io.airlift:aircompressor 0.21
…b#782) * Add Microbenchmark for Insert rows * Various performance improvements in the insertRows path * Fix tests and format * Make flush threads daemon to allow jvm to exit if these threads are active * Remove commented out line * Make memory info provider a singleton because we dont want multiple instances of it * Address review comments * Lower benchmark row count to make gh actions happy * Review comments * Mark these deps as test-only and check memory every 100ms
…akedb#779) SNOW-1457523: Upgrade Parquet dependencies version to fix CVE
…eption from JDBC (snowflakedb#780) We have seen many complains from customer about having false positive ERROR log from JDBC, since we will refresh the token only if it expires. This change switches to actively refresh the token before it expires to avoid the ERROR log. Refresh stage metadata automatically in the background to avoid exceptions encountered during file upload in uploadWithoutConnection snowflakedb#753 Testing is covered by existing long running end2end test
…e client was created with (snowflakedb#794) * Reject new stage metadata if the deployment id does not match what the client was created with * Review comments
* SNOW-1545879 Reduce the max channel/chunk sizes * Fix flaky test * Trigger build
* update protobuf * license merge
Co-authored-by: Hitesh Madan <[email protected]>
Fix PRIMARY_FILE_ID_KEY
V2.2.0 Release
Addresses issue where streaming client receives base64-encoded strings for binary.
Adds new parameter
BINARY_STRING_ENCODING
which supports valueshex
andbase64
. The default value ishex
.