Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support converting zero bytes and high-bit-set bytes to octal sequences for VARBINARY #405

Open
Obsecurus opened this issue Nov 5, 2020 · 4 comments

Comments

@Obsecurus
Copy link

Obsecurus commented Nov 5, 2020

PostgreSQL has a function encode(data, 'escape') where data is a bytea.
https://www.postgresql.org/docs/9.4/functions-binarystring.html

Currently when using sqlline w/ the Athena JDBC the default output for VARBINARY is to convert as hex. It would be extremely useful the have the ability to support this output mode directly in sqlline for any VARBINARY column when set.

@snuyanzin
Copy link
Collaborator

@Obsecurus thank you for the raised issue
Not sure that this covers all your needs but anyway there is escapeOutput property which also allows to do escaping.
!set escapeOutput true

@Obsecurus
Copy link
Author

we've tried escapeOutput but it does not generate the desired results. We've use the Athena JAR with raw Python and get the desired result. I imagine anyone working with binary data containing ascii would want this mode. I suspect sqlline doesn't have this rendering mode with any of the drivers.

@Obsecurus
Copy link
Author

Here is an example select on a VARBINARY column from Athena when using the Python pyathenajdbc library which is a wrapper for the Athena JAR. Notice how the ASCII converts but other other bytes are left as-is:

b'\x03\x00\x00+&\xe0\x00\x00\x00\x00\x00Cookie: mstshash=hello\r\n\x01\x00\x08\x00\x03\x00\x00\x00'

@snuyanzin
Copy link
Collaborator

yes, I understand what you mean.
currently i see 2 ways:

  1. Use existing java byte array to string + as it is just a byte representation inside db it makes sense to specify encoding e.g. via property (as there could be ascii, utf-8, utf-16 and etc.). However all non-mapped symbols will be just replaced by default with \uFFFD (java's default while conversion from byte array)
  2. To do replacement with something depending on byte value need extra efforts. + Also makes sense to have a property to define format e.g. octal sequence, hex sequence and so on

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants