Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Binary value type for optimized binary arrays #6

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

nebkat
Copy link

@nebkat nebkat commented May 11, 2024

The original UBJSON solution for binary data was an array of uint8 values. While this does sufficiently address the encoding of such data in the UBJSON format, it does not allow parsers to differentiate between a generic list of numbers and binary data.


When dealing with large quantities of binary data this can have a significant negative impact on performance, as many languages provide optimized storage for binary data that is much more efficient than a standard array.

In the nlohmann C++ JSON library for example, a standard array can require 16 bytes per byte of data, while an optimized binary format would require exactly one.

The introduction of the other unsigned data types in BJData furthers the need for a dedicated byte type. uint8 is no longer the lone unsigned data type, and for parsers to treat uint8 arrays differently as suggested in the UBJSON solution would lead to further confusion.


This proposal aims to address this issue with the introduction of a dedicated byte (B) type. This type would be identical to a uint8, but would be explicitly recommended for serializers/parsers to implement as an optimized data format type. Where such a type is not available, or parsers have not been upgraded to support the format, a standard integer array can be used instead.

C++ provides std::vector<std::byte or uint8_t>, JavaScript provides Uint8Array, Dart provides Uint8List and Python provides bytearray.


UBJSON also states:

BSON, for example, defines types for binary data, regular expressions, JavaScript code blocks and other constructs that have no equivalent data type in JSON. BJSON defines a binary data type as well, again leaving the door wide open to interpretation that can potentially lead to incompatibilities between two implementations of the spec and Smile, while the closest, defines more complex data constructs and generation/parsing rules in the name of absolute space efficiency. These are not short-comings, just trade-offs the different specs made in order to service specific use-cases.

This solution does not fundamentally add any complexity, and without it many may be forced to use these other data formats along with all their baggage in order to achieve the desired efficiency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant