Skip to content

Conversation

Else00
Copy link

@Else00 Else00 commented Aug 11, 2025

This PR introduces support for the BSON (Binary JSON) format, allowing jaq to process BSON files and streams.

Motivation

BSON is a widely used data serialization format, especially in conjunction with MongoDB. Extending jaq's capabilities to handle BSON makes it a more versatile tool for developers working in these ecosystems.

Implementation

The implementation takes a non-invasive approach by decoding BSON data into a JSON string representation before passing it to the existing filter engine. This allows jaq to leverage its powerful JSON processing and rendering logic without modification.

Specifically, BSON documents are converted to their Canonical Extended JSON string representation.

Key Changes

  • New CLI Flags: Adds a --bson-input (short: -b) flag to explicitly treat input as BSON.
  • Automatic Detection: Automatically detects files with the .bson extension. This detection can be overridden by other format flags.
  • BSON Parser Module: Introduces a new jaq/src/bson.rs module to encapsulate all BSON decoding logic.
  • Dependencies: Adds the bson crate as a dependency and enables the serde_json feature on jaq-json for the conversion.
  • Testing: Includes unit tests for the BSON parser, covering both simple documents and BSON-specific types like ObjectId and DateTime.
  • Documentation: The help.txt file has been updated to include the new flags.

Example Usage

# Automatically detect .bson file
jaq . data.bson

# Explicitly parse BSON from stdin
cat data.bson | jaq --bson-input '.data | length'

@01mf02
Copy link
Owner

01mf02 commented Aug 12, 2025

Hi @Else00, thanks for your PR!

First of all, may I ask you whether you used an LLM to create your code? I am critical towards using LLMs and do not want machine-generated code in my repository.

Next, there is a large PR going on (#284) that adds functionality which will clash with your CLI changes, in particular --bson-input. This will be --from bson. If you want to have this PR merged, you will need to base it on my work in #284.

Finally, I'm very critical about your path going from BSON -> serde_json::Value -> jaq_json::Val. This will be pretty bad for performance, and I'm also concerned about whether this can preserve certain kinds of values (e.g. big integers), because every transformation step may lose information. In general, I do not want serde as regular dependency for jaq, if I can somehow avoid it.

To get an idea of how I could imagine BSON support to look like in jaq, you may inspire yourself by looking at cbor.rs in my PR. There, I deserialise CBOR from lexer tokens --- which is the route I also take for XML and JSON, just to name two. That approach is not that hard and is much better for performance, dependencies etc. If you manage to get something like that working for BSON, then I will be much more positive about integrating it.

@01mf02
Copy link
Owner

01mf02 commented Sep 23, 2025

I'm closing this for lack of response.

@01mf02 01mf02 closed this Sep 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants