-
Notifications
You must be signed in to change notification settings - Fork 17
feat!: switch CBOR library to cborg #13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
BREAKING CHANGE! * Will be much more strict with data types on encode and decode, only IPLD Data Model values will be accepted, all others will be rejected (including `undefined`). No other tags than 42 will be involved in encode or decode. * Will strictly reject decode of data where integers are encoded using more bytes than are required, as per deterministic DAG-CBOR rules. * Strictly only tag 42 and no others. Also no exotic types like Simple Values. * Indefinite length items will not be accepted for decode. * BigInt will be used for integers decoded outside of the safe integer range. They are also accepted for encodes, within the 64-bit range. The exotic bignumber type from borc is no longer supported. * Floats are now always encoded as 64-bit, to match the updated DAG-CBOR spec and the Go implementation. * Uint8Arrays are native through the stack now, `Buffer` isn't required for browser bundling.
PerformanceIt's been very hard to pin down a consistent performance story. Because of the different approaches taken to both encode and decode taken by borc and cborg it's very dependent on the object forms and size and even the way that the codec is used. The majority of performance differences come down to byte array handling in JavaScript and Node.js - allocations, copies, and UTF8 handling. borc's main performance key is being able to lean on Node.js' But this comes at a cost because the "heap" size needs to be known up-front. Which is where some of our headaches with borc have come in and the cborg doesn't use this approach, it just assumes a For encode the approach is totally different - borc allocates a plain array and puts bytes in it, converting it to a In narrow benchmark situations, configured for enough heap, borc is difficult to beat for different object types head-to-head. I've formed a suite of benchmarks that test against randomised IPLD-style data and the results vary a lot. For one-offs, borc is very slow. Its up-front needs are very large and its "fast" mode (that we use here) requires persisting a But, when we end up consuming it, borc becomes a lot slower. The need to deal strictly with The attempt to remove These are some benchmarks consuming
So you can see the wide variability there. I really wanted decode to be much faster than that but it's really difficult to squeeze any more without improvements in JS TypedArray performance. There are some possibilities for flattening the decode process a little more but it will cost in other ways, so I'm avoiding that for now. We also have real-world data to test against. I have a 3.5Gb CAR file containing all blocks that form a single Filecoin headers graph. I can test streaming read/decode time with both borc and cborg versions of On my machine I get numbers like this:
They result in the same output CAR file, so that helps verify consisten encode. Unfortunately it's not the same CAR file as the input file! This is because of Filecoin encoding non-UTF8 data into CBOR |
I've added a commit that disallows |
This PR's changes are mostly centered around the `package.json` file: * Switch from borc to cborg. This gives us more precise typescript types for cbor. ([Here is why js-ipfs switched](ipld/js-dag-cbor#13)) * Remove rollup. It would only apply on the `index.[es5/cjs/umd].js` files and all other modules, e.g. `fs/protocol/private/namefilter.js` (I need that for account recovery) would be unchanged from the output from `tsc`. However, we can just remove rollup by: - Adding `"browser": { crypto: false }` to signal to bundlers to ignore `require("crypto")` in dependencies (noble-ec25519 has these requires, but is careful not to actually use them in browser contexts) - Making dependencies always `import { Buffer } from 'buffer'`. - Creating our `index.min.js` (previously `index.umd.js`) file with something else * Introduce `esbuild` for creating a convenient browser bundle at `dist/index.min.js`. It's fast. Like, <1 second. (Rollup took >20 secs for just the index.umd.js) * Move library files to `lib/`. We keep `dist/index.min.js`, though. I think this distinction is nice. * Introduce export maps. `import "webnative/fs/protocol/private/namefilter"` is now possible instead of `import "webnative/dist/fs/protocol/private/namefilter"`. * Switch to github actions from travis CI * Add some utilities for better logging of errors in puppeteer. We now get console errors and unhandled exceptions that happen *in the browser into our jest output*. * Don't transpile "newer" features like async/await during `tsc`. We require node version >=15, which supports newer features like that anyway.
BREAKING CHANGE!
See cborg for additional details.
IPLD Data Model values will be accepted, all others will be rejected
(including
undefined
). No other tags than 42 will be involved in encodeor decode.
bytes than are required, as per deterministic DAG-CBOR rules.
They are also accepted for encodes, within the 64-bit range. The exotic
bignumber type from borc is no longer supported.
and the Go implementation.
Buffer
isn't required forbrowser bundling.
I will follow up with a comment about performance characteristics.
We may want to hold off on merging this PR until ipld/specs#342 is resolved because it would be good to get strictness about rejecting the IEEE754 specials into here if that's the way we're going to go because that will be a breaking change as well. I have flags in cborg already to handle this on decode and encode can be handled here.