Skip to content

Commit

Permalink
ARROW-12549: [JS] Table and RecordBatch should not extend Vector, mak…
Browse files Browse the repository at this point in the history
…e JS lib smaller

This pull request addresses a number of issues that requires a more substantial refactor.

The main goals are:
1. Eliminate cruft by dropping support for outdated browsers/environments.
2. Reduce total surface area by eliminating unnecessary `Vector`, `Chunked`, and `Column` classes.
3. Reduce the amount of the library pulled in when Table, RecordBatch, or Vector classes are imported.

In this pull request, we have eliminated type specific Vector classes. There is now only one vector that has a data instance and we use type-specific visitors. Record batches don't inherit from vectors anymore. Neither do Tables. Columns are gone. To create vectors and tables, we now have separate methods that can be easily tree shaken.

We also added tests for the bundles, fixed some issues with bundling in webpack, updated dependencies (including typescript and flatbuffers). We also added memoization to dictionary vectors to reduce the overhead of decoding UTF-8 to strings.

A quick overview of Arrow with the new API: https://observablehq.com/d/9480eccb30a21010.

Also addresses:
* [ARROW-10255](https://issues.apache.org/jira/browse/ARROW-10255)
* [ARROW-11347](https://issues.apache.org/jira/browse/ARROW-11347)
* [ARROW-12548](https://issues.apache.org/jira/browse/ARROW-12548)
* [ARROW-13514](https://issues.apache.org/jira/browse/ARROW-13514)
* [ARROW-10220](https://issues.apache.org/jira/browse/ARROW-10220)
* [ARROW-14933](https://issues.apache.org/jira/browse/ARROW-14933)
* [ARROW-12538](https://issues.apache.org/jira/browse/ARROW-12538)
* [ARROW-12536](https://issues.apache.org/jira/browse/ARROW-12536)

## Performance comparison:

### Master:
```
Prepare Data: 502.401ms
Running "Parse" suite...
dataset: tracks, function: Table.from 15,578 ops/s ±0.67%, 0.064 ms, 94 samples
dataset: tracks, function: readBatches 15,853 ops/s ±0.59%, 0.063 ms, 97 samples
dataset: tracks, function: serialize 969 ops/s ±1.8%, 1 ms, 93 samples
Running "Get values by index" suite...
dataset: tracks, column: lat, length: 1,000,000, type: Float32 78 ops/s ±0.090%, 13 ms, 82 samples
dataset: tracks, column: lng, length: 1,000,000, type: Float32 79 ops/s ±0.090%, 13 ms, 70 samples
dataset: tracks, column: origin, length: 1,000,000, type: Dictionary<Int8, Utf8> 1.59 ops/s ±25%, 563 ms, 9 samples
dataset: tracks, column: destination, length: 1,000,000, type: Dictionary<Int8, Utf8> 1.74 ops/s ±3.2%, 576 ms, 9 samples
Running "Iterate vectors" suite...
dataset: tracks, column: lat, length: 1,000,000, type: Float32 85 ops/s ±0.14%, 12 ms, 74 samples
dataset: tracks, column: lng, length: 1,000,000, type: Float32 85 ops/s ±0.11%, 12 ms, 75 samples
dataset: tracks, column: origin, length: 1,000,000, type: Dictionary<Int8, Utf8> 1.51 ops/s ±3.1%, 657 ms, 8 samples
dataset: tracks, column: destination, length: 1,000,000, type: Dictionary<Int8, Utf8> 1.49 ops/s ±4.0%, 666 ms, 8 samples
Running "Slice toArray vectors" suite...
dataset: tracks, column: lat, length: 1,000,000, type: Float32 2,588 ops/s ±3.0%, 0.4 ms, 74 samples
dataset: tracks, column: lng, length: 1,000,000, type: Float32 2,345 ops/s ±1.7%, 0.43 ms, 73 samples
dataset: tracks, column: origin, length: 1,000,000, type: Dictionary<Int8, Utf8> 1.29 ops/s ±5.3%, 760 ms, 8 samples
dataset: tracks, column: destination, length: 1,000,000, type: Dictionary<Int8, Utf8> 1.28 ops/s ±4.1%, 784 ms, 8 samples
Running "Slice vectors" suite...
dataset: tracks, column: lat, length: 1,000,000, type: Float32 4,212,193 ops/s ±0.23%, 0 ms, 100 samples
dataset: tracks, column: lng, length: 1,000,000, type: Float32 4,400,234 ops/s ±0.80%, 0 ms, 92 samples
dataset: tracks, column: origin, length: 1,000,000, type: Dictionary<Int8, Utf8> 4,764,651 ops/s ±0.13%, 0 ms, 101 samples
dataset: tracks, column: destination, length: 1,000,000, type: Dictionary<Int8, Utf8> 4,763,581 ops/s ±0.050%, 0 ms, 98 samples
Running "DataFrame Iterate" suite...
dataset: tracks, length: 1,000,000 23.1 ops/s ±2.1%, 43 ms, 43 samples
Running "DataFrame Count By" suite...
dataset: tracks, column: origin, length: 1,000,000, type: Dictionary<Int8, Utf8> 535 ops/s ±0.050%, 1.9 ms, 99 samples
dataset: tracks, column: destination, length: 1,000,000, type: Dictionary<Int8, Utf8> 535 ops/s ±0.040%, 1.9 ms, 96 samples
Running "DataFrame Filter-Scan Count" suite...
dataset: tracks, column: lat, length: 1,000,000, type: Float32, test: gt, value: 0 57 ops/s ±0.090%, 18 ms, 75 samples
dataset: tracks, column: lng, length: 1,000,000, type: Float32, test: gt, value: 0 57 ops/s ±0.050%, 18 ms, 74 samples
dataset: tracks, column: origin, length: 1,000,000, type: Dictionary<Int8, Utf8>, test: eq, value: Seattle 99 ops/s ±0.060%, 10 ms, 86 samples
Running "DataFrame Filter-Iterate" suite...
dataset: tracks, column: lat, length: 1,000,000, type: Float32, test: gt, value: 0 37 ops/s ±0.12%, 27 ms, 66 samples
dataset: tracks, column: lng, length: 1,000,000, type: Float32, test: gt, value: 0 37 ops/s ±0.14%, 27 ms, 66 samples
dataset: tracks, column: origin, length: 1,000,000, type: Dictionary<Int8, Utf8>, test: eq, value: Seattle 70 ops/s ±0.45%, 14 ms, 73 samples
Running "DataFrame Direct Count" suite...
dataset: tracks, column: lat, length: 1,000,000, type: Float32, test: gt, value: 0 160 ops/s ±0.040%, 6.3 ms, 83 samples
dataset: tracks, column: lng, length: 1,000,000, type: Float32, test: gt, value: 0 162 ops/s ±0.12%, 6.1 ms, 85 samples
dataset: tracks, column: origin, length: 1,000,000, type: Dictionary<Int8, Utf8>, test: eq, value: Seattle 1.51 ops/s ±5.6%, 664 ms, 8 samples
```

### This branch:

```
Running "vectorFromArray" suite...
from: numbers                  106 ops/s ±1.1%,   9.3 ms, 79 samples
from: booleans                 101 ops/s ±1.4%,   9.8 ms, 76 samples
from: dictionary               105 ops/s ±4.1%,     9 ms, 78 samples
Running "Iterate Vector" suite...
from: uint8Array               896 ops/s ±0.21%,  1.1 ms, 94 samples
from: uint16Array              896 ops/s ±0.82%,  1.1 ms, 94 samples
from: uint32Array              884 ops/s ±0.39%,  1.1 ms, 95 samples
from: uint64Array              285 ops/s ±0.19%,  3.5 ms, 92 samples
from: int8Array                882 ops/s ±0.65%,  1.1 ms, 95 samples
from: int16Array               899 ops/s ±0.37%,  1.1 ms, 95 samples
from: int32Array               887 ops/s ±0.46%,  1.1 ms, 92 samples
from: int64Array               280 ops/s ±0.60%,  3.5 ms, 91 samples
from: float32Array             805 ops/s ±0.86%,  1.2 ms, 90 samples
from: float64Array             814 ops/s ±0.44%,  1.2 ms, 92 samples
from: numbers                  812 ops/s ±0.39%,  1.2 ms, 91 samples
from: booleans                 284 ops/s ±0.14%,  3.5 ms, 92 samples
from: dictionary               298 ops/s ±0.44%,  3.3 ms, 91 samples
from: string                  16.2 ops/s ±3.9%,    59 ms, 45 samples
Running "Spread Vector" suite...
from: uint8Array               360 ops/s ±1.2%,   2.7 ms, 93 samples
from: uint16Array              374 ops/s ±0.55%,  2.6 ms, 92 samples
from: uint32Array              372 ops/s ±1.1%,   2.6 ms, 91 samples
from: uint64Array              164 ops/s ±0.66%,    6 ms, 78 samples
from: int8Array                372 ops/s ±0.64%,  2.7 ms, 96 samples
from: int16Array               380 ops/s ±0.42%,  2.6 ms, 94 samples
from: int32Array               375 ops/s ±0.87%,  2.6 ms, 92 samples
from: int64Array               164 ops/s ±0.64%,  6.1 ms, 86 samples
from: float32Array             327 ops/s ±0.62%,    3 ms, 85 samples
from: float64Array             318 ops/s ±1.1%,   3.1 ms, 91 samples
from: numbers                  326 ops/s ±0.74%,    3 ms, 89 samples
from: booleans                 178 ops/s ±0.92%,  5.6 ms, 84 samples
from: dictionary               189 ops/s ±0.51%,  5.2 ms, 89 samples
from: string                  14.8 ops/s ±3.7%,    65 ms, 41 samples
Running "toArray Vector" suite...
from: uint8Array        28,488,216 ops/s ±0.22%,    0 ms, 101 samples
from: uint16Array       28,777,482 ops/s ±0.41%,    0 ms, 98 samples
from: uint32Array       28,387,333 ops/s ±0.25%,    0 ms, 97 samples
from: uint64Array       23,412,763 ops/s ±0.68%,    0 ms, 97 samples
from: int8Array         21,497,600 ops/s ±0.22%,    0 ms, 94 samples
from: int16Array        21,990,137 ops/s ±0.16%,    0 ms, 101 samples
from: int32Array        21,809,196 ops/s ±0.68%,    0 ms, 96 samples
from: int64Array        20,084,822 ops/s ±0.68%,    0 ms, 93 samples
from: float32Array      18,452,580 ops/s ±0.83%,    0 ms, 96 samples
from: float64Array      18,527,057 ops/s ±0.54%,    0 ms, 92 samples
from: numbers           18,555,045 ops/s ±0.52%,    0 ms, 99 samples
from: booleans                 178 ops/s ±0.43%,  5.6 ms, 84 samples
from: dictionary               189 ops/s ±0.61%,  5.3 ms, 89 samples
from: string                  15.8 ops/s ±0.76%,   63 ms, 43 samples
Running "get Vector" suite...
from: uint8Array               441 ops/s ±1.1%,   2.2 ms, 95 samples
from: uint16Array              441 ops/s ±0.48%,  2.2 ms, 95 samples
from: uint32Array              443 ops/s ±0.23%,  2.2 ms, 96 samples
from: uint64Array              414 ops/s ±0.68%,  2.4 ms, 93 samples
from: int8Array                439 ops/s ±0.30%,  2.3 ms, 95 samples
from: int16Array               447 ops/s ±0.35%,  2.2 ms, 96 samples
from: int32Array               439 ops/s ±0.48%,  2.3 ms, 94 samples
from: int64Array               415 ops/s ±0.17%,  2.4 ms, 97 samples
from: float32Array             472 ops/s ±0.49%,  2.1 ms, 94 samples
from: float64Array             471 ops/s ±0.26%,  2.1 ms, 97 samples
from: numbers                  473 ops/s ±0.22%,  2.1 ms, 98 samples
from: booleans                 429 ops/s ±0.25%,  2.3 ms, 97 samples
from: dictionary               464 ops/s ±0.23%,  2.1 ms, 96 samples
from: string                  17.8 ops/s ±1.3%,    56 ms, 48 samples
Running "Parse" suite...
dataset: tracks, function: read recordBatches
       12,047 ops/s ±0.77%, 0.082 ms, 100 samples
dataset: tracks, function: write recordBatches
        1,028 ops/s ±0.72%, 0.96 ms, 96 samples
Running "Get values by index" suite...
dataset: tracks, column: lat, length: 1,000,000, type: Float32
           46 ops/s ±0.12%,   22 ms, 61 samples
dataset: tracks, column: lng, length: 1,000,000, type: Float32
           46 ops/s ±0.15%,   22 ms, 61 samples
dataset: tracks, column: origin, length: 1,000,000, type: Dictionary<Int8, Utf8>
         25.3 ops/s ±0.37%,   39 ms, 46 samples
dataset: tracks, column: destination, length: 1,000,000, type: Dictionary<Int8, Utf8>
         25.1 ops/s ±0.76%,   39 ms, 46 samples
Running "Iterate vectors" suite...
dataset: tracks, column: lat, length: 1,000,000, type: Float32
           84 ops/s ±0.20%,   12 ms, 73 samples
dataset: tracks, column: lng, length: 1,000,000, type: Float32
           82 ops/s ±0.65%,   12 ms, 72 samples
dataset: tracks, column: origin, length: 1,000,000, type: Dictionary<Int8, Utf8>
           30 ops/s ±0.94%,   33 ms, 54 samples
dataset: tracks, column: destination, length: 1,000,000, type: Dictionary<Int8, Utf8>
           30 ops/s ±0.41%,   33 ms, 54 samples
Running "Slice toArray vectors" suite...
dataset: tracks, column: lat, length: 1,000,000, type: Float32
        2,911 ops/s ±3.3%,  0.33 ms, 86 samples
dataset: tracks, column: lng, length: 1,000,000, type: Float32
        2,765 ops/s ±3.2%,  0.35 ms, 77 samples
dataset: tracks, column: origin, length: 1,000,000, type: Dictionary<Int8, Utf8>
           18 ops/s ±1.2%,    55 ms, 49 samples
dataset: tracks, column: destination, length: 1,000,000, type: Dictionary<Int8, Utf8>
         18.2 ops/s ±0.73%,   54 ms, 50 samples
Running "Slice vectors" suite...
dataset: tracks, column: lat, length: 1,000,000, type: Float32
    4,338,570 ops/s ±0.52%,    0 ms, 94 samples
dataset: tracks, column: lng, length: 1,000,000, type: Float32
    4,341,418 ops/s ±0.41%,    0 ms, 97 samples
dataset: tracks, column: origin, length: 1,000,000, type: Dictionary<Int8, Utf8>
    3,656,243 ops/s ±0.45%,    0 ms, 101 samples
dataset: tracks, column: destination, length: 1,000,000, type: Dictionary<Int8, Utf8>
    3,598,448 ops/s ±1.0%,     0 ms, 97 samples
Running "Spread vectors" suite...
dataset: tracks, column: lat, length: 1,000,000, type: Float32
           16 ops/s ±4.3%,    59 ms, 44 samples
dataset: tracks, column: lng, length: 1,000,000, type: Float32
         16.1 ops/s ±4.2%,    60 ms, 45 samples
dataset: tracks, column: origin, length: 1,000,000, type: Dictionary<Int8, Utf8>
         17.8 ops/s ±1.5%,    55 ms, 49 samples
dataset: tracks, column: destination, length: 1,000,000, type: Dictionary<Int8, Utf8>
         17.6 ops/s ±1.7%,    55 ms, 48 samples
Running "Table" suite...
Iterate, dataset: tracks, numRows: 1,000,000
           27 ops/s ±0.28%,   37 ms, 49 samples
Spread, dataset: tracks, numRows: 1,000,000
         8.73 ops/s ±3.7%,   111 ms, 25 samples
toArray, dataset: tracks, numRows: 1,000,000
         8.15 ops/s ±4.9%,   115 ms, 26 samples
get, dataset: tracks, numRows: 1,000,000
         17.2 ops/s ±0.31%,   58 ms, 47 samples
Running "Table Direct Count" suite...
dataset: tracks, column: lat, numRows: 1,000,000, type: Float32, test: gt, value: 0
           74 ops/s ±0.16%,   14 ms, 77 samples
dataset: tracks, column: lng, numRows: 1,000,000, type: Float32, test: gt, value: 0
           74 ops/s ±0.20%,   14 ms, 77 samples
dataset: tracks, column: origin, numRows: 1,000,000, type: Dictionary<Int8, Utf8>, test: eq, value: Seattle
          80 ops/s ±0.060%,   12 ms, 71 samples
```

Closes apache#10371 from trxcllnt/fea/simplify

Lead-authored-by: [Paul Taylor <[email protected]>]
Co-authored-by: Dominik Moritz <[email protected]>
Co-authored-by: ptaylor <[email protected]>
Signed-off-by: Dominik Moritz <[email protected]>
  • Loading branch information
trxcllnt and domoritz committed Jan 16, 2022
1 parent 7029f90 commit 20b66c2
Show file tree
Hide file tree
Showing 285 changed files with 13,932 additions and 14,555 deletions.
2 changes: 1 addition & 1 deletion .env
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ JDK=8
KARTOTHEK=latest
LLVM=12
MAVEN=3.5.4
NODE=14
NODE=16
NUMPY=latest
PANDAS=latest
PYTHON=3.8
Expand Down
6 changes: 3 additions & 3 deletions .github/workflows/js.yml
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ env:
jobs:

docker:
name: AMD64 Debian 11 NodeJS 14
name: AMD64 Debian 11 NodeJS 16
runs-on: ubuntu-latest
if: ${{ !contains(github.event.pull_request.title, 'WIP') }}
timeout-minutes: 60
Expand Down Expand Up @@ -75,7 +75,7 @@ jobs:
strategy:
fail-fast: false
matrix:
node: [14]
node: [16]
steps:
- name: Checkout Arrow
uses: actions/checkout@v2
Expand All @@ -99,7 +99,7 @@ jobs:
strategy:
fail-fast: false
matrix:
node: [14]
node: [16]
steps:
- name: Checkout Arrow
uses: actions/checkout@v2
Expand Down
2 changes: 1 addition & 1 deletion ci/docker/conda-integration.dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ FROM ${repo}:${arch}-conda-cpp

ARG arch=amd64
ARG maven=3.5
ARG node=14
ARG node=16
ARG jdk=8
ARG go=1.15

Expand Down
2 changes: 1 addition & 1 deletion ci/docker/debian-10-js.dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
# under the License.

ARG arch=amd64
ARG node=14
ARG node=16
FROM ${arch}/node:${node}

ENV NODE_NO_WARNINGS=1
Expand Down
2 changes: 1 addition & 1 deletion ci/docker/debian-11-js.dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
# under the License.

ARG arch=amd64
ARG node=14
ARG node=16
FROM ${arch}/node:${node}

ENV NODE_NO_WARNINGS=1
Expand Down
2 changes: 1 addition & 1 deletion ci/docker/linux-apt-docs.dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ RUN /arrow/ci/scripts/util_download_apache.sh \
ENV PATH=/opt/apache-maven-${maven}/bin:$PATH
RUN mvn -version

ARG node=14
ARG node=16
RUN wget -q -O - https://deb.nodesource.com/setup_${node}.x | bash - && \
apt-get install -y nodejs && \
apt-get clean && \
Expand Down
1 change: 1 addition & 0 deletions ci/scripts/js_test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -25,5 +25,6 @@ pushd ${source_dir}

yarn lint
yarn test
yarn test:bundle

popd
1 change: 1 addition & 0 deletions dev/release/verify-release-candidate.sh
Original file line number Diff line number Diff line change
Expand Up @@ -440,6 +440,7 @@ test_js() {
yarn lint
yarn build
yarn test
yarn test:bundle
popd
}

Expand Down
3 changes: 2 additions & 1 deletion js/.eslintignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
.eslintrc.js
.eslintrc.cjs
gulp
jest.config.js
jestconfigs
targets
test/bundle/
27 changes: 25 additions & 2 deletions js/.eslintrc.js → js/.eslintrc.cjs
Original file line number Diff line number Diff line change
Expand Up @@ -27,9 +27,10 @@ module.exports = {
sourceType: "module",
ecmaVersion: 2020,
},
plugins: ["@typescript-eslint", "jest"],
plugins: ["@typescript-eslint", "jest", "unicorn"],
extends: [
"eslint:recommended",
"plugin:unicorn/recommended",
"plugin:jest/recommended",
"plugin:jest/style",
"plugin:@typescript-eslint/recommended",
Expand Down Expand Up @@ -82,6 +83,28 @@ module.exports = {
"no-trailing-spaces": "error",
"no-var": "error",
"no-empty": "off",
"no-cond-assign": "off"
"no-cond-assign": "off",

"unicorn/catch-error-name": "off",
"unicorn/no-nested-ternary": "off",
"unicorn/no-new-array": "off",
"unicorn/no-null": "off",
"unicorn/empty-brace-spaces": "off",
"unicorn/no-zero-fractions": "off",
"unicorn/prevent-abbreviations": "off",
"unicorn/prefer-module": "off",
"unicorn/numeric-separators-style": "off",
"unicorn/prefer-spread": "off",
"unicorn/filename-case": "off",
"unicorn/prefer-export-from": "off",
"unicorn/prefer-switch": "off",
"unicorn/prefer-node-protocol": "off",

"unicorn/consistent-destructuring": "warn",
"unicorn/no-array-reduce": ["warn", { "allowSimpleOperations": true }],
"unicorn/no-await-expression-member": "warn",
"unicorn/no-useless-undefined": "warn",
"unicorn/consistent-function-scoping": "warn",
"unicorn/prefer-math-trunc": "warn"
},
};
3 changes: 3 additions & 0 deletions js/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,9 @@ targets
test/data/**/*.json
test/data/**/*.arrow

# test bundles
test/bundle/**/*-bundle.js*

# jest snapshots (too big)
test/__snapshots__/

Expand Down
1 change: 1 addition & 0 deletions js/.vscode/extensions.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,6 @@
"recommendations": [
"dbaeumer.vscode-eslint",
"augustocdias.tasks-shell-input",
"orta.vscode-jest"
]
}
36 changes: 35 additions & 1 deletion js/.vscode/launch.json
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,16 @@
"command": "./node_modules/.bin/jest --listTests | sed -r \"s@$PWD/test/@@g\"",
}
},
{
"type": "command",
"id": "BUNDLE_FILE",
"command": "shellCommand.execute",
"args": {
"cwd": "${workspaceFolder}",
"description": "Select a file to debug",
"command": "ls test/bundle/**/*-bundle.js",
}
},
{
"type": "command",
"id": "TEST_RUNTIME_ARGS",
Expand Down Expand Up @@ -100,6 +110,29 @@
"VALIDATE"
]
},
{
"name": "Debug Bundle",
"program": "${input:BUNDLE_FILE}",
"request": "launch",
"skipFiles": [
"<node_internals>/**"
],
"type": "node"
},
{
"name": "Debug Benchmarks",
"program": "${workspaceFolder}/perf/index.ts",
"request": "launch",
"skipFiles": [
"<node_internals>/**",
"${workspaceFolder}/node_modules/**/*.js"
],
"runtimeArgs": [
"--loader",
"ts-node/esm/transpile-only"
],
"type": "node"
},
{
"type": "node",
"request": "launch",
Expand Down Expand Up @@ -213,7 +246,8 @@
"${workspaceFolder}/bin/print-buffer-alignment.js",
"./test/data/cpp/stream/struct_example.arrow"
]
},{
},
{
"type": "node",
"name": "vscode-jest-tests",
"request": "launch",
Expand Down
19 changes: 17 additions & 2 deletions js/.vscode/settings.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,21 @@
"typescript.tsdk": "node_modules/typescript/lib",
"editor.trimAutoWhitespace": true,
"editor.codeActionsOnSave": {
"source.fixAll.eslint": true
}
"source.fixAll.eslint": false
},
"[javascript]": {
"editor.tabSize": 4,
"editor.formatOnSave": true,
"editor.formatOnSaveMode": "file",
"editor.defaultFormatter": "vscode.typescript-language-features"
},
"[typescript]": {
"editor.tabSize": 4,
"editor.formatOnSave": true,
"editor.formatOnSaveMode": "file",
"editor.defaultFormatter": "vscode.typescript-language-features"
},
"jest.jestCommandLine": "node --experimental-vm-modules node_modules/jest/bin/jest.js --config jest.config.js",
"jest.autoRun": {"watch": false, "onSave": "test-src-file"},
"typescript.preferences.importModuleSpecifierEnding": "js"
}
26 changes: 11 additions & 15 deletions js/DEVELOP.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,12 @@ You can run the benchmarks with `yarn perf`. To print the results to stderr as J

You can change the target you want to test by changing the imports in `perf/index.ts`. Note that you need to compile the bundles with `yarn build` before you can import them.

# Testing Bundling

The bunldes use `apache-arrow` so make sure to build it with `yarn build -t apache-arrow`. To bundle with a variety of bundlers, run `yarn test:bundle` or `yarn gulp bundle`.

Run `yarn gulp bundle:webpack:analyze` to open [Webpack Bundle Analyzer](https://github.com/webpack-contrib/webpack-bundle-analyzer).

# Updating the Arrow format flatbuffers generated code

1. Once generated, the flatbuffers format code needs to be adjusted for our build scripts (assumes `gnu-sed`):
Expand All @@ -96,27 +102,17 @@ You can change the target you want to test by changing the imports in `perf/inde
sed -i '+s+org.apache.arrow.flatbuf.++ig' $tmp_format_dir/*.fbs

# Generate TS source from the modified Arrow flatbuffers schemas
flatc --ts --no-ts-reexport -o ./js/src/fb $tmp_format_dir/{File,Schema,Message}.fbs
flatc --ts -o ./js/src/fb $tmp_format_dir/{File,Schema,Message,Tensor,SparseTensor}.fbs

# Remove the tmpdir
rm -rf $tmp_format_dir
```

cd ./js/src/fb

# Rename the existing files to <filename>.bak.ts
mv File{,.bak}.ts && mv Schema{,.bak}.ts && mv Message{,.bak}.ts
2. Manually fix the unused imports and add // @ts-ignore for other errors

# Remove `_generated` from the ES6 imports of the generated files
sed -i '+s+_generated\";+\";+ig' *_generated.ts
# Fix all the `flatbuffers` imports
sed -i '+s+./flatbuffers+flatbuffers+ig' *_generated.ts
# Fix the Union createTypeIdsVector typings
sed -i -r '+s+static createTypeIdsVector\(builder: flatbuffers.Builder, data: number\[\] \| Uint8Array+static createTypeIdsVector\(builder: flatbuffers.Builder, data: number\[\] \| Int32Array+ig' Schema_generated.ts
# Remove "_generated" suffix from TS files
mv File{_generated,}.ts && mv Schema{_generated,}.ts && mv Message{_generated,}.ts
```
3. Add `.js` to the imports. In VSCode, you can search for `^(import [^';]* from '(\./|(\.\./)+)[^';.]*)';` and replace with `$1.js';`.
2. Execute `yarn lint` from the `js` directory to fix the linting errors
4. Execute `yarn lint` from the `js` directory to fix the linting errors
[1]: mailto:[email protected]
[2]: https://github.com/apache/arrow/tree/master/format
Expand Down
Loading

0 comments on commit 20b66c2

Please sign in to comment.