Skip to content
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 14 additions & 3 deletions docs/api/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -137,13 +137,24 @@ interface HttpBridgeOptions {
## Codec Utilities

```ts
import { decodeValue, decodeValueAsync, registerArrowDecoder, clearArrowDecoder } from 'tywrap';
import {
decodeValue,
decodeValueAsync,
autoRegisterArrowDecoder,
registerArrowDecoder,
clearArrowDecoder,
} from 'tywrap';

// NodeBridge auto-registers when apache-arrow is installed.
// If you're decoding outside the bridge, call autoRegisterArrowDecoder() or register manually:
const arrowReady = await autoRegisterArrowDecoder();
// if (!arrowReady) throw new Error('Install apache-arrow or enable JSON fallback');
// registerArrowDecoder(bytes => bytes);

registerArrowDecoder(bytes => bytes);
const value = await decodeValueAsync(pythonValue);
```

Arrow-encoded payloads throw unless you register a decoder or enable JSON fallback on the Python bridge.
Arrow-encoded payloads throw unless a decoder is registered or JSON fallback is enabled on the Python bridge.

## Error Types

Expand Down
23 changes: 17 additions & 6 deletions docs/codec-roadmap.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,13 @@ This is a forward-looking plan for adding codecs beyond numpy/pandas. The focus
- Keep failures explicit unless a user opts into lossy fallbacks.
- Avoid heavy implicit conversions (CPU/GPU) without clear config.

## DX Defaults (Decisions)

- Arrow is the default for ndarray/dataframe/series. The JS runtime should auto-register an Arrow decoder when `apache-arrow` is installed so users do not have to wire it manually.
- JSON fallback is opt-in only (via `TYWRAP_CODEC_FALLBACK=json`) and remains explicitly lossy for dtype/NA fidelity.
- GPU handling stays explicit: no implicit `.cpu()` or contiguous copies. Opt-in copy/transfer remains available, and GPU-native transport is a follow-up track (DLPack/Arrow CUDA).
- Large payloads should not be forced through single-line JSONL forever; add an artifact/chunked transport to keep responses reliable without silent truncation.

## Envelope Conventions

All tywrap codec envelopes share:
Expand All @@ -28,6 +35,11 @@ The subprocess JSONL transport is not streaming: large results must fit in memor

- Set `TYWRAP_CODEC_MAX_BYTES` (bytes, UTF-8) to cap the maximum serialized response size emitted by the Python bridge.
- If exceeded, the call fails with an explicit error instead of attempting a silent fallback.
- Planned: add an artifact/chunked transport path for large payloads to avoid JSONL size ceilings.

### Feature Detection

Bridge metadata should surface optional codec availability to help the JS side decide when to rely on SciPy/Torch/Sklearn codecs.

## SciPy (sparse matrices)

Expand Down Expand Up @@ -93,6 +105,7 @@ Notes:
- Default to CPU tensors; require opt-in for `.cpu()` conversion.
- Reject non-contiguous tensors unless explicitly allowed.
- Opt-in copy/transfer via `TYWRAP_TORCH_ALLOW_COPY=1`.
- Future: GPU-native transport (DLPack/Arrow CUDA) to avoid implicit device transfers.

## Sklearn (models + outputs)

Expand Down Expand Up @@ -123,9 +136,7 @@ Notes:
1. Define envelope specs and validation tests (round-trip + size limits).
2. Implement Python bridge serialization with feature detection.
3. Implement JS decoder + type mapping presets.
4. Add performance gates and CI coverage for the new codecs.

## Open Questions

- GPU handling: do we allow implicit device transfers?
- Max payload sizes: per-call limits vs global caps beyond `TYWRAP_CODEC_MAX_BYTES`?
4. Make Arrow frictionless (auto-register decoder + living app on Arrow path).
5. Add payload scaling via artifact/chunked transport (protocol versioned if needed).
6. Improve scientific codec ergonomics (explicit GPU opt-in, SciPy format expansion, safe sklearn opt-ins).
7. Add performance gates and CI coverage for the new codecs.
2 changes: 1 addition & 1 deletion docs/runtimes/browser.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ loading, rely on Pyodide directly.

## Data Transport

Arrow envelopes are supported in the browser if you register an Arrow decoder:
Arrow envelopes are supported in the browser if you register an Arrow decoder (Node auto-registers when `apache-arrow` is installed):

```ts
import { registerArrowDecoder } from 'tywrap';
Expand Down
16 changes: 6 additions & 10 deletions docs/runtimes/nodejs.md
Original file line number Diff line number Diff line change
Expand Up @@ -177,17 +177,13 @@ npm install apache-arrow
```

```typescript
import { createRequire } from 'node:module';
import { registerArrowDecoder } from 'tywrap';
import { autoRegisterArrowDecoder, registerArrowDecoder } from 'tywrap';

// Register Arrow decoder for optimal performance
const require = createRequire(import.meta.url);
const { tableFromIPC } = require('apache-arrow');
registerArrowDecoder(bytes => tableFromIPC(bytes));

// If you don't register a decoder, Arrow-encoded payloads will throw.
// To accept raw bytes, register a passthrough decoder:
// registerArrowDecoder(bytes => bytes);
// NodeBridge auto-registers an Arrow decoder when apache-arrow is installed.
// If you need custom behavior (or you're decoding outside NodeBridge), register manually:
await autoRegisterArrowDecoder();
// const { tableFromIPC } = await import('apache-arrow');
// registerArrowDecoder(bytes => tableFromIPC(bytes));
```

### JSON Fallback
Expand Down
2 changes: 1 addition & 1 deletion examples/living-app/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ This is a small but non-trivial “living example” that exercises tywrap end-t

- TypeScript (Node) calls into Python via `NodeBridge`
- Python uses `pandas` + `numpy` for data work and `pydantic` for config validation
- Rich results (e.g. `pandas.DataFrame`) come back via Arrow IPC and are decoded in Node with `apache-arrow`
- Rich results (e.g. `pandas.DataFrame`) come back via Arrow IPC and are decoded in Node with `apache-arrow` (auto-registered by `NodeBridge`)

## What it does

Expand Down
30 changes: 6 additions & 24 deletions examples/living-app/src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,9 @@ import { existsSync, mkdtempSync, rmSync } from 'node:fs';
import { dirname, join, resolve } from 'node:path';
import { fileURLToPath } from 'node:url';
import { tmpdir } from 'node:os';
import { createRequire } from 'node:module';

import { NodeBridge } from 'tywrap/node';
import { setRuntimeBridge } from 'tywrap/runtime';
import { clearArrowDecoder, registerArrowDecoder } from 'tywrap';
import { autoRegisterArrowDecoder, clearArrowDecoder } from 'tywrap';

import {
driftReport,
Expand Down Expand Up @@ -52,30 +50,14 @@ function resolveCodecMode(argv: readonly string[]): CodecMode {
* Register an Arrow decoder for this Node process.
*
* Why: `apache-arrow` is an optional dependency and tywrap should run without it in JSON mode.
* We use `require()` instead of ESM `import()` so Node/TypeScript resolve the package's "node"
* export + typings correctly (the ESM export map can otherwise select the DOM build/types).
*/
async function enableArrowDecoder(): Promise<void> {
const require = createRequire(import.meta.url);
let arrowModule: unknown;
try {
arrowModule = require('apache-arrow');
} catch (err) {
const code = (err as { code?: unknown }).code;
if (code === 'MODULE_NOT_FOUND') {
throw new Error(
"Arrow mode requires the optional dependency 'apache-arrow'. Install it with `npm install apache-arrow`."
);
}
throw err;
}
const arrow = arrowModule as {
tableFromIPC?: (bytes: Uint8Array) => { toArray?: () => unknown[] };
};
if (typeof arrow.tableFromIPC !== 'function') {
throw new Error('apache-arrow does not export tableFromIPC');
const registered = await autoRegisterArrowDecoder();
if (!registered) {
throw new Error(
"Arrow mode requires the optional dependency 'apache-arrow'. Install it with `npm install apache-arrow`."
);
}
registerArrowDecoder((bytes: Uint8Array) => arrow.tableFromIPC!(bytes));
}

/**
Expand Down
17 changes: 17 additions & 0 deletions runtime/python_bridge.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
import sys
import json
import importlib
import importlib.util
import os
import traceback
import base64
Expand Down Expand Up @@ -91,6 +92,19 @@ def arrow_available():
return True


def module_available(module_name: str) -> bool:
"""
Lightweight feature detection for optional codec dependencies.

Why: exposes availability in bridge metadata without importing heavy modules or triggering
side effects, so the TS side can decide when to rely on optional codecs.
"""
try:
return importlib.util.find_spec(module_name) is not None
except Exception:
return False


def is_numpy_array(obj):
try:
import numpy as np # noqa: F401
Expand Down Expand Up @@ -555,6 +569,9 @@ def handle_meta():
'pid': os.getpid(),
'codecFallback': 'json' if FALLBACK_JSON else 'none',
'arrowAvailable': arrow_available(),
'scipyAvailable': module_available('scipy.sparse'),
'torchAvailable': module_available('torch'),
'sklearnAvailable': module_available('sklearn.base'),
'instances': len(instances),
}

Expand Down
1 change: 1 addition & 0 deletions src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,7 @@ export { detectRuntime, isNodejs, isDeno, isBun, isBrowser } from './utils/runti
export {
decodeValue,
decodeValueAsync,
autoRegisterArrowDecoder,
registerArrowDecoder,
clearArrowDecoder,
} from './utils/codec.js';
Expand Down
7 changes: 6 additions & 1 deletion src/runtime/node.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,9 @@
import { existsSync } from 'node:fs';
import { delimiter, isAbsolute, join, resolve } from 'node:path';
import { fileURLToPath } from 'node:url';
import { createRequire } from 'node:module';

import { decodeValueAsync } from '../utils/codec.js';
import { autoRegisterArrowDecoder, decodeValueAsync } from '../utils/codec.js';
import { getDefaultPythonPath } from '../utils/python.js';
import { getVenvBinDir, getVenvPythonExe } from '../utils/runtime.js';
import type { BridgeInfo } from '../types/index.js';
Expand Down Expand Up @@ -273,6 +274,10 @@

private async startProcess(): Promise<void> {
try {
const require = createRequire(import.meta.url);
await autoRegisterArrowDecoder({
loader: async () => require('apache-arrow'),
});
const { spawn } = await import('child_process');
const allowedPrefixes = ['TYWRAP_'];
const allowedKeys = new Set(['PATH', 'PYTHONPATH', 'VIRTUAL_ENV', 'PYTHONHOME']);
Expand All @@ -292,10 +297,10 @@
const currentPath = env.PATH ?? process.env.PATH ?? '';
env.PATH = `${venv.binDir}${delimiter}${currentPath}`;
}
if (!env.PYTHONUTF8) {

Check warning on line 300 in src/runtime/node.ts

View workflow job for this annotation

GitHub Actions / lint

Prefer using nullish coalescing operator (`??=`) instead of an assignment expression, as it is simpler to read
env.PYTHONUTF8 = '1';
}
if (!env.PYTHONIOENCODING) {

Check warning on line 303 in src/runtime/node.ts

View workflow job for this annotation

GitHub Actions / lint

Prefer using nullish coalescing operator (`??=`) instead of an assignment expression, as it is simpler to read
env.PYTHONIOENCODING = 'UTF-8';
}
// Respect explicit request for JSON fallback only; otherwise fast-fail by default
Expand Down
8 changes: 7 additions & 1 deletion src/runtime/optimized-node.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,12 @@

import { delimiter, isAbsolute, join, resolve } from 'node:path';
import { fileURLToPath } from 'node:url';
import { createRequire } from 'node:module';
import type { ChildProcess } from 'child_process';
import { EventEmitter } from 'events';

import { globalCache } from '../utils/cache.js';
import { decodeValueAsync } from '../utils/codec.js';
import { autoRegisterArrowDecoder, decodeValueAsync } from '../utils/codec.js';
import { getDefaultPythonPath } from '../utils/python.js';
import { getVenvBinDir, getVenvPythonExe } from '../utils/runtime.js';

Expand Down Expand Up @@ -186,6 +187,11 @@
throw new Error('Bridge has been disposed');
}

const require = createRequire(import.meta.url);
await autoRegisterArrowDecoder({
loader: async () => require('apache-arrow'),
});

// Ensure minimum processes are available
while (this.processPool.length < this.options.minProcesses) {
await this.spawnProcess();
Expand Down Expand Up @@ -429,10 +435,10 @@
PYTHONUNBUFFERED: '1', // Ensure immediate output
PYTHONDONTWRITEBYTECODE: '1', // Skip .pyc files for faster startup
};
if (!env.PYTHONUTF8) {

Check warning on line 438 in src/runtime/optimized-node.ts

View workflow job for this annotation

GitHub Actions / lint

Prefer using nullish coalescing operator (`??=`) instead of an assignment expression, as it is simpler to read
env.PYTHONUTF8 = '1';
}
if (!env.PYTHONIOENCODING) {

Check warning on line 441 in src/runtime/optimized-node.ts

View workflow job for this annotation

GitHub Actions / lint

Prefer using nullish coalescing operator (`??=`) instead of an assignment expression, as it is simpler to read
env.PYTHONIOENCODING = 'UTF-8';
}
if (this.options.virtualEnv) {
Expand Down
3 changes: 3 additions & 0 deletions src/types/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -317,6 +317,9 @@ export interface BridgeInfo {
pid: number;
codecFallback: 'json' | 'none';
arrowAvailable: boolean;
scipyAvailable: boolean;
torchAvailable: boolean;
sklearnAvailable: boolean;
instances: number;
}

Expand Down
46 changes: 46 additions & 0 deletions src/utils/codec.ts
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,52 @@
return typeof arrowTableFrom === 'function';
}

function isNodeRuntime(): boolean {
return (
typeof process !== 'undefined' &&
typeof (process as { versions?: { node?: string } }).versions?.node === 'string'
);
}

function registerArrowDecoderFromModule(module: { tableFromIPC?: unknown }): void {
const tableFromIPC = module.tableFromIPC;
if (typeof tableFromIPC !== 'function') {
throw new Error('apache-arrow does not export tableFromIPC');
}
registerArrowDecoder((bytes: Uint8Array) => tableFromIPC(bytes));
}

export async function autoRegisterArrowDecoder(
options: { loader?: () => Promise<unknown> } = {}
): Promise<boolean> {
if (hasArrowDecoder()) {
return true;
}
const loader =
options.loader ??
(isNodeRuntime()
? async () => {

Check warning on line 155 in src/utils/codec.ts

View workflow job for this annotation

GitHub Actions / lint

Missing return type on function
try {
const nodeModule = await import('node:module');
const require = nodeModule.createRequire(import.meta.url);
return require('apache-arrow') as unknown;
} catch {
return await import('apache-arrow');
}
}
: undefined);
if (!loader) {
return false;
}
try {
const arrowModule = await loader();
registerArrowDecoderFromModule(arrowModule as { tableFromIPC?: unknown });
return true;
} catch {
return false;
}
}

function isObject(value: unknown): value is { [k: string]: unknown } {
return typeof value === 'object' && value !== null;
}
Expand Down
40 changes: 40 additions & 0 deletions test/runtime_codec.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ import { describe, it, expect, beforeEach, afterEach, vi } from 'vitest';
import {
decodeValueAsync,
decodeValue,
autoRegisterArrowDecoder,
registerArrowDecoder,
clearArrowDecoder,
hasArrowDecoder,
Expand Down Expand Up @@ -51,6 +52,45 @@ describe('Cross-Runtime Data Transfer Codec', () => {
expect(hasArrowDecoder()).toBe(true);
});

it('should auto-register Arrow decoder from loader', async () => {
const tableFromIPC = vi.fn().mockReturnValue({ numRows: 1, numCols: 1 });
const loader = vi.fn().mockResolvedValue({ tableFromIPC });

const registered = await autoRegisterArrowDecoder({ loader });

expect(registered).toBe(true);
expect(loader).toHaveBeenCalled();
expect(hasArrowDecoder()).toBe(true);
});

it('should skip loader when decoder already registered', async () => {
registerArrowDecoder(bytes => bytes);
const loader = vi.fn().mockImplementation(() => {
throw new Error('loader should not be called');
});

const registered = await autoRegisterArrowDecoder({ loader });

expect(registered).toBe(true);
expect(loader).not.toHaveBeenCalled();
});

it('should return false when loader lacks tableFromIPC', async () => {
const loader = vi.fn().mockResolvedValue({});

const registered = await autoRegisterArrowDecoder({ loader });

expect(registered).toBe(false);
});

it('should return false when loader throws', async () => {
const loader = vi.fn().mockRejectedValue(new Error('missing'));

const registered = await autoRegisterArrowDecoder({ loader });

expect(registered).toBe(false);
});

it('should initially have no Arrow decoder', () => {
clearArrowDecoder();
expect(hasArrowDecoder()).toBe(false);
Expand Down
3 changes: 3 additions & 0 deletions test/runtime_node.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,9 @@ describeNodeOnly('Node.js Runtime Bridge', () => {
expect(info.protocol).toBe('tywrap/1');
expect(info.protocolVersion).toBeGreaterThan(0);
expect(info.pythonVersion).toMatch(/^\d+\.\d+\.\d+$/);
expect(typeof info.scipyAvailable).toBe('boolean');
expect(typeof info.torchAvailable).toBe('boolean');
expect(typeof info.sklearnAvailable).toBe('boolean');

const before = info.instances;
const handle = await bridge.instantiate('collections', 'Counter', [[1, 2, 2]]);
Expand Down
Loading