Skip to content
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 14 additions & 3 deletions docs/api/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -137,13 +137,24 @@ interface HttpBridgeOptions {
## Codec Utilities

```ts
import { decodeValue, decodeValueAsync, registerArrowDecoder, clearArrowDecoder } from 'tywrap';
import {
decodeValue,
decodeValueAsync,
autoRegisterArrowDecoder,
registerArrowDecoder,
clearArrowDecoder,
} from 'tywrap';

// NodeBridge auto-registers when apache-arrow is installed.
// If you're decoding outside the bridge, call autoRegisterArrowDecoder() or register manually:
const arrowReady = await autoRegisterArrowDecoder();
// if (!arrowReady) throw new Error('Install apache-arrow or enable JSON fallback');
// registerArrowDecoder(bytes => bytes);

registerArrowDecoder(bytes => bytes);
const value = await decodeValueAsync(pythonValue);
```

Arrow-encoded payloads throw unless you register a decoder or enable JSON fallback on the Python bridge.
Arrow-encoded payloads throw unless a decoder is registered or JSON fallback is enabled on the Python bridge.

## Error Types

Expand Down
23 changes: 17 additions & 6 deletions docs/codec-roadmap.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,13 @@ This is a forward-looking plan for adding codecs beyond numpy/pandas. The focus
- Keep failures explicit unless a user opts into lossy fallbacks.
- Avoid heavy implicit conversions (CPU/GPU) without clear config.

## DX Defaults (Decisions)

- Arrow is the default for ndarray/dataframe/series. The JS runtime should auto-register an Arrow decoder when `apache-arrow` is installed so users do not have to wire it manually.
- JSON fallback is opt-in only (via `TYWRAP_CODEC_FALLBACK=json`) and remains explicitly lossy for dtype/NA fidelity.
- GPU handling stays explicit: no implicit `.cpu()` or contiguous copies. Opt-in copy/transfer remains available, and GPU-native transport is a follow-up track (DLPack/Arrow CUDA).
- Large payloads should not be forced through single-line JSONL forever; add an artifact/chunked transport to keep responses reliable without silent truncation.

## Envelope Conventions

All tywrap codec envelopes share:
Expand All @@ -28,6 +35,11 @@ The subprocess JSONL transport is not streaming: large results must fit in memor

- Set `TYWRAP_CODEC_MAX_BYTES` (bytes, UTF-8) to cap the maximum serialized response size emitted by the Python bridge.
- If exceeded, the call fails with an explicit error instead of attempting a silent fallback.
- Planned: add an artifact/chunked transport path for large payloads to avoid JSONL size ceilings.

### Feature Detection

Bridge metadata should surface optional codec availability to help the JS side decide when to rely on SciPy/Torch/Sklearn codecs.

## SciPy (sparse matrices)

Expand Down Expand Up @@ -93,6 +105,7 @@ Notes:
- Default to CPU tensors; require opt-in for `.cpu()` conversion.
- Reject non-contiguous tensors unless explicitly allowed.
- Opt-in copy/transfer via `TYWRAP_TORCH_ALLOW_COPY=1`.
- Future: GPU-native transport (DLPack/Arrow CUDA) to avoid implicit device transfers.

## Sklearn (models + outputs)

Expand Down Expand Up @@ -123,9 +136,7 @@ Notes:
1. Define envelope specs and validation tests (round-trip + size limits).
2. Implement Python bridge serialization with feature detection.
3. Implement JS decoder + type mapping presets.
4. Add performance gates and CI coverage for the new codecs.

## Open Questions

- GPU handling: do we allow implicit device transfers?
- Max payload sizes: per-call limits vs global caps beyond `TYWRAP_CODEC_MAX_BYTES`?
4. Make Arrow frictionless (auto-register decoder + living app on Arrow path).
5. Add payload scaling via artifact/chunked transport (protocol versioned if needed).
6. Improve scientific codec ergonomics (explicit GPU opt-in, SciPy format expansion, safe sklearn opt-ins).
7. Add performance gates and CI coverage for the new codecs.
2 changes: 1 addition & 1 deletion docs/runtimes/browser.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ loading, rely on Pyodide directly.

## Data Transport

Arrow envelopes are supported in the browser if you register an Arrow decoder:
Arrow envelopes are supported in the browser if you register an Arrow decoder (Node auto-registers when `apache-arrow` is installed):

```ts
import { registerArrowDecoder } from 'tywrap';
Expand Down
17 changes: 9 additions & 8 deletions docs/runtimes/nodejs.md
Original file line number Diff line number Diff line change
Expand Up @@ -177,17 +177,18 @@ npm install apache-arrow
```

```typescript
import { createRequire } from 'node:module';
import { autoRegisterArrowDecoder } from 'tywrap';

// Auto registration: NodeBridge does this when apache-arrow is installed.
await autoRegisterArrowDecoder();
```

```typescript
import { registerArrowDecoder } from 'tywrap';
import { tableFromIPC } from 'apache-arrow';

// Register Arrow decoder for optimal performance
const require = createRequire(import.meta.url);
const { tableFromIPC } = require('apache-arrow');
// Manual registration: customize decoding outside NodeBridge.
registerArrowDecoder(bytes => tableFromIPC(bytes));

// If you don't register a decoder, Arrow-encoded payloads will throw.
// To accept raw bytes, register a passthrough decoder:
// registerArrowDecoder(bytes => bytes);
```

### JSON Fallback
Expand Down
2 changes: 1 addition & 1 deletion examples/living-app/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ This is a small but non-trivial “living example” that exercises tywrap end-t

- TypeScript (Node) calls into Python via `NodeBridge`
- Python uses `pandas` + `numpy` for data work and `pydantic` for config validation
- Rich results (e.g. `pandas.DataFrame`) come back via Arrow IPC and are decoded in Node with `apache-arrow`
- Rich results (e.g. `pandas.DataFrame`) come back via Arrow IPC and are decoded in Node with `apache-arrow` (auto-registered by `NodeBridge`)

## What it does

Expand Down
30 changes: 6 additions & 24 deletions examples/living-app/src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,9 @@ import { existsSync, mkdtempSync, rmSync } from 'node:fs';
import { dirname, join, resolve } from 'node:path';
import { fileURLToPath } from 'node:url';
import { tmpdir } from 'node:os';
import { createRequire } from 'node:module';

import { NodeBridge } from 'tywrap/node';
import { setRuntimeBridge } from 'tywrap/runtime';
import { clearArrowDecoder, registerArrowDecoder } from 'tywrap';
import { autoRegisterArrowDecoder, clearArrowDecoder } from 'tywrap';

import {
driftReport,
Expand Down Expand Up @@ -52,30 +50,14 @@ function resolveCodecMode(argv: readonly string[]): CodecMode {
* Register an Arrow decoder for this Node process.
*
* Why: `apache-arrow` is an optional dependency and tywrap should run without it in JSON mode.
* We use `require()` instead of ESM `import()` so Node/TypeScript resolve the package's "node"
* export + typings correctly (the ESM export map can otherwise select the DOM build/types).
*/
async function enableArrowDecoder(): Promise<void> {
const require = createRequire(import.meta.url);
let arrowModule: unknown;
try {
arrowModule = require('apache-arrow');
} catch (err) {
const code = (err as { code?: unknown }).code;
if (code === 'MODULE_NOT_FOUND') {
throw new Error(
"Arrow mode requires the optional dependency 'apache-arrow'. Install it with `npm install apache-arrow`."
);
}
throw err;
}
const arrow = arrowModule as {
tableFromIPC?: (bytes: Uint8Array) => { toArray?: () => unknown[] };
};
if (typeof arrow.tableFromIPC !== 'function') {
throw new Error('apache-arrow does not export tableFromIPC');
const registered = await autoRegisterArrowDecoder();
if (!registered) {
throw new Error(
"Arrow mode requires the optional dependency 'apache-arrow'. Install it with `npm install apache-arrow`."
);
}
registerArrowDecoder((bytes: Uint8Array) => arrow.tableFromIPC!(bytes));
}

/**
Expand Down
63 changes: 61 additions & 2 deletions runtime/python_bridge.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
import sys
import json
import importlib
import importlib.util
import os
import traceback
import base64
Expand Down Expand Up @@ -84,14 +85,39 @@ class ProtocolError(Exception):


def arrow_available():
"""
Return True when pyarrow can be imported.

Why: advertise Arrow capability to the TS side without crashing startup when
pyarrow is optional or missing.
"""
try:
import pyarrow # noqa: F401
except Exception:
import pyarrow
except (ImportError, OSError):
return False
return True


def module_available(module_name: str) -> bool:
"""
Lightweight feature detection for optional codec dependencies.

Why: exposes availability in bridge metadata without importing heavy modules or triggering
side effects, so the TS side can decide when to rely on optional codecs.
"""
try:
return importlib.util.find_spec(module_name) is not None
except (ImportError, AttributeError, TypeError, ValueError):
# Why: guard against unusual importlib edge cases without masking other failures.
return False


def is_numpy_array(obj):
"""
Detect numpy arrays when NumPy is installed.

Why: keep NumPy optional while enabling ndarray serialization.
"""
try:
import numpy as np # noqa: F401
except Exception:
Expand All @@ -100,6 +126,11 @@ def is_numpy_array(obj):


def is_pandas_dataframe(obj):
"""
Detect pandas DataFrame instances when pandas is installed.

Why: avoid hard pandas dependency while enabling dataframe encoding.
"""
try:
import pandas as pd # noqa: F401
except Exception:
Expand All @@ -108,6 +139,11 @@ def is_pandas_dataframe(obj):


def is_pandas_series(obj):
"""
Detect pandas Series instances when pandas is installed.

Why: avoid hard pandas dependency while enabling series encoding.
"""
try:
import pandas as pd # noqa: F401
except Exception:
Expand All @@ -116,6 +152,11 @@ def is_pandas_series(obj):


def is_scipy_sparse(obj):
"""
Detect scipy sparse matrices when scipy is installed.

Why: allow sparse matrix encoding without importing scipy in all environments.
"""
try:
import scipy.sparse as sp # noqa: F401
except Exception:
Expand All @@ -127,6 +168,11 @@ def is_scipy_sparse(obj):


def is_torch_tensor(obj):
"""
Detect torch tensors when torch is installed.

Why: allow tensor encoding without a hard torch dependency.
"""
try:
import torch # noqa: F401
except Exception:
Expand All @@ -138,6 +184,11 @@ def is_torch_tensor(obj):


def is_sklearn_estimator(obj):
"""
Detect sklearn estimators for metadata-only serialization.

Why: allow feature-gated estimator metadata without importing sklearn by default.
"""
try:
from sklearn.base import BaseEstimator # noqa: F401
except Exception:
Expand Down Expand Up @@ -547,6 +598,11 @@ def handle_dispose_instance(params):


def handle_meta():
"""
Return bridge metadata for capability detection.

Why: the Node side uses this to decide whether optional codecs can be used.
"""
return {
'protocol': PROTOCOL,
'protocolVersion': PROTOCOL_VERSION,
Expand All @@ -555,6 +611,9 @@ def handle_meta():
'pid': os.getpid(),
'codecFallback': 'json' if FALLBACK_JSON else 'none',
'arrowAvailable': arrow_available(),
'scipyAvailable': module_available('scipy'),
'torchAvailable': module_available('torch'),
'sklearnAvailable': module_available('sklearn'),
'instances': len(instances),
}

Expand Down
1 change: 1 addition & 0 deletions src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,7 @@ export { detectRuntime, isNodejs, isDeno, isBun, isBrowser } from './utils/runti
export {
decodeValue,
decodeValueAsync,
autoRegisterArrowDecoder,
registerArrowDecoder,
clearArrowDecoder,
} from './utils/codec.js';
Expand Down
7 changes: 6 additions & 1 deletion src/runtime/node.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,9 @@
import { existsSync } from 'node:fs';
import { delimiter, isAbsolute, join, resolve } from 'node:path';
import { fileURLToPath } from 'node:url';
import { createRequire } from 'node:module';

import { decodeValueAsync } from '../utils/codec.js';
import { autoRegisterArrowDecoder, decodeValueAsync } from '../utils/codec.js';
import { getDefaultPythonPath } from '../utils/python.js';
import { getVenvBinDir, getVenvPythonExe } from '../utils/runtime.js';
import type { BridgeInfo } from '../types/index.js';
Expand Down Expand Up @@ -273,6 +274,10 @@

private async startProcess(): Promise<void> {
try {
const require = createRequire(import.meta.url);
await autoRegisterArrowDecoder({
loader: async () => require('apache-arrow'),
});
const { spawn } = await import('child_process');
const allowedPrefixes = ['TYWRAP_'];
const allowedKeys = new Set(['PATH', 'PYTHONPATH', 'VIRTUAL_ENV', 'PYTHONHOME']);
Expand All @@ -292,10 +297,10 @@
const currentPath = env.PATH ?? process.env.PATH ?? '';
env.PATH = `${venv.binDir}${delimiter}${currentPath}`;
}
if (!env.PYTHONUTF8) {

Check warning on line 300 in src/runtime/node.ts

View workflow job for this annotation

GitHub Actions / lint

Prefer using nullish coalescing operator (`??=`) instead of an assignment expression, as it is simpler to read
env.PYTHONUTF8 = '1';
}
if (!env.PYTHONIOENCODING) {

Check warning on line 303 in src/runtime/node.ts

View workflow job for this annotation

GitHub Actions / lint

Prefer using nullish coalescing operator (`??=`) instead of an assignment expression, as it is simpler to read
env.PYTHONIOENCODING = 'UTF-8';
}
// Respect explicit request for JSON fallback only; otherwise fast-fail by default
Expand Down
8 changes: 7 additions & 1 deletion src/runtime/optimized-node.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,12 @@

import { delimiter, isAbsolute, join, resolve } from 'node:path';
import { fileURLToPath } from 'node:url';
import { createRequire } from 'node:module';
import type { ChildProcess } from 'child_process';
import { EventEmitter } from 'events';

import { globalCache } from '../utils/cache.js';
import { decodeValueAsync } from '../utils/codec.js';
import { autoRegisterArrowDecoder, decodeValueAsync } from '../utils/codec.js';
import { getDefaultPythonPath } from '../utils/python.js';
import { getVenvBinDir, getVenvPythonExe } from '../utils/runtime.js';

Expand Down Expand Up @@ -186,6 +187,11 @@
throw new Error('Bridge has been disposed');
}

const require = createRequire(import.meta.url);
await autoRegisterArrowDecoder({
loader: () => require('apache-arrow'),
});

// Ensure minimum processes are available
while (this.processPool.length < this.options.minProcesses) {
await this.spawnProcess();
Expand Down Expand Up @@ -429,10 +435,10 @@
PYTHONUNBUFFERED: '1', // Ensure immediate output
PYTHONDONTWRITEBYTECODE: '1', // Skip .pyc files for faster startup
};
if (!env.PYTHONUTF8) {

Check warning on line 438 in src/runtime/optimized-node.ts

View workflow job for this annotation

GitHub Actions / lint

Prefer using nullish coalescing operator (`??=`) instead of an assignment expression, as it is simpler to read
env.PYTHONUTF8 = '1';
}
if (!env.PYTHONIOENCODING) {

Check warning on line 441 in src/runtime/optimized-node.ts

View workflow job for this annotation

GitHub Actions / lint

Prefer using nullish coalescing operator (`??=`) instead of an assignment expression, as it is simpler to read
env.PYTHONIOENCODING = 'UTF-8';
}
if (this.options.virtualEnv) {
Expand Down
3 changes: 3 additions & 0 deletions src/types/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -317,6 +317,9 @@ export interface BridgeInfo {
pid: number;
codecFallback: 'json' | 'none';
arrowAvailable: boolean;
scipyAvailable: boolean;
torchAvailable: boolean;
sklearnAvailable: boolean;
instances: number;
}

Expand Down
Loading
Loading