src: avoid copying source string in TextEncoder.encode by anonrig · Pull Request #63897 · nodejs/node

anonrig · 2026-06-13T22:25:06Z

Summary

EncodeUtf8String, which backs TextEncoder.prototype.encode(), copied the entire source string out of the V8 heap into a MaybeStackBuffer (via WriteOneByteV2/WriteV2) before encoding — and allocated on the heap for strings larger than the stack buffer (> 4096 chars). The sibling EncodeInto already avoids this by reading the flat content directly through v8::String::ValueView.

This reads the flat content via ValueView instead. Because ValueView carries a DisallowGarbageCollection scope, the backing store cannot be allocated while the view is alive, so the view is used in two short scopes:

Size pass — validate the encoding and compute the exact UTF-8 output length.
Allocate the exactly-sized backing store (GC now permitted).
Encode pass — re-acquire the view (flattening is cached on the string, so this is cheap) and convert directly into the backing store.

The rare unpaired-surrogate path still copies into a mutable buffer, since to_well_formed_utf16 runs in place. Output buffer sizing and WHATWG replacement semantics are unchanged.

Benchmarks

benchmark/util/text-encoder.js (op=encode, n=1e6, 12 interleaved runs each via compare.js):

type	len=256	len=1024	len=8192
ascii	+14.1%	+23.5%	+43.9%
one-byte (latin1)	+14.4%	+22.2%	+12.3%
two-byte (utf-16)	+16.7%	+20.4%	+15.5%

len=32 uses the unchanged small-string path (within noise). As a control, the untouched encodeInto path stayed flat (−2.0%..+0.5%) across all 12 configurations, confirming the harness is unbiased.

Verification

Builds clean (full relink).
All encoding / TextEncoder / TextDecoder parallel tests pass.
Functional spot-checks across ASCII, Latin1, BMP, valid surrogate pairs, and lone/reversed surrogates match Buffer.from(s, 'utf8').

`EncodeUtf8String`, which backs `TextEncoder.prototype.encode()`, copied the entire source string out of the V8 heap into a `MaybeStackBuffer` (via `WriteOneByteV2`/`WriteV2`) before encoding, allocating on the heap for strings larger than the stack buffer. `EncodeInto` already avoids this by reading the flat content directly through `v8::String::ValueView`. Read the flat content via `ValueView` instead. Because `ValueView` holds a `DisallowGarbageCollection` scope, the backing store cannot be allocated while it is alive, so the view is used in two short scopes: one to validate and compute the exact UTF-8 length, and one to encode directly into the backing store after allocation. Flattening is cached on the string, so re-acquiring the view is cheap. The rare unpaired surrogate path still copies into a mutable buffer for in-place `to_well_formed_utf16`. benchmark/util/text-encoder.js (op=encode, n=1e6, 12 runs each): len=256 len=1024 len=8192 ascii +14.1% +23.5% +43.9% one-byte (latin1) +14.4% +22.2% +12.3% two-byte (utf-16) +16.7% +20.4% +15.5% len=32 uses the unchanged small-string path (~noise). The untouched encodeInto path stayed flat (-2.0%..+0.5%) across all configurations.

nodejs-github-bot · 2026-06-13T23:21:51Z

CI: https://ci.nodejs.org/job/node-test-pull-request/74106/

nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run. labels Jun 13, 2026

addaleax approved these changes Jun 13, 2026

View reviewed changes

addaleax added the request-ci Add this label to start a Jenkins CI on a PR. label Jun 13, 2026

github-actions Bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Jun 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

src: avoid copying source string in TextEncoder.encode#63897

src: avoid copying source string in TextEncoder.encode#63897
anonrig wants to merge 1 commit into
nodejs:mainfrom
anonrig:src-textencoder-encode-zero-copy

anonrig commented Jun 13, 2026

Uh oh!

nodejs-github-bot commented Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

anonrig commented Jun 13, 2026

Summary

Benchmarks

Verification

Uh oh!

nodejs-github-bot commented Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants