src: avoid copying source string in TextEncoder.encode#63897
Open
anonrig wants to merge 1 commit into
Open
Conversation
`EncodeUtf8String`, which backs `TextEncoder.prototype.encode()`, copied
the entire source string out of the V8 heap into a `MaybeStackBuffer`
(via `WriteOneByteV2`/`WriteV2`) before encoding, allocating on the heap
for strings larger than the stack buffer. `EncodeInto` already avoids
this by reading the flat content directly through `v8::String::ValueView`.
Read the flat content via `ValueView` instead. Because `ValueView` holds
a `DisallowGarbageCollection` scope, the backing store cannot be
allocated while it is alive, so the view is used in two short scopes:
one to validate and compute the exact UTF-8 length, and one to encode
directly into the backing store after allocation. Flattening is cached
on the string, so re-acquiring the view is cheap. The rare unpaired
surrogate path still copies into a mutable buffer for in-place
`to_well_formed_utf16`.
benchmark/util/text-encoder.js (op=encode, n=1e6, 12 runs each):
len=256 len=1024 len=8192
ascii +14.1% +23.5% +43.9%
one-byte (latin1) +14.4% +22.2% +12.3%
two-byte (utf-16) +16.7% +20.4% +15.5%
len=32 uses the unchanged small-string path (~noise). The untouched
encodeInto path stayed flat (-2.0%..+0.5%) across all configurations.
addaleax
approved these changes
Jun 13, 2026
Collaborator
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
EncodeUtf8String, which backsTextEncoder.prototype.encode(), copied the entire source string out of the V8 heap into aMaybeStackBuffer(viaWriteOneByteV2/WriteV2) before encoding — and allocated on the heap for strings larger than the stack buffer (> 4096chars). The siblingEncodeIntoalready avoids this by reading the flat content directly throughv8::String::ValueView.This reads the flat content via
ValueViewinstead. BecauseValueViewcarries aDisallowGarbageCollectionscope, the backing store cannot be allocated while the view is alive, so the view is used in two short scopes:The rare unpaired-surrogate path still copies into a mutable buffer, since
to_well_formed_utf16runs in place. Output buffer sizing and WHATWG replacement semantics are unchanged.Benchmarks
benchmark/util/text-encoder.js(op=encode,n=1e6, 12 interleaved runs each viacompare.js):len=32uses the unchanged small-string path (within noise). As a control, the untouchedencodeIntopath stayed flat (−2.0%..+0.5%) across all 12 configurations, confirming the harness is unbiased.Verification
TextEncoder/TextDecoderparallel tests pass.Buffer.from(s, 'utf8').