Skip to content

Conversation

michalmuskala
Copy link
Contributor

In particular, this includes the full original binary and all the references. Gathering this information requires full heap traversal, so is fairly expensive - that's why we keep both the existing binary and this new API.

Example:

1> A = <<0:(1024*8)>>.
2> <<B:550/bitstring,D:550/bits,E:550/bits,C/bitstring>> = A.
3> erlang:process_info(self(),[binary_full]).
[{binary_full,[{126812357133856,1024,1,
           <<0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...>>,
           [{1100,1650},{550,1100},{1650,8192},{0,550},{0,8192}]}]}]

Original draft implementation by @garazdawi in garazdawi@d5a2c6f.

Copy link
Contributor

github-actions bot commented Oct 7, 2025

CT Test Results

    3 files    135 suites   49m 39s ⏱️
1 651 tests 1 594 ✅ 57 💤 0 ❌
2 287 runs  2 211 ✅ 76 💤 0 ❌

Results for commit 39c06bb.

♻️ This comment has been updated with latest results.

To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass.

See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally.

Artifacts

// Erlang/OTP Github Action Bot

@michalmuskala michalmuskala force-pushed the binary-full branch 5 times, most recently from eb25eca to 1ca8340 Compare October 7, 2025 12:07
In particular, this includes the full original binary and all the references.
Gathering this information requires full heap traversal, so is fairly expensive -
that's why we keep both the existing `binary` and this new API.

Example:

```
1> A = <<0:(1024*8)>>.
2> <<B:550/bitstring,D:550/bits,E:550/bits,C/bitstring>> = A.
3> erlang:process_info(self(),[binary_full]).
[{binary_full,[{126812357133856,1024,1,
           <<0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...>>,
           [{1100,1650},{550,1100},{1650,8192},{0,550},{0,8192}]}]}]
```

Co-authored-by: Lukas Backström <[email protected]>
@IngelaAndin IngelaAndin added the team:VM Assigned to OTP team VM label Oct 13, 2025
@jhogberg jhogberg self-assigned this Oct 13, 2025
Copy link
Contributor

@jhogberg jhogberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I've left some comments below.

ErtsDynamicWStack ws;
} DebugBinary;

static void gather_binaries(DebugBinary *bins, Uint count, Eterm *start, Eterm *stop) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick:

Suggested change
static void gather_binaries(DebugBinary *bins, Uint count, Eterm *start, Eterm *stop) {
static void gather_binaries(DebugBinary *bins, Uint count, const Eterm *start, const Eterm *stop) {

Comment on lines +3852 to +3884
switch (primary_tag(val)) {
case TAG_PRIMARY_IMMED1:
case TAG_PRIMARY_LIST:
case TAG_PRIMARY_BOXED:
break;
case TAG_PRIMARY_HEADER:
if (header_is_transparent(val)) {
continue;
}
switch (thing_subtag(val)) {
case SUB_BITS_SUBTAG:
{
const ErlSubBits *sb = (ErlSubBits*)(tp-1);
const BinRef *underlying = (BinRef*)boxed_val(sb->orig);
if (thing_subtag(underlying->thing_word) != HEAP_BITS_SUBTAG) {
for (Uint i = 0; i < count; i++) {
DebugBinary* b = &bins[i];
if (b->bin_ref == underlying) {
WSTACK_PUSH2(b->ws.ws, sb->start, sb->end);
break;
}
}
}
}
ERTS_FALLTHROUGH();
default:
{
tp += header_arity(val);
}
break;
}
break;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a bit complex, there's only one case with special treatment in either switch and there's fall-through on top of it, how about something like (untested)?

Suggested change
switch (primary_tag(val)) {
case TAG_PRIMARY_IMMED1:
case TAG_PRIMARY_LIST:
case TAG_PRIMARY_BOXED:
break;
case TAG_PRIMARY_HEADER:
if (header_is_transparent(val)) {
continue;
}
switch (thing_subtag(val)) {
case SUB_BITS_SUBTAG:
{
const ErlSubBits *sb = (ErlSubBits*)(tp-1);
const BinRef *underlying = (BinRef*)boxed_val(sb->orig);
if (thing_subtag(underlying->thing_word) != HEAP_BITS_SUBTAG) {
for (Uint i = 0; i < count; i++) {
DebugBinary* b = &bins[i];
if (b->bin_ref == underlying) {
WSTACK_PUSH2(b->ws.ws, sb->start, sb->end);
break;
}
}
}
}
ERTS_FALLTHROUGH();
default:
{
tp += header_arity(val);
}
break;
}
break;
}
if (primary_tag(val) == TAG_PRIMARY_HEADER &&
!header_is_transparent(val)) {
if (thing_subtag(val) == SUB_BITS_SUBTAG) {
const ErlSubBits *sb = (ErlSubBits*)(tp-1);
const BinRef *underlying = (BinRef*)boxed_val(sb->orig);
if (thing_subtag(underlying->thing_word) != HEAP_BITS_SUBTAG) {
for (Uint i = 0; i < count; i++) {
DebugBinary* b = &bins[i];
if (b->bin_ref == underlying) {
WSTACK_PUSH2(b->ws.ws, sb->start, sb->end);
continue;
}
}
}
}
tp += header_arity(val);
}

Comment on lines +3936 to +3954
mp = rp->msg_frag;
bp = rp->mbuf;

if (bp)
goto search_heap_frags;

while (mp) {

bp = erts_message_to_heap_frag(mp);
mp = mp->next;

search_heap_frags:

while (bp) {
gather_binaries(binariesp, binaries_count,
bp->mem, bp->mem + bp->used_size);
bp = bp->next;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mbuf and msg_frag need to be searched separately, as you can have heap fragments without messages (e.g. to satisfy allocations when GC is temporarily disabled). Both have next members that you should follow, and the latter should keep using erts_message_to_heap_frag like you do now.

}

for (Uint i = 0; i < binaries_count; i++) {
DebugBinary b = binariesp[i];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you intend a copy here? (Same issue in the other loop further down)

Suggested change
DebugBinary b = binariesp[i];
const DebugBinary *b = &binariesp[i];

erts_bld_uword(NULL, &sz, (UWord) b.bin_ref->val);
erts_bld_uint(NULL, &sz, b.bin_ref->val->orig_size);
sz += ERL_REFC_BITS_SIZE;
for (UWord *bits = b.ws.ws.wstart; bits < b.ws.ws.wsp; bits += 2) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I read this correctly, bits[0] is the start offset of the referenced region and bits[1] is the end offset. How about naming bits something like range_offsets or slice_bounds instead?

Comment on lines +3986 to +3990
bitstring = erts_wrap_refc_bitstring(
&hfact->off_heap->first,
&hfact->off_heap->overhead,
&hp, b.bin_ref->val, (byte*)b.bin_ref->val->orig_bytes,
0, b.bin_ref->val->orig_size * 8);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: prefer NBITS instead of * 8

Suggested change
bitstring = erts_wrap_refc_bitstring(
&hfact->off_heap->first,
&hfact->off_heap->overhead,
&hp, b.bin_ref->val, (byte*)b.bin_ref->val->orig_bytes,
0, b.bin_ref->val->orig_size * 8);
bitstring = erts_wrap_refc_bitstring(
&hfact->off_heap->first,
&hfact->off_heap->overhead,
&hp, b.bin_ref->val,
(byte*)b.bin_ref->val->orig_bytes,
0,
NBITS(b.bin_ref->val->orig_size));

Comment on lines +3931 to +3932
if (OLD_HEAP(rp))
gather_binaries(binariesp, binaries_count, OLD_HEAP(rp), OLD_HTOP(rp) /*OLD_HEND(p)*/);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick:

Suggested change
if (OLD_HEAP(rp))
gather_binaries(binariesp, binaries_count, OLD_HEAP(rp), OLD_HTOP(rp) /*OLD_HEND(p)*/);
if (OLD_HEAP(rp)) {
gather_binaries(binariesp, binaries_count, OLD_HEAP(rp), OLD_HTOP(rp) /*OLD_HEND(p)*/);
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

team:VM Assigned to OTP team VM

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants