Skip to content

wip: add filtered L1 Bundle copycat depth-aware indexing by ID#734

Open
charmful0x wants to merge 17 commits intoneo/edgefrom
feat/bundles-copycat
Open

wip: add filtered L1 Bundle copycat depth-aware indexing by ID#734
charmful0x wants to merge 17 commits intoneo/edgefrom
feat/bundles-copycat

Conversation

@charmful0x
Copy link

@charmful0x charmful0x commented Mar 6, 2026

About

performance-aware resource-minimalist filtering-aware L1 bundles copycat indexing - status: wip

configurability

  • add owner alias (reduce address computation at every query):
Opts2 = dev_copycat_arweave:add_owner_alias(    <<"FPjbN_btYKzcf8QASjs30v5C0FPv7XpwKXENBW8dqVw">>,    <<"neo-bundler">>,    Opts1  ).
  • set L1 bundle safe size cap (useful to configure the memory allowed usage per process, taking into accoutn arweave_workers count * MEMORY_SAFE_CAP)
dev_copycat_arweave:set_memory_safe_cap(xxxxbytes, Opts).
  • safe max recursion depth
~copycat@1.0/arweave&depth=safe_max

in id=.. path: &depth=safe_max recurse every bundle till DEPTH_RECURSION_CAP. if no depth is provided (1..safe_max), it defaults to safe_max.

  • defaults
-define(DEPTH_L1_OFFSETS, 1).
-define(DEPTH_RECURSION_CAP, 4).
%% 1GB in bytes
-define(MEMORY_SAFE_CAP, 1024 * 1024 * 1024).

supported paths

at the moment, the new path is the &id=.. + filters. how it works:

  1. it requires the L1 IDs offsets to be present in store
  2. it fetch the L1 ID headers to retrieve owner and tags + validates that its a bundle
  3. it applies the filters to the L1 TX, pass on it if the filters hit the guating
  4. if filters passed, it downloads the L1 TX bytestream, does depth recusion in memory, index children offsets -> recurses in memory and indexes descendant offsets

to test the feature locally, and to simulate having the required local L1 tx offset index, index a block with depth=1

  application:ensure_all_started(hb).
  application:ensure_all_started(inets).
  application:ensure_all_started(ssl).
  hb_http:start().

  TestStore = hb_test_utils:test_store().
  StoreOpts = #{<<"index-store">> => [TestStore]}.
  Store = [
    TestStore,
    #{
      <<"store-module">> => hb_store_arweave,
      <<"name">> => <<"cache-arweave">>,
      <<"index-store">> => [TestStore],
      <<"arweave-node">> => <<"https://arweave.net">>
    }
  ].

  Opts = #{
    store => Store,
    arweave_index_ids => true,
    arweave_index_store => StoreOpts,
    arweave_index_workers => 4,
    prometheus => false,
    http_client => httpc,
    http_retry => 1,
    http_retry_time => 200,
    http_retry_mode => constant,
    http_retry_response => [failure]
  }.

  Opts1 = dev_copycat_arweave:add_owner_alias(
    <<"FPjbN_btYKzcf8QASjs30v5C0FPv7XpwKXENBW8dqVw">>,
    <<"neo-bundler">>,
    Opts
  ).

  %% simulate already having the L1 offset index locally
  hb_ao:resolve(
    <<"~copycat@1.0/arweave&from=1870797&to=1870797&mode=write&depth=1">>,
    Opts1
  ).

for block https://aolink.ar.io/#/block/1870797

now if we assume we have the required L1 TXs offsets indexed locally, we can integrate over IDs and assert filters:

hb_ao:resolve(<<"~copycat@1.0/arweave&id=6DODXspJYXcMbUvadcAQ9FoP3xh5N0dhDCiOwU7d4Q4&mode=write&depth=safe_max&include-owner-alias=neo-bundler&exclude-tag=Bundler-App-Name:Redstone">>, Opts1).

res:
{ok,#{items_count => 1404,bundle_count => 1,skipped_count => 0,

if we try a L1 TX with redstone filter enabled: must be a turbo owner L1 TX (bundle) but exclude redstone tags.

txid: https://viewblock.io/arweave/tx/5q95xvC3_BbZa5C4hypcgnIHkyLSlgGef-OpAdMjoOY

  Opts1A = dev_copycat_arweave:add_owner_alias(
    <<"JNC6vBhjHY1EPwV3pEeNmrsgFMxH5d38_LHsZ7jful8">>,
    <<"turbo">>,
    Opts1
  ).
42> hb_ao:resolve(    <<"~copycat@1.0/arweave&id=5q95xvC3_BbZa5C4hypcgnIHkyLSlgGef-OpAdMjoOY&mode=write&include-owner-alias=turbo&exclude-tag=Bundler-App-Name:Redstone">>,    Opts1A  ).
=== HB DEBUG ===[3273092ms in <0.1056.0> @ hb_ao:194 / hb_ao:204 / hb_ao:543 / dev_copycat_arweave:137 / dev_copycat_arweave:689]==>
arweave_tx_skipped, tx_id: [Explicit:] <<"5q95xvC3_BbZa5C4hypcgnIHkyLSlgGef-OpAdMjoOY">>, reason: exclude_tag_match
{ok,#{items_count => 0,bundle_count => 0,skipped_count => 1,
      <<"priv">> =>
          #{<<"hashpath">> =>
                <<"M8hn9wfAiAF8pQirF-j48KgJ1lXxkD02MlDX0uWhUoM/1SgyB85LEUXqwat2_18-vyaJIk2NpNKeiUKNpbrP4XA">>}}}

TODOs:

  • fallback to chunked/raw indexing when L1 data_size exceeds MEMORY_SAFE_CAP
  • make DEPTH_RECURSION_CAP configurable, with getter/setter parity to MEMORY_SAFE_CAP
  • add owner driven discovery for ~copycat@1.0/arweave&include-owner=<OWNER_ADDR>&exclude-tag=<TAG_NAME>:<TAG_VALUE>
  • cleanup & perf optimization

@charmful0x
Copy link
Author

added support for comma separated include-owner-alias filter:

48> hb_ao:resolve(    <<"~copycat@1.0/arweave&id=6DODXspJYXcMbUvadcAQ9FoP3xh5N0dhDCiOwU7d4Q4&mode=write&include-owner-alias=neo-bundler,turbo&exclude-tag=Bundler-App-Name:Redstone">>,    Opts2  ).
{ok,#{items_count => 1404,bundle_count => 1,skipped_count => 0,
      <<"priv">> =>
          #{<<"hashpath">> =>
                <<"M8hn9wfAiAF8pQirF-j48KgJ1lXxkD02MlDX0uWhUoM/sV4DMZxa1fCz2ajn144goQSnEGoJcaTSHrTaOqvhHps">>}}}
49> 

@charmful0x
Copy link
Author

get/set recursion cap -- overrides DEPTH_RECURSION_CAP and defaults to it if not set.

dev_copycat_arweave:set_depth_recursion_cap(5, Opts).

dev_copycat_arweave:get_depth_recursion_cap(Opts).

@charmful0x
Copy link
Author

charmful0x commented Mar 6, 2026

new features:

hb_ao:resolve(    <<"~copycat@1.0/arweave&id=fFt5eteych-ppitofKFoeuzm5I_2CyY1ce4FSAGC3Ow&mode=write&load-l1-offset=true&include-owner-alias=neo-bundler,turbo&exclude-tag=Bundler-App-Name:Redstone&include-tag=Bundler-App-Name:ao">>,    Opts2  ).
  • &include-tag=Key:Value : require the L1 TX header to contain that tag pair
  • &load-l1-offset=true : if the L1 TX offset is not present in the local store, fetch it from the network, write it locally, and continue with the existing id=... indexing path

@charmful0x
Copy link
Author

charmful0x commented Mar 7, 2026

updated the MEMORY_SAFE_CAP to match the highest recorded L1 data tx size under turbo's ao bundler:

Ardrive Turbo (data uploads stopped at block 867572 ): JNC6vBhjHY1EPwV3pEeNmrsgFMxH5d38_LHsZ7jful8

{
  "total_size_bytes": "69035238626980",
  "total_size_gb": "64294.076",
  "largest_txid": "DEk-63yOLQNt04ZjUeTYJ4GJ18ur7kDNLg_6wBsVvz0",
  "largest_tx_size": "5369655672",
  "smallest_txid": "Bmbz9xuw3m1whhBXa2hI1OZXp68Bi0Wsu-rdCV2YOwg",
  "smallest_tx_size": "3836"
}

neo-uploader (actively uploading data): FPjbN_btYKzcf8QASjs30v5C0FPv7XpwKXENBW8dqVw

latest snapshot stats:

  "total_size_bytes": "3663945608",
  "total_size_gb": "3.412",
  "largest_txid": "wzoLJaO6ahteoIU_UfjC0noJPM1PxV7XVDFH_QLM0nE",
  "largest_tx_size": "84331158",
  "smallest_txid": "eNzAdhwi6GC9HMcjfAn-MFWaD_zHOK8bFokqPeStA6E",
  "smallest_tx_size": "2231"

bucketed distribution

{
"input_file": "turbo-txs.json",
"total_entries": 198400,
"entries_with_size": 198400,
"under_100mb": 93767,
"under_100mb_pct": 47.26,
"under_250mb": 139179,
"under_250mb_pct": 70.15,
"under_500mb": 164341,
"under_500mb_pct": 82.83,
"under_1gb": 181073,
"under_1gb_pct": 91.27,
"under_2gb": 192998,
"under_2gb_pct": 97.28,
"under_3gb": 196288,
"under_3gb_pct": 98.94,
"under_4gb": 197542,
"under_4gb_pct": 99.57,
"under_5gb": 198093,
"under_5gb_pct": 99.85,
"over_6gb": 0,
"over_6gb_pct": 0,
"buckets": {
  "bucket_0_100mb": {
    "count": 93767,
    "pct": 47.26
  },
  "bucket_100_250mb": {
    "count": 45412,
    "pct": 22.89
  },
  "bucket_250_500mb": {
    "count": 25162,
    "pct": 12.68
  },
  "bucket_500mb_1gb": {
    "count": 16732,
    "pct": 8.43
  },
  "bucket_1_2gb": {
    "count": 11925,
    "pct": 6.01
  },
  "bucket_2_3gb": {
    "count": 3290,
    "pct": 1.66
  },
  "bucket_3_4gb": {
    "count": 1254,
    "pct": 0.63
  },
  "bucket_4_5gb": {
    "count": 551,
    "pct": 0.28
  },
  "bucket_5_6gb": {
    "count": 307,
    "pct": 0.15
  },
  "bucket_over_6gb": {
    "count": 0,
    "pct": 0
  }
}
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant