Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternative bytecode_pattern approach #56

Open
A60AB5450353F40E opened this issue Mar 10, 2023 · 5 comments
Open

Alternative bytecode_pattern approach #56

A60AB5450353F40E opened this issue Mar 10, 2023 · 5 comments

Comments

@A60AB5450353F40E
Copy link

So I was thinking to split how to store redeem script, how about split it to 3 fields:

  • redeem script pattern, computed by replacing each sequence of pushes with number of pushes, encoded as script number push
  • sequence of push sizes: just the sequence of push sizes encoded as script number pushes
  • sequence of pushes: just the pushes

One can then use _eq operator on the most general pattern, which should be better performance than regex, it would then be further narrowed down by using regex on push sizes or pushes, but those would be executed only on positive matches for the general template.
Also, the redeem script can be accurately reconstructed from this.

Could even do some more parsing and have a function to filter for the exact value of Nth push or something.

@A60AB5450353F40E
Copy link
Author

A60AB5450353F40E commented Mar 14, 2023

I made a little tool to experiment with this, using these modes:

  • STRIP_PUSHES - replace each succesive sequence of pushes with number of pushes, encoded as a script number
  • STRIP_PUSH_DATA - replace each push with payload size, encoded as a script number
  • EXTRACT_PUSHES - ignore all executable bytes, extract full pushes

Example of patternizing AnyHedge input script:

STRIP_PUSHES:    56; len=1
STRIP_PUSH_DATA: 01406001406051025401; len=10
EXTRACT_PUSHES:  40da2963cc172e7dccf9570ebd272c496d9df459f1b4d07f1961515642054db764f25c4aab947a4dbcf7793ca25bcc5a46faa83d93e7aeaa2e23a95376e386902d10020aa262ab330100863301002b45000040c0df593545220c40b8676d56388b58715b27cc0fc5de3640dfd49aa05d00e3efc5597b9d8dd783f055b0579630157599290d547e9e3c160d2170d2c300177a72103e0aa262ac3301008733010026450000514d5401043e0aa262041209a26203eab30202e53303bc6d390500743ba40b2102d3c1de9d4bc77d6c3608cbe44d10138c7488e592dc2b1e10a6cf0e92c2ecb0471976a91415d54e1b90d806548f34263afe71695a8a19716388ac1976a914ec4bebcc7842bdc802880e8692d83e9ec41b95d688ac51210374059eb4b0edb9052779a1ef93c76d1715473a9f3d6634135a3c03b82561a015210396dc6749e3bde2c230fb10bb66a444d83318521c449181a6be57123c1257a6575c79009c637b695c7a7cad5b7a7cad6d6d6d6d6d51675c7a519dc3519d5f7a5f795779bb5d7a5d79577abb5c79587f77547f75817600a0695c79587f77547f75818c9d5c7a547f75815b799f695b795c7f77817600a0695979a35879a45c7a547f7581765c7aa2695b7aa2785a7a8b5b7aa5919b6902220276587a537a96a47c577a527994a4c4529d00cc7b9d00cd557a8851cc9d51cd547a8777777768; len=508

Entering the redeem script, we get:

BYTECODE:        043e0aa262041209a26203eab30202e53303bc6d390500743ba40b2102d3c1de9d4bc77d6c3608cbe44d10138c7488e592dc2b1e10a6cf0e92c2ecb0471976a91415d54e1b90d806548f34263afe71695a8a19716388ac1976a914ec4bebcc7842bdc802880e8692d83e9ec41b95d688ac51210374059eb4b0edb9052779a1ef93c76d1715473a9f3d6634135a3c03b82561a015210396dc6749e3bde2c230fb10bb66a444d83318521c449181a6be57123c1257a6575c79009c637b695c7a7cad5b7a7cad6d6d6d6d6d51675c7a519dc3519d5f7a5f795779bb5d7a5d79577abb5c79587f77547f75817600a0695c79587f77547f75818c9d5c7a547f75815b799f695b795c7f77817600a0695979a35879a45c7a547f7581765c7aa2695b7aa2785a7a8b5b7aa5919b6902220276587a537a96a47c577a527994a4c4529d00cc7b9d00cd557a8851cc9d51cd547a8777777768; len=340
STRIP_PUSHES:    5d79519c637b69517a7cad517a7cad6d6d6d6d6d5167517a519dc3519d517a51795179bb517a5179517abb5179517f77517f75817651a0695179517f77517f75818c9d517a517f758151799f695179517f77817651a0695179a35179a4517a517f758176517aa269517aa278517a8b517aa5919b695176517a517a96a47c517a517994a4c4519d51cc7b9d51cd517a8851cc9d51cd517a8777777768; len=156
STRIP_PUSH_DATA: 54545352535501210119011951012101215179519c637b69517a7cad517a7cad6d6d6d6d6d5167517a519dc3519d517a51795179bb517a5179517abb5179517f77517f75817651a0695179517f77517f75818c9d517a517f758151799f695179517f77817651a0695179a35179a4517a517f758176517aa269517aa278517a8b517aa5919b695276517a517a96a47c517a517994a4c4519d51cc7b9d51cd517a8851cc9d51cd517a8777777768; len=173
EXTRACT_PUSHES:  043e0aa262041209a26203eab30202e53303bc6d390500743ba40b2102d3c1de9d4bc77d6c3608cbe44d10138c7488e592dc2b1e10a6cf0e92c2ecb0471976a91415d54e1b90d806548f34263afe71695a8a19716388ac1976a914ec4bebcc7842bdc802880e8692d83e9ec41b95d688ac51210374059eb4b0edb9052779a1ef93c76d1715473a9f3d6634135a3c03b82561a015210396dc6749e3bde2c230fb10bb66a444d83318521c449181a6be57123c1257a6575c005c5b515c51515f5f575d5d575c5854005c58545c545b5b5c0059585c545c5b5a5b0222025853575252000055515154; len=231

@A60AB5450353F40E
Copy link
Author

To better illustrate, here's an index (pattern, input_count) of contract fingerprints (STRIP_PUSHES mode) from blocks 0-780,000:

https://gist.github.com/A60AB5450353F40E/6b3e525d6e1220328217b9568968d6fc

@bitjson
Copy link
Member

bitjson commented Nov 21, 2023

Thanks for looking into this @A60AB5450353F40E!

This would be a great improvement for scanning contract patterns. I'd love to take a PR introducing this feature! I won't have bandwidth to work on this myself until I make some progress on #29. (Otherwise, I'll try to implement the bytecode_pattern stuff this way when I'm working on the ClickHouse migration.)

@A60AB5450353F40E
Copy link
Author

Wrote a little paper about this: https://gitlab.com/0353F40E/smart-contract-fingerprinting

and working on extending BCHN RPC with it: https://gitlab.com/0353F40E/bitcoin-cash-node/-/commits/bcpattern

If we could get this merged to BCHN then you won't have to calculate it with SQL anymore, can just read the relevant fields from node RPC

@A60AB5450353F40E
Copy link
Author

This has been merged to BCHN!

https://gitlab.com/bitcoin-cash-node/bitcoin-cash-node/-/merge_requests/1921

With next release you'll be able to get the redeem script, its pattern & fingerprint directly from the node RPC! (also, you get an array of pushed data), see below example for the schema:


        "vin": [
          {
            "txid": "e44a4f61574e6413bb29fc6fd352fbb82c65f8fa10e85c9112a4ba1373c2d34e",
            "vout": 1,
            "scriptSig": {
              "asm": "795402691 71ab0ffa0f0a39607490087014c2669e45ad326e5809b7e3c741d41795b30ea40ab45125f71c9042433362716751015de3c1b8177d47ade21352e38acbf1eb6f[
ALL|FORKID] 025e96449f9f644fff91d2f43489ca82f77090d2cb8228017132438a5887d08e27 04c3e1682f143ce4f66e21f6f33e7589e83a96f03fabf77eeca514eb7aace7526fb50fea132c1a3c70
62e2c4afe2c2547a5479ad7b547a9d5279a9876303805101b2756778a9788764006968686d51",
              "hex": "04c3e1682f4171ab0ffa0f0a39607490087014c2669e45ad326e5809b7e3c741d41795b30ea40ab45125f71c9042433362716751015de3c1b8177d47ade21352e38acbf1eb6
f4121025e96449f9f644fff91d2f43489ca82f77090d2cb8228017132438a5887d08e274c4f04c3e1682f143ce4f66e21f6f33e7589e83a96f03fabf77eeca514eb7aace7526fb50fea132c1a3c7062e2
c4afe2c2547a5479ad7b547a9d5279a9876303805101b2756778a9788764006968686d51",
              "byteCodePattern": {
                "fingerprint": "e632b7095b0bf32c260fa4c539e9fd7b852d0de454e9be26f24d0d6f91d069d3",
                "pattern": "54",
                "patternAsm": "4",
                "data": [
                  "c3e1682f",
                  "71ab0ffa0f0a39607490087014c2669e45ad326e5809b7e3c741d41795b30ea40ab45125f71c9042433362716751015de3c1b8177d47ade21352e38acbf1eb6f41",
                  "025e96449f9f644fff91d2f43489ca82f77090d2cb8228017132438a5887d08e27",
                  "04c3e1682f143ce4f66e21f6f33e7589e83a96f03fabf77eeca514eb7aace7526fb50fea132c1a3c7062e2c4afe2c2547a5479ad7b547a9d5279a9876303805101b2756778a978
8764006968686d51"
                ]
              },
              "redeemScript": {
                "asm": "795402691 3ce4f66e21f6f33e7589e83a96f03fabf77eeca5 eb7aace7526fb50fea132c1a3c7062e2c4afe2c2 4 OP_ROLL 4 OP_PICK OP_CHECKSIGVERIFY OP_ROT 4 OP_ROLL OP_NUMEQUALVERIFY 2 OP_PICK OP_HASH160 OP_EQUAL OP_IF 86400 OP_CHECKSEQUENCEVERIFY OP_DROP OP_ELSE OP_OVER OP_HASH160 OP_OVER OP_EQUAL OP_NOTIF 0 OP_VERIFY OP_ENDIF OP_ENDIF OP_2DROP 1",
                "hex": "04c3e1682f143ce4f66e21f6f33e7589e83a96f03fabf77eeca514eb7aace7526fb50fea132c1a3c7062e2c4afe2c2547a5479ad7b547a9d5279a9876303805101b2756778a9788764006968686d51",
                "byteCodePattern": {
                  "fingerprint": "6c3d207ada192debb8a108f6d404b5830cdb622650ec63a9e63b8f5b23c215d5",
                  "pattern": "547a5179ad7b517a9d5179a9876351b2756778a9788764516968686d51",
                  "patternAsm": "4 OP_ROLL 1 OP_PICK OP_CHECKSIGVERIFY OP_ROT 1 OP_ROLL OP_NUMEQUALVERIFY 1 OP_PICK OP_HASH160 OP_EQUAL OP_IF 1 OP_CHECKSEQUENCEV
ERIFY OP_DROP OP_ELSE OP_OVER OP_HASH160 OP_OVER OP_EQUAL OP_NOTIF 1 OP_VERIFY OP_ENDIF OP_ENDIF OP_2DROP 1",
                  "data": [
                    "c3e1682f",
                    "3ce4f66e21f6f33e7589e83a96f03fabf77eeca5",
                    "eb7aace7526fb50fea132c1a3c7062e2c4afe2c2",
                    "04",
                    "04",
                    "04",
                    "02",
                    "805101",
                    "",
                    "01"
                  ],
                  "p2shType": "p2sh20"
                }
              }
            },

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants