BUIP037: Hardfork SegWit Proposer: Amaury SÉCHET Submitted: 2016-11-12 Status: draft
There are various problem with the current transaction format, including malleability and quadratic hashing. SegWit proposes to add a new transaction format, which solves this problem but does so in a way that does not allow to spend existing UTXO. As a result, SegWit doesn't deliver on its promise. FlexTrans proposes an alternative way to solve these problems in a way that is compatible with existing UTXO, which allows us to eventually weed out the old transaction format.
This BUIP proposes to adopt a strategy similar to FlexTrans but using implementation details more similar to SegWit. Doing so should allow actors in the ecosystem, who have already implemented SegWit, to support this BUIP with minimal effort.
A new transaction format is introduced. It is recognized by its use of version 7.
The binary format is similar to existing transaction format. However, the signature script for each input is replaced by witness data, the output is limited to a version and a hash, and options and metadata fields are added for extensibility.
For compactness all integer are represented via a variable length encoding, except the version field. The leading byte of the format indicate how large the representation is. The representation is big endian.
First byte | Bits | From | To |
---|---|---|---|
0xxxxxxx | 7 | 0x0000000000000000 | 0x000000000000007f |
10xxxxxx | 14 | 0x0000000000000080 | 0x000000000000407f |
110xxxxx | 21 | 0x0000000000004080 | 0x000000000020407f |
1110xxxx | 28 | 0x0000000000204080 | 0x000000001020407f |
11110xxx | 35 | 0x0000000010204080 | 0x000000081020407f |
111110xx | 42 | 0x0000000810204080 | 0x000004081020407f |
1111110x | 49 | 0x0000040810204080 | 0x000204081020407f |
11111110 | 56 | 0x0002040810204080 | 0x010204081020407f |
11111111 | 64 | 0x0102040810204080 | 0xffffffffffffffff |
It has the following binary format:
Field Size | Description | Data type | Comments |
---|---|---|---|
1+ | version | varuint | Transaction data format version (in this case, 7) |
1+ | tx_in count | varuint | Number of Transaction inputs |
35+ | tx_in | tx_in[] | A list of 1 or more transaction inputs or sources for coins |
1+ | tx_out count | varuint | Number of Transaction outputs |
22+ | tx_out | tx_out[] | A list of 1 or more transaction outputs or destinations for coins |
1+ | option size | varuint | The size of the metadata field, in bytes |
? | option | metadata | Optional metadata relative to this input |
Transaction id (aka txid) is computed by hashing the transaction, skipping over input's witness and metadata. See the inputs section for more details.
The option field is interpreted as follow:
The metadata field contains a set of entries. Each entry is made of one varuint tag, followed by a value, which size and representation depends on the tag. One tag value cannot appear twice.
Because the size of the metadata field is known by the parser, the parser can skip over remaining metadata when it encounter a tag with a value it doesn't know how to interpret.
This BUIP defines the following tags for transaction metadata:
Tag | Description | Data type | Comments |
---|---|---|---|
10 | LockByBlock | varuint | lock_time support (block height) |
11 | LockByTime | varuint | lock_time support (timestamp) |
TxIn consists of the following fields:
Field Size | Description | Data type | Comments |
---|---|---|---|
33+ | previous_output | outpoint | The previous output transaction reference, as an OutPoint structure |
1+ | witness count | varuint | Number of witness data |
? | witness | witness | Witness data |
1+ | metadata size | varuint | The size of the metadata field, in bytes |
? | metadata | metadata | Optional metadata relative to this input |
The OutPoint structure consists of the following fields:
Field Size | Description | Data type | Comments |
---|---|---|---|
32 | hash | char[32] | The hash of the referenced transaction |
1+ | index | varuint | The index of the specific output in the transaction. The first output is 0, etc. |
Witness consist of the following fields:
Field Size | Description | Data type | Comments |
---|---|---|---|
1+ | witness length | varuint | Witness length |
? | witness | uchar[] | Witness data |
The witness field represent element that are on the stack before the redeem script starts to run.
This BUIP defines the following tags for input metadata:
Tag | Description | Data type | Comments |
---|---|---|---|
10 | LockByBlock | varuint | BIP68/112/113 support |
11 | LockByTime | varuint | BIP68/112/113 support |
Metadata tags are a BUIP039 extension point. The can be signaled with the InMD prefix.
The signature process compute sighash as double_sha256(txid + previous_output + TxOut + metadata + hashType)
TxOut needs to use the new format. When spending UTXO from older transaction, please refers to the conversion procedure in the Outputs section.
TxOut consists of the following fields:
Field Size | Description | Data type | Comments |
---|---|---|---|
1+ | value | varuint | Transaction Value |
1+ | version | varuint | Script versioning capabilities |
20+ | hash | uchar[] | Usually contains the public key as a Bitcoin script setting up conditions to claim this output |
The 3 least significant bits of the version field indicate the size of the output hash as follow:
Size class | Hash size |
---|---|
0 | 20B - 160bits |
1 | 24B - 192bits |
2 | 28B - 224bits |
3 | 32B - 256bits |
4 | 40B - 320bits |
5 | 48B - 384bits |
6 | 56B - 448bits |
7 | 64B - 512bits |
This BUIP define 4 valid versions:
Version | Semantic |
---|---|
0 | P2KH - OP_HASH160 |
3 | P2KH - OP_HASH256 |
8 | P2SH - OP_HASH160 |
11 | P2SH - OP_HASH256 |
For both version, we'll define OP_HASH as OP_HASH160 is the hash size is 20 bytes and OP_HASH256 if the hash size is 32bytes.
If we have P2KH version, the following redeem script is executed to verify the signature:
OP_DUP OP_HASH <pk_script></pk_script> OP_EQUALVERIFY OP_CHECKSIG
If we have P2SH version, the topmost element of the stack popped and hashed using OP_HASH, and the result compared to hash. If the comparison fails, the transaction is invalid. If the comparison succeed the popped element is executed as a script to validate the spend.
Out version are a BUIP039 extension point. They can be signaled with the OutV prefix.
Legacy UTXO can be converted to this new format using the following procedure:
Script pattern | Version | Script |
---|---|---|
OP_DUP OP_HASH160 <pk_hash></pk_hash> OP_EQUALVERIFY OP_CHECKSIG | 0 | pk_hash |
OP_DUP OP_HASH256 <pk_hash></pk_hash> OP_EQUALVERIFY OP_CHECKSIG | 3 | pk_hash |
<pubkey></pubkey> OP_CHECKSIG | 3 | double_sha256(pubkey) |
OP_HASH160 <script_hash></script_hash> OP_EQUAL | 8 | script_hash |
anything | 11 | double_sha256(anything) |
NB: this may require output script to be duplicated in the witness to spend legacy UTXO. This will happen in a very minimal number of cases and is on purpose. Witness data can be put in cold storage while UTXO data need to be kept hot as normal node needs to query these data frequently when operating. Because of this, it is desirable to shift as much data as possible from the UTXO set to the witness data.
- Signature malleability are prohibited as per BIP146 - OP_CHECKMULTISIG and OP_CHECKMULTISIGVERIFY dummy argument must be a null vector as per BIP147.
This BUIP keeps thing close to SegWit to minimize sunk cost for actor supporting it. It also reuse the tag system from FlexTrans to ensure the format is extensible, but limit this to the metadata and option field in order to enable BUIP039.
In conclusion, This combine the best part of SegWit (extensibility and privacy using a version/hash pair as output) and FlexTrans (extensibility, compatible UTXO) and allow for BUIP039. In addition, this limits the sunk cost for actors who prepared for SegWit.
EDIT: Revision, removing the lock_time field and reintroducing the sequence one. More revision to come as this converge toward FlexTrans.
EDIT2: Revision.
Output script are now just a hash. This is also an idea from SegWit and allow to shift data from the UTXO set to the witness data. Because this is a hard fork, it can be done in a way that is backward compatible. Inputs now have a metadata field containing optional data identified via a tag. This is a natural extension point for future extension and an ideal place to store optional data such as the sequence field.
EDIT3: Use BUIP039 to extend this transaction format.
EDIT4: Add a global option field which respects the metadata. Leverage it to implement lock_time.
EDIT5: Rework the Out structure to be more compact. Add a rationale section to compare to FlexTrans and SegWit.
EDIT6: Remove comparison to FT and SW as post length is limited.
EDIT7: Use variable size encoding for int all over the place. Define the format.