You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@jridgewell and I brainstormed a bit how we could make the "Scopes" encoding forward compatible without making it too complex or heavy handed. I'll implement some of these in the coming days to get some numbers to compare.
Future-proof "Scopes" encoding
The current "Scopes" encoding is not ideal w.r.t. to future extension:
Adding new fields to OriginalScope and GeneratedRange in a backwards compatible way is impossible. Any tool implementing the current proposal would break once we add new optional fields to either data structure.
The encoding uses the , and ; characters on top of base64 encoded VLQ numbers. Moving to a future binary source map format will require a different encoding for "Scopes" to account for , and ;.
We should aim for an encoding that is both forwards-compatible and is purely VLQ based: So the only difference between the current JSON source map format and a potential future binary format is how VLQs are encoded.
The crux of the issue is to find the right balance between
retaining some flexibility for future extensions without going overboard (e.g DWARF-style encoding),
encoding/decoding complexity,
and encoded size.
This document proposes some potential "Scopes" encodings that keep both goals in mind while aiming for a healthy balance.
Grammar
The encoding formats are presented in a EBNF-like grammar with:
there is only one terminal: a VLQ. Each terminal is labelled and we denote them with uppercase (e.g. TERMINAL is a VLQ with the label 'TERMINAL').
non-terminals denoted with snake case (e.g. non_term).
symbol* means zero or more repetitions of symbol.
symbol? means zero or one symbol.
symbol[N] means N occurrences of symbol.
Option A - Prefix items with their length
original_scopes = (LENGTH original_item)*
original_item = original_start_item | original_end_item
original_start_item =
LINE
COLUMN
FLAGS
NAME? // present if FLAGS<0> is set
KIND? // present if FLAGS<1> is set
VARIABLE_COUNT
VARIABLE[VARIABLE_COUNT]
original_end_item =
LINE
COLUMN
generated_ranges = (LENGTH generated_item)*
generated_item = generated_start_item | generated_end_item
generated_start_item =
COLUMN // the actual value is COLUMN<1:n>.
LINE? // if COLUMN<0> is set.
FLAGS
DEFINITION_SOURCE_OFFSET? // present if FLAGS<0> is set
DEFINITION_ITEM_OFFSET? // present if FLAGS<0> is set
CALL_SITE_SOURCE? // present if FLAGS<1> is set
CALL_SITE_LINE? // present if FLAGS<1> is set
CALL_SITE_COLUMN? // present if FLAGS<1> is set
BINDING_COUNT
binding[BINDING_COUNT]
binding =
EXPR_OR_SUB_RANGE_LENGTH // -1 = not available, >=0 offset into "names"
EXPR_0? // present if EXPR_OR_SUBRANGE_LENGTH < -1.
sub_range_binding[-EXPR_OR_SUBRANGE_LENGTH - 1]
sub_range_binding =
LINE
COLUMN
EXPR
generated_end_item =
COLUMN // the actual value is COLUMN<1:n>.
LINE? // if COLUMN<0> is set.
This is identical to the current proposal modulo:
Each item is prefixed with the number of VLQs in the item
Variables in OriginalScope and bindings in GeneratedRange are prefixed with their length
columns in the generated range encode whether a line VLQ is present or not
original_start_item and original_end_item are distinguished by their length: A "end" item always has 2 VLQs while a "start" item has at least 3. generated_start_item and generated_end_item are distinguished by their length: A "end" item has 1 or 2 VLQs while a "start" item has at least 3.
Option B - Add "remaining" count in the presence of unknown flags
Similar to Option A, so we'll list only the changed productions:
original_scopes = original_item*
original_start_item =
LINE
COLUMN
FLAGS
NAME? // present if FLAGS<0> is set
KIND? // present if FLAGS<1> is set
VARIABLE_COUNT
VARIABLE[VARIABLE_COUNT]
REMAINING? // present if FLAGS<n:3> is not zero.
REST[REMAINING]
generated_ranges = generated_item*
generated_start_item =
COLUMN // the actual value is COLUMN<1:n>.
LINE? // if COLUMN<0> is set.
FLAGS
DEFINITION_SOURCE_OFFSET? // present if FLAGS<0> is set
DEFINITION_ITEM_OFFSET? // present if FLAGS<0> is set
CALL_SITE_SOURCE? // present if FLAGS<1> is set
CALL_SITE_LINE? // present if FLAGS<1> is set
CALL_SITE_COLUMN? // present if FLAGS<1> is set
BINDING_COUNT
binding[BINDING_COUNT]
REMAINING? // present if FLAGS<n:4> is not zero.
REST[REMAINING]
Advantages over Option A:
We only pay the price of encoding the item length once we actually add new fields
Variables/bindings are not included, so REMAINING stays small even for scopes/ranges with lots of variables
Quirks:
Adding new marker flags to FLAGS (not new fields) requires generators to emit a REMAINING value of 0.
Option C - Tag-Length-Value
Similar to Option A but we prefix each item not only with it's length but a tag as well. The advantages are:
We can encode scopes and ranges in one blob. That is the JSON could have a single "scopes" field containing the combination of "originalScopes" and "generatedRanges".
Start/end items can be distinguished by their tag.
We keep the door open for not only extending original_start_item and generated_start_item, but adding new item types all-together.
Since it's similar to option A, we'll list only the changed productions:
This is a variant to Option C. Instead of using original_start_item and original_end_item, we combine both into a original_item. Similar to DWARF, nesting is achieved by using a special tag to denote the end of a item's children.
Example of nested scopes (tags only): [0x1, ...<length + content>, 0x1, ...<length + content>, 0x0, 0x0].
This comes with some special rules if we don't want to lose the efficiency of relative line/column numbers for start and end locations:
A scopes or ranges' start location is relative to the preceding siblings' end location, or the parents' start location if it's the first child.
A scopes or ranges' end location is relative to it's last child's end location, or it's start location if it does not have any children.
There is also the question of START_LINE, and END_LINE in generated_item. We could encode it's presence in FLAGS or use the LSB of the respective *_COLUMN.
The text was updated successfully, but these errors were encountered:
@jridgewell and I brainstormed a bit how we could make the "Scopes" encoding forward compatible without making it too complex or heavy handed. I'll implement some of these in the coming days to get some numbers to compare.
Future-proof "Scopes" encoding
The current "Scopes" encoding is not ideal w.r.t. to future extension:
Adding new fields to
OriginalScope
andGeneratedRange
in a backwards compatible way is impossible. Any tool implementing the current proposal would break once we add new optional fields to either data structure.The encoding uses the
,
and;
characters on top of base64 encoded VLQ numbers. Moving to a future binary source map format will require a different encoding for "Scopes" to account for,
and;
.We should aim for an encoding that is both forwards-compatible and is purely VLQ based: So the only difference between the current JSON source map format and a potential future binary format is how VLQs are encoded.
The crux of the issue is to find the right balance between
This document proposes some potential "Scopes" encodings that keep both goals in mind while aiming for a healthy balance.
Grammar
The encoding formats are presented in a EBNF-like grammar with:
TERMINAL
is a VLQ with the label 'TERMINAL').non_term
).symbol*
means zero or more repetitions ofsymbol
.symbol?
means zero or onesymbol
.symbol[N]
means N occurrences ofsymbol
.Option A - Prefix items with their length
This is identical to the current proposal modulo:
OriginalScope
and bindings inGeneratedRange
are prefixed with their lengthoriginal_start_item
andoriginal_end_item
are distinguished by their length: A "end" item always has 2 VLQs while a "start" item has at least 3.generated_start_item
andgenerated_end_item
are distinguished by their length: A "end" item has 1 or 2 VLQs while a "start" item has at least 3.Option B - Add "remaining" count in the presence of unknown flags
Similar to Option A, so we'll list only the changed productions:
Advantages over Option A:
Quirks:
REMAINING
value of 0.Option C - Tag-Length-Value
Similar to Option A but we prefix each item not only with it's length but a tag as well. The advantages are:
original_start_item
andgenerated_start_item
, but adding new item types all-together.Since it's similar to option A, we'll list only the changed productions:
Option C2 - DWARF-style zero entries
This is a variant to Option C. Instead of using
original_start_item
andoriginal_end_item
, we combine both into aoriginal_item
. Similar to DWARF, nesting is achieved by using a special tag to denote the end of a item's children.Example of nested scopes (tags only):
[0x1, ...<length + content>, 0x1, ...<length + content>, 0x0, 0x0]
.This comes with some special rules if we don't want to lose the efficiency of relative line/column numbers for start and end locations:
There is also the question of
START_LINE
, andEND_LINE
ingenerated_item
. We could encode it's presence in FLAGS or use the LSB of the respective*_COLUMN
.The text was updated successfully, but these errors were encountered: