Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mappings v2: More Efficient Encoding #155

Open
jridgewell opened this issue Nov 20, 2024 · 1 comment
Open

Mappings v2: More Efficient Encoding #155

jridgewell opened this issue Nov 20, 2024 · 1 comment

Comments

@jridgewell
Copy link
Member

jridgewell commented Nov 20, 2024

In the last Scopes meeting, I mentioned that I had been working on a project that reduced reduced Google's module graph encoding by ~30% by using a packed VLQ encoding, essentially removing any separators like , or ;. Applying this to our mappings encoding, I think we can remove ~30% (or ~50% if we switch to an 8-bit VLQ and binary encoding).

Removing Separators

In order to remove separators, we first need to know exactly how many lines are present in the map, and how many mappings are present on each line. That should allow us to do a simple loop (ignore the relative deltas, this is just psuedocode):

const lines = readInt();
for (let i = 0; i < lines; i++) {
  const mappings = readInt();
  for (let j = 0; j < mappings; j++) {
    readMapping()
  }
}

The problem is that each mapping has a variable number of fields (either 1, 4, or 5). Without a , separator, we don't know when to stop reading the fields for the current mapping. So we also need to encode the length of each mapping. It's easy to do this with a field before each mapping:

function readMapping() {
  const length = readInt();
  const genCol = readInt();
  if (length === 1) return [genCol];

  const index = readInt();
  const line = readInt();
  const col = readInt();
  if (length === 4) return [genCol, index, line, col];

  return [genCol, index, line, col, readInt()];
}

This alone is pretty good, but we can still do better. genColumn is frequently very small, just a few bits of data. Instead of wasting 8 bits to encode the length of the mapping, we can use the low bits of genColumn:

function readMapping() {
  const data = readInt();
  const length = data & 0b11 === 0b11 ? 5 : data & 0b11 === 0b01 ? 4 : 1;
  const genCol = data >>> 2;
  //...
}

We can still do better. genColumn is never negative in practice. Instead of using zigzag encoding, we could just encode it as a positive int.

function readMapping() {
  const data = readPosInt();
  //...
}

Just eliminating separators can save us ~10-15%.

Omitting sourcesIndex and sourceLine

The next thing that I've noticed is that sourcesIndex rarely changes between mappings, and the same with sourceLine. This makes a lot of sense, if we're transpiling we'll be outputting a lot of mappings that are on the same line as the previous one.

If the sourcesIndex delta or the sourceLine delta are 0, we could omit them from the encoding. This just requires 2 more bits, bringing our total to 4 bits of data packing. We can still encode this pretty easily in genColumn:

function readMapping() {
  const data = readPosInt();
  const length = data & 0b0101; // 1, 4, or 5
  const sourcesIndexPresent = data & 0b0010;
  const sourceLinePresent = data & 0b1000;
  
  const genCol = data >>> 4;
  if (length === 1) return [genCol];

  const index = sourcesIndexPresent ? readInt() : lastIdx;
  const line = sourceLinePresent ? readInt() : lastLine;
  const col = readInt();
  if (length === 4) return [genCol, index, line, col];

  return [genCol, index, line, col, readInt()];
}

This saves us ~25-35%


Analysis sheet, code

This is a highlight from Google Search's internal source map:

Bytes Spec 8-bit VLQ No Sep (6-bit VLQ) No Sep (8-bit VLQ) Flags (6-bit VLQ) Flags (8-bit VLQ)
raw 2,790,581 2,556,308 2,376,350 2,122,263 1,815,680 1,447,719
gzip 6 896,546 885,869 883,010 873,562 842,970 822,952
brotli 6 841,365 815,145 826,303 804,323 794,121 774,462
Percent Spec 8-bit VLQ No Sep (6-bit VLQ) No Sep (8-bit VLQ) Flags (6-bit VLQ) Flags (8-bit VLQ)
raw 0.00% -8.40% -14.84% -23.95% -34.94% -48.12%
gzip 6 -62.29% -1.19% -1.51% -2.56% -5.98% -8.21%
brotli 6 -64.53% -3.12% -1.79% -4.40% -5.62% -7.95%
@jridgewell
Copy link
Member Author

jridgewell commented Nov 20, 2024

Updated the gist and analysis with 2 new modifications to the bit flags design:

Bytes Spec No Sep (6-bit VLQ) Flags (6-bit VLQ) Flags (Omit Names, 6-bit) Flags (4/5 tuples, 6-bit) 8-bit VLQ No Sep (8-bit VLQ) Flags (8-bit VLQ) Flags (Omit Names, 8-bit) Flags (4/5 tuples, 8-bit)
raw 2,790,581 2,376,350 1,815,680 1,709,584 1,729,639 2,556,308 2,122,263 1,447,719 1,341,623 1,427,680
gzip 6 896,546 883,010 842,970 833,045 826,475 885,869 873,562 822,952 812,541 810,704
brotli 6 841,365 826,303 794,121 785,213 781,740 815,145 804,323 774,462 768,376 761,421
Percent Spec No Sep (6-bit VLQ) Flags (6-bit VLQ) Flags (Omit Names, 6-bit) Flags (4/5 tuples, 6-bit) 8-bit VLQ No Sep (8-bit VLQ) Flags (8-bit VLQ) Flags (Omit Names, 8-bit) Flags (4/5 tuples, 8-bit)
raw 0.0% -14.8% -34.9% -38.7% -38.0% -8.4% -23.9% -48.1% -51.9% -48.8%
gzip 6 -67.9% -1.5% -6.0% -7.1% -7.8% -1.2% -2.6% -8.2% -9.4% -9.6%
brotli 6 -69.8% -1.8% -5.6% -6.7% -7.1% -3.1% -4.4% -8.0% -8.7% -9.5%

Omit Names

In the bit flags design, we're using 2 bits to represent 3 mapping lengths, leaving us with a valid 4th value. We can use that 4th value to signal when a 5-length mapping has a 0 delta in the namesIndex field.

function readMapping() {
  const data = readPosInt();
  const length = data & 0b0101; // 0, 1, 4, or 5
  const sourcesIndexPresent = data & 0b0010;
  const sourceLinePresent = data & 0b1000;
  const namesIndexPresent = length === 0;
  
  const genCol = data >>> 4;
  if (length === 1) return [genCol];

  const index = sourcesIndexPresent ? readInt() : lastIdx;
  const line = sourceLinePresent ? readInt() : lastLine;
  const col = readInt();
  if (length === 4) return [genCol, index, line, col];

  const name = namesIndexPresent ? readInt() : lastName;
  return [genCol, index, line, col, name];
}

This saves an additional 1%

Remove sourceless (1-length) mappings

The other option is to remove sourceless segments entirely. 99.999999999% of mappings have some original source location, but we're using 2 bits to encode the 3 mapping lengths. If we remove sourceless segments, then we only need 1 bit to represent 4/5 length mappings.

function readMapping() {
  const data = readPosInt();
  const length = 4 | (data & 0b001)
  const sourcesIndexPresent = data & 0b010;
  const sourceLinePresent = data & 0b100;
  
  const genCol = data >>> 3;
  const index = sourcesIndexPresent ? readInt() : lastIdx;
  const line = sourceLinePresent ? readInt() : lastLine;
  const col = readInt();
  if (length === 4) return [genCol, index, line, col];

  return [genCol, index, line, col, readInt()];
}

This saves an additional 6% (in everything but the Closure source maps used by Google)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant