-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assertion failure: assert next_code_length >= code_length when code_length overflows #440
Comments
So, we have too many numbers in the table of numbers? |
it tries to assign 21-bit length code for we'll need to define the fallback algorithm for this kind of case. |
That algorithm is a non-normative illustration of assigning codes, don't
take it as spec. We should add a note about the code length issue.
We can and should leave code length limiting algorithms as an
implementation detail of the encoder, because there are slow, accurate
algorithms and fast approximating algorithms and depending on the
circumstances either could be preferable. We don't want to bake one into
the spec and the decoder doesn't care.
If you need one to crib from you can dig up papers on length limited
Huffman codes for the accurate algorithm or zstd has a fast approximation
(probably many other Huffman compressors too.) Facebook does something like
zstd in our Hack encoder.
…On Tue, Sep 10, 2019, 11:35 PM arai-a ***@***.***> wrote:
it tries to assign 21-bit length code for 1 and 2 symbols.
we'll need to define the fallback algorithm for this kind of case.
(IMO, this doesn't happen in usual wild case, so just falling back to
assign same length code for all items should be fine)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#440?email_source=notifications&email_token=AAANOUAT6TEVDHYZ25CBIBTQI6WDJA5CNFSM4IVIDCFKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6LKB7A#issuecomment-529965308>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAANOUDW2CQMFD6N6VX6LFDQI6WDJANCNFSM4IVIDCFA>
.
|
Ah, good to know. We're using it as spec all over SpiderMonkey for the time being :) |
Ok. A correct encoder should not output lengths > 20 bits and SpiderMonkey
should reject files with 21+ bit length codes.
BTW 20 bits was an arbitrary choice and may be something to revisit. Some
people may need a million strings... I hope not but I would not be
surprised if there's a file out there that big.
FWIW Facebook does have files where the *optimal* code has lengths > 20
bits and hence we use that heuristic to limit code lengths.
…On Fri, Sep 13, 2019, 2:27 PM David Teller ***@***.***> wrote:
That algorithm is a non-normative illustration of assigning codes, don't
take it as spec. We should add a note about the code length issue.
Ah, good to know. We're using it as spec all over SpiderMonkey for the
time being :)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#440?email_source=notifications&email_token=AAANOUCQHMNQYCF5QFEFPATQJMQDVA5CNFSM4IVIDCFKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6T732A#issuecomment-531103208>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAANOUDDAO53YJPQB3ZVDETQJMQDVANCNFSM4IVIDCFA>
.
|
Oh, got it, 20 is part of the spec, but the algorithm isn't. So, given that there are real-world files where 20 isn't sufficient for an optimal code, do we wish to keep 20 as a hard limit? Or do we expect to become suboptimal for huge files (and possibly issue a warning telling the developers that their file is simply too large)? |
Steps to reproduce:
(the result is 1.1MB of JS file)
Actual result:
crash at
binjs-ref/fbssdc/model.py
Line 472 in 5cdddbc
The text was updated successfully, but these errors were encountered: