-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Requirements for user-provided dictionaries #418
Comments
Requirement: Avoiding side-channel attacks.We do not want MechanismMost web browsers are moving towards a mechanism to isolate caches across origins to avoid such side-channel attacks. I believe that this means any attempt to use a dictionary for e.g. the same framework across all origins would be counter-productive. This isolation mechanism should be necessary and sufficient for our needs. ConclusionNo blocker, as long as we concentrate on site-specific dictionaries. Open questions
|
On security, there's an attack specific to compression where an attacker observes/infers the compressed length of a transfer and uses that to infer something about the content being transferred; see HEIST. With a dictionary the disclosure could be of content compressed with the dictionary, or the content of the dictionary itself. One simple mitigation would be use a credential-less fetch for the dictionary, which should discourage sites from putting sensitive strings in the dictionary. I think there's a few practical issues to consider: The dictionary to use needs to be specified with the file. I think the best choice is to have a HTTP header indicating the URL to retrieve the dictionary from. This makes more sense to me than baking URLs into files, because it decouples producing files and dictionaries from deploying them. That lines up with the status quo on the web—I can get a minified threejs from GitHub and deploy it on my server without changing the content of the file. However dictionary loading works, we should work out advice to developers about how to use push/prefetch/etc. correctly. Should files have one or multiple external dictionaries? I expect string change frequency follow some exponential distribution, so it might make sense to have a couple of dictionaries: one for very slowly changing strings, another for shared but recently changed strings, etc. From a format perspective this is straightforward. It would complicate how indicating external dictionaries worked though. The developer needs an easy way to debug files being paired with the wrong dictionary. The format doesn't have any mechanism to check that it's being decompressed with the correct dictionary. I think this is separate to the security-sensitive problem of dictionary index out-of-bounds type stuff; rather it's a developer ergonomics issue. If a BinAST file fails to decode as expected because it was paired with the wrong dictionary, it would be nice to give the developer a message on the devtools console. I think having a few bytes in the dictionary and a few bytes in the format which the browser matches should be sufficient. It is tempting to hash the dictionary but this prevents scenarios like localisation, sending debug dictionaries with verbose error messages, and complicates some content addressable stores which want to hash static resources in a batch.
I don't think this would work well because JavaScript is usually a static resource, and static resources are often served from CDNs or cookieless domains—and hence they are cross origin resources. Requiring the dictionary to be served from the same origin would complicate that kind of serving.
There are a lot of ways to measure performance when talking about shared dictionaries:
Getting compression data from more sites would be good. Those should use samples of JavaScript over time. Having data from a browser vendor about cache performance is a critical input to a model. |
Yes, that's one of the reasons I started https://github.com/Yoric/real-js-samples .
What data do you need? |
(ok, I need to do some more reading about HEIST)
That strikes me as hard/impossible to enforce. edit I had initially written "not hard". This was a typo, I meant the opposite.
Agreed.
Agreed.
Agreed on both accounts. Waiting for feedback from others.
Agreed. Waiting for @RReverser's input on this point.
Ok, let's discuss this in another issue :) |
Ok, did some reading on HEIST. In general, I agree that we want to avoid storing confidential data in the dictionary, but I don't really see how to enforce this. This is pretty much equivalent to disallowing storing confidential data in JS code, right? In the specific case of HEIST, wouldn't it be a better protection if we added (or allowed) the addition of a random number of random bytes at the end of the dictionary? |
Ok, I just had a conversation with @martinthomson (Mozilla security) about HEIST. As far as I can tell, since we're not compressing substrings, BinAST itself should not create HEIST issues that do not already exist. What can create issues, on the other hand, is the brotli post-compression, if it is applied to a file that contains both user-controlled data and confidential data, whether it's a/ a dictionary; b/ a compressed file. a/ I have difficulties imagining webdevs creating a dictionary that contains either user-controlled data or confidential data, much less both. b/ Similarly, I have difficulties imagining webdevs compressing a JS file that contains either user-controlled data or confidential data, much less both. On the other hand, once we have a fast encoder, it is quite possible that webdevs could use BinAST to compress a JSON file that contains both. While we could side-step the issue by refusing to compress JSON, that would probably just cause webdevs to hide the JSON as JS, which would be even worse. A suggestion by @martinthomson would be to add an encoder command-line flag, to let the webdev specify whether the file contains user-controlled data and whether the file contains confidential data. If both are specified, we may still encode with BinAST, but not with brotli. To discourage the webdev from applying brotli regardless, we may wish to move brotli compression inside the file. Regardless, as you mention, @dominiccooney, we should make clear for webdevs that mixing user-controlled and confidential data a bad idea. |
Let's talk about seeding the dictionary (i.e. the first fetch). Depending on network performance/contention, I believe that there may be two cases.
Case 1. is fairly easy to specify. Case 2. is more complicated, as it may require additional push-style HTTP headers and/or something like an extension to I believe that we should concentrate on case 1. for the moment. |
With the current context 0.1 format (reference implementation), we already achieve a compression level ~5% better than Brotli on Facebook, assuming a site-specific dictionary.
On the assumption that we are heading towards user-provided dictionaries, I'm opening this issue to discuss the requirements for such dictionaries.
Stuff that I believe everybody agrees on
(will be updated)
Open questions
The text was updated successfully, but these errors were encountered: