-
Notifications
You must be signed in to change notification settings - Fork 211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using env in import causing a huge performance hit #1878
Comments
Thanks for making a proper report, @PierreR! :) Since this seems to be a perf issue with the Haskell implementation, I'll move it to |
Just to clarify, it's not using an environment variable import per se that is the problem. The issue is that if you wrap an import with an integrity check inside of another import without an integrity check then performance slows down. The reason why is because any import without an integrity check needs to be type-checked. |
@sjakobi Thanks for moving the issue (I didn't think twice before submitting it). Would like to say this is a big deal for us because the overall performance that we face while working with our attempt to provide a more user-friendly high-level openshift dhall package is well problematic. Because this repo will be use in many other dhall files, forcing us to provide the whole import with its integrity hash in each of these files represents a major operational burden. I also believe this change of performance behaviour is quite difficult to understand from the point of view of the end user (at least for me it is ;-) |
The tentative solution I have in mind for this is to add a special case to the import resolution logic to skip type-checking if the import is just a trivial re-export. I need to actually try it, though, before I can say for sure if that solution will work |
Fixes #1878 This change skips the type-checking step if an import trivially refers to another import
The fix is up here: #1879 |
* Accelerate performance of trivial re-exports Fixes #1878 This change skips the type-checking step if an import trivially refers to another import * Don't normalize a second time ... as suggested by @sjakobi I also noticed an inconsistency in returning `substitutedExpr` vs. `resolvedExpr`, which I fixed along the way * Correctly disable flaky test Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
While trying to find out workarounds around our As a reminder, given:
Here is the benchmark for a file that uses such an import:
or Using let oc = ../../../../../../bric/cicd/dhall/openshift/package.dhall sha256:9121f94c10754651cef3558bb607cff9821341d23360b65a2616ca392e195695
The difference between the 2 is consistently significant. Is there anything I could try to avoid repeating |
It looks like my best option is to do:
and use dhall freeze to update all the @Gabriel439 I don't quite understand why but is this correct ? |
@PierreR: I think it will be better for me to focus on the other import-related optimizations rather than try to solve this specific issue, since they will have a larger impact on performance |
@Gabriel439 thanks ! Actually doing a dhall freeze across all files explodes my 8G Ram very quickly ;-) (In the meanwhile I can always do a |
That sounds like a separate perf issue. Could you open an issue, ideally with instructions how to reproduce the problem? |
@sjakobi I am pretty sure the explosion of RAM is a consequence of the performance issue discussed in #1890. I was using I can fill another issue but I think it is known that given the current performance profile trying to put parallelism in the mix is not the best idea. Unfortunately sometimes parallelism is so natural that you don't realize immediately that it is the source of the explosion. Sorry for the confusion. I should have been more precise. |
@Gabriel439 Could we re-open this issue ? Even if it is going to be fix later on by the new import-related optimizations ? Exposing sha256 in all files is far from ideal for us because:
|
@PierreR: I'd need to know more details, but I believe you do not need |
@Gabriel439 Thanks for the suggestion. For some reason I haven't thought about it ... I have just tried it and I have got surprising figures, performance wise. Situation 1use an env variable
OC is an env variable with this value: http://stash.cirb.lan/projects/CICD/repos/dhall/raw/openshift/package.dhall?at=0.9.184 Situation 2use a normal import indirection
oc.dhall has this content:
I have a constant factor of 2 between both setup :
Situation 2 (that solves my issue) is twice slower than situation 1. Given the current performance issue, I can't afford to double the generation time ;-) |
@PierreR: Yeah, this is a case where I'm pretty sure I understand what is going on. The issue here is being caused by the "semi-semantic" cache, which caches intermediate imports automatically (even ones without integrity checks). So what's happening is that the large expression stored in
This is still a case where I believe it would be fixed by the import-related optimizations that I'm working on. In particular, one of the optimizations I'm testing is to not store imports inline within the cache (to avoid them being decoded multiple times) and instead storing references to separate cache entries to deduplicate decoding work. |
As discussed in discourse , benchmarking some examples from https://github.com/PierreR/dhall-packages reveals that replacing an env variable used to represents an import by its value causes a performance boost of nearly 50% (from ~35s to ~18s) :
This has been tested with
Related to dhall-lang/dhall-lang#975
The text was updated successfully, but these errors were encountered: