Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using env in import causing a huge performance hit #1878

Closed
PierreR opened this issue Jun 23, 2020 · 16 comments · Fixed by #1879
Closed

Using env in import causing a huge performance hit #1878

PierreR opened this issue Jun 23, 2020 · 16 comments · Fixed by #1879
Labels
performance All the slow and inefficient things

Comments

@PierreR
Copy link
Contributor

PierreR commented Jun 23, 2020

As discussed in discourse , benchmarking some examples from https://github.com/PierreR/dhall-packages reveals that replacing an env variable used to represents an import by its value causes a performance boost of nearly 50% (from ~35s to ~18s) :

- let oc = env:OC -- OC='../package.dhall sha256:088....'
+ let oc = ../package.dhall sha256:088.....

This has been tested with

bench 'dhall-to-yaml --file ./openshift/examples/application1.dhall --output ./openshift/examples/application1.yaml'

Related to dhall-lang/dhall-lang#975

@PierreR
Copy link
Contributor Author

PierreR commented Jun 23, 2020

In discourse @lisael provides a nice explanation together with a suggestion to mitigate the problem.

@sjakobi
Copy link
Collaborator

sjakobi commented Jun 23, 2020

Thanks for making a proper report, @PierreR! :)

Since this seems to be a perf issue with the Haskell implementation, I'll move it to dhall-haskell.

@sjakobi sjakobi transferred this issue from dhall-lang/dhall-lang Jun 23, 2020
@sjakobi sjakobi added the performance All the slow and inefficient things label Jun 23, 2020
@Gabriella439
Copy link
Collaborator

Just to clarify, it's not using an environment variable import per se that is the problem. The issue is that if you wrap an import with an integrity check inside of another import without an integrity check then performance slows down. The reason why is because any import without an integrity check needs to be type-checked.

@PierreR
Copy link
Contributor Author

PierreR commented Jun 23, 2020

@sjakobi Thanks for moving the issue (I didn't think twice before submitting it).

Would like to say this is a big deal for us because the overall performance that we face while working with our attempt to provide a more user-friendly high-level openshift dhall package is well problematic.

Because this repo will be use in many other dhall files, forcing us to provide the whole import with its integrity hash in each of these files represents a major operational burden.

I also believe this change of performance behaviour is quite difficult to understand from the point of view of the end user (at least for me it is ;-)

@Gabriella439
Copy link
Collaborator

The tentative solution I have in mind for this is to add a special case to the import resolution logic to skip type-checking if the import is just a trivial re-export. I need to actually try it, though, before I can say for sure if that solution will work

Gabriella439 added a commit that referenced this issue Jun 24, 2020
Fixes #1878

This change skips the type-checking step if an import trivially
refers to another import
@Gabriella439
Copy link
Collaborator

The fix is up here: #1879

@mergify mergify bot closed this as completed in #1879 Jun 25, 2020
mergify bot added a commit that referenced this issue Jun 25, 2020
* Accelerate performance of trivial re-exports

Fixes #1878

This change skips the type-checking step if an import trivially
refers to another import

* Don't normalize a second time

... as suggested by @sjakobi

I also noticed an inconsistency in returning `substitutedExpr` vs.
`resolvedExpr`, which I fixed along the way

* Correctly disable flaky test

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
@PierreR
Copy link
Contributor Author

PierreR commented Sep 11, 2020

While trying to find out workarounds around our dhall-kubernetes performance problems, I have realized that this issue is not fixed for me (dhall v1.34 or the latest v1.35.0)

As a reminder, given:

→ echo $OC
http://stash.cirb.lan/projects/CICD/repos/dhall/raw/openshift/package.dhall sha256:9121f94c10754651cef3558bb607cff9821341d23360b65a2616ca392e195695

Here is the benchmark for a file that uses such an import:

  1. Using let oc = env:OC
dhall type --file its/application/prd/prometheus/application.dhall --quiet  33.22s user 1.13s system 99% cpu 34.540 total
  1. Using let oc = http://stash.cirb.lan/projects/CICD/repos/dhall/raw/openshift/package.dhall sha256:9121f94c10754651cef3558bb607cff9821341d23360b65a2616ca392e195695
dhall type --file its/application/prd/prometheus/application.dhall --quiet  18.22s user 0.77s system 99% cpu 19.066 total

or

Using let oc = ../../../../../../bric/cicd/dhall/openshift/package.dhall sha256:9121f94c10754651cef3558bb607cff9821341d23360b65a2616ca392e195695

dhall type --file its/application/prd/prometheus/application.dhall --quiet  17.14s user 0.62s system 99% cpu 17.793 total

The difference between the 2 is consistently significant.

Is there anything I could try to avoid repeating http://stash.cirb.lan/projects/CICD/repos/dhall/raw/openshift/package.dhall sha256:9121f94c10754651cef3558bb607cff9821341d23360b65a2616ca392e195695 in every files apart from writing a script that would update such information anytime we release a new version of our package.dhall ?

@PierreR
Copy link
Contributor Author

PierreR commented Sep 11, 2020

It looks like my best option is to do:

let oc =
      env:OC sha256:9121f94c10754651cef3558bb607cff9821341d23360b65a2616ca392e195695

and use dhall freeze to update all the sha256 when an updates is pushed.

@Gabriel439 I don't quite understand why but is this correct ?

@Gabriella439
Copy link
Collaborator

@PierreR: I think it will be better for me to focus on the other import-related optimizations rather than try to solve this specific issue, since they will have a larger impact on performance

@PierreR
Copy link
Contributor Author

PierreR commented Sep 11, 2020

@Gabriel439 thanks ! Actually doing a dhall freeze across all files explodes my 8G Ram very quickly ;-) (In the meanwhile I can always do a sed )

@sjakobi
Copy link
Collaborator

sjakobi commented Sep 11, 2020

@Gabriel439 thanks ! Actually doing a dhall freeze across all files explodes my 8G Ram very quickly ;-) (In the meanwhile I can always do a sed )

That sounds like a separate perf issue. Could you open an issue, ideally with instructions how to reproduce the problem?

@PierreR
Copy link
Contributor Author

PierreR commented Sep 11, 2020

@sjakobi I am pretty sure the explosion of RAM is a consequence of the performance issue discussed in #1890. I was using fd to do a dhall freeze across multiple files which means it was happening in parallel. I have similar issues with shake if I use a shakeThreads above 1.

I can fill another issue but I think it is known that given the current performance profile trying to put parallelism in the mix is not the best idea.

Unfortunately sometimes parallelism is so natural that you don't realize immediately that it is the source of the explosion. Sorry for the confusion. I should have been more precise.

@PierreR
Copy link
Contributor Author

PierreR commented Oct 5, 2020

@Gabriel439 Could we re-open this issue ? Even if it is going to be fix later on by the new import-related optimizations ?

Exposing sha256 in all files is far from ideal for us because:

  • We are exposing internal optimizations to our non technical clients. Purely on a image/selling point of view that's not good because we can't build a non technical abstraction for our user.
  • Updating the sha256 is now a PR of hundred files instead of just in one single clean place

@Gabriella439
Copy link
Collaborator

@PierreR: I'd need to know more details, but I believe you do not need sha256 hashes on all internal imports. I'm guessing you probably only need to protect top-level imports with integrity checks.

@PierreR
Copy link
Contributor Author

PierreR commented Oct 8, 2020

@Gabriel439 Thanks for the suggestion. For some reason I haven't thought about it ...

I have just tried it and I have got surprising figures, performance wise.

Situation 1

use an env variable

let oc =
      env:OC sha256:1afc2ce9652ef7d281dff47594dea75b32009a7149db88cfc6927e87696be693

OC is an env variable with this value: http://stash.cirb.lan/projects/CICD/repos/dhall/raw/openshift/package.dhall?at=0.9.184

Situation 2

use a normal import indirection

let oc = ../../../oc.dhall

oc.dhall has this content:

http://stash.cirb.lan/projects/CICD/repos/dhall/raw/openshift/package.dhall?at=0.9.184 sha256:1afc2ce9652ef7d281dff47594dea75b32009a7149db88cfc6927e87696be693

I have a constant factor of 2 between both setup :

# situation 1
→ time dhall-to-yaml --documents --file ./its/application/cicd-docs/application.dhall --output ./its/application/cicd-docs/application.yaml
dhall-to-yaml --documents --file ./its/application/cicd-docs/application.dhal  10.79s user 0.41s system 99% cpu 11.296 total

# situation 2
→ time dhall-to-yaml --documents --file ./its/application/cicd-docs/application.dhall --output ./its/application/cicd-docs/application.yaml
dhall-to-yaml --documents --file ./its/application/cicd-docs/application.dhal  20.03s user 0.63s system 99% cpu 20.695 total

Situation 2 (that solves my issue) is twice slower than situation 1. Given the current performance issue, I can't afford to double the generation time ;-)

@Gabriella439
Copy link
Collaborator

@PierreR: Yeah, this is a case where I'm pretty sure I understand what is going on. The issue here is being caused by the "semi-semantic" cache, which caches intermediate imports automatically (even ones without integrity checks). So what's happening is that the large expression stored in ./oc.dhall is being decoded from the cache twice:

  • Once when importing http://stash.cirb.lan/projects/CICD/repos/dhall/raw/openshift/package.dhall?at=0.9.184 sha256:1afc2ce9652ef7d281dff47594dea75b32009a7149db88cfc6927e87696be693
  • Another time when importing ./oc.dhall (which is decoded from the local cache)

This is still a case where I believe it would be fixed by the import-related optimizations that I'm working on. In particular, one of the optimizations I'm testing is to not store imports inline within the cache (to avoid them being decoded multiple times) and instead storing references to separate cache entries to deduplicate decoding work.

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance All the slow and inefficient things
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants