-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmark on a bigger example #4
Comments
It seems that the self-interpreter example is the best candidate for benchmarking so far. The input list length can be set to 10 or some higher value. After I'm done with a few implementation changes, I'll be posting here benchmarking data and musings about what improvements can be made. |
I'm now pretty confident that homeomorphic embedding is what slows down the supercompilation process the most (unsurprisingly). Take this example:
The results are:
However, if I change the implementation of
|
After commit e791c38 on the above example (
A self-interpreter with the following
Before:
After:
|
Benchmarking a self-interpreter with the following
Before commit 26a88fc (now using
After:
Also, it turns out that both caches, the global size cache and the local result cache, improve performance a lot. If we remove any one of them, the performance will be much worse. |
Also, I should note that the above contrived example now takes around 32 seconds to supercompile (before adding the result cache, it used to take around 15 seconds). The reasons are:
However, I think that self-interpretation exhibits a more "real-life" scenario of supercompilation, which is far more important to us. Therefore, I'll keep the result cache anyway. |
This is a self-interpretation memory benchmark (yes, now using
That is, Mazeppa has allocated a total of ~782.8 megabytes:
However, only ~13.0 megabytes ( What this data tells us? Probably that the OCaml GC is doing its best by collecting many short-lived values while using the major heap to its minimum. |
A self-interpreter with the following
Without Flambda (OCaml version
With Flambda (OCaml version
So without any fine tuning, the performance with Flambda is slightly better. It's probably worth experimenting with different Flambda options for more aggressive optimization than |
The same self-interpreter invocation as in the message above:
After commit 192ad81 (with Flambda):
With regards to memory usage, it's about 1.5x better: previously, Mazeppa allocated ~755.1 MB on the major heap; now it's ~486.3 MB. That is, both speed and performance are better. |
After commit 41871ff:
|
Running time remains the same after commit b3fc70b. Memory usage is reduced by a few megabytes (~484.4 MB on the major heap compared to ~486.3 MB previously). |
We need to benchmark the supercompiler on some bigger example than those in the
examples/
folder, to see which parts take most time. The usual suspects are substitution and homeomorphic embedding.But before that, Mazeppa needs to be smart enough to emit reasonable code for all programs (see #2). It makes no sense to benchmark the algorithm that needs conceptual changes.
Note: all the benchmarks below were conducted on AMD Ryzen 9 5900HX.
The text was updated successfully, but these errors were encountered: