-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sequence() was the bottleneck in my program #451
Comments
Optimizations: * cache sequence(). See PDLPorters/pdl#451 * Use scalars when comparing 1,1-piddles * Only evaluate slices when necessary. For example, bestPos is only used when a better fit is found ($sfit < $bestfit) * Remove ->copy from temp variable, there is no re-assignment. * Group scalar math together with parens _before_ applying that result to a PDL. Possible performance side-effects: * When logging is enabled, some PDL operations happen more than once, but I bet the printf takes longer than the PDL op. * getBestFit() creates a PDL on return, so maybe a touch slower, but it drastically speeds up internals with scalar comparisons.
Notes on profiling C/XS code: https://www.perlmonks.org/?node_id=791611 |
With the above, |
Ironically, the |
With the changes linked above, on my machine it's now about twice as quick (450ms) as when I started. |
With the above-linked commit, this is now taking about 380ms here. Notes on identifying this and probably similar performance bottlenecks, using
The first two are the top-two expensive functions, the most so being the unexpected
That confirmed it was indeed coming from
This is how long after, about 20% less time:
|
With that commit, the 10m The |
For more performance issues, see the benchmark on nrdvana/perl-Math-3Space#8. The stripped-down timings:
Here, PDL::LinearAlgebra does make things quicker. |
It appears that in the most common case of a PDL method being called on an existing PDL instance, this code runs: Lines 39 to 41 in fc7e9b6
This means every single call to any pdl method will perform an "sv_derived_from" check, which isn't cheap. (I mean it's not terrible, but it's more than walking a few quick pointers) If that logic could be changed to lead with I can't easily judge whether you could put the magic directly onto the hashref without rearranging a bunch of other assumptions in the code. I see it doing interesting things like checking if |
@nrdvana It's great to have your thoughts on this! Can you confirm one thing: line 39 above constrains that bit to only non-ref - i.e. strings. Therefore, this section will only get run for class methods (i.e. not every single call of instance methods - that was one of the optimisations I made, referred to earlier), which is different from your thought above? |
@mohawk2 Indeed, I misread what was going on. But, the end conclusion is the same because it calls sv_derived_from lower down: Lines 113 to 118 in fc7e9b6
Another idea I just had is that you could attach new PDL magic to the ndarray, and still leave the pointer stored in the SV and/or HV->{PDL} to provide backward compatibility. The detection of the magic would go first in this function (make the common case fast) and then fall back to the other stuff. |
That does sound like the natural conclusion from your XS guide. And an obvious place to insert that would be in |
@mohawk2 I didn't get far, but this change doesn't break any tests. Want to benchmark it? |
@nrdvana Before applying your change, I got these results:
After:
A little over 50% faster for a simple-case method call, very nice! I've cherry-picked it and tweaked it so it doesn't give compiler warnings. I've removed the |
This links the pdl to the SV un-ambiguously, so that looking up the pdl from an SV can skip checking the inheritance hierarchy. More functionality could be moved into the extension magic later, but this is just a preliminary test to see if it speeds things up.
I was surprised to find that the
sequence()
function was the bottleneck in a Particle Swarm optimization. Ultimately I cached$sequence = sequence(@foo)
instead of generating it every time it was used.Not sure if there is room to optimize
sequence
or not, but here is the nytprof output.Before caching:
After caching:
Caching fixed it for me in my own program, but FYI in case it helps others, or if it can be addressed somehow.
The text was updated successfully, but these errors were encountered: