Skip to content

Conversation

@NimaSarajpoor
Copy link
Collaborator

This PR addresses issue #22. The proposed pyfftw-based sdp will first be added under challenger_sdp.py so that we can compare it against the existing pyfftw_sdp. Eventually, once we are certain that there is no other concern, we will move it to pyfftw_sdp.

@gitnotebooks
Copy link

gitnotebooks bot commented Dec 11, 2025

@NimaSarajpoor
Copy link
Collaborator Author

timing.sh:

#!/bin/bash

rm -rf sdp/__pycache__
./timing.py -timeout 5.0 -pmin 2 -pmax 24 -pdiff 100 pyfftw challenger > timing.csv
rm -rf sdp/__pycache__

The performance of challenger relative to existing pyffte_sdp

challenger-performance

Observation
The proposed challenger is approximately 1.25x-2x faster than pyfftw_sdp for large T, where len(T) >= 2^15.

@NimaSarajpoor
Copy link
Collaborator Author

NimaSarajpoor commented Dec 12, 2025

@seanlaw
If there is no particular concern on the logic, please let me know and I can move the script to pyfftw_sdp.

@seanlaw
Copy link
Contributor

seanlaw commented Dec 12, 2025

@NimaSarajpoor Please go ahead!

@NimaSarajpoor
Copy link
Collaborator Author

NimaSarajpoor commented Dec 12, 2025

The script for pyfftw_sdp has been updated to reflect the recent proposal. Given this change to our baseline performance, it is worth it to re-evaluate the performance of other modules relative to this new baseline. The following figure illustrates the performance of pocketfft_r2c_c2r (blue plot) and scipy_oaconvolve (orange plot) benchmarked against the new baseline: pyfftw_sdp.

#!/bin/bash

rm -rf sdp/__pycache__
./timing.py -timeout 5.0 -pmin 6 -pmax 24 pyfftw pocketfft_r2c_c2r scipy_oaconvolve > timing.csv
rm -rf sdp/__pycache__
pyfftw-vs-others-sdp

Observations

  • In all cases, pyfftw_sdp outperforms pocketfft_r2c_c2r_sdp
  • For len(Q)<2^15, scipy_oaconvolve_sdp outperforms (single-threaded) pyfftw_sdp when len(T) >> len(Q).

Conclusion
For len(Q) >= 2^6, our focus can be on the following two modules:

  • scipy_oaconvolve_sdp
  • pyfftw_sdp

I think it should be worth it to check the performance of multi-threading pyfftw_sdp and see if that outperforms scipy_oaconvolve_sdp.

@NimaSarajpoor
Copy link
Collaborator Author

I think it is worth evaluating the performance of multi-threaded pyfftw_sdp to see whether it can outperform scipy_oaconvolve_sdp.

The baseline implementation of pyfftw_sdp is single-threaded. In what follows, I explore multi-threaded variants of pyfftw_sdp, where the number of threads (self.threads, see the line below) is set to 2, 4, and 8:

self.threads = 1

The benchmark was run using the following timing.sh script:

#!/bin/bash

rm -rf sdp/__pycache__
./timing.py -timeout 5.0 -pmin 6 -pmax 24 pyfftw challenger_2threads challenger_4threads challenger_8threads scipy_oaconvolve > timing.csv
rm -rf sdp/__pycache__

Results

pyfftwThreads-vs-oaconvolve

Observations

  • Multi-threaded pyfftw_sdp does not consistently outperform the single-threaded baseline. In fact, for shorter input arrays, the single-threaded version often performs better.
  • pyfftw_sdp with 4 or 8 threads outperforms scipy_oaconvolve.

Side note
It is important to clarify a key assumption underlying this comparison: the FFT planning time in pyfftw is excluded. This assumes that the FFT is applied repeatedly to inputs of the same size, in which case the one-time planning cost (which can also be computed in advance) can reasonably be ignored.
Under this assumption, single-threaded / multi-threaded pyfftw_sdp seems to be the solution we are looking for. However, if the application requires FFTs on new, unforeseen large input sizes, then scipy_oaconvolve may be the better choice, as it does not have to spend time for the "planning" phase.

@NimaSarajpoor
Copy link
Collaborator Author

@seanlaw
Do you think this PR is ready to be merged?

@seanlaw
Copy link
Contributor

seanlaw commented Dec 14, 2025

Do you think this PR is ready to be merged?

Please give me some time to review it more thoroughly

Copy link
Contributor

@seanlaw seanlaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NimaSarajpoor I basically have one primary suggestion for you to consider but I am not married to it (see below)

Copy link
Contributor

@seanlaw seanlaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NimaSarajpoor I've left some comments for you to consider

@NimaSarajpoor
Copy link
Collaborator Author

@seanlaw
I've addressed the comments. I've also improved the comments and docstrings in pyfftw_sdp.py. Additionally, improved the comment in the new test function that was added recently to test.py.

Copy link
Contributor

@seanlaw seanlaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NimaSarajpoor I think this is ready to be merged. I left a comment for you to consider but feel free to ignore if you disagree as I do not feel strongly about it

Sliding dot product between `Q` and `T`.
"""
m = Q.shape[0]
if self.n != T.shape[0]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this if statement is needed. The lines below can still be executed regardless of whether the length has changed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right..... the only reason for having that if statement is to avoid calling pyfftw.next_fast_len when possible. It should be fine though to remove it as that results in just a slight drop in the performance for lengths 2^18 to 2^21 (0%-10%).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants