diff --git a/CHANGELOG b/CHANGELOG index 000e03b6f..e7b302c48 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -1,37 +1,46 @@ List of features / changes made / release notes, in reverse chronological order. If not stated, FINUFFT is assumed (cuFINUFFT <=1.3 is listed separately). -V 2.3.0beta (7/21/24) - -* ES kernel rescaled to max value 1, reduced horner degrees for upsampfac=1.25 - (fixes fp32 overflow issue #454). -* Major acceleration of spread/interp kernels using XSIMD header-only lib, +V 2.3.0beta (7/24/24) + +* python build modernized to pyproject.toml (both CPU and GPU). + PRs 507 (Anden, Lu, Barbone) +* switchable FFT: either FFTW or DUCC0 (latter need no plan stage; also it is + used to exploit sparsity pattern to achieve FFT speedups 1-3x in 2D and 3D). + PR463, Martin Reinecke. +* ES kernel rescaled to max value 1, reduced poly degrees for upsampfac=1.25, + cleaner Horner coefficient generation PR499 (fixes fp32 overflow issue #454). +* Major manual acceleration of spread/interp kernels via XSIMD header-only lib, kernel evaluation, templating by ns with AVX-width-dependent decisions. Up to 80% faster, dep on compiler. (Marco Barbone with help from Libin Lu). - NOTE: introduces new dependency (XSIMD), added to cMake and makefile. -* new test/finufft3dkernel_test checks kerevalmeth=0,1 same to tol (M Barbone). + PRs 459, 471, 502. + NOTE: introduces new dependency (XSIMD), added to cMake and makefile. +* Exploiting even/odd symmetry for 10% faster xsimd-accel kernel poly eval + Libin Lu based on idea of Martin Reinecke (PR477,492,493). +* new test/finufft3dkernel_test checks kerevalmeth=0 and 1 agree to tolerance + PR 473 (M Barbone). * new perftest/compare_spreads.jl compares two spreadinterp libs (A Barnett). * new benchmarker perftest/spreadtestndall sweeps all kernel widths (M Barbone). * cufinufft now supports modeord(type 1,2 only): 0 CMCL-style increasing mode - order, 1 FFT-style mode order. -* New doc page: migration guide from NFFT3 (2d1 case only). + order, 1 FFT-style mode order. PR447,446 (Libin Lu, Joakim Anden). +* New doc page: migration guide from NFFT3 (2d1 case only), Barnett. * New foldrescale, removes [-3pi,3pi) restriction on NU points, and slight speedup at large tols. Deprecates both opts.chkbnds and error code - FINUFFT_ERR_SPREAD_PTS_OUT_RANGE. Also inlined kernel eval code, increases - compile of spreadinterp.cpp to 10s. PR #440 (Marco Barbone + Martin Reinecke) + FINUFFT_ERR_SPREAD_PTS_OUT_RANGE. Also inlined kernel eval code (increases + compile of spreadinterp.cpp to 10s). PR440 Marco Barbone + Martin Reinecke. * CPU plan stage allows any # threads, warns if > omp_get_max_threads(); or if single-threaded fixes nthr=1 and warns opts.nthreads>1 attempt. Sort now respects spread_opts.sort_threads not nthreads. Supercedes PR 431. * new docs troubleshooting accuracy limitations due to condition number of the - NUFFT problem. + NUFFT problem (Barnett). * new sanity check on nj and nk (<0 or too big); new err code, tester, doc. * MAX_NF increased from 1e11 to 1e12, since machines grow. * improved GPU python docs: migration guide; usage from cupy, numba, torch, - pycuda. PyPI pkg still at 2.2.0beta. + pycuda. Docs for all GPU options. PyPI pkg still at 2.2.0beta. * Added a clang-format pre-commit hook to ensure consistent code style. Created a .clang-format file to define a style similar to the existing style. Applied clang-format to all cmake, C, C++, and CUDA code. Ignored the blame - using .git-blame-ignore-revs. Added a contributing.md for developers. + using .git-blame-ignore-revs. contributing.md for devs. PR450,455, Barbone. * cuFINUFFT interface update: number of nonuniform points M is now a 64-bit int as opposed to 32-bit. While this does modify the ABI, most code will just need to recompile against the new library as compilers will silently upcast