Skip to content

primecount-6.1

Compare
Choose a tag to compare
@kimwalisch kimwalisch released this 12 Sep 16:03
· 1689 commits to master since this release

The main focus of this release has been on polishing the code and improving the documentation. I also tried many things to improve the scaling on servers with a large number of CPU cores, however I only achieved minor speed ups. The only meaningful improvement is that the same threads are now reused throughout the entire AC computation. This improves the scaling for small to medium sized computations up to 1020. GCC benefits most from this change whereas Clang performance is mostly unchanged.

  • Xavier Gourdon's algorithm has been distributed using MPI so that computations can now run on HPC clusters.
  • CMakeLists.txt: New WITH_JEMALLOC option (default OFF).
  • AC.cpp: Reuse the same threads throughout the computation.
  • AC.cpp: Improve upper bound of C2 formula.
  • AC.cpp: Avoid branch inside hot loop of A formula.
  • SegmentedPiTable.cpp: Reuse threads from AC.cpp.
  • LoadBalancerP2.cpp: New load balancer for P2 & B formulas.
  • phi.cpp: Reduce caching for tiny numbers.
  • generate_phi.hpp: Reduce caching for tiny numbers.
  • pod_vector.hpp: Like std::vector, but without default initialization (useful when allocating 100s of GiB).
  • PiTable.cpp: Multi-threaded initialization.
  • Status.cpp: Avoid thread synchronization when printing in order to improve scaling of AC and S2_easy.
  • Status.cpp: Improve S2_hard & D status accuracy.
  • StatusAC.cpp: More accurate status for AC formula.
  • cmdoptions.cpp: Add -B & -D options.