You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: PrimeHaskell/solution_2/README.md
+25-12
Original file line number
Diff line number
Diff line change
@@ -7,6 +7,8 @@
7
7
8
8
It is often spoken that functional languages such as Haskell must be slower than imperative ones; this implementation tries to dispell that notion.
9
9
10
+
Although there is little point to a multi-threaded solution in showing which language is fastest for any of the languages as they will only show the effect of CPU throttling due to increased power usage for multiple cores and the effect of sharing resources, especially "Hyper-Threading" (HT)/""Simultaneous Multi Threading" (SMT) in sharing threads using common core execution unit resources and will be consistent in ratio to single threaded uses across languages, to be competitive a multi-threaded solution is provided. Since for the metric of work done per thread for HT/SMT threads when all available threads are used drops by almost a factor of two plus the thermal throttling factor, some implementations have used less than the maximum number of threads to gain an apparent advantage in the multi-threading leaderboard, with one precident example using 4 threads and some forcing 16 threads in order to gain an advantage in the main test machine which has 32 threads on 16 cores using HT/SMT. This seems objectionable as it tailors the test to this specific CPU and this implementation uses four threads, which should be available for all test machines. This will provide an advantage on the 16 core test machine in less thermal throttling and less sharing of compute engine resources, but it will be no more than the advantage of the other accepted implementation using four thread. As implied above, the multi-threading contest ruls should really be modified that all available threads must be used for a "maximum total work done" implementation.
11
+
10
12
The first three techniques used in this Haskell solution are implemented in an imperative style using `forM_` so that the core algorithm remains recognizable. Unlike the earlier solution, this solution does not use imported libraries to accomplish the task, so thus is `faithful to base`. The number representation is one bit per odd number.
11
13
12
14
The "stride8" techniques use a similar algorithm as the Rust "striped" algorithm but instead of changing the order of bits within the sieve buffer, leaves the order as normal and culls/marks them by "strides" in place, so thus is also `faithful base`. The actual loops are very simple and thus no separate storage implementation is used. The outer loop searches for the base prime values as required; The next inner loop level has a limit set so that it never runs more than eight times, then loops by just setting up the constant mask value and starting byte index to be used in the innermost actual marking loops. The boolean deliverable array is returned after masking off all values above the given range in the above two lines as those values may not have been processed and aren't desired in the output listing.
@@ -44,6 +46,7 @@ docker run --rm primes
44
46
45
47
## Output
46
48
49
+
The following outputs haven't been updated to show multi-threading results as the final Docker image shows that multi-theading is just directly proportional to the effect of thermal throttling this CPU from 3.6 GHz down to 3.2 GHz since it has no HT/SMT threads:
47
50
- Intel SkyLake i5-6500, GHC Haskell version 8.10.7, no LLVM
48
51
49
52
```
@@ -84,19 +87,29 @@ Intel SkyLake i5-6500, GHC Haskell version 8.10.7, with LLVM (version 12) and 25
0 commit comments