Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

no-std support #124

Open
wants to merge 20 commits into
base: master
Choose a base branch
from
Open

no-std support #124

wants to merge 20 commits into from

Conversation

piotrfila
Copy link

This PR replaces all std imports in the scalar implementation with analogues from core and alloc and makes std an optional feature enabled by default.
HashMap implementation is taken from hashbrown (which std uses internally).
Relevant issues: #116, #122.
I ran the benchmarks overnight and although the full suite takes longer to perform (on my x86-64 Linux machine: median 869 s vs 767 s before), the individual test results are somewhat more balanced (68 tests run more than 20% slower and 67 tests more than 20% faster).

Here are the test results. The first number is the median time taken by the new implementation divided by the median time taken by the old implementation. (Each version was run 15 times).
I am not sure where the performance difference comes from.

0.4615 bench_from2to1024_f32_7
0.4662 bench_from2to1024_f32_42
0.4674 bench_from2to1024_f32_39
0.4684 bench_from2to1024_f32_38
0.4688 bench_power3_planned_scalar_f64_0000009
0.4689 bench_from2to1024_f32_26
0.4694 bench_from2to1024_f32_45
0.4696 bench_from2to1024_f32_40
0.4699 bench_from2to1024_f32_21
0.4706 bench_from2to1024_f64_5
0.4709 bench_from2to1024_f64_88
0.4734 bench_from2to1024_f32_24
0.4735 bench_from2to1024_f64_92
0.475 bench_from2to1024_f32_41
0.4756 bench_from2to1024_f32_94
0.4761 bench_from2to1024_f32_102
0.4778 bench_from2to1024_f32_95
0.478 bench_from2to1024_f64_41
0.4789 bench_from2to1024_f32_22
0.4789 bench_from2to1024_f64_125
0.480 bench_from2to1024_f64_141
0.4811 bench_from2to1024_f64_17
0.4815 bench_from2to1024_f64_22
0.4816 bench_from2to1024_f32_167
0.4817 bench_from2to1024_f32_43
0.4826 bench_from2to1024_f32_103
0.4832 bench_from2to1024_f32_104
0.4834 bench_power3_radix3_scalar_f64_0019683
0.4836 bench_from2to1024_f64_26
0.4845 bench_from2to1024_f64_23
0.4845 bench_power3_radix3_scalar_f32_0002187
0.4845 bench_power3_radix3_scalar_f64_0006561
0.4848 bench_from2to1024_f32_69
0.4849 bench_power3_radix3_scalar_f32_0006561
0.4852 bench_from2to1024_f64_169
0.4857 bench_from2to1024_f64_142
0.4857 bench_from2to1024_f64_25
0.4859 bench_from2to1024_f64_73
0.4862 bench_from2to1024_f32_23
0.4877 bench_from2to1024_f64_155
0.4891 bench_from2to1024_f64_150
0.4892 bench_power3_radix3_scalar_f32_0000729
0.4893 bench_from2to1024_f64_71
0.4896 bench_from2to1024_f64_109
0.4899 bench_from2to1024_f64_72
0.4911 bench_from2to1024_f64_158
0.4914 bench_from2to1024_f64_74
0.4932 bench_from2to1024_f64_40
0.4941 bench_from2to1024_f32_25
0.4941 bench_from2to1024_f64_49
0.4946 bench_power3_planned_scalar_f32_0000081
0.4956 bench_from2to1024_f64_124
0.4977 bench_from2to1024_f64_127
0.4979 bench_from2to1024_f32_100
0.500 bench_from2to1024_f32_4
0.500 bench_from2to1024_f64_21
0.501 bench_from2to1024_f64_154
0.5031 bench_from2to1024_f32_97
0.5084 bench_from2to1024_f64_126
0.5085 bench_from2to1024_f64_15
0.5154 bench_from2to1024_f64_132
0.5163 bench_from2to1024_f64_75
0.5223 bench_from2to1024_f32_96
0.5375 bench_from2to1024_f64_50
0.5476 bench_from2to1024_f64_168
0.5957 bench_from2to1024_f64_20
0.7429 bench_power3_planned_scalar_f64_0531441
0.8375 bench_from2to1024_f32_168
0.8737 bench_from2to1024_f32_31
0.875 bench_from2to1024_f32_126
0.8932 bench_from2to1024_f32_58
0.9057 bench_from2to1024_f32_54
0.9146 bench_from2to1024_f32_140
0.9251 bench_from2to1024_f32_108
0.9337 bench_from2to1024_f32_49
0.9392 bench_from2to1024_f32_182
0.9402 bench_from2to1024_f32_154
0.9419 bench_from2to1024_f32_87
0.943 bench_from2to1024_f32_105
0.9441 bench_power3_radix3_scalar_f32_0000243
0.9446 bench_from2to1024_f64_38
0.9493 bench_from2to1024_f32_91
0.9525 bench_from2to1024_f32_135
0.9531 bench_from2to1024_f32_48
0.9534 bench_from2to1024_f32_33
0.9543 bench_from2to1024_f32_189
0.9547 bench_from2to1024_f64_69
0.9569 bench_from2to1024_f32_75
0.9586 bench_from2to1024_f32_115
0.9592 bench_from2to1024_f32_44
0.9596 bench_from2to1024_f32_110
0.9602 bench_from2to1024_f32_51
0.9602 bench_from2to1024_f32_66
0.9603 bench_from2to1024_f32_187
0.9615 bench_from2to1024_f32_117
0.9623 bench_from2to1024_f32_165
0.9636 bench_from2to1024_f32_57
0.9637 bench_from2to1024_f32_88
0.9641 bench_from2to1024_f32_78
0.9647 bench_from2to1024_f32_65
0.9649 bench_from2to1024_f32_15
0.9649 bench_from2to1024_f32_68
0.9653 bench_from2to1024_f32_28
0.9657 bench_from2to1024_f32_174
0.9659 bench_from2to1024_f32_12
0.9668 bench_from2to1024_f32_112
0.9668 bench_from2to1024_f32_175
0.9668 bench_from2to1024_f32_72
0.9669 bench_from2to1024_f32_180
0.9678 bench_from2to1024_f32_52
0.968 bench_from2to1024_f32_63
0.9681 bench_from2to1024_f32_176
0.9682 bench_from2to1024_f32_77
0.969 bench_from2to1024_f64_14
0.9694 bench_from2to1024_f32_99
0.9696 bench_from2to1024_f32_114
0.9707 bench_from2to1024_f32_170
0.9714 bench_from2to1024_f32_161
0.9729 bench_from2to1024_f32_116
0.9729 bench_from2to1024_f32_70
0.9733 bench_from2to1024_f32_55
0.9738 bench_from2to1024_f32_119
0.975 bench_from2to1024_f64_36
0.9752 bench_from2to1024_f32_80
0.9755 bench_from2to1024_f32_130
0.9763 bench_from2to1024_f32_76
0.9769 bench_from2to1024_f32_183
0.977 bench_from2to1024_f32_109
0.9773 bench_from2to1024_f32_144
0.9773 bench_from2to1024_f32_34
0.9777 bench_from2to1024_f32_71
0.9785 bench_from2to1024_f32_142
0.9791 bench_from2to1024_f64_104
0.9799 bench_from2to1024_f32_125
0.980 bench_power3_planned_scalar_f64_4782969
0.9805 bench_from2to1024_f32_18
0.9806 bench_from2to1024_f32_134
0.9813 bench_from2to1024_f32_84
0.9814 bench_from2to1024_f32_156
0.9815 bench_from2to1024_f32_146
0.982 bench_from2to1024_f32_111
0.9821 bench_from2to1024_f32_113
0.9824 bench_from2to1024_f32_141
0.9825 bench_from2to1024_f32_145
0.9825 bench_from2to1024_f32_157
0.9829 bench_from2to1024_f32_132
0.983 bench_from2to1024_f32_199
0.983 bench_from2to1024_f32_79
0.9831 bench_from2to1024_f32_159
0.9839 bench_from2to1024_f64_176
0.9843 bench_from2to1024_f32_133
0.9844 bench_from2to1024_f32_20
0.9846 bench_from2to1024_f32_138
0.9846 bench_from2to1024_f64_98
0.9846 bench_power3_radix3_scalar_f32_0000009
0.9847 bench_from2to1024_f32_131
0.9847 bench_from2to1024_f32_36
0.9851 bench_from2to1024_f64_162
0.9853 bench_from2to1024_f32_60
0.9855 bench_from2to1024_f32_122
0.9855 bench_from2to1024_f32_53
0.9868 bench_from2to1024_f64_117
0.9869 bench_from2to1024_f64_76
0.9872 bench_from2to1024_f32_27
0.9875 bench_from2to1024_f32_181
0.9877 bench_from2to1024_f32_136
0.988 bench_from2to1024_f64_166
0.9883 bench_from2to1024_f32_150
0.9883 bench_from2to1024_f32_67
0.9889 bench_from2to1024_f64_10
0.9892 bench_from2to1024_f32_188
0.9899 bench_from2to1024_f64_134
0.9899 bench_from2to1024_f64_160
0.9902 bench_from2to1024_f64_93
0.9903 bench_from2to1024_f64_171
0.9904 bench_from2to1024_f64_180
0.9912 bench_power3_planned_scalar_f64_0000243
0.9914 bench_from2to1024_f32_164
0.9916 bench_from2to1024_f64_163
0.9916 bench_from2to1024_f64_178
0.9917 bench_from2to1024_f32_178
0.992 bench_from2to1024_f32_127
0.9924 bench_from2to1024_f32_139
0.9924 bench_from2to1024_f32_37
0.9924 bench_power3_planned_scalar_f64_0000081
0.9925 bench_from2to1024_f32_61
0.993 bench_from2to1024_f32_147
0.993 bench_from2to1024_f64_46
0.9937 bench_from2to1024_f64_81
0.9938 bench_from2to1024_f64_27
0.9944 bench_from2to1024_f32_158
0.9949 bench_from2to1024_f64_34
0.9951 bench_from2to1024_f64_99
0.9953 bench_power3_planned_scalar_f32_0177147
0.9954 bench_from2to1024_f64_47
0.9957 bench_from2to1024_f64_167
0.9958 bench_from2to1024_f32_123
0.9959 bench_from2to1024_f64_91
0.996 bench_power3_radix3_scalar_f64_0177147
0.996 bench_power3_radix3_scalar_f64_1594323
0.9963 bench_from2to1024_f32_184
0.9967 bench_from2to1024_f64_153
0.9969 bench_from2to1024_f64_95
0.997 bench_from2to1024_f32_137
0.997 bench_from2to1024_f64_149
0.997 bench_from2to1024_f64_187
0.997 bench_power3_radix3_scalar_f64_0059049
0.9975 bench_power3_planned_scalar_f32_0002187
0.9976 bench_power3_radix3_scalar_f32_0531441
0.9979 bench_power3_planned_scalar_f32_0059049
0.998 bench_from2to1024_f32_173
0.9981 bench_power3_radix3_scalar_f64_0002187
0.9982 bench_from2to1024_f32_179
0.9983 bench_from2to1024_f64_161
0.9984 bench_from2to1024_f32_107
0.9984 bench_from2to1024_f32_151
0.9986 bench_power3_planned_scalar_f64_1594323
0.9987 bench_from2to1024_f32_149
0.9989 bench_from2to1024_f32_129
0.9989 bench_power3_planned_scalar_f64_0000729
0.999 bench_from2to1024_f32_82
0.9991 bench_power3_planned_scalar_f32_4782969
0.9992 bench_from2to1024_f64_128
0.9992 bench_power3_planned_scalar_f32_1594323
0.9997 bench_power3_planned_scalar_f32_0531441
0.9997 bench_power3_radix3_scalar_f32_0019683
1.000 bench_from2to1024_f32_106
1.000 bench_from2to1024_f32_11
1.000 bench_from2to1024_f32_13
1.000 bench_from2to1024_f32_16
1.000 bench_from2to1024_f32_166
1.000 bench_from2to1024_f32_29
1.000 bench_from2to1024_f32_3
1.000 bench_from2to1024_f32_5
1.000 bench_from2to1024_f32_6
1.000 bench_from2to1024_f32_8
1.000 bench_from2to1024_f64_11
1.000 bench_from2to1024_f64_13
1.000 bench_from2to1024_f64_16
1.000 bench_from2to1024_f64_179
1.000 bench_from2to1024_f64_19
1.000 bench_from2to1024_f64_2
1.000 bench_from2to1024_f64_29
1.000 bench_from2to1024_f64_3
1.000 bench_from2to1024_f64_31
1.000 bench_from2to1024_f64_32
1.000 bench_from2to1024_f64_7
1.000 bench_power3_planned_scalar_f32_0000003
1.000 bench_power3_planned_scalar_f32_0000009
1.000 bench_power3_planned_scalar_f32_0000027
1.000 bench_power3_planned_scalar_f64_0000003
1.000 bench_power3_planned_scalar_f64_0000027
1.000 bench_power3_planned_scalar_f64_0002187
1.000 bench_power3_radix3_scalar_f32_0059049
1.001 bench_from2to1024_f32_128
1.001 bench_from2to1024_f64_101
1.001 bench_from2to1024_f64_118
1.001 bench_from2to1024_f64_144
1.001 bench_power3_planned_scalar_f32_0000729
1.001 bench_power3_planned_scalar_f32_0006561
1.001 bench_power3_planned_scalar_f32_0019683
1.002 bench_from2to1024_f32_177
1.002 bench_from2to1024_f32_83
1.002 bench_from2to1024_f32_92
1.002 bench_from2to1024_f64_138
1.002 bench_from2to1024_f64_143
1.002 bench_from2to1024_f64_164
1.002 bench_from2to1024_f64_173
1.002 bench_from2to1024_f64_174
1.002 bench_power3_planned_scalar_f64_0059049
1.002 bench_power3_radix3_scalar_f64_0000729
1.003 bench_from2to1024_f64_148
1.003 bench_from2to1024_f64_188
1.003 bench_power3_radix3_scalar_f32_0177147
1.004 bench_from2to1024_f64_139
1.004 bench_from2to1024_f64_175
1.004 bench_from2to1024_f64_185
1.004 bench_from2to1024_f64_64
1.004 bench_from2to1024_f64_94
1.004 bench_power3_planned_scalar_f32_0000243
1.004 bench_power3_radix3_scalar_f64_0000243
1.004 bench_power3_radix3_scalar_f64_0531441
1.005 bench_from2to1024_f32_148
1.005 bench_from2to1024_f32_30
1.005 bench_from2to1024_f64_186
1.005 bench_from2to1024_f64_24
1.005 bench_from2to1024_f64_83
1.005 bench_from2to1024_f64_96
1.006 bench_from2to1024_f32_32
1.006 bench_from2to1024_f32_64
1.006 bench_from2to1024_f64_52
1.006 bench_from2to1024_f64_53
1.006 bench_from2to1024_f64_65
1.007 bench_from2to1024_f32_47
1.007 bench_from2to1024_f64_184
1.007 bench_from2to1024_f64_199
1.007 bench_from2to1024_f64_70
1.008 bench_from2to1024_f64_133
1.008 bench_from2to1024_f64_177
1.008 bench_from2to1024_f64_87
1.009 bench_from2to1024_f32_185
1.009 bench_from2to1024_f64_123
1.009 bench_from2to1024_f64_145
1.009 bench_from2to1024_f64_193
1.009 bench_from2to1024_f64_43
1.010 bench_from2to1024_f32_101
1.010 bench_from2to1024_f64_116
1.010 bench_from2to1024_f64_68
1.011 bench_from2to1024_f32_118
1.011 bench_from2to1024_f64_110
1.011 bench_from2to1024_f64_129
1.012 bench_from2to1024_f64_103
1.012 bench_from2to1024_f64_152
1.012 bench_from2to1024_f64_191
1.012 bench_from2to1024_f64_51
1.013 bench_from2to1024_f32_56
1.013 bench_from2to1024_f64_151
1.013 bench_from2to1024_f64_172
1.013 bench_from2to1024_f64_42
1.013 bench_from2to1024_f64_44
1.013 bench_from2to1024_f64_56
1.014 bench_from2to1024_f32_155
1.014 bench_from2to1024_f64_100
1.014 bench_from2to1024_f64_131
1.014 bench_from2to1024_f64_28
1.014 bench_from2to1024_f64_35
1.014 bench_from2to1024_f64_37
1.015 bench_from2to1024_f64_102
1.015 bench_from2to1024_f64_189
1.015 bench_from2to1024_f64_30
1.015 bench_from2to1024_f64_39
1.015 bench_from2to1024_f64_63
1.016 bench_from2to1024_f32_98
1.016 bench_from2to1024_f64_190
1.016 bench_from2to1024_f64_55
1.017 bench_from2to1024_f32_163
1.017 bench_from2to1024_f64_48
1.017 bench_from2to1024_f64_77
1.018 bench_from2to1024_f32_19
1.019 bench_from2to1024_f64_130
1.019 bench_from2to1024_f64_165
1.019 bench_from2to1024_f64_192
1.020 bench_from2to1024_f64_120
1.020 bench_from2to1024_f64_33
1.020 bench_from2to1024_f64_45
1.020 bench_from2to1024_f64_79
1.021 bench_from2to1024_f32_46
1.021 bench_from2to1024_f64_147
1.022 bench_from2to1024_f32_160
1.022 bench_from2to1024_f64_12
1.023 bench_from2to1024_f32_81
1.023 bench_power3_radix3_scalar_f32_0000081
1.023 bench_power3_radix3_scalar_f64_0000081
1.025 bench_from2to1024_f32_62
1.025 bench_from2to1024_f32_74
1.025 bench_from2to1024_f64_105
1.025 bench_from2to1024_f64_140
1.025 bench_from2to1024_f64_82
1.026 bench_from2to1024_f32_169
1.026 bench_from2to1024_f64_137
1.028 bench_from2to1024_f64_170
1.028 bench_from2to1024_f64_57
1.028 bench_from2to1024_f64_62
1.029 bench_from2to1024_f32_162
1.029 bench_from2to1024_f64_157
1.029 bench_from2to1024_f64_18
1.033 bench_from2to1024_f64_86
1.034 bench_from2to1024_f64_146
1.036 bench_from2to1024_f32_93
1.037 bench_from2to1024_f64_159
1.043 bench_from2to1024_f32_17
1.050 bench_from2to1024_f32_10
1.050 bench_from2to1024_f64_156
1.051 bench_from2to1024_f64_121
1.052 bench_from2to1024_f32_124
1.056 bench_from2to1024_f64_119
1.057 bench_from2to1024_f64_122
1.061 bench_from2to1024_f64_61
1.064 bench_from2to1024_f32_186
1.074 bench_from2to1024_f32_50
1.082 bench_from2to1024_f64_85
1.084 bench_from2to1024_f64_84
1.136 bench_from2to1024_f64_54
1.193 bench_power3_radix3_scalar_f64_4782969
1.231 bench_power3_radix3_scalar_f32_0000003
1.303 bench_power3_radix3_scalar_f32_4782969
1.333 bench_from2to1024_f64_8
1.530 bench_from2to1024_f32_90
1.702 bench_power3_radix3_scalar_f32_1594323
1.721 bench_power3_planned_scalar_f64_0177147
1.778 bench_from2to1024_f64_9
1.847 bench_from2to1024_f32_120
1.921 bench_from2to1024_f32_85
1.927 bench_from2to1024_f32_190
1.970 bench_from2to1024_f64_115
1.971 bench_from2to1024_f32_143
1.974 bench_from2to1024_f32_153
1.980 bench_from2to1024_f64_135
1.983 bench_from2to1024_f32_195
1.994 bench_from2to1024_f32_196
1.997 bench_from2to1024_f32_59
1.999 bench_from2to1024_f32_171
1.999 bench_from2to1024_f32_197
2.000 bench_from2to1024_f32_14
2.002 bench_from2to1024_f64_112
2.007 bench_from2to1024_f32_191
2.008 bench_from2to1024_f32_89
2.012 bench_from2to1024_f32_152
2.021 bench_from2to1024_f32_198
2.023 bench_from2to1024_f64_80
2.030 bench_from2to1024_f64_89
2.031 bench_power3_radix3_scalar_f64_0000009
2.033 bench_power3_radix3_scalar_f64_0000027
2.034 bench_from2to1024_f64_181
2.039 bench_from2to1024_f64_97
2.042 bench_from2to1024_f64_196
2.042 bench_from2to1024_f64_197
2.043 bench_from2to1024_f64_113
2.044 bench_from2to1024_f32_172
2.049 bench_from2to1024_f64_66
2.052 bench_from2to1024_f64_114
2.052 bench_power3_planned_scalar_f64_0006561
2.054 bench_from2to1024_f32_73
2.055 bench_power3_planned_scalar_f64_0019683
2.055 bench_power3_radix3_scalar_f32_0000027
2.057 bench_from2to1024_f32_192
2.060 bench_from2to1024_f64_194
2.067 bench_from2to1024_f64_67
2.071 bench_from2to1024_f64_107
2.073 bench_from2to1024_f64_111
2.077 bench_from2to1024_f64_59
2.081 bench_from2to1024_f32_193
2.085 bench_from2to1024_f32_86
2.087 bench_from2to1024_f64_182
2.092 bench_from2to1024_f64_198
2.093 bench_from2to1024_f64_58
2.094 bench_from2to1024_f64_78
2.095 bench_from2to1024_f64_195
2.130 bench_from2to1024_f32_194
2.133 bench_from2to1024_f32_35
2.133 bench_power3_radix3_scalar_f64_0000003
2.136 bench_from2to1024_f32_121
2.136 bench_from2to1024_f32_9
2.143 bench_from2to1024_f64_106
2.143 bench_from2to1024_f64_6
2.156 bench_from2to1024_f64_136
2.200 bench_from2to1024_f64_4
2.203 bench_from2to1024_f64_183
2.204 bench_from2to1024_f64_108
2.215 bench_from2to1024_f64_90
2.297 bench_from2to1024_f64_60
2.333 bench_from2to1024_f32_2

Copy link
Contributor

@WalterSmuts WalterSmuts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Would it be possible to add some documentation on the no_std feature in the README and perhaps a regression test? Not sure if doctests lets your specify the features, but something like that would be amazing!

@@ -1,6 +1,8 @@
use crate::Complex;
use crate::FftNum;
use std::ops::{Deref, DerefMut};
use core::ops::{Deref, DerefMut};
#[allow(unused_imports)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this necessary? Can't you feature gate the import?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, my bad. Fixed in a6f4c2e.

Cargo.toml Outdated
Comment on lines 27 to 30
avx = ["std"]
sse = ["std"]
neon = ["std"]
wasm_simd = ["std"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there no way to get these on no_std?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, fixed by b0e38ef, 2017988 and 68d0966.

@WalterSmuts
Copy link
Contributor

So I gave a bash at actually using this crate from a no_std binary. You quickly realise that it still tries linking in std. After a bit of fiddling with the dependencies I got all the std features removed. Required a bunch of default-features = false:

+num-complex = { version = "0.4", default-features = false }
+num-traits = { version = "0.2", default-features = false }
+num-integer = { version = "^0.1.40", default-features = false }
+strength_reduce = { version = "0.2.4", default-features = false }
+transpose = { version = "0.2", default-features = false }
+primal-check = { path = "/home/walter/Development/primal/primal-check", default-features = false }

And note local version of primal-check. Don't see why the upstream version won't accept my changes for no_std support.

Once you weed out all the std stuff you end up with errors regarding math operations. Core variants of f32 and f64 don't implement sqrt, cos and the like. You need to go manually change them to use libm. Silly that the rust compiler just happily ignores that when some of your dependencies have std linked in.

I eventually got something that seems like it's building correctly on my machine. I'm not really very familiar with no_std stuff but AFAICT it actually works! Real test would be to use it on a real embedded system. Anyway, this compiles runs:

#![feature(
    lang_items,
    start,
    core_intrinsics,
    rustc_private,
    panic_unwind,
    rustc_attrs
)]
#![allow(internal_features)]
#![no_std]

extern crate unwind;

use core::alloc::Layout;
use core::cell::UnsafeCell;
use core::panic::PanicInfo;
use core::ptr::null_mut;
use core::sync::atomic::AtomicUsize;
use core::sync::atomic::Ordering::SeqCst;
use core::{alloc::GlobalAlloc, intrinsics};
use rustfft::{num_complex::Complex, FftPlanner};

const ARENA_SIZE: usize = 128 * 1024;
const MAX_SUPPORTED_ALIGN: usize = 4096;
#[repr(C, align(4096))] // 4096 == MAX_SUPPORTED_ALIGN
struct SimpleAllocator {
    arena: UnsafeCell<[u8; ARENA_SIZE]>,
    remaining: AtomicUsize, // we allocate from the top, counting down
}

#[global_allocator]
static ALLOCATOR: SimpleAllocator = SimpleAllocator {
    arena: UnsafeCell::new([0x55; ARENA_SIZE]),
    remaining: AtomicUsize::new(ARENA_SIZE),
};
unsafe impl Sync for SimpleAllocator {}

unsafe impl GlobalAlloc for SimpleAllocator {
    unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
        let size = layout.size();
        let align = layout.align();

        // `Layout` contract forbids making a `Layout` with align=0, or align not power of 2.
        // So we can safely use a mask to ensure alignment without worrying about UB.
        let align_mask_to_round_down = !(align - 1);

        if align > MAX_SUPPORTED_ALIGN {
            return null_mut();
        }

        let mut allocated = 0;
        if self
            .remaining
            .fetch_update(SeqCst, SeqCst, |mut remaining| {
                if size > remaining {
                    return None;
                }
                remaining -= size;
                remaining &= align_mask_to_round_down;
                allocated = remaining;
                Some(remaining)
            })
            .is_err()
        {
            return null_mut();
        };
        self.arena.get().cast::<u8>().add(allocated)
    }
    unsafe fn dealloc(&self, _ptr: *mut u8, _layout: Layout) {}
}

#[start]
fn main(_argc: isize, _argv: *const *const u8) -> isize {
    let mut planner = FftPlanner::new();
    let fft = planner.plan_fft_forward(1234);

    let mut buffer = [Complex {
        re: 0.0f32,
        im: 0.0f32,
    }; 1234];
    fft.process(&mut buffer);
    0
}

#[lang = "eh_personality"]
fn rust_eh_personality() {}

#[panic_handler]
fn panic_handler(_info: &PanicInfo) -> ! {
    intrinsics::abort()
}

So to actually get no_std support we need to:

  • Submit change to primal-check to add no_std support.
  • Update dependencies to use default-features = false. Not sure if there's advantage on doing this only on no_std feature or always... Could be performance regression on core support.
  • Add libm support. Same concern around using it per default or only when we remove std feature. I strongly suspect that'll have performance regression though.
  • Benchmark
  • Validate it works on a real no_std environment (with someone more familiar with no_std).

@piotrfila
Copy link
Author

piotrfila commented Dec 17, 2023

Thanks a lot! The crate now compiles on thumbv7em-none-eabihf with the libm feature enabled. I decided to move libm into its own feature because as you said it is likely to cause a performance regression. For now, I forked primal to make it work with no-std, but it would be nice to merge the changes there upstream.

@WalterSmuts
Copy link
Contributor

WalterSmuts commented Dec 17, 2023

Nice work! :)

Think your primal-check change may be incorrect:

error[E0599]: no method named `powf` found for type `f64` in the current scope
  --> /home/walter/.cargo/git/checkouts/primal-4a737820cfafafb5/53d0fdf/src/perfect_power.rs:47:25
   |
47 |         let factor = x_.powf(1.0/expn as f64).round() as u64;
   |                         ^^^^ method not found in `f64`

I think you need to ensure either std or libm is enabled. Or explicitly enable the libm feature in RustFFT.

README.md Outdated
Comment on lines 67 to 70
### std

The scalar implementation (no avx, sse, neon or wasm SIMD) is available for no-std targets. To build the crate as no-std simply disable default fetures.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently you also need to enable libm feature. Would be cool if default-features = false implies libm or something. Same with primal-check crate. I'm not really sure what the generally acceptable thing to do is in Rust but if it's not possible to express "either std or libm" needs to be enabled, then you should add documentation on enabling libm here.

@piotrfila
Copy link
Author

piotrfila commented Dec 17, 2023

My primal-check fix was just meant to get the crate to compile on no-std, so yeah it's not great 🙃.
I see now that this is quite a bit more work than I originally anticipated... I am going to keep working on this though, it has been a fun project so far.

I have an example working a stm32f411 nucleo board. The library is really big though, it takes up 220KiB of flash (which takes several seconds to upload to the chip) and computing an FFT of size 1234 needs 48KiB of heap.

EDIT: Example is now in the repo.

Constructing an FFT dynamically might not be the best approach on embedded. It would be much better to construct the FFTs at compile time and only include the relevant algorithms in the binary (though this would make it impossible to construct dynamically-sized FFTs, so there would need to be a way to do both). Giving the option to store the twiddles in flash rather than sram would be nice, too.

Statically-sized FFTs would also be a nice feature for std applications, as this would likely reduce binary size and could be faster by not requiring allocations. I'm not sure how to implement this, though. The use of nightly features would probably be required, complicated const functions are hard to do in stable rust in my experience.

@piotrfila
Copy link
Author

piotrfila commented Dec 17, 2023

Another issue I found when compiling for thumbv6m-none-eabi (pi pico board, Arm M0+) is that not all targets support Arc. This can be mitigated by using portable-atomic, though I could not get an example to compile without modifying portable-atomic-util (it does not re-export the critical-section feature of portable-atomic).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants