-
Notifications
You must be signed in to change notification settings - Fork 24
Add bounds reasoning comments to AVX2 backend #560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
jammychiou1
commented
Oct 26, 2025
- Resolves Add bounds reasoning comments to AVX2 backend #528
1fc1bf6 to
1b76629
Compare
ca27600 to
da40590
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mac Mini (M1, 2020) benchmarks (opt)
| Benchmark suite | Current: 6ca67e1 | Previous: ee7e1bf | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
46245 cycles |
46243 cycles |
1.00 |
ML-DSA-44 sign |
132702 cycles |
132735 cycles |
1.00 |
ML-DSA-44 verify |
47876 cycles |
47881 cycles |
1.00 |
ML-DSA-65 keypair |
81159 cycles |
81166 cycles |
1.00 |
ML-DSA-65 sign |
219247 cycles |
219290 cycles |
1.00 |
ML-DSA-65 verify |
80130 cycles |
80129 cycles |
1.00 |
ML-DSA-87 keypair |
132357 cycles |
132350 cycles |
1.00 |
ML-DSA-87 sign |
280984 cycles |
280937 cycles |
1.00 |
ML-DSA-87 verify |
130424 cycles |
130406 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mac Mini (M1, 2020) benchmarks (no-opt)
| Benchmark suite | Current: 6ca67e1 | Previous: ee7e1bf | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
115064 cycles |
115058 cycles |
1.00 |
ML-DSA-44 sign |
431635 cycles |
431665 cycles |
1.00 |
ML-DSA-44 verify |
122206 cycles |
122172 cycles |
1.00 |
ML-DSA-65 keypair |
197112 cycles |
197083 cycles |
1.00 |
ML-DSA-65 sign |
701011 cycles |
701034 cycles |
1.00 |
ML-DSA-65 verify |
197688 cycles |
197688 cycles |
1 |
ML-DSA-87 keypair |
325227 cycles |
325219 cycles |
1.00 |
ML-DSA-87 sign |
884685 cycles |
884692 cycles |
1.00 |
ML-DSA-87 verify |
328848 cycles |
328850 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A76 (Raspberry Pi 5) benchmarks (opt)
| Benchmark suite | Current: 6ca67e1 | Previous: ee7e1bf | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
115004 cycles |
114982 cycles |
1.00 |
ML-DSA-44 sign |
377271 cycles |
377314 cycles |
1.00 |
ML-DSA-44 verify |
120313 cycles |
120175 cycles |
1.00 |
ML-DSA-65 keypair |
199250 cycles |
199171 cycles |
1.00 |
ML-DSA-65 sign |
622635 cycles |
622821 cycles |
1.00 |
ML-DSA-65 verify |
198187 cycles |
198196 cycles |
1.00 |
ML-DSA-87 keypair |
326349 cycles |
325598 cycles |
1.00 |
ML-DSA-87 sign |
790980 cycles |
790006 cycles |
1.00 |
ML-DSA-87 verify |
325253 cycles |
324398 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 4th gen (c7i)
| Benchmark suite | Current: 6ca67e1 | Previous: ee7e1bf | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
34962 cycles |
35293 cycles |
0.99 |
ML-DSA-44 sign |
121049 cycles |
120017 cycles |
1.01 |
ML-DSA-44 verify |
38221 cycles |
38192 cycles |
1.00 |
ML-DSA-65 keypair |
61665 cycles |
61551 cycles |
1.00 |
ML-DSA-65 sign |
199507 cycles |
199134 cycles |
1.00 |
ML-DSA-65 verify |
62246 cycles |
62382 cycles |
1.00 |
ML-DSA-87 keypair |
95126 cycles |
93721 cycles |
1.01 |
ML-DSA-87 sign |
235196 cycles |
229923 cycles |
1.02 |
ML-DSA-87 verify |
93916 cycles |
94024 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 4th gen (c7i) (no-opt)
| Benchmark suite | Current: 6ca67e1 | Previous: ee7e1bf | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
95254 cycles |
94967 cycles |
1.00 |
ML-DSA-44 sign |
349037 cycles |
348918 cycles |
1.00 |
ML-DSA-44 verify |
100789 cycles |
100735 cycles |
1.00 |
ML-DSA-65 keypair |
164263 cycles |
164543 cycles |
1.00 |
ML-DSA-65 sign |
567230 cycles |
567551 cycles |
1.00 |
ML-DSA-65 verify |
165474 cycles |
165398 cycles |
1.00 |
ML-DSA-87 keypair |
266932 cycles |
267562 cycles |
1.00 |
ML-DSA-87 sign |
722097 cycles |
722682 cycles |
1.00 |
ML-DSA-87 verify |
272046 cycles |
271670 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A76 (Raspberry Pi 5) benchmarks (no-opt)
| Benchmark suite | Current: 6ca67e1 | Previous: ee7e1bf | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
213333 cycles |
213108 cycles |
1.00 |
ML-DSA-44 sign |
782029 cycles |
782015 cycles |
1.00 |
ML-DSA-44 verify |
230092 cycles |
230320 cycles |
1.00 |
ML-DSA-65 keypair |
384054 cycles |
383982 cycles |
1.00 |
ML-DSA-65 sign |
1326768 cycles |
1313471 cycles |
1.01 |
ML-DSA-65 verify |
375377 cycles |
375490 cycles |
1.00 |
ML-DSA-87 keypair |
605449 cycles |
605206 cycles |
1.00 |
ML-DSA-87 sign |
1621496 cycles |
1622880 cycles |
1.00 |
ML-DSA-87 verify |
617407 cycles |
617415 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)
| Benchmark suite | Current: 6ca67e1 | Previous: ee7e1bf | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
229297 cycles |
225291 cycles |
1.02 |
ML-DSA-44 sign |
680601 cycles |
674064 cycles |
1.01 |
ML-DSA-44 verify |
230038 cycles |
228253 cycles |
1.01 |
ML-DSA-65 keypair |
392762 cycles |
399283 cycles |
0.98 |
ML-DSA-65 sign |
1120149 cycles |
1102258 cycles |
1.02 |
ML-DSA-65 verify |
383658 cycles |
383981 cycles |
1.00 |
ML-DSA-87 keypair |
663306 cycles |
645046 cycles |
1.03 |
ML-DSA-87 sign |
1465499 cycles |
1407349 cycles |
1.04 |
ML-DSA-87 verify |
649376 cycles |
625913 cycles |
1.04 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton4
| Benchmark suite | Current: 6ca67e1 | Previous: ee7e1bf | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
69406 cycles |
69297 cycles |
1.00 |
ML-DSA-44 sign |
215066 cycles |
215394 cycles |
1.00 |
ML-DSA-44 verify |
72872 cycles |
72803 cycles |
1.00 |
ML-DSA-65 keypair |
123156 cycles |
123048 cycles |
1.00 |
ML-DSA-65 sign |
353712 cycles |
354049 cycles |
1.00 |
ML-DSA-65 verify |
120786 cycles |
120878 cycles |
1.00 |
ML-DSA-87 keypair |
201134 cycles |
200545 cycles |
1.00 |
ML-DSA-87 sign |
451336 cycles |
451914 cycles |
1.00 |
ML-DSA-87 verify |
198201 cycles |
198487 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 3rd gen (c6a)
| Benchmark suite | Current: 6ca67e1 | Previous: ee7e1bf | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
69683 cycles |
69592 cycles |
1.00 |
ML-DSA-44 sign |
184860 cycles |
185506 cycles |
1.00 |
ML-DSA-44 verify |
69154 cycles |
69226 cycles |
1.00 |
ML-DSA-65 keypair |
119441 cycles |
120798 cycles |
0.99 |
ML-DSA-65 sign |
295459 cycles |
298349 cycles |
0.99 |
ML-DSA-65 verify |
115470 cycles |
116590 cycles |
0.99 |
ML-DSA-87 keypair |
201494 cycles |
201311 cycles |
1.00 |
ML-DSA-87 sign |
385746 cycles |
386042 cycles |
1.00 |
ML-DSA-87 verify |
193805 cycles |
193658 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 3rd gen (c6i)
| Benchmark suite | Current: 6ca67e1 | Previous: ee7e1bf | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
57175 cycles |
56967 cycles |
1.00 |
ML-DSA-44 sign |
180395 cycles |
180312 cycles |
1.00 |
ML-DSA-44 verify |
61138 cycles |
61223 cycles |
1.00 |
ML-DSA-65 keypair |
99517 cycles |
99149 cycles |
1.00 |
ML-DSA-65 sign |
295948 cycles |
296299 cycles |
1.00 |
ML-DSA-65 verify |
100305 cycles |
100113 cycles |
1.00 |
ML-DSA-87 keypair |
153336 cycles |
153114 cycles |
1.00 |
ML-DSA-87 sign |
352913 cycles |
353081 cycles |
1.00 |
ML-DSA-87 verify |
152140 cycles |
151972 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton2
| Benchmark suite | Current: 6ca67e1 | Previous: ee7e1bf | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
115166 cycles |
115181 cycles |
1.00 |
ML-DSA-44 sign |
377510 cycles |
377683 cycles |
1.00 |
ML-DSA-44 verify |
120392 cycles |
120343 cycles |
1.00 |
ML-DSA-65 keypair |
199370 cycles |
199283 cycles |
1.00 |
ML-DSA-65 sign |
623245 cycles |
623012 cycles |
1.00 |
ML-DSA-65 verify |
198354 cycles |
198353 cycles |
1.00 |
ML-DSA-87 keypair |
326701 cycles |
326259 cycles |
1.00 |
ML-DSA-87 sign |
791828 cycles |
790809 cycles |
1.00 |
ML-DSA-87 verify |
325412 cycles |
324916 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton3
| Benchmark suite | Current: 6ca67e1 | Previous: ee7e1bf | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
73849 cycles |
73827 cycles |
1.00 |
ML-DSA-44 sign |
228459 cycles |
228653 cycles |
1.00 |
ML-DSA-44 verify |
78255 cycles |
78142 cycles |
1.00 |
ML-DSA-65 keypair |
129816 cycles |
129734 cycles |
1.00 |
ML-DSA-65 sign |
378404 cycles |
378349 cycles |
1.00 |
ML-DSA-65 verify |
129311 cycles |
129160 cycles |
1.00 |
ML-DSA-87 keypair |
208622 cycles |
210617 cycles |
0.99 |
ML-DSA-87 sign |
479146 cycles |
479575 cycles |
1.00 |
ML-DSA-87 verify |
208503 cycles |
210205 cycles |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton4 (no-opt)
| Benchmark suite | Current: 6ca67e1 | Previous: ee7e1bf | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
132648 cycles |
132784 cycles |
1.00 |
ML-DSA-44 sign |
498274 cycles |
498151 cycles |
1.00 |
ML-DSA-44 verify |
144830 cycles |
144894 cycles |
1.00 |
ML-DSA-65 keypair |
226547 cycles |
226211 cycles |
1.00 |
ML-DSA-65 sign |
813539 cycles |
812397 cycles |
1.00 |
ML-DSA-65 verify |
231077 cycles |
231596 cycles |
1.00 |
ML-DSA-87 keypair |
374183 cycles |
374501 cycles |
1.00 |
ML-DSA-87 sign |
1020719 cycles |
1020931 cycles |
1.00 |
ML-DSA-87 verify |
383566 cycles |
383519 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 3rd gen (c6a) (no-opt)
| Benchmark suite | Current: 6ca67e1 | Previous: ee7e1bf | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
135440 cycles |
134965 cycles |
1.00 |
ML-DSA-44 sign |
539993 cycles |
540181 cycles |
1.00 |
ML-DSA-44 verify |
148184 cycles |
148325 cycles |
1.00 |
ML-DSA-65 keypair |
228063 cycles |
227974 cycles |
1.00 |
ML-DSA-65 sign |
889220 cycles |
893643 cycles |
1.00 |
ML-DSA-65 verify |
237929 cycles |
237974 cycles |
1.00 |
ML-DSA-87 keypair |
372971 cycles |
372801 cycles |
1.00 |
ML-DSA-87 sign |
1106723 cycles |
1105358 cycles |
1.00 |
ML-DSA-87 verify |
386855 cycles |
387678 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 3rd gen (c6i) (no-opt)
| Benchmark suite | Current: 6ca67e1 | Previous: ee7e1bf | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
157988 cycles |
157813 cycles |
1.00 |
ML-DSA-44 sign |
565277 cycles |
564791 cycles |
1.00 |
ML-DSA-44 verify |
169766 cycles |
169751 cycles |
1.00 |
ML-DSA-65 keypair |
270290 cycles |
270498 cycles |
1.00 |
ML-DSA-65 sign |
925971 cycles |
926242 cycles |
1.00 |
ML-DSA-65 verify |
276263 cycles |
275901 cycles |
1.00 |
ML-DSA-87 keypair |
452768 cycles |
453347 cycles |
1.00 |
ML-DSA-87 sign |
1188324 cycles |
1186402 cycles |
1.00 |
ML-DSA-87 verify |
461505 cycles |
461198 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 4th gen (c7a)
| Benchmark suite | Current: 6ca67e1 | Previous: ee7e1bf | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
42686 cycles |
42979 cycles |
0.99 |
ML-DSA-44 sign |
130695 cycles |
130913 cycles |
1.00 |
ML-DSA-44 verify |
44216 cycles |
44413 cycles |
1.00 |
ML-DSA-65 keypair |
72672 cycles |
72353 cycles |
1.00 |
ML-DSA-65 sign |
210616 cycles |
212806 cycles |
0.99 |
ML-DSA-65 verify |
73367 cycles |
72915 cycles |
1.01 |
ML-DSA-87 keypair |
109463 cycles |
109750 cycles |
1.00 |
ML-DSA-87 sign |
249730 cycles |
248355 cycles |
1.01 |
ML-DSA-87 verify |
110219 cycles |
111517 cycles |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A55 (Snapdragon 888) benchmarks (opt)
| Benchmark suite | Current: 6ca67e1 | Previous: ee7e1bf | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
288553 cycles |
288273 cycles |
1.00 |
ML-DSA-44 sign |
927585 cycles |
936164 cycles |
0.99 |
ML-DSA-44 verify |
295250 cycles |
292179 cycles |
1.01 |
ML-DSA-65 keypair |
487983 cycles |
488081 cycles |
1.00 |
ML-DSA-65 sign |
1530864 cycles |
1529760 cycles |
1.00 |
ML-DSA-65 verify |
482983 cycles |
475632 cycles |
1.02 |
ML-DSA-87 keypair |
831824 cycles |
841156 cycles |
0.99 |
ML-DSA-87 sign |
2087730 cycles |
2121320 cycles |
0.98 |
ML-DSA-87 verify |
815092 cycles |
826714 cycles |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton3 (no-opt)
| Benchmark suite | Current: 6ca67e1 | Previous: ee7e1bf | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
138369 cycles |
138318 cycles |
1.00 |
ML-DSA-44 sign |
492999 cycles |
493648 cycles |
1.00 |
ML-DSA-44 verify |
148347 cycles |
148347 cycles |
1 |
ML-DSA-65 keypair |
241732 cycles |
241461 cycles |
1.00 |
ML-DSA-65 sign |
809768 cycles |
809767 cycles |
1.00 |
ML-DSA-65 verify |
240679 cycles |
240584 cycles |
1.00 |
ML-DSA-87 keypair |
395821 cycles |
395729 cycles |
1.00 |
ML-DSA-87 sign |
1027366 cycles |
1027294 cycles |
1.00 |
ML-DSA-87 verify |
401516 cycles |
401299 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton2 (no-opt)
| Benchmark suite | Current: 6ca67e1 | Previous: ee7e1bf | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
213829 cycles |
213522 cycles |
1.00 |
ML-DSA-44 sign |
782940 cycles |
794930 cycles |
0.98 |
ML-DSA-44 verify |
230437 cycles |
230022 cycles |
1.00 |
ML-DSA-65 keypair |
384543 cycles |
384988 cycles |
1.00 |
ML-DSA-65 sign |
1310640 cycles |
1307299 cycles |
1.00 |
ML-DSA-65 verify |
375817 cycles |
376399 cycles |
1.00 |
ML-DSA-87 keypair |
606688 cycles |
605922 cycles |
1.00 |
ML-DSA-87 sign |
1626018 cycles |
1626375 cycles |
1.00 |
ML-DSA-87 verify |
618278 cycles |
617623 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 4th gen (c7a) (no-opt)
| Benchmark suite | Current: 6ca67e1 | Previous: ee7e1bf | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
120170 cycles |
120288 cycles |
1.00 |
ML-DSA-44 sign |
454518 cycles |
453859 cycles |
1.00 |
ML-DSA-44 verify |
130051 cycles |
130271 cycles |
1.00 |
ML-DSA-65 keypair |
204928 cycles |
205208 cycles |
1.00 |
ML-DSA-65 sign |
736614 cycles |
736112 cycles |
1.00 |
ML-DSA-65 verify |
209739 cycles |
209715 cycles |
1.00 |
ML-DSA-87 keypair |
337171 cycles |
337055 cycles |
1.00 |
ML-DSA-87 sign |
928686 cycles |
926156 cycles |
1.00 |
ML-DSA-87 verify |
345798 cycles |
345516 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)
| Benchmark suite | Current: 6ca67e1 | Previous: ee7e1bf | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
317365 cycles |
318223 cycles |
1.00 |
ML-DSA-44 sign |
1224779 cycles |
1227916 cycles |
1.00 |
ML-DSA-44 verify |
344907 cycles |
339753 cycles |
1.02 |
ML-DSA-65 keypair |
552815 cycles |
559842 cycles |
0.99 |
ML-DSA-65 sign |
1909394 cycles |
1942017 cycles |
0.98 |
ML-DSA-65 verify |
519791 cycles |
529374 cycles |
0.98 |
ML-DSA-87 keypair |
872331 cycles |
863765 cycles |
1.01 |
ML-DSA-87 sign |
2478204 cycles |
2444597 cycles |
1.01 |
ML-DSA-87 verify |
883699 cycles |
863122 cycles |
1.02 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A55 (Snapdragon 888) benchmarks (no-opt)
| Benchmark suite | Current: 6ca67e1 | Previous: ee7e1bf | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
466124 cycles |
464693 cycles |
1.00 |
ML-DSA-44 sign |
2213126 cycles |
2221355 cycles |
1.00 |
ML-DSA-44 verify |
550344 cycles |
545927 cycles |
1.01 |
ML-DSA-65 keypair |
777048 cycles |
776812 cycles |
1.00 |
ML-DSA-65 sign |
3645043 cycles |
3636149 cycles |
1.00 |
ML-DSA-65 verify |
849253 cycles |
849524 cycles |
1.00 |
ML-DSA-87 keypair |
1254730 cycles |
1270549 cycles |
0.99 |
ML-DSA-87 sign |
4489453 cycles |
4519665 cycles |
0.99 |
ML-DSA-87 verify |
1370350 cycles |
1380875 cycles |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SpacemiT K1 8 (Banana Pi F3) benchmarks (no-opt)
| Benchmark suite | Current: 6ca67e1 | Previous: ee7e1bf | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
822519 cycles |
821707 cycles |
1.00 |
ML-DSA-44 sign |
3332513 cycles |
3334149 cycles |
1.00 |
ML-DSA-44 verify |
919075 cycles |
918864 cycles |
1.00 |
ML-DSA-65 keypair |
1397964 cycles |
1396572 cycles |
1.00 |
ML-DSA-65 sign |
5447376 cycles |
5443674 cycles |
1.00 |
ML-DSA-65 verify |
1465026 cycles |
1464453 cycles |
1.00 |
ML-DSA-87 keypair |
2301172 cycles |
2300947 cycles |
1.00 |
ML-DSA-87 sign |
6820350 cycles |
6813578 cycles |
1.00 |
ML-DSA-87 verify |
2402933 cycles |
2397029 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
da40590 to
6ca67e1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.
| Benchmark suite | Current: 6ca67e1 | Previous: ee7e1bf | Ratio |
|---|---|---|---|
ML-DSA-87 sign |
1465499 cycles |
1407349 cycles |
1.04 |
ML-DSA-87 verify |
649376 cycles |
625913 cycles |
1.04 |
This comment was automatically generated by workflow using github-action-benchmark.
06248ba to
8bf430b
Compare
Signed-off-by: jammychiou1 <[email protected]>
Signed-off-by: jammychiou1 <[email protected]>
The new approach is adapted from our Neon implementation. See <#411 (comment)> for more information on the idea. Bounds reasoning comments are also added. Signed-off-by: jammychiou1 <[email protected]>
Edit some comments while we're at it. Signed-off-by: jammychiou1 <[email protected]>
8bf430b to
37040a9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jammychiou1. I checked the bounds tracking for NTT, iNTT, basemul and everything makes sense to me. Please include some reasoning for the 3q/4 bounds.
It would also be great if you can extend https://github.com/pq-code-package/mlkem-native/blob/main/test/test_bounds.py to demonstrate the 3q/4 bound.
Let's move the decompose changes to a separate follow-up PR, please.
| /* | ||
| * Compute l + h, montmul(h - l, zh) then store the results back to l, h | ||
| * respectively. | ||
| * | ||
| * The general abs bound of Montgomery multiplication is 3q/4. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline: Please include reasoing for the 3q/4 bound --- you actually mean Montgomery multiplication by a constant, not general Montgomery multiplication.
| * Although the general abs bound of Montgomery multiplication is 3q/4, we use | ||
| * the more convenient bound q here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rephrase and reference intt.S for the 3q/4.