Skip to content

Conversation

@forfudan
Copy link
Owner

@forfudan forfudan commented Mar 9, 2025

This PR focuses on optimizing divide() by replacing string-based approach with int-based approach. It significantly improves the performance of divide() by from x100 to x10000 times.

  1. Advanced Overflow Prevention - Precise coefficient bounds checking with proper truncation to maintain accurate results within the 96-bit limitation, particularly in division operations

  2. Comprehensive Edge Case Handling - Special case optimizations for common scenarios (zero values, powers of 10, coefficient of 1) with proper error messages for impossible operations

  3. Sophisticated Scale Management - Intelligent adjustment of decimal scales during arithmetic operations, preserving maximum precision while preventing overflow

  4. Performance Optimization - Efficient binary exponentiation for power operations, specialized division algorithms, and direct bit manipulation for improved computational efficiency

These improvements reflect progress in the "make it right" and "make it fast" phases, focusing on robustness, correctness, performance, and edge case handling while maintaining the framework for future performance optimizations.

@forfudan forfudan merged commit 3501645 into main Mar 9, 2025
2 checks passed
@forfudan forfudan deleted the work branch March 9, 2025 20:18
@forfudan
Copy link
Owner Author

Bench before changes:

=== DeciMojo Division Benchmark ===
Time: 2025-03-08T12:14:12.450519
System: Darwin 24.3.0
Processor: arm
Python version: 3.12.9
Python decimal precision: 28
Mojo decimal precision: 28

Running division benchmarks with 1000 iterations each

Benchmark:       Integer division (no remainder)
Mojo result:     25
Python result:   25
Mojo Decimal:    24344.0 ns per iteration
Python Decimal:  3712.0 ns per iteration
Speedup factor:  0.152481104173513

Benchmark:       Simple decimal division
Mojo result:     4.2
Python result:   4.2
Mojo Decimal:    27989.0 ns per iteration
Python Decimal:  3760.0 ns per iteration
Speedup factor:  0.134338490121119

Benchmark:       Division with repeating decimal
Mojo result:     3.3333333333333333333333333333
Python result:   3.333333333333333333333333333
Mojo Decimal:    92164.0 ns per iteration
Python Decimal:  3617.0 ns per iteration
Speedup factor:  0.03924525845232412

Benchmark:       Division by one
Mojo result:     123.45
Python result:   123.45
Mojo Decimal:    24866.0 ns per iteration
Python Decimal:  3692.0 ns per iteration
Speedup factor:  0.14847583045121854

Benchmark:       Division of zero
Mojo result:     0
Python result:   0E+2
Mojo Decimal:    2.0 ns per iteration
Python Decimal:  3641.0 ns per iteration
Speedup factor:  1820.5

Benchmark:       Division with negative numbers
Mojo result:     -61.725
Python result:   -61.725
Mojo Decimal:    31470.0 ns per iteration
Python Decimal:  3601.0 ns per iteration
Speedup factor:  0.1144264378773435

Benchmark:       Division by very small number
Mojo result:     10000
Python result:   1E+4
Mojo Decimal:    17362.0 ns per iteration
Python Decimal:  3633.0 ns per iteration
Speedup factor:  0.20925008639557655

Benchmark:       High precision division
Mojo result:     1.2499999886093750002689453113
Python result:   1.249999988609375000268945311
Mojo Decimal:    752919.0 ns per iteration
Python Decimal:  3656.0 ns per iteration
Speedup factor:  0.004855768017542392

Benchmark:       Division resulting in power of 10
Mojo result:     10000
Python result:   1.0E+4
Mojo Decimal:    17962.0 ns per iteration
Python Decimal:  3683.0 ns per iteration
Speedup factor:  0.20504398173922725

Benchmark:       Division of very large numbers
Mojo result:     10000000001
Python result:   10000000001
Mojo Decimal:    59169.0 ns per iteration
Python Decimal:  3581.0 ns per iteration
Speedup factor:  0.06052155689634775

=== Division Benchmark Summary ===
Benchmarked:      10 different division cases
Each case ran:    1000 iterations
Performance:      See detailed results above for each case

@forfudan
Copy link
Owner Author

Bench after changes:

=== DeciMojo Division Benchmark ===
Time: 2025-03-09T21:03:40.370413
System: Darwin 24.3.0
Processor: arm
Python version: 3.12.9
Python decimal precision: 28
Mojo decimal precision: 28

Running division benchmarks with 1000 iterations each

Benchmark:       Integer division (no remainder)
Decimals:        100 / 4
Mojo result:     25
Python result:   25
Mojo Decimal:    7.0 ns per iteration
Python Decimal:  3719.0 ns per iteration
Speedup factor:  531.2857142857143

Benchmark:       Simple decimal division
Decimals:        10.5 / 2.5
Mojo result:     4.2
Python result:   4.2
Mojo Decimal:    24.0 ns per iteration
Python Decimal:  3794.0 ns per iteration
Speedup factor:  158.08333333333334

Benchmark:       Division with repeating decimal
Decimals:        10 / 3
Mojo result:     3.3333333333333333333333333333
Python result:   3.333333333333333333333333333
Mojo Decimal:    433.0 ns per iteration
Python Decimal:  3811.0 ns per iteration
Speedup factor:  8.801385681293302

Benchmark:       Division by one
Decimals:        123.45 / 1
Mojo result:     123.45
Python result:   123.45
Mojo Decimal:    3.0 ns per iteration
Python Decimal:  3767.0 ns per iteration
Speedup factor:  1255.6666666666667

Benchmark:       Division of zero
Decimals:        0 / 123.45
Mojo result:     0
Python result:   0E+2
Mojo Decimal:    2.0 ns per iteration
Python Decimal:  3789.0 ns per iteration
Speedup factor:  1894.5

Benchmark:       Division with negative numbers
Decimals:        123.45 / -2
Mojo result:     -61.725
Python result:   -61.725
Mojo Decimal:    34.0 ns per iteration
Python Decimal:  3628.0 ns per iteration
Speedup factor:  106.70588235294117

Benchmark:       Division by very small number
Decimals:        1 / 0.0001
Mojo result:     10000
Python result:   1E+4
Mojo Decimal:    6.0 ns per iteration
Python Decimal:  3779.0 ns per iteration
Speedup factor:  629.8333333333334

Benchmark:       High precision division
Decimals:        0.1234567890123456789 / 0.0987654321098765432
Mojo result:     1.2499999886093750002689453113
Python result:   1.249999988609375000268945311
Mojo Decimal:    520.0 ns per iteration
Python Decimal:  3742.0 ns per iteration
Speedup factor:  7.196153846153846

Benchmark:       Division resulting in power of 10
Decimals:        10 / 0.001
Mojo result:     10000
Python result:   1.0E+4
Mojo Decimal:    7.0 ns per iteration
Python Decimal:  3987.0 ns per iteration
Speedup factor:  569.5714285714286

Benchmark:       Division of very large numbers
Decimals:        99999999999999999999 / 9999999999
Mojo result:     10000000001
Python result:   10000000001
Mojo Decimal:    6.0 ns per iteration
Python Decimal:  3811.0 ns per iteration
Speedup factor:  635.1666666666666

=== Division Benchmark Summary ===
Benchmarked:      10 different division cases
Each case ran:    1000 iterations
Performance:      See detailed results above for each case

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants