Skip to content

Conversation

@forfudan
Copy link
Owner

This pull request focuses on optimizing sqrt() function by improving initial guess. The speed is significantly increased and is comparable to Python's decimals.sqrt() function.

This pull request also includes several updates to the README.md file to improve clarity, accuracy, and completeness, as well as some changes to the benchmarking files to enhance their functionality and correctness (add bench_sqrt).

@forfudan
Copy link
Owner Author

Bench against Python for v0.1.0

=== DeciMojo Square Root Benchmark ===
Time: 2025-03-09T21:34:44.747649
System: Darwin 24.3.0
Processor: arm
Python version: 3.12.9
Python decimal precision: 28
Mojo decimal precision: 28

Running square root benchmarks with 100 iterations each

Benchmark:       Perfect square (small)
Decimal:         16
Mojo result:     4
Python result:   4.0
Mojo Decimal:    9298290.0 ns per iteration
Python Decimal:  6690.0 ns per iteration
Speedup factor:  0.000719487131504825

Benchmark:       Perfect square (large)
Decimal:         1000000
Mojo result:     1000
Python result:   1000.0
Mojo Decimal:    19905740.0 ns per iteration
Python Decimal:  6730.0 ns per iteration
Speedup factor:  0.0003380934343561204

Benchmark:       Non-perfect square (small irrational)
Decimal:         2
Mojo result:     1.4142135623730950488016887242
Python result:   1.4142135623730951
Mojo Decimal:    5960670.0 ns per iteration
Python Decimal:  6870.0 ns per iteration
Speedup factor:  0.0011525549980119685

Benchmark:       Non-perfect square (medium)
Decimal:         123.456
Mojo result:     11.111075555498666484621494041
Python result:   11.111075555498667
Mojo Decimal:    14347030.0 ns per iteration
Python Decimal:  7100.0 ns per iteration
Speedup factor:  0.0004948759429651991

Benchmark:       Very small number
Decimal:         0.0000001
Mojo result:     0.0003162277660168379331998893
Python result:   0.00031622776601683794
Mojo Decimal:    7109560.0 ns per iteration
Python Decimal:  7140.0 ns per iteration
Speedup factor:  0.0010042815589150383

Benchmark:       Very large number
Decimal:         100000000000000000000
Mojo result:     10000000000
Python result:   10000000000.0
Mojo Decimal:    51093230.0 ns per iteration
Python Decimal:  7660.0 ns per iteration
Speedup factor:  0.00014992201510845957

Benchmark:       Number just above 1
Decimal:         1.0000001
Mojo result:     1.0000000499999987500000624999
Python result:   1.0000000499999988
Mojo Decimal:    19152230.0 ns per iteration
Python Decimal:  7140.0 ns per iteration
Speedup factor:  0.00037280254048745236

Benchmark:       Number just below 1
Decimal:         0.9999999
Mojo result:     0.9999999499999987499999374999
Python result:   0.9999999499999987
Mojo Decimal:    22279610.0 ns per iteration
Python Decimal:  6760.0 ns per iteration
Speedup factor:  0.00030341644220881783

Benchmark:       High precision value
Decimal:         1.23456789012345678901234567
Mojo result:     1.1111111061111110993611110541
Python result:   1.111111106111111
Mojo Decimal:    66541070.0 ns per iteration
Python Decimal:  6900.0 ns per iteration
Speedup factor:  0.00010369535686757065

Benchmark:       Number with exact square root
Decimal:         0.04
Mojo result:     0.2
Python result:   0.2
Mojo Decimal:    8387550.0 ns per iteration
Python Decimal:  6910.0 ns per iteration
Speedup factor:  0.0008238400963332559

Benchmark:       Number close to a perfect square
Decimal:         99.99
Mojo result:     9.99949998749937496093476542
Python result:   9.999499987499375
Mojo Decimal:    11958540.0 ns per iteration
Python Decimal:  7130.0 ns per iteration
Speedup factor:  0.000596226629672184

Benchmark:       Very large perfect square
Decimal:         1000000000
Mojo result:     31622.776601683793319988935445
Python result:   31622.776601683792
Mojo Decimal:    26154480.0 ns per iteration
Python Decimal:  7260.0 ns per iteration
Speedup factor:  0.00027758150802462904

Benchmark:       Number with repeating pattern in result
Decimal:         3
Mojo result:     1.7320508075688772935274463415
Python result:   1.7320508075688772
Mojo Decimal:    5329160.0 ns per iteration
Python Decimal:  7010.0 ns per iteration
Speedup factor:  0.0013154043038677765

Benchmark:       Number with trailing zeros
Decimal:         144.0000
Mojo result:     12
Python result:   12.0
Mojo Decimal:    13151590.0 ns per iteration
Python Decimal:  6940.0 ns per iteration
Speedup factor:  0.0005276928493056733

Benchmark:       Slightly larger than perfect square
Decimal:         4.0001
Mojo result:     2.0000249998437519530944829559
Python result:   2.000024999843752
Mojo Decimal:    11769880.0 ns per iteration
Python Decimal:  7090.0 ns per iteration
Speedup factor:  0.0006023850710457541

Benchmark:       Slightly smaller than perfect square
Decimal:         15.9999
Mojo result:     3.9999874999804686889646053305
Python result:   3.9999874999804685
Mojo Decimal:    14801160.0 ns per iteration
Python Decimal:  7050.0 ns per iteration
Speedup factor:  0.0004763140186309722

Benchmark:       Number with many decimal places
Decimal:         0.12345678901234567890
Mojo result:     0.3513641828820144253093654172
Python result:   0.35136418288201443
Mojo Decimal:    46002980.0 ns per iteration
Python Decimal:  7220.0 ns per iteration
Speedup factor:  0.0001569463543448707

Benchmark:       Number close to maximum value
Decimal:         79228162514264337593543950334
Mojo result:     281474976710656.00000000000000
Python result:   281474976710656.0
Mojo Decimal:    86764900.0 ns per iteration
Python Decimal:  6790.0 ns per iteration
Speedup factor:  7.825745203417511e-05

Benchmark:       Very tiny positive number
Decimal:         0.0000000000000000000000000001
Mojo result:     0.00000000000001
Python result:   1e-14
Mojo Decimal:    4719410.0 ns per iteration
Python Decimal:  7080.0 ns per iteration
Speedup factor:  0.0015001875234404302

Benchmark:       Number requiring many iterations
Decimal:         987654321.123456789
Mojo result:     31426.968054896049564603619131
Python result:   31426.96805489605
Mojo Decimal:    45269640.0 ns per iteration
Python Decimal:  7230.0 ns per iteration
Speedup factor:  0.0001597096862267957

=== Square Root Benchmark Summary ===
Benchmarked:      20 different square root cases
Each case ran:    100 iterations
Performance:      See detailed results above for each case

@forfudan
Copy link
Owner Author

Bench against Python after PR #21

=== DeciMojo Square Root Benchmark ===
Time: 2025-03-09T21:36:41.831592
System: Darwin 24.3.0
Processor: arm
Python version: 3.12.9
Python decimal precision: 28
Mojo decimal precision: 28

Running square root benchmarks with 100 iterations each

Benchmark:       Perfect square (small)
Decimal:         16
Mojo result:     4
Python result:   4.0
Mojo Decimal:    59610.0 ns per iteration
Python Decimal:  7240.0 ns per iteration
Speedup factor:  0.12145613152155679

Benchmark:       Perfect square (large)
Decimal:         1000000
Mojo result:     1000
Python result:   1000.0
Mojo Decimal:    140140.0 ns per iteration
Python Decimal:  6060.0 ns per iteration
Speedup factor:  0.04324247181390038

Benchmark:       Non-perfect square (small irrational)
Decimal:         2
Mojo result:     1.4142135623730950488016887242
Python result:   1.4142135623730951
Mojo Decimal:    11050.0 ns per iteration
Python Decimal:  6340.0 ns per iteration
Speedup factor:  0.5737556561085972

Benchmark:       Non-perfect square (medium)
Decimal:         123.456
Mojo result:     11.111075555498666484621494041
Python result:   11.111075555498667
Mojo Decimal:    95590.0 ns per iteration
Python Decimal:  7300.0 ns per iteration
Speedup factor:  0.07636782090176797

Benchmark:       Very small number
Decimal:         0.0000001
Mojo result:     0.0003162277660168379331998894
Python result:   0.00031622776601683794
Mojo Decimal:    17110.0 ns per iteration
Python Decimal:  6500.0 ns per iteration
Speedup factor:  0.3798947983635301

Benchmark:       Very large number
Decimal:         100000000000000000000
Mojo result:     10000000000
Python result:   10000000000.0
Mojo Decimal:    581690.0 ns per iteration
Python Decimal:  7130.0 ns per iteration
Speedup factor:  0.012257387955784009

Benchmark:       Number just above 1
Decimal:         1.0000001
Mojo result:     1.0000000499999987500000625000
Python result:   1.0000000499999988
Mojo Decimal:    95770.0 ns per iteration
Python Decimal:  6710.0 ns per iteration
Speedup factor:  0.07006369426751592

Benchmark:       Number just below 1
Decimal:         0.9999999
Mojo result:     0.9999999499999987499999375000
Python result:   0.9999999499999987
Mojo Decimal:    107780.0 ns per iteration
Python Decimal:  6530.0 ns per iteration
Speedup factor:  0.060586379662275

Benchmark:       High precision value
Decimal:         1.23456789012345678901234567
Mojo result:     1.1111111061111110993611110542
Python result:   1.111111106111111
Mojo Decimal:    583710.0 ns per iteration
Python Decimal:  7000.0 ns per iteration
Speedup factor:  0.011992256428706036

Benchmark:       Number with exact square root
Decimal:         0.04
Mojo result:     0.2
Python result:   0.2
Mojo Decimal:    14950.0 ns per iteration
Python Decimal:  6420.0 ns per iteration
Speedup factor:  0.4294314381270903

Benchmark:       Number close to a perfect square
Decimal:         99.99
Mojo result:     9.999499987499374960934765420
Python result:   9.999499987499375
Mojo Decimal:    47650.0 ns per iteration
Python Decimal:  6720.0 ns per iteration
Speedup factor:  0.1410283315844701

Benchmark:       Very large perfect square
Decimal:         1000000000
Mojo result:     31622.776601683793319988935444
Python result:   31622.776601683792
Mojo Decimal:    217410.0 ns per iteration
Python Decimal:  6460.0 ns per iteration
Speedup factor:  0.029713444643760637

Benchmark:       Number with repeating pattern in result
Decimal:         3
Mojo result:     1.7320508075688772935274463415
Python result:   1.7320508075688772
Mojo Decimal:    12340.0 ns per iteration
Python Decimal:  6590.0 ns per iteration
Speedup factor:  0.534035656401945

Benchmark:       Number with trailing zeros
Decimal:         144.0000
Mojo result:     12.000000000000000000000000000
Python result:   12.0
Mojo Decimal:    89750.0 ns per iteration
Python Decimal:  6900.0 ns per iteration
Speedup factor:  0.07688022284122563

Benchmark:       Slightly larger than perfect square
Decimal:         4.0001
Mojo result:     2.0000249998437519530944829559
Python result:   2.000024999843752
Mojo Decimal:    38490.0 ns per iteration
Python Decimal:  6550.0 ns per iteration
Speedup factor:  0.1701740711873214

Benchmark:       Slightly smaller than perfect square
Decimal:         15.9999
Mojo result:     3.9999874999804686889646053305
Python result:   3.9999874999804685
Mojo Decimal:    104540.0 ns per iteration
Python Decimal:  6480.0 ns per iteration
Speedup factor:  0.0619858427396212

Benchmark:       Number with many decimal places
Decimal:         0.12345678901234567890
Mojo result:     0.3513641828820144253093654172
Python result:   0.35136418288201443
Mojo Decimal:    355050.0 ns per iteration
Python Decimal:  6960.0 ns per iteration
Speedup factor:  0.019602872834811998

Benchmark:       Number close to maximum value
Decimal:         79228162514264337593543950334
Mojo result:     281474976710656.00000000000000
Python result:   281474976710656.0
Mojo Decimal:    1007610.0 ns per iteration
Python Decimal:  6840.0 ns per iteration
Speedup factor:  0.00678834072706702

Benchmark:       Very tiny positive number
Decimal:         0.0000000000000000000000000001
Mojo result:     0.00000000000001
Python result:   1e-14
Mojo Decimal:    17590.0 ns per iteration
Python Decimal:  6810.0 ns per iteration
Speedup factor:  0.3871517907902217

Benchmark:       Number requiring many iterations
Decimal:         987654321.123456789
Mojo result:     31426.968054896049564603619130
Python result:   31426.96805489605
Mojo Decimal:    450110.0 ns per iteration
Python Decimal:  6460.0 ns per iteration
Speedup factor:  0.014352047277332207

=== Square Root Benchmark Summary ===
Benchmarked:      20 different square root cases
Each case ran:    100 iterations
Performance:      See detailed results above for each case

@forfudan
Copy link
Owner Author

Bench against Python after this PR:

=== DeciMojo Square Root Benchmark ===
Time: 2025-03-10T18:14:02.201642
System: Darwin 24.3.0
Processor: arm
Python version: 3.12.9
Python decimal precision: 28
Mojo decimal precision: 28

Running square root benchmarks with 100 iterations each

Benchmark:       Perfect square (small)
Decimal:         16
Mojo result:     4
Python result:   4.0
Mojo Decimal:    80.0 ns per iteration
Python Decimal:  6880.0 ns per iteration
Speedup factor:  86.0

Benchmark:       Perfect square (large)
Decimal:         1000000
Mojo result:     1000
Python result:   1000.0
Mojo Decimal:    140.0 ns per iteration
Python Decimal:  7400.0 ns per iteration
Speedup factor:  52.857142857142854

Benchmark:       Non-perfect square (small irrational)
Decimal:         2
Mojo result:     1.4142135623730950488016887242
Python result:   1.4142135623730951
Mojo Decimal:    4400.0 ns per iteration
Python Decimal:  6670.0 ns per iteration
Speedup factor:  1.5159090909090909

Benchmark:       Non-perfect square (medium)
Decimal:         123.456
Mojo result:     11.111075555498666484621494041
Python result:   11.111075555498667
Mojo Decimal:    3870.0 ns per iteration
Python Decimal:  7270.0 ns per iteration
Speedup factor:  1.8785529715762275

Benchmark:       Very small number
Decimal:         0.0000001
Mojo result:     0.0003162277660168379331998894
Python result:   0.00031622776601683794
Mojo Decimal:    5830.0 ns per iteration
Python Decimal:  6680.0 ns per iteration
Speedup factor:  1.1457975986277873

Benchmark:       Very large number
Decimal:         100000000000000000000
Mojo result:     10000000000
Python result:   10000000000.0
Mojo Decimal:    220.0 ns per iteration
Python Decimal:  7020.0 ns per iteration
Speedup factor:  31.90909090909091

Benchmark:       Number just above 1
Decimal:         1.0000001
Mojo result:     1.0000000499999987500000625000
Python result:   1.0000000499999988
Mojo Decimal:    4500.0 ns per iteration
Python Decimal:  7060.0 ns per iteration
Speedup factor:  1.568888888888889

Benchmark:       Number just below 1
Decimal:         0.9999999
Mojo result:     0.9999999499999987499999375000
Python result:   0.9999999499999987
Mojo Decimal:    4650.0 ns per iteration
Python Decimal:  6380.0 ns per iteration
Speedup factor:  1.3720430107526882

Benchmark:       High precision value
Decimal:         1.23456789012345678901234567
Mojo result:     1.1111111061111110993611110542
Python result:   1.111111106111111
Mojo Decimal:    1720.0 ns per iteration
Python Decimal:  7180.0 ns per iteration
Speedup factor:  4.174418604651163

Benchmark:       Number with exact square root
Decimal:         0.04
Mojo result:     0.2
Python result:   0.2
Mojo Decimal:    90.0 ns per iteration
Python Decimal:  7070.0 ns per iteration
Speedup factor:  78.55555555555556

Benchmark:       Number close to a perfect square
Decimal:         99.99
Mojo result:     9.999499987499374960934765420
Python result:   9.999499987499375
Mojo Decimal:    2510.0 ns per iteration
Python Decimal:  6870.0 ns per iteration
Speedup factor:  2.737051792828685

Benchmark:       Very large perfect square
Decimal:         1000000000
Mojo result:     31622.776601683793319988935444
Python result:   31622.776601683792
Mojo Decimal:    4340.0 ns per iteration
Python Decimal:  7470.0 ns per iteration
Speedup factor:  1.7211981566820276

Benchmark:       Number with repeating pattern in result
Decimal:         3
Mojo result:     1.7320508075688772935274463415
Python result:   1.7320508075688772
Mojo Decimal:    4720.0 ns per iteration
Python Decimal:  7150.0 ns per iteration
Speedup factor:  1.5148305084745763

Benchmark:       Number with trailing zeros
Decimal:         144.0000
Mojo result:     12.00
Python result:   12.0
Mojo Decimal:    170.0 ns per iteration
Python Decimal:  6390.0 ns per iteration
Speedup factor:  37.588235294117645

Benchmark:       Slightly larger than perfect square
Decimal:         4.0001
Mojo result:     2.0000249998437519530944829559
Python result:   2.000024999843752
Mojo Decimal:    2050.0 ns per iteration
Python Decimal:  6140.0 ns per iteration
Speedup factor:  2.995121951219512

Benchmark:       Slightly smaller than perfect square
Decimal:         15.9999
Mojo result:     3.9999874999804686889646053305
Python result:   3.9999874999804685
Mojo Decimal:    13320.0 ns per iteration
Python Decimal:  6120.0 ns per iteration
Speedup factor:  0.4594594594594595

Benchmark:       Number with many decimal places
Decimal:         0.12345678901234567890
Mojo result:     0.3513641828820144253093654172
Python result:   0.35136418288201443
Mojo Decimal:    2910.0 ns per iteration
Python Decimal:  6190.0 ns per iteration
Speedup factor:  2.127147766323024

Benchmark:       Number close to maximum value
Decimal:         79228162514264337593543950334
Mojo result:     281474976710656.00000000000000
Python result:   281474976710656.0
Mojo Decimal:    1200.0 ns per iteration
Python Decimal:  6290.0 ns per iteration
Speedup factor:  5.241666666666666

Benchmark:       Very tiny positive number
Decimal:         0.0000000000000000000000000001
Mojo result:     0.00000000000001
Python result:   1e-14
Mojo Decimal:    70.0 ns per iteration
Python Decimal:  6180.0 ns per iteration
Speedup factor:  88.28571428571429

Benchmark:       Number requiring many iterations
Decimal:         987654321.123456789
Mojo result:     31426.968054896049564603619130
Python result:   31426.96805489605
Mojo Decimal:    4950.0 ns per iteration
Python Decimal:  6180.0 ns per iteration
Speedup factor:  1.2484848484848485

=== Square Root Benchmark Summary ===
Benchmarked:      20 different square root cases
Each case ran:    100 iterations
Performance:      See detailed results above for each case

@forfudan forfudan merged commit 7dcb7eb into main Mar 10, 2025
2 checks passed
@forfudan forfudan deleted the sqrt branch March 10, 2025 17:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants