-
Notifications
You must be signed in to change notification settings - Fork 270
Description
Issue: Significant Performance Slowdown with toolz.curry
I encountered a performance slowdown where toolz.curry
made my script over 50x slower in some cases.
Initially, I had the following code:
return pipe(
content,
process_paragraph,
map(lambda content
: (content, str([size, line.strip(), path]))),
)
I was working with nearly 10 million records, and the process was extremely slow. After waiting for about 30 minutes, I had to stop the program.
After investigating, I found that modifying the code to the following version significantly improved performance:
return map(
lambda content: (content, str([size, line.strip(), path])),
process_paragraph(content),
)
At first, I thought the issue was with toolz.pipe
. I reviewed its implementation and noticed it’s just a for
loop. I then tried using the pipe
functions from the expression
and returns
libraries, but both were still slow.
Next, I suspected the map
function might be causing the slowdown. After further investigation, I discovered that the real issue was with toolz.curry
, which was responsible for the drastic performance drop.
Simplified Test Case
To isolate the issue, I simplified my original script to the bare minimum for testing. Below is the test code:
from builtins import map, filter
import toolz
import cytoolz
import funcy
import expression
import pymonad.tools
import pydash
import returns.curry
import time
def process_something(num, curried_fn, lambda_fn):
def process_iter(i):
a = [str(i)]
return list(curried_fn(lambda_fn)(a))
return list(map(process_iter, range(num)))
def test_curry(flag, curried_fn, lambda_fn):
t1 = time.perf_counter()
num = 100_000
rslt = process_something(num, curried_fn, lambda_fn)
t2 = time.perf_counter() - t1
print(f"{flag:<25} {t2:.5f}")
return [flag, t2, rslt]
fn = filter
fn = map
fn_str = fn.__name__
lambda_fn = lambda x: x
args_num = 2
f2 = lambda fn, a, b: fn(a, b)
curry_list = {
"toolz.curried": getattr(toolz.curried, fn_str),
"cytoolz.curried": getattr(cytoolz.curried, fn_str),
"toolz.curry": toolz.functoolz.curry(fn),
"cytoolz.curry": cytoolz.functoolz.curry(fn),
"lambda_curry": lambda x: lambda y: fn(x, y),
"funcy.curry.seqs": funcy.curry(getattr(funcy.seqs, fn_str)),
"funcy.curry": funcy.curry(fn),
"funcy.autocurry": funcy.autocurry(fn),
"funcy.autocurry.seqs": funcy.autocurry(getattr(funcy.seqs, fn_str)),
"expression.seq": getattr(expression.collections.seq, fn_str),
"expression.curry": expression.curry(args_num - 1)(fn),
"pymonad.tools.curry": pymonad.tools.curry(args_num)(fn),
"pydash.functions.curry": pydash.functions.curry(f2)(fn),
"returns.curry": returns.curry.curry(f2)(fn),
}
print(f"{fn_str:<25} time")
t = curry_list
rslt = []
for i in t:
r = test_curry(i, t[i], lambda_fn)
rslt.append(r[2])
rslt_set = set(str(i) for i in rslt)
assert len(rslt_set) == 1
Test Results
fn = map
fn_str = fn.__name__
lambda_fn = lambda x: x
map time
toolz.curried 11.18867
cytoolz.curried 9.66267
toolz.curry 11.77486
cytoolz.curry 9.98406
lambda_curry 0.15762
funcy.curry.seqs 0.51989
funcy.curry 0.16449
funcy.autocurry 1.64497
funcy.autocurry.seqs 0.95207
expression.seq 0.87715
expression.curry 0.26653
pymonad.tools.curry 0.33536
pydash.functions.curry 0.93974
returns.curry 3.67666
fn = filter
fn_str = fn.__name__
lambda_fn = lambda x: x is None
filter time
toolz.curried 7.93439
cytoolz.curried 7.21578
toolz.curry 7.50556
cytoolz.curry 6.93008
lambda_curry 0.20139
funcy.curry.seqs 0.22452
funcy.curry 0.10085
funcy.autocurry 0.82936
funcy.autocurry.seqs 0.49786
expression.seq 0.34209
expression.curry 0.15820
pymonad.tools.curry 0.20793
pydash.functions.curry 0.43155
returns.curry 2.11528
Additional Note
toolz.curried.map(fn, iter)
does not affect performance. The performance issue only occurs with toolz.curried.map(fn)(iter)
.
System Information
╰─ uname -a
Linux localhost 4.19.273-VK-X-g0ec5bda45854 #327 SMP PREEMPT Fri Jun 28 14:28:49 CST 2024 aarch64 aarch64 aarch64 GNU/Linux
╰─ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 20.04.6 LTS
Release: 20.04
Codename: focal
╰─ python -V
Python 3.12.4
╰─ pip show toolz cytoolz
Name: toolz Version: 0.12.1
Summary: List processing tools and functional utilities
Home-page: https://github.com/pytoolz/toolz/
Author: https://raw.github.com/pytoolz/toolz/master/AUTHORS.md
Author-email:
License: BSD
Location: /root/.pyenv/versions/3.12.4/envs/daily/lib/python3.12/site-packages
Requires:
Required-by: cytoolz
---
Name: cytoolz
Version: 0.12.3
Summary: Cython implementation of Toolz: High performance functional utilities
Home-page: https://github.com/pytoolz/cytoolz
Author: https://raw.github.com/pytoolz/cytoolz/master/AUTHORS.md
Author-email: [email protected]
License: BSD
Location: /root/.pyenv/versions/3.12.4/envs/daily/lib/python3.12/site-packages
Requires: toolz
Required-by: