Memory and parallelism tuning #230

jamessmith123456 · 2023-04-14T08:56:20Z

（1）It seems that memory issues cannot be solved when there is a large amount of data.
（2）If the parallelism is 20, the original data will be copied in 20 copies？
（3）How can I solve the coordination relationship between memory and CPU to set the optimal parameters，please?

nalepae · 2023-05-10T21:01:06Z

(1): Pandarallel basically doubles the amount of needed memory, as stated in the documentation:

pandarallel gets around this limitation by using all cores of your computer. But, in return, pandarallel need twice the memory that standard pandas operation would normally use.

(2): No, the original data will be copied only once, whatever the parallelism.

(3): There is no coordination relationship between CPU and memory (cf (2))

SysuJayce · 2023-06-12T11:43:38Z

(1): Pandarallel basically doubles the amount of needed memory, as stated in the documentation:

pandarallel gets around this limitation by using all cores of your computer. But, in return, pandarallel need twice the memory that standard pandas operation would normally use.

(2): No, the original data will be copied only once, whatever the parallelism.

(3): There is no coordination relationship between CPU and memory (cf (2))

hi @nalepae , if the amount of data is quite large, how can we boost the preparation before apply()?

If I have 100GB data read in memory, I have to wait a long time before the apply start

nalepae · 2024-01-23T09:50:04Z

Pandaral·lel is looking for a maintainer!
If you are interested, please open an GitHub issue.

shermansiu · 2024-04-27T10:21:35Z

@SysuJayce, what do you mean by "boosting the preparation"?

If you are memory-bound, I would suggest breaking up your dataframe into smaller shards and applying your function to each shard.

Do you have any other problems? If not, I would like to close this issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory and parallelism tuning #230

Memory and parallelism tuning #230

jamessmith123456 commented Apr 14, 2023

nalepae commented May 10, 2023

SysuJayce commented Jun 12, 2023

nalepae commented Jan 23, 2024

shermansiu commented Apr 27, 2024

Memory and parallelism tuning #230

Memory and parallelism tuning #230

Comments

jamessmith123456 commented Apr 14, 2023

nalepae commented May 10, 2023

SysuJayce commented Jun 12, 2023

nalepae commented Jan 23, 2024

shermansiu commented Apr 27, 2024