Replies: 2 comments 6 replies
-
Hi @Arthfael , There are certainly parts that could be parallelized and lets put aside that I have doubts that it's worth creating many formatted worksheets (the row limit in OOXML is something like 1 mio and the column limit ~16 thousand that's a lot of cells per worksheet and I can tell you, I don't see anything if I'm simply buried in data. But that was not point of the discussion. Internally Given that one has enough memory, the construction of the worksheet xmls could be parallelized. What cannot be parallelized and takes a lot of memory is the construction of the wb <- wb_workbook()$add_worksheet()$add_data(x = ..., dims = ...) Afterwards you can get wb$worksheets[[1]]$sheet_data$cc Construct a few workbook objects, collect But you are definitely on your own in this endeavor as I have no further interest in this (writing this response already took 30 minutes, working on this ...). I would push the data you want into some database, maybe |
Beta Was this translation helpful? Give feedback.
-
Thank you, I do agree that this may not be ideal... but what can I say, there reports are expected of me, and an external database is currently not an option. I appreciate the 30 minutes ^^ I have played a bit with the idea and can provide, for any poor soul who would stumble on this in the future and wonder whether this can be made to work, the following code:
A first benchmark suggests some significant time gains writing the table into the workbook - ~4x faster, which is well short of the maximum expected improvement (I have 55 threads in my cluster, and memory isn't an issue) but to be expected because of the overheads, and still a nice boost. So... I think that this will prove very useful for me, thanks for the support. |
Beta Was this translation helpful? Give feedback.
-
I apologize in advance if this idea seems "out-there". I don't know at all how relevant this would be for other people or how feasible from the underlying code.
I have large amounts of proteomics data, which I am trying to write with formatting. Using single core, this is slow (> 1h). I am wondering whether there could not be ways for me to make it work using parallelization, considering that I have N > 50 vCPUs at my disposal.
The idea would be to:
The whole strategy hinges on two points:
Beta Was this translation helpful? Give feedback.
All reactions