Streaming generation of a large PDFs #392

itimofeev · 2024-01-26T08:12:24Z

Is your feature request related to a problem? Please describe.
Hello,

I'm currently working at a company that provides trading services to clients. One of our essential requirements is to generate monthly reports containing all the deals for each client. In some cases, a single client can have as many as 100,000 deals within a month. The issue we are facing is that the Maroto library can only render the entire PDF with all the data at once. Consequently, our services consume a significant amount of memory as they need to load all the deals into memory, build the PDF in memory, and only then generate a compressed PDF, which consumes far less memory than all the temporary data structures combined.

Describe the solution you'd like
Upon reviewing the current code, I couldn't find support for streaming generation. Could you please advise if there are any options or methods to optimize memory consumption in this scenario?

Describe alternatives you've considered
One approach we have considered is rendering each page separately and then utilizing a third-party Golang library to merge these individual pages into a single PDF file. We would greatly appreciate any guidance or suggestions on how to address this memory consumption issue effectively. Thank you.

johnfercher · 2024-01-26T18:47:32Z

Hello @itimofeev, how are you? I have some questions to your issue.

Are you using the v2?
Maroto now has a feature to merge PDFs.
- You could generate the files separated and merge them, but I see a problem with page number counting that we would have to deal with it.

With that said, I will see a way to improve this. @F-Amaral such a nice challenge here.

johnfercher · 2024-01-26T18:52:16Z

I'm thinking here. Maybe you could try the parallelism feature. Since with this, are created small PDFs an them they are merged. Since maroto now have a clear division between declaration phase and computing phase, I think that this can help you.

itimofeev · 2024-01-29T06:14:14Z

Hello @johnfercher,

Thank you for your prompt response! We have recently upgraded and are now using Maroto v2. I appreciate your pointing out the PDF merge feature; we hadn’t noticed it before. We will look into the page numbering issue more closely as we explore this feature.

Regarding the parallelism feature, I must admit I'm not entirely clear on how to implement it effectively. It seems we might need to put in some additional effort to understand and utilize this feature properly.

I want to take a moment to express my gratitude for your work on Maroto. It’s an excellent library that demonstrates high coding standards, and it has been instrumental in our projects.

Also, if you're thinking about adding page streaming to Maroto, I'd love to be a part of that. I'm ready and willing to help out with the coding if you need it.

Thanks again for your dedication and support in maintaining Maroto.

Best regards,
Ilia

johnfercher · 2024-01-29T12:44:15Z

First of all, thank you :D

To use the parallel generation you should only define the WithWorkerPoolSize() in the builder. Is possible that it will use less memory, if not, you could try to generate different documents and merge them.

If you follow the path to generate different documents and merge them, please let me know. This may be an easy way to implement a less memory consumption algorithm. To achieve this we should only apply this part sequential instead of concurrently.

lordofscripts · 2024-05-11T20:06:48Z

When I read the Maroto docs I realized it wouldn't scale well memory-wise, especially for large data.
In my project I deal with a large JSON file. First I pre-parse it to create & resolve relationships, and in the 2nd pass I process it piecewise, generating it on the go.

johnfercher · 2024-05-18T13:50:29Z

We achieved a low memory mode which keeps the memory allocation lower, it keeps 13% less allocations and don´t increase overtime. However, we should focus more in this.

johnfercher assigned johnfercher and F-Amaral Jan 26, 2024

johnfercher added new feature New feature or request help wanted Extra attention is needed hard to solve Not food for newcomers in analysis Analyzing if should be implemented v2 will be solved in v2 labels Jan 26, 2024

johnfercher mentioned this issue May 17, 2024

low memory mode #427

Merged

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streaming generation of a large PDFs #392

Streaming generation of a large PDFs #392

itimofeev commented Jan 26, 2024

johnfercher commented Jan 26, 2024 •

edited

Loading

johnfercher commented Jan 26, 2024

itimofeev commented Jan 29, 2024

johnfercher commented Jan 29, 2024

lordofscripts commented May 11, 2024

johnfercher commented May 18, 2024

Streaming generation of a large PDFs #392

Streaming generation of a large PDFs #392

Comments

itimofeev commented Jan 26, 2024

johnfercher commented Jan 26, 2024 • edited Loading

johnfercher commented Jan 26, 2024

itimofeev commented Jan 29, 2024

johnfercher commented Jan 29, 2024

lordofscripts commented May 11, 2024

johnfercher commented May 18, 2024

johnfercher commented Jan 26, 2024 •

edited

Loading