Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming generation of a large PDFs #392

Open
itimofeev opened this issue Jan 26, 2024 · 6 comments
Open

Streaming generation of a large PDFs #392

itimofeev opened this issue Jan 26, 2024 · 6 comments
Assignees
Labels
hard to solve Not food for newcomers help wanted Extra attention is needed in analysis Analyzing if should be implemented new feature New feature or request v2 will be solved in v2

Comments

@itimofeev
Copy link

Is your feature request related to a problem? Please describe.
Hello,

I'm currently working at a company that provides trading services to clients. One of our essential requirements is to generate monthly reports containing all the deals for each client. In some cases, a single client can have as many as 100,000 deals within a month. The issue we are facing is that the Maroto library can only render the entire PDF with all the data at once. Consequently, our services consume a significant amount of memory as they need to load all the deals into memory, build the PDF in memory, and only then generate a compressed PDF, which consumes far less memory than all the temporary data structures combined.

Describe the solution you'd like
Upon reviewing the current code, I couldn't find support for streaming generation. Could you please advise if there are any options or methods to optimize memory consumption in this scenario?

Describe alternatives you've considered
One approach we have considered is rendering each page separately and then utilizing a third-party Golang library to merge these individual pages into a single PDF file. We would greatly appreciate any guidance or suggestions on how to address this memory consumption issue effectively. Thank you.

@johnfercher johnfercher added new feature New feature or request help wanted Extra attention is needed hard to solve Not food for newcomers in analysis Analyzing if should be implemented v2 will be solved in v2 labels Jan 26, 2024
@johnfercher
Copy link
Owner

johnfercher commented Jan 26, 2024

Hello @itimofeev, how are you? I have some questions to your issue.

  1. Are you using the v2?
  2. Maroto now has a feature to merge PDFs.
    • You could generate the files separated and merge them, but I see a problem with page number counting that we would have to deal with it.

With that said, I will see a way to improve this. @F-Amaral such a nice challenge here.

@johnfercher
Copy link
Owner

I'm thinking here. Maybe you could try the parallelism feature. Since with this, are created small PDFs an them they are merged. Since maroto now have a clear division between declaration phase and computing phase, I think that this can help you.

@itimofeev
Copy link
Author

Hello @johnfercher,

Thank you for your prompt response! We have recently upgraded and are now using Maroto v2. I appreciate your pointing out the PDF merge feature; we hadn’t noticed it before. We will look into the page numbering issue more closely as we explore this feature.

Regarding the parallelism feature, I must admit I'm not entirely clear on how to implement it effectively. It seems we might need to put in some additional effort to understand and utilize this feature properly.

I want to take a moment to express my gratitude for your work on Maroto. It’s an excellent library that demonstrates high coding standards, and it has been instrumental in our projects.

Also, if you're thinking about adding page streaming to Maroto, I'd love to be a part of that. I'm ready and willing to help out with the coding if you need it.

Thanks again for your dedication and support in maintaining Maroto.

Best regards,
Ilia

@johnfercher
Copy link
Owner

First of all, thank you :D

To use the parallel generation you should only define the WithWorkerPoolSize() in the builder. Is possible that it will use less memory, if not, you could try to generate different documents and merge them.

If you follow the path to generate different documents and merge them, please let me know. This may be an easy way to implement a less memory consumption algorithm. To achieve this we should only apply this part sequential instead of concurrently.

@lordofscripts
Copy link

When I read the Maroto docs I realized it wouldn't scale well memory-wise, especially for large data.
In my project I deal with a large JSON file. First I pre-parse it to create & resolve relationships, and in the 2nd pass I process it piecewise, generating it on the go.

@johnfercher johnfercher mentioned this issue May 17, 2024
11 tasks
@johnfercher
Copy link
Owner

We achieved a low memory mode which keeps the memory allocation lower, it keeps 13% less allocations and don´t increase overtime. However, we should focus more in this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hard to solve Not food for newcomers help wanted Extra attention is needed in analysis Analyzing if should be implemented new feature New feature or request v2 will be solved in v2
Projects
None yet
Development

No branches or pull requests

4 participants