Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance enhancements #22

Open
anselor opened this issue Aug 2, 2018 · 1 comment
Open

Performance enhancements #22

anselor opened this issue Aug 2, 2018 · 1 comment
Labels
enhancement New feature or request question Further information is requested
Milestone

Comments

@anselor
Copy link
Contributor

anselor commented Aug 2, 2018

When generating large tables the performance drops significantly.
A large part of this is likely because tableformatter will seek through all of the fields of all of the rows multiple times to generate the display text from the data, measure the text, and then format/wrap the text to fit the column width/alignment.

Some of the following could be used to improve performance:

  • Limit the number of rows analyzed to determine column widths (configurable limit)
  • Once all columns have reached the maximum allowable width, stop measure rows
  • If all columns have pre-defined fixed widths, skip the measuring step

Also, tableformatter currently generates a full in-memory model of the entire table before rendering to a string. This can use a lot of memory. We can reduce the memory usage by only building the in-memory model up until the maximum analysis depth and then process/render the remaining rows on-demand.

@tleonhardt
Copy link
Member

Regardless of which approach we take, we should probably add a section at the bottom of the README called something like "Performance considerations" where we explain what types of things are likely to be slow and why.

You say we are seeking through all of the fields and all of the rows multiple times. Is that more than twice? I can see why we may need to parse through all of it twice, but I can't see why it would ever need to be more than that.

Perhaps we could add an additional API function which allows the user to pass in the data as well as an integer specifying the maximum number of lines of text and/or rows of data to return at a time and then a generator which will return the next N rows?

Alternatively, I like the concept of limiting the number of rows analyzed for performance reasons.

@tleonhardt tleonhardt added enhancement New feature or request question Further information is requested labels Aug 10, 2018
@anselor anselor added this to the 0.2.0 milestone Aug 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants