Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release query result after materialization & transformation #1027

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

toppyy
Copy link
Contributor

@toppyy toppyy commented Jan 25, 2025

Releases the query result after it is no longer needed to reduce memory footprint. See tidyverse/duckplyr#434

TODO: Needs tests that show decreased memory footprint (how to?) & validate logic

@krlmlr
Copy link
Collaborator

krlmlr commented Feb 12, 2025

Thanks. This is certainly better than the status quo.

This now has conflicts. We also want to coordinate with another planned change -- removal of the allow_materialization argument, and replacing it with n_cells == 0 . Only in this class, not in the API. Could be one PR, or one after another.

I wonder if column counting and object lifetime could be made local to the AltrepRelationWrapper class so that the vectors only notify that a column now has been computed, and the bookkeeping takes part in that class only.

@toppyy
Copy link
Contributor Author

toppyy commented Feb 15, 2025

Thanks! Resolved the conflicts and moved the column counting and related logic under AltrepRelationWrapper (good idea).

Copy link
Collaborator

@krlmlr krlmlr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Whenever a free function uses a pointer or reference (like rownames_wrapper->) in almost every statement, it's an indication that this should better be a member function. Do you want to tackle this refactoring too? The essence of this PR is the res.reset(); that could be added in a separate small PR, to separate refactorings from features.

@@ -404,6 +433,8 @@ size_t DoubleToSize(double d) {
auto relation_wrapper = make_shared_ptr<AltrepRelationWrapper>(rel, allow_materialization, DoubleToSize(n_rows),
DoubleToSize(n_cells));

relation_wrapper->ncols = drel->Columns().size();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to update twice (here and below)?

@@ -306,16 +324,27 @@ const void *RelToAltrep::RownamesDataptrOrNull(SEXP x) {

void *RelToAltrep::DoRownamesDataptrGet(SEXP x) {
auto rownames_wrapper = AltrepRownamesWrapper::Get(x);

// the query has been materialized, return the rowcount
// (and void recomputing the query if it's been reset)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// (and void recomputing the query if it's been reset)
// (and avoid recomputing the query if it's been reset)

Comment on lines +30 to +31
R_xlen_t rowcount;
bool rowcount_retrieved;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth to initialize with -1 to avoid rowcount_retrieved ? I honestly don't know.

Comment on lines +33 to +34
size_t ncols;
size_t cols_transformed;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the same token, use a single n_cols_to_retrieve that counts down to zero?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants