-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Release query result after materialization & transformation #1027
base: main
Are you sure you want to change the base?
Conversation
Thanks. This is certainly better than the status quo. This now has conflicts. We also want to coordinate with another planned change -- removal of the I wonder if column counting and object lifetime could be made local to the |
Thanks! Resolved the conflicts and moved the column counting and related logic under |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Whenever a free function uses a pointer or reference (like rownames_wrapper->
) in almost every statement, it's an indication that this should better be a member function. Do you want to tackle this refactoring too? The essence of this PR is the res.reset();
that could be added in a separate small PR, to separate refactorings from features.
@@ -404,6 +433,8 @@ size_t DoubleToSize(double d) { | |||
auto relation_wrapper = make_shared_ptr<AltrepRelationWrapper>(rel, allow_materialization, DoubleToSize(n_rows), | |||
DoubleToSize(n_cells)); | |||
|
|||
relation_wrapper->ncols = drel->Columns().size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need to update twice (here and below)?
@@ -306,16 +324,27 @@ const void *RelToAltrep::RownamesDataptrOrNull(SEXP x) { | |||
|
|||
void *RelToAltrep::DoRownamesDataptrGet(SEXP x) { | |||
auto rownames_wrapper = AltrepRownamesWrapper::Get(x); | |||
|
|||
// the query has been materialized, return the rowcount | |||
// (and void recomputing the query if it's been reset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// (and void recomputing the query if it's been reset) | |
// (and avoid recomputing the query if it's been reset) |
R_xlen_t rowcount; | ||
bool rowcount_retrieved; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth to initialize with -1
to avoid rowcount_retrieved
? I honestly don't know.
size_t ncols; | ||
size_t cols_transformed; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the same token, use a single n_cols_to_retrieve
that counts down to zero?
Releases the query result after it is no longer needed to reduce memory footprint. See tidyverse/duckplyr#434
TODO: Needs tests that show decreased memory footprint (how to?) & validate logic