Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: Implement tbl_subassign_col() in C #1368

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

krlmlr
Copy link
Member

@krlmlr krlmlr commented Sep 3, 2022

For #1353.

@krlmlr krlmlr requested a review from lionel- September 3, 2022 13:51
Copy link
Member

@lionel- lionel- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR is missing benchmarks to describe what is gained from moving all this code to C.

What would be needed from vctrs to be able to delegate subsetting to it?

// x_names: names of x
// n_old: number of elements with some property
// j_max: maximum value of j
// j_/j: SEXP and native version of bare vector
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use j_ffi vs j? This way it's immediately clear which does what.


// Compute j from j_, converting 1-based j_ to zero-based j
R_xlen_t n_j = Rf_xlength(j_);
R_xlen_t* j = (R_xlen_t*)R_alloc(n_j + 1, sizeof(R_xlen_t));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason not to use a raw vector here? Just asking because we usually renounce to manage memory only in very specific cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same remark for allocs below. Another reason to manage memory manually is that it makes it clearer when things are not needed anymore, so this has a self-documenting value.

}

// Allocate output vector and output names with final size
SEXP xo = Rf_allocVector(VECSXP, n_xo);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PROTECT

SEXP xo = Rf_allocVector(VECSXP, n_xo);
Rf_copyMostAttrib(x, xo);

SEXP xo_names = Rf_allocVector(STRSXP, n_xo);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PROTECT


// Compute j from j_, converting 1-based j_ to zero-based j
R_xlen_t n_j = Rf_xlength(j_);
R_xlen_t* j = (R_xlen_t*)R_alloc(n_j + 1, sizeof(R_xlen_t));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you have used vec_as_location() upstream to avoid having to deal with double vectors? I imagine tibble doesn't need support for long vectors in the column dimension.

// Add sentinel value
j[n_j] = -1;

// Compute j_max and n_old
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is n_old?


// Set output row names
// FIXME: Reuse original vector?
SEXP new_row_names_ = Rf_allocVector(INTSXP, 2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PROTECT

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants