Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistent Error Handling for setnames() with NA_character_ #6240

Closed
wants to merge 4 commits into from

Conversation

Nj221102
Copy link
Contributor

closes #6236

Summary:
This pull request addresses the inconsistency in the setnames() function when handling NA_character_ in column renaming. Previously, setnames() would error on partial assignments with NA_character_ but silently ignore the same in full assignments. This update ensures consistent error handling for both cases, providing a clear error message when NA_character_ is used as a new column name.

Changes Made:

  1. Modified setnames() to error if NA_character_ is provided as a name in both partial and full assignments.
  2. Added tests to ensure consistent behavior and proper error handling.

Copy link

github-actions bot commented Jul 11, 2024

Comparison Plot

Generated via commit 1968855

Download link for the artifact containing the test results: ↓ atime-results.zip

Time taken to finish the standard R installation steps: 12 minutes and 13 seconds

Time taken to run atime::atime_pkg on the tests: 3 minutes and 43 seconds

@tdhock
Copy link
Member

tdhock commented Jul 11, 2024

I am not sure about adding another error here, since actually setnames(DT, vector_with_NA) can be interpreted as a feature: rename only the columns which are not NA.

We should also add documentation for how missing values are treated in setattr.Rd
Current documentation of old is

     old: When ‘new’ is provided, character names or numeric positions
          of column names to change. When ‘new’ is not provided, a
          function or the new column names (i.e., it's implicitly
          treated as ‘new’; excluding ‘old’ and explicitly naming ‘new’
          is equivalent). If a function, it will be called with the
          current column names and is supposed to return the new column
          names. The new column names must be the same length as the
          number of columns. See examples.

my reading of the above is that setnames(DT, my_names) is the same as setnames(DT, new=my_names) and that my_names must be the same length as the number of columns.

Current doc for new arg is

     new: Optional. It can be a function or the new column names. If a
          function, it will be called with ‘old’ and expected to return
          the new column names. The new column names must be the same
          length as columns provided to ‘old’ argument.

I would keep the current behavior for back-compatibility and change the doc for new arg to below
The new column names must be the same length as columns provided to ‘old’ argument. Missing values in 'new' mean to not rename that column, and are only allowed when 'old' is not provided.

@@ -2619,6 +2619,7 @@ setnames = function(x,old,new,skip_absent=FALSE) {
} else {
w = which(names(x) != old | (Encoding(names(x)) != Encoding(old)))
}
if (anyNA(old)) stopf("NA in 'new' at positions %s", brackify(which(is.na(old))))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code has variabled named old but the error says 'new' which is confusing, please clarify

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this code also add a new error when both old and new are specified?

Copy link
Contributor Author

@Nj221102 Nj221102 Jul 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this code also add a new error when both old and new are specified?

The thing is that this condition can only be reached when either old or new is missing, so it doesn't make changes to the condition when both old and new are specified.

This code has a variable named old, but the error message refers to new, which is confusing. Please clarify

When we do something like this:

setnames(DT, c('c', NA_character_))

we are inputting the new names in the old argument hence old variable carries the new names .

And when we do:

setnames(DT, new = c('c', NA_character_))

there is a condition that checks if old is missing. If old is missing, that means we are inputting the new names through the new argument. In this condition, the value of new is assigned to old, and new is set to NULL, so new names end up in the old variable.

hence, to check for NAs in new names, we have to check the old variable. This causes the error and variable mismatching, but since we are actually finding NA's in new names i think the error is correct but we can do something about the variable though...

We can fix this by modifying the code a bit, but that might cause some complications in some other places.

@tdhock
Copy link
Member

tdhock commented Jul 15, 2024

based on our discussion in #6236 (comment) I think this Pr should be closed and we should instead just add that sentence to the docs, is that OK?

@Nj221102
Copy link
Contributor Author

based on our discussion in #6236 (comment) I think this Pr should be closed and we should instead just add that sentence to the docs, is that OK?

sure

@Nj221102 Nj221102 closed this Jul 15, 2024
@Nj221102 Nj221102 deleted the setnames branch July 15, 2024 13:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

setnames() errors for NA 'new' when 'old' provided, silently ignored in full assignment (no 'old')
2 participants