Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

setnames() errors for NA 'new' when 'old' provided, silently ignored in full assignment (no 'old') #6236

Closed
MichaelChirico opened this issue Jul 10, 2024 · 7 comments · Fixed by #6273

Comments

@MichaelChirico
Copy link
Member

DT=data.table(a=1,b=2)
setnames(DT, 'b', NA_character_)
# Error: NA in 'new' at positions [1]
setnames(DT, c('c', NA_character_))
names(DT)
# [1] "c" "b"

Seems odd.

@tdhock
Copy link
Member

tdhock commented Jul 11, 2024

I'm not sure this is a bug in behavior, but definitely docs could be clarified. Here is my interpretation

# means to rename the 'b' column to missing, does not make sense so error
setnames(DT, 'b', NA_character_)
# Error: NA in 'new' at positions [1]

# means to set the first column name to 'c' and skip rename of second column name
setnames(DT, c('c', NA_character_))

@MichaelChirico
Copy link
Member Author

I think it's clearer in the simple, explicit case setnames(DT, 'b', NA_character_) that an error can be OK.

It's harder for me to see why we have different behavior based on missing(new), e.g.

DT = data.table(a=1)
setnames(DT, NA_character_)
setnames(DT, 'a', NA_character_)

I guess I'm struggling to see the use case where we'd come up with such a vector of column names with missings. Here's a workaround, but I wonder why we'd have new_with_missings in the first place instead of being able to start directly from 'indices to replace, replacements'.

setnames(DT, new_with_missings)
local({
  idx <- na.omit(new_with_missings)
  setnames(DT, -attr(idx, "na.action"), idx)
})

@tdhock
Copy link
Member

tdhock commented Jul 12, 2024

you are right it is weird, and un-documented, (but maybe used in the wild, I'm not sure) so I think we could go either way (continue supporting this or error)
Based on the "stable code base" principle https://github.com/Rdatatable/data.table/blob/master/GOVERNANCE.md#the-r-package I think we should prefer to continue supporting this rather than change to an error.
That being said we can try merging the error into master and if no revdeps are affected, that change would be fine with me.

@Nj221102
Copy link
Contributor

Nj221102 commented Jul 12, 2024

That being said we can try merging the error into master and if no revdeps are affected, that change would be fine with me.

Yeah we can try this as well, if any problems occur we can just revert it and i will file a PR with the documentation instead.

@MichaelChirico
Copy link
Member Author

I think we should prefer to continue supporting this rather than change to an error.

Definitely. But should we allow setnames(DT, old, new) with anyNA(new) meaning skip in this case as well?

@tdhock
Copy link
Member

tdhock commented Jul 13, 2024

should we allow setnames(DT, old, new) with anyNA(new) meaning skip

You are right, for consistency, you could make an argument for skip in the old and new case too. This is tricky.

My suggestion for a way forward in #6240 was updating the docs. Adding one sentence "Missing values in 'new' mean to not rename that column, and are only allowed when 'old' is not provided." seems to fix the issue for me. Maybe somebody else can read the docs and see if that fixes this? I feel there is some relationship to the skip_absent arg?

skip_absent: Skip items in ‘old’ that are missing (i.e. absent) in
          `names(x)`. Default ‘FALSE’ halts with error if any are
          missing.

BTW the name of the issue I think could be clarified--- for me the difference is not partial vs full, it is rename by name (old+new) vs by order/index.
When old+new provided, we do rename by name (old names to new), and error for NA.
When setnames(DT,new_names) or setnames(DT,new=new_names) case, we rename by position/index (all names become new_names except NA entries).
For me a full assignment is when all names are supposed to change. So this is an example of a full assignment which actually does error (NA not silently ignored)

> setnames(DT,c("b","a"),c("y",NA_character_))
Error: NA in 'new' at positions [2]

@MichaelChirico MichaelChirico changed the title setnames() errors for missing 'new' in partial assignment, silently ignored in full assignment setnames() errors for NA 'new' when 'old' provided, silently ignored in full assignment (no 'old') Jul 13, 2024
@MichaelChirico
Copy link
Member Author

I feel there is some relationship to the skip_absent arg?

Good callout. But I don't think that's the case (maybe it should though?):

DT=data.table(a=1,b=2)
setnames(DT,c(NA, 'c')) # a->a, b->c
setnames(DT, c('a', 'c'), c('d', NA))
# Error in stopf("NA in 'new' at positions %s", brackify(which(is.na(new)))) : 
#   NA in 'new' at positions [2]
setnames(DT, c(NA, 'c'), c('d', 'e'))
# Error in stopf("NA (or out of bounds) in 'old' at positions %s", brackify(which(is.na(old)))) : 
#   NA (or out of bounds) in 'old' at positions [1]

setnames(DT, c('a', 'c'), c('d', NA), skip_absent=TRUE)
# Error in stopf("NA in 'new' at positions %s", brackify(which(is.na(new)))) : 
#   NA in 'new' at positions [2]
setnames(DT, c(NA, 'c'), c('d', 'e'), skip_absent=TRUE)
# Error in stopf("NA (or out of bounds) in 'old' at positions %s", brackify(which(is.na(old)))) : 
#   NA (or out of bounds) in 'old' at positions [1]

The case can be made for the last case behavior should be a->a, c->e -- that's not at odds with the documentation:

Skip items in old that are missing (i.e. absent) in names(x).

Adding one sentence "Missing values in 'new' mean to not rename that column, and are only allowed when 'old' is not provided." seems to fix the issue for me.

I agree that's a good fix. We can revisit if there's a user request to support a different behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants