Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify Usage of .BY in Special Symbols Documentation #6193

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

Conversation

Nj221102
Copy link
Contributor

closes #5389

Description:

This PR updates the special-symbols documentation to make it clear how to use .BY correctly in data.table. The main goal is to help users understand how to avoid issues when using .BY outside of its immediate context, as discussed in GitHub Issue #5389.

Changes:

  • Added a Note in the details section to include advice on using copy(.BY) when extracting .BY outside its immediate context.
  • Added an example to show the correct usage of copy(.BY).

@@ -66,5 +68,25 @@ DT[{cat(sprintf('in i, .N is \%d\n', .N)); a < .N/2},
# .I can be different in j and by, enabling rowwise operations in by
DT[, .(.I, min(.SD[,-1]))]
DT[, .(min(.SD[,-1])), by=.I]

#These examples illustrate the importance of using copy(.BY) when capturing .BY values in a list outside the data.table context to avoid unexpected behavior.
dt <- data.table(l = sample(letters[1:5], 100, replace = TRUE), b = rnorm(100))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi, this is a good start, but how about a simpler example, with fewer than 100 rows?
how about 4 rows, data.table(letter=c('a','a','b','b'),number=1:4)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done :)

man/special-symbols.Rd Outdated Show resolved Hide resolved
@MichaelChirico
Copy link
Member

Hi @Nj221102, FYI, there's no need to do the "Update" step for each individual PR every time master updates, it creates a lot of e-mail churn. If there are merge conflicts to resolve, go ahead and do so, otherwise your reviewer can do the update step easily when they review.

@Nj221102
Copy link
Contributor Author

Hi @Nj221102, FYI, there's no need to do the "Update" step for each individual PR every time master updates, it creates a lot of e-mail churn. If there are merge conflicts to resolve, go ahead and do so, otherwise your reviewer can do the update step easily when they review.

Sure will keep that in mind from now on , sorry for the inconvenience caused

@Nj221102
Copy link
Contributor Author

Nj221102 commented Jul 2, 2024

@tdhock plz check this out as well.

@@ -66,5 +68,25 @@ DT[{cat(sprintf('in i, .N is \%d\n', .N)); a < .N/2},
# .I can be different in j and by, enabling rowwise operations in by
DT[, .(.I, min(.SD[,-1]))]
DT[, .(min(.SD[,-1])), by=.I]

# These examples illustrate the importance of using copy(.BY) when capturing .BY values in a list outside the data.table context to avoid unexpected behavior.
dt <- data.table(l = letters[1:5], b = rnorm(5))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again please use simpler data (1:4 instead of rnorm) as I mentioned in my previous comment

Comment on lines +79 to +86
byg <<- append(byg, .BY) # Incorrect: Missing copy(.BY)
mean(b)
}, by = l]
str(byg) # All elements of 'byg' will contain the .BY value of the last group

## Correct usage
byg <- list()
dt[, m := {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these examples are perhaps too long.
If I understand correctly, the important different is .BY versus copy(.BY) right?
So can you please delete as much other code (:= append, mean) as possible to make the example simpler? ideally one line instead of 6 for both incorrect/correct usage

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unexpected behavior in .BY
4 participants