-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can we use sym() with a vector of strings? #321
Comments
just as an idea, not sure if you have tried this - but you could maybe look into
|
Thanks David. I did consider this - the problem is that we aren't applying the same function to the same column, so I think you'd end up needing to call summarise_at for each of the column pairs. My aim was to try to get the summary done with the fewest dplyr calls as possible (to avoid copying the tibble), whilst maintaining a way to easily unit test the functions in smaller, reproducible chunks (hence the abstraction of the summary functions from the summarising action). I did realise the somewhat obvious after posting that I can of course make a list of strings and refer to them with d1ki_aggr_value_names[["kibrainimagingwithin12hrs"]]. It does however seem syntactically clumsy to have a list of strings rather than a vector of strings. |
I think this question would be better posted on stackoverflow or community.rstudio.com. Also I would suggest presenting the problem as succintly as possible. You can use
There is no overhead on tibble copying. Columns will only be copied individually when at least one element changes. |
Thanks lionel-. Sorry - my query was more about the action and return values of sym rather than the problem itself (which was described just to demonstrate the use case and to make sure it wasn't due to an error in the way I was using sym first). Apologies for not compressing the example down more. If you play around with a function like paste0 which behaves in the way I would have expected, you get the following actions: test <- c('foo', 'bar')
paste0('Hello ',test)
#> (returns character vector of 'Hello foo', 'Hello bar'.
sym(test)
#> Error: Only strings can be converted to symbols (ie. sym singular can't be used on vector). You can use syms(test). But then you'll get a list, not a vector: [[1]]
Foo
[[2]]
Bar Edit: The above was with rlang 0.1.4 just in case the very latest work not on CRAN differs. |
I am not sure what you expect |
Shouldn't it be able to return a vector of syms rather than a list? |
There is no atomic vectors of symbols in R ;) |
Aha! OK - that explains it! Thanks very much! |
You can still use |
Yeah I've actually done that in the code last night (after posting) and that does work. It just appeared odd in the syntax as I was mentally expecting to still be using a vector and suddenly had a list. |
First time I use |
I get an error related to comments here. I have no problem if I use the instruction
But, if I try to avoid mentioning the data frame
I understand that Why this happens? Could you help me to avoid this issue? Thank you very much! |
We've been contemplating not unquoting beyond calls to |
To avoid this, just create the function outside of |
oops, now that I read your code again, I see this won't work. Something along the lines of: anscombe %>%
mutate(sum_x = rowSums(map_dfc(1:4, ~ anscombe[[paste0("x", .)]] > 10), na.rm = T)) Note that I refer to In general, I'd suggest reviewing your approach because this map_dfc + rowsums + mutate pattern is quite complicated to follow and get right. |
Thank you, Lionel. When I wrote
In fact, I was looking for an (enlightening) alternative to reshape the data frame to long to apply the condition of interest and return to the original data frame, and I was trying to shorten it as much as possible, and this approach was the best working that I found. Thank you, anyway, and I'll follow closely the issue #845 . |
You could use |
Hi,
I've been trying to solve a problem using tidyeval and encountered that sym can be used with either a string, or a list of strings, but weirdly not a vector of strings (or more precisely a named vector of strings). I think this would be a useful addition just using the same syntax of sym().
It may be helpful to describe the problem I've been trying to solve in order to understand the use of the vectors os strings at all, and it would be helpful just in case there is a much better way of solving my problem and the need for sym with a vector is invalid.
I'm working on a tibble of approximately 300,000 rows and 300 columns looking at stroke care (each row is a stroke patient admission). We have an initially large dataset that we then produce a series of calculated fields for (ie. summations or other calculations on the raw data). We then undertake a load of aggregated summary calculations to produce Key Indicators which are used in reports, with the data grouped by a team and a time period (eg. quarterly or monthly dependent on the report. Each Key Indicator is measured in two ways, one 'Patient Centred - PC', and one 'Team Centred - TC' result (a patient's stroke care can be delivered by multiple different teams so the team measures only the care delivered by that organisation whereas the patient measures record care delivered to any patient who comes in contact with a team, regardless of whether the team delivered it or not). So, PC and TC fields are paired - we usually either want a summary of just the TC fields, or of both TC and PC fields for other reports.
Some of the calculations are complex and need to be right, so I've structured the code to make unit testing easier into sets of indicators are grouped into domains. I've then used the quos command to construct a list of functions for each domain, which we then join together to carry out one giant mutate (for calculated fields) or summarise (for the aggregated values) in one go.
So for one of the domains, here is the function definitions, and the variable names set as strings (so bear in mind there will be another 8 or 9 files with further function definitions - many more complex than these):
d1ki_aggr_value_names <- c(
kiclockstarttobrainimagingmins = "KIClockStartToBrainImagingMins",
kibrainimagingwithin1hr = "KIBrainImagingWithin1hr",
kibrainimagingwithin12hrs = "KIBrainImagingWithin12hrs")
d1ki_aggr_value_functions <- rlang::quos(
KIMedianBrainImagingTime = median(
!!d1ki_aggr_value_names["kiclockstarttobrainimagingmins"],
na.rm = TRUE),
KIPCScannedIn1Hr = nonNApercentage(
!!d1ki_aggr_value_names["kibrainimagingwithin1hr"]),
KIPCScannedIn12Hrs = nonNApercentage(
!!d1ki_aggr_value_names["kibrainimagingwithin12hrs"]))
We join d1ki_aggr_value_functions to similar lists of functions, and perform a dplyr::summarise operation on the source data. For the unit tests, you create a minimal test set and perform just the one set of functions on the test set. So far so good.
Obviously names is starting life as a vector of named strings here. Each of the strings starting with 'KI' is actually the suffix of a pair of column names: TCKIClockStartToBrainImagingMins and PCKIClockStartToBrainImagingMins for example. My cunning plan was that we could append the prefix to the strings using paste0, then turn all the strings into symbols, merge the functions quos lists together for all the domains, and execute the summary for one or both of the pair of results. The purpose of having the vector of strings is hopefully therefore obvious from the example.
Of course then sym doesn't work on vectors so we'd have to loop through each one individually.
Firstly, is there a reason sym doesn't work on vectors of strings (or could it be something which could be implemented)? Secondly - are there any better suggestions for the implementation above?
The text was updated successfully, but these errors were encountered: