Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Function to Mutate a subset of dataset, and reattach it to the dataset #27

Closed
oobianom opened this issue Jun 13, 2024 · 8 comments
Closed

Comments

@oobianom
Copy link
Owner

Hi Brice, does such a function already exist?

Basically, with dplyr, I can filter and then do all downstream processes like group_by mutate and so on. But there I need that filtered portion to remain in the entire dataset after the manipulation of that subset.

Let me know if you understand. Else, I can rephrase.

@brichard1638
Copy link

I believe I understand what you are asking but if you could rephrase your inquiry with a sample dataset, it would be more helpful.

@oobianom
Copy link
Owner Author

oobianom commented Jun 19, 2024

To make it easier for you to assess, i put together a rough draft of the function and updated this repository.

Here is an example,
dt = mtcars

I want to subset to "mpg == 21.0 & cyl == 6", then mutate various columns within that subset while leaving the others intact

with base R, this is how I would approach it

dt[dt$mpg == 21.0 & dt$cyl == 6,]$cyl = 1000
dt[dt$mpg == 21.0 & dt$cyl == 6,]$hp = 2000
dt[dt$mpg == 21.0 & dt$cyl == 6,]$vs = dt[dt$mpg == 21.0 & dt$cyl == 6,]$hp*2

with the new function, this how I will do it

mutate_filter(dt,mpg == 21.0 & cyl == 6, cyl=1000,hp=2000,vs=hp*2)

@brichard1638
Copy link

brichard1638 commented Jun 21, 2024

The proposed function you have described does not exist, at least in the way you have described it, in R. Given the additional information you have provided, I have crafted what I believe is a function that meets the requirements you have laid out.

FUNCTION NAME:
mutate_filter

TOTAL NUMBER OF FUNCTION ARGUMENTS:
6

ARGUMENT NAMES:

  • data
  • f_arg1
  • f_arg2
  • mutcolx
  • mutcoly
  • expr

ARGUMENT SUMMARY DESCRIPTION:

  • data: The data frame object to be passed for mutation processing
  • f_arg1: The first of two arguments used to filter the data frame passed as data
  • f_arg2: A secondary filtering argument
  • mutcolx: A target variable in the data frame against which to mutate
  • mutcoly: A secondary variable in the data frame against which to mutate
  • expr: A mathematical expression extended among two or more variables that produce a new calculated variable called 'calc_fld'

OPTIONALITY:
Only two arguments are required to execute the function. These arguments are data and f_arg1.

FUNCTION STRUCTURE:
mutate_filter <- function(data, f_arg1, f_arg2, mutcolx, mutcoly, expr) {
if (missing(f_arg2)) {
d1 <- dplyr::filter(data, eval(parse(text = f_arg1)))
} else {
d1 <- dplyr::filter(data, eval(parse(text = f_arg1)), eval(parse(text = f_arg2)))
}

if (missing(mutcolx)) {
quote(expr = )
} else {
eval(parse(text = paste0("d1$", mutcolx)))
}

if (missing(mutcoly)) {
quote(expr = )
} else {
eval(parse(text = paste0("d1$", mutcoly)))
}

if (missing(expr)) {
quote(expr = )
} else {
# Evaluate the expression within the data frame context
calc_fld <- eval(parse(text = expr), envir = d1)
# Add the new field to the data frame
d1$calc_fld <- calc_fld
}
return(d1)
}

FUNCTION TESTING STATUS:
The function has been tested but not extensively. If the function meets the expectations provided by the previous explanation and requirements as outlined in this issue, additional testing should be conducted.

If the function does not work as presented, especially consistent with the examples provided, please reach out and I will send the function syntax again. It is possible that the conversion from the R application to this medium did not capture the code syntax correctly.

FUNCTIONAL UTILITY:
It is not understood what the value proposition is for the arguments in the function called mutcolx and mutcoly. Consistent with the requirements provided, they were included. However, mutating an entire data field with a single value does not seem to be useful or provide a high level of utility. Adding a second argument that replicates this functionality is also questionable. Unless a compelling reason exists for the inclusion of these arguments, it is strongly recommended that they be removed from the function. The function would then contain a total of (4) arguments, collectively providing what is believed to be an extraordinary value proposition.

One way to improve the utility of the mutate_filter function would be to replace one of the mutcol arguments with an argument that can control the removal of contiguous or non-contiguous variables from the data frame object in the mutated output.

CODE EXAMPLES:
library(DescTools)
data("d.pizza")

data("mtcars")
data("quakes")

mutate_filter(mtcars, f_arg1 = "mpg == 21.0", f_arg2 = "cyl == 6", mutcolx = "cyl = 1000", mutcoly = "hp = 2000", expr = "hp*2")
mutate_filter(d.pizza[,1:10], f_arg1 = "driver == 'Taylor'", f_arg2 = "area == 'Camden'", expr = "count*price")
mutate_filter(mtcars, f_arg1 = "cyl == 8", expr = "vs+am+gear+carb")
mutate_filter(quakes, f_arg1 = "stations == 10", expr = "round(mag/depth,3)")

@oobianom
Copy link
Owner Author

Thanks Brice. Actually, I don't think we need the secondary arguments since one can easily combine such as "mpg == 21 & cyc == <3"

@brichard1638
Copy link

brichard1638 commented Jun 21, 2024 via email

@brichard1638
Copy link

brichard1638 commented Jun 22, 2024

Based on your latest feedback, I've re-constructed the mutate_filter function in the following ways:

  • Removed both the mutcolx and mutcolx arguments
  • Replaced the mutcol arguments with rem and srtfld
  • rem removes the variables from the original passed dataset so that they are not included in the output
  • srtfld provides an ascending sort on one or more variables contained in the output

NEW FUNCTION STRUCTURE:
mutate_filter <- function(data, f_arg1, f_arg2, rem = NULL, srtfld = NULL, expr) {
if (missing(f_arg2)) {
d1 <- dplyr::filter(data, eval(parse(text = f_arg1)))
} else {
d1 <- dplyr::filter(data, eval(parse(text = f_arg1)), eval(parse(text = f_arg2)))
}

if (!is.null(rem)) {
d1 <- d1[, -c(rem)]
}

if (missing(expr)) {
quote(expr = )
} else {
# Evaluate the expression within the data frame context
calc_fld <- eval(parse(text = expr), envir = d1)
# Add the new field to the data frame
d1$calc_fld <- calc_fld
}

if (!is.null(srtfld)) {
d1 <- dplyr::arrange(d1, eval(parse(text = srtfld)))
}

return(d1)
}

It is believed that this version of the mutate_filter function possesses a much higher value proposition than its predecessor. As a result, this modified function should be the one selected for inclusion in the quickcode package.

FUNCTION TESTING STATUS:
The updated function has been tested but not extensively. If the functional output meets the expectations of the requirements previously outlined in this issue, additional testing should be conducted.

ADDITIONAL NOTES:

  • The rem argument does not accept field name string literals
  • The rem argument maintains a specific encoding syntax that must be observed or the function will crash
  • The rem does not accept quotation marks
  • The rem argument is encoded using vector syntax
  • Encoding syntax for the rem function for a contiguous removal of variables: c(4,5,6) where 4 is the fourth variable in the dataset counting from left-to-right starting with 1; an alternative encoding could be c(4:6)
  • Encoding syntax for the rem function for a non-contiguous removal of variables: c(3, 6:9, 15) presuming the passed dataset contains at least 15 variables
  • The srtfld argument can accept multiple field names
  • The variable passed to the srtfld argument should be encapsulated with quotation marks

CODE EXAMPLES:
library(DescTools)
data("d.pizza")
data("mtcars")

mutate_filter(mtcars, "mpg == 21.0", "cyl == 6", expr = "hp*2")
mutate_filter(d.pizza[,1:10], f_arg1 = "driver == 'Taylor'", f_arg2 = "area == 'Camden'", expr = "count*price")
mutate_filter(mtcars, f_arg1 = "cyl == 8", expr = "vs+am+gear+carb")
mutate_filter(airquality, f_arg1 = "Month == 5", rem = c(3:4), expr = "Ozone/Solar.R")
mutate_filter(d.pizza, f_arg1 = "area == 'Camden'", rem = c(1:4, 15,16), srtfld = "price", expr = "round(count*price,2)")
mutate_filter(mtcars, f_arg1 = "vs == 1", rem = c(2:5, 11), srtfld = "mpg")
mutate_filter(mtcars, f_arg1 = "mpg > 20", rem = 11)
mutate_filter(d.pizza[5:10], f_arg1 = "area == 'Westminster'", srtfld = c("driver", "price"))

CONCLUSION
The only thing missing from the code supporting the mutate_filter function is that each argument must be expressly cited or the function will crash. I'm not sure what changes need to be made to the code but argument names when using the function should be optional.

@oobianom
Copy link
Owner Author

Thanks Brice!

@brichard1638
Copy link

brichard1638 commented Jun 28, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants