Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create selecting() to generate code to select exactly the columns in a data frame. #7

Open
joethorley opened this issue Aug 9, 2022 · 3 comments

Comments

@joethorley
Copy link
Member

a great way to say only select informative columns and the convert to code that no longer depends on values (to avoid breakage)

@joethorley joethorley changed the title Create function to generate code to select exactly the columns in a data frame. Create selecting() to generate code to select exactly the columns in a data frame. Dec 21, 2022
@joethorley
Copy link
Member Author

selecting <- function(.data, ..., .fix = NULL)

selecting(data)

"select(col1, somethingelse, daa, daa2)"

selecting(data, .fix = "'")

"select('col1', 'somethingelse', 'daa', 'daa2')"

selecting(data, matches("^d"), .fix = '"')

'select("daa", "daa2")'

@aylapear
Copy link
Member

What types of options should be passed?

  • have a type = argument that defaults to "all" but also has the options of "drop_uninformative" and "keep_uninformative" as they align with naming and utility we already have in the package?
  • have a regex = argument so you can pass regular expressions
  • the function would first select the columns based on the type argument and then could subset them down further using the regex argument
  • the .fix = argument applies formatting to the column names it outputs so they either come out without any things around them or with a " around them, or backticks `, or even append a prefix or suffix to them

@joethorley
Copy link
Member Author

I don' think necessary for the drop uninformative and keep uninformative as could just do this at the start before calling selecting(). May be better to call function selected() as its really just describing what has been selected at this point.

Note however I think that drop_uninformative_colums() and keep_uninformative_colums() should write the equivalent selected() call to the console unless the user silences.

I also think that the regex argument is unnecessary as the user could just call select(matches(x)) before calling selected().

What we are looking for is code that can be embeded in the script to select exactly the columns that were in the data frame at the time that selected() was called so whether or not column names are backticked should just be determined by whether this is necessary I think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants