Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement R tidylog package style of output #13

Open
kdpsingh opened this issue Feb 14, 2023 · 4 comments
Open

Implement R tidylog package style of output #13

kdpsingh opened this issue Feb 14, 2023 · 4 comments
Assignees
Labels
enhancement New feature or request

Comments

@kdpsingh
Copy link
Member

R has a wonderful tidylog package that outputs a log of how an operation modified a dataframe (e.g., "filter: 300 rows were removed (10%) of the data, with 2,700 rows remaining.")

I would like to implement this capability. I don't think that using TableMetadataTools.jl is necessarily the approach I want to take because this metadata should be printed (using @info or println) but does not need to be permanently stored as part of the data frame.

This will probably be implemented either using @aside or simply by wrapping the DataFrames.jl functions with a tidylog function that captures the state of the data frame before and after the operation and prints out the difference.

@bkamins
Copy link

bkamins commented Feb 14, 2023

Yes, if you do not want to store it in metadata then it is easier to just do logging (however, maybe you want to consider logging the changes in metadata as an opt-in - some users maybe would find it useful when doing lineage analysis?)

@kdpsingh
Copy link
Member Author

This is a great point. I may consider adding this later. In my mental model, the logging is tied to operations rather than data frames. For example, a join is a single operation and it's not clear that either data frame would "own" that metadata.

I may first implement this in a logging style and then think through the implications of storing some or all of the results as metadata.

@bkamins
Copy link

bkamins commented Feb 14, 2023

a join is a single operation and it's not clear that either data frame would "own" that metadata.

I was thinking about it. The produced data frame "owns" the metadata as you need to know how it got created. Of course this is just food for thought for the future.

@kdpsingh kdpsingh self-assigned this Feb 21, 2023
@kdpsingh kdpsingh added the enhancement New feature or request label Feb 21, 2023
@kdpsingh
Copy link
Member Author

Confirmed that tidylog is MIT License: elbersb/tidylog#61

Will aim for a mostly line-by-line translation of tidylog in R.

While we could consider autodetecting changes in the data frames (and treat all verbs the same), I think the tidylog approach to customize the output for each verb feels more natural and is probably more efficient.

@kdpsingh kdpsingh transferred this issue from TidierOrg/Tidier.jl Jul 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants