-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated Basic Usage of Manipulation Functions #3360
base: main
Are you sure you want to change the base?
Updated Basic Usage of Manipulation Functions #3360
Conversation
Friendly bump 🙂 |
Agreed. But first we need to make a decision on #3361. I will have a look at it and comment there. |
docs/src/man/basics.md
Outdated
| ------------ | -------------------------------- | -------------------------------------------- | ------------------------------------------------- | | ||
| `transform` | Creates a new data frame. | Retains both source and manipulated columns. | Retains same number of rows as source data frame. | | ||
| `transform!` | Modifies an existing data frame. | Retains both source and manipulated columns. | Retains same number of rows as source data frame. | | ||
| `select` | Creates a new data frame. | Retains only manipulated columns. | Retains same number of rows as source data frame. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"manipulated" or "created"?
docs/src/man/basics.md
Outdated
| `transform!` | Modifies an existing data frame. | Retains both source and manipulated columns. | Retains same number of rows as source data frame. | | ||
| `select` | Creates a new data frame. | Retains only manipulated columns. | Retains same number of rows as source data frame. | | ||
| `select!` | Modifies an existing data frame. | Retains only manipulated columns. | Retains same number of rows as source data frame. | | ||
| `subset` | Creates a new data frame. | Retains only source columns. | Number of rows is determined by the manipulation. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe better say that by condition (to differentiate it from combine
)
docs/src/man/basics.md
Outdated
### Constructing Operation Pairs | ||
All of the functions above use the same syntax which is commonly | ||
`manipulation_function(dataframe, operation)`. | ||
The `operation` argument is a `Pair` which defines the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add that Pair
is constructed using =>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also - it does not have to be the pair, so maybe say it is usually a pair?
(your first example is not a pair below)
docs/src/man/basics.md
Outdated
These rules are typically called transformation mini-language. | ||
|
||
Let us move to the examples of application of these rules | ||
## Basic Usage of Manipulation Functions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A larger question - maybe create a separate page for this tutorial?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that would be better. Maybe "Manipulation Functions" under "User Guide" before "Split-apply-combine"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As commented below - I would put it as a "top level" with a name something along "A gentle introduction to manipulation functions" (so that we clearly signal that this material is less formal than the rest of the manual).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would you think of making the new Top Level section something like "Beginner's Guide" or "User's Guide for Beginners" and then placing "Manipulation Functions" at a second level under that? I'm not volunteering to rewrite the entire User's Guide, but it could leave room for others to add similar "gentle" content to the documentation. It would also make the sidebar look cleaner by splitting up the current long name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. The other section that could go there is https://dataframes.juliadata.org/stable/man/basics/ as it has the same objective.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, but Basics contains the existing "Basic Usage of Manipulation Functions". I don't know how to differentiate it from this new section if they live next to each other.
I initially intended to just clarify some topics within that section, but now the scope has grown.
I can maybe overwrite that section if I add these topics that I don't currently cover:
- "Note that this time we use string column selectors because some of the column names have spaces in them."
- "The benefit of
select
orcombine
over indexing is that it is easier to get the union of several column selectors."- "It is important to note that
select
always returns a data frame, even if a single column selected as opposed to indexing syntax."- "By default
select
copies columns of a passed source data frame. In order to avoid copying, pass thecopycols=false
keyword argument."
The other sections under Basics use the German dataset, but I think it is easier to understand what is going on with smaller data frames where you know all the data values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can maybe overwrite that section
Yes - I think it is OK just to expand that section (especially that it is top-level now already)
I think it is easier to understand what is going on with smaller data frames
Agreed. just please use different variable names than these already used there so that using different dataframes does not lead to confusion.
Thank you! (sorry for so many comments, but - unfortunately - writing documentation is hard)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done (sort of). I did not use new variables names though. I had been and continued to just overwrite the definition of df
. My data frames are so small and frequent that coming up with a new name each time would be a pain.
docs/src/man/basics.md
Outdated
!!! Note | ||
The Julia parser sometimes prevents `:` from being used by itself. | ||
`ERROR: syntax: whitespace not allowed after ":" used for quoting` | ||
means your `:` must be wrapped in either `(:)` or `Cols(:)` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I typically recommend All()
which is easy to understand I think.
docs/src/man/basics.md
Outdated
These rules are typically called transformation mini-language. | ||
|
||
Let us move to the examples of application of these rules | ||
These functions and their methods are explained in more detail in the section |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add that not only more detail, but with more "slow paced" and informal approach :).
@@ -0,0 +1,1345 @@ | |||
# Data Frame Manipulation Functions | |||
|
|||
The seven functions below can be used to manipulate data frames |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also here, as I have commented - maybe add a sentence at the top explaining the teaching approach of this material. Thank you!
I've never used Documenter.jl before. I just tried to edit the |
I don't know why the cross references aren't working. It looks to me like I am doing it right. https://documenter.juliadocs.org/dev/man/syntax/#@ref-link |
Need to see if documenter tests pass this time. There is also a subsection I would like at the end on performance, but I am hoping someone else can write it, even if that is as a separate PR. |
I think that the documenter error is due to the fact that your references are on level three, i.e. @mortenpi - how could @nathanrboyer reference e.g. this section |
Sorry, missed this earlier. This actually looks like a bug. The fact that it's level 3 heading shouldn't matter. As a workaround, you might be able to use |
I think I fixed my errors, but I am still getting some errors below which seem unrelated to me. I get this error on main with
I get this error on this branch with
|
If this is using Documenter 1.0+, then that keyword was removed (and The doctest failures seem to be due to different printing between Julia versions (i.e. make sure you use the same Julia version locally as is being used for the docs CI). |
I thought that I already was in sync from this: 7614fc3 I am still getting the second error, and I am using the juliaup release branch which seems to match the ci.yml.
|
Replaces PR #2907 since it was too out of sync with the main branch.