Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

table_one and summarytable sort argument: order by row option #39

Open
nicolasfoss opened this issue Sep 30, 2024 · 4 comments
Open

table_one and summarytable sort argument: order by row option #39

nicolasfoss opened this issue Sep 30, 2024 · 4 comments

Comments

@nicolasfoss
Copy link

Would it be possible to have an argument that a user could leverage to sort the rows as well? Presently sort kw only sorts the columns. It would likely need to be a separate kw argument.

With multiple row arguments that could be tricky, but I have a use case where it would be nice on a longer table where I wanted to display days of the week on the rows and could not order them using summarytable or table-one.

Thanks for taking time to read this!

@jkrumbiegel
Copy link
Collaborator

jkrumbiegel commented Oct 1, 2024

I'm not sure I understand the request for table_one, the rows here are the variables you specify, so you control that order directly. For summarytable, I assume your problem was that "Tuesday" would alphabetically be sorted after "Sunday"? You could make the days an ordered categorical variable with CategoricalArrays, or you could presort your dataset and then set sort = false in summarytable so that it picks up these variables in the order they appear in the dataset:

That would be the difference between:

df = DataFrame(
    days = repeat(["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"], inner = 10),
    data = randn(10 * 7),
)

summarytable(df, :data, rows = :days, summary = [sum, minimum, maximum])
image

and

summarytable(df, :data, sort = false, rows = :days, summary = [sum, minimum, maximum])
image

@nicolasfoss
Copy link
Author

nicolasfoss commented Oct 1, 2024

Thanks for your comment! It might help if I provide a reprex.

Setup

using SummaryTables, DataFrames

Create the necessary columns for Race and Day of the Week

race = vcat(fill("BLACK", 7), fill("TOM", 2), fill("WHITE", 22))

day_of_week = vcat(

    # BLACK

    "MON", "TUE", "TUE", "WED", "THU", "THU", "FRI",

    # TOM

    "FRI", "SAT",

    # WHITE

    "MON", "MON", "TUE", "TUE", "WED", "WED", "THU", "FRI", "FRI", "FRI", 

    "SAT", "SAT", "SAT", "SUN", "SUN", "MON", "TUE", "THU", "THU", "TUE",

    "WED", "SAT"

)

Create a DataFrame

faux_data = DataFrame(RACE = race, DAY_OF_WEEK = day_of_week)

Create a dictionary mapping days to their numeric order for sorting

day_order = Dict("MON" => 1, "TUE" => 2, "WED" => 3, "THU" => 4, "FRI" => 5, "SAT" => 6, "SUN" => 7)

Create the summary statistics using table_one --> ordering/levels do work on cols

faux_data_stats = table_one(
    sort(faux_data, :DAY_OF_WEEK, by = x -> day_order[x]),
    :RACE => "Race",
    groupby = :DAY_OF_WEEK => "Day",
    show_n = true;
    sort = false
)

image

Create the summary statistics using table_one --> ordering/levels do not work on rows

faux_data_stats = table_one(
    sort(faux_data, :DAY_OF_WEEK, by = x -> day_order[x]),
    :DAY_OF_WEEK => "Day",
    groupby = :RACE => "Race",
    show_n = true;
    sort = false
)

image

I should have provided the reprex in the first place, sorry about that.

I hope you can see what I am referring to. This seems to be expected behavior and I am suggesting it would be a great enhancement to be able to order those rows based on a Dict object or another method. Maybe I should have tried running categorical() on my day variable, instead?

@jkrumbiegel
Copy link
Collaborator

Ahh now I see, you mean the sorting within a "categorical variable" analysis. I will have to think a little about possible APIs for this (aside from making categorical values which sort as expected on their own)

@nicolasfoss
Copy link
Author

@jkrumbiegel Thanks for your time and review! I hope I have been helpful.

In any case, SummaryTables is a wonderful package and a great benefit to the community. Have a great week!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants