Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: change default value of condense to FALSE in print.col_spec() #1478

Closed
presnell opened this issue Mar 3, 2023 · 3 comments
Closed

Comments

@presnell
Copy link

presnell commented Mar 3, 2023

I find the default behavior pretty surprising/confusing for a print method. See the example below to see what I mean. Would it break anything else if this were changed?

csv <- "
v,w,x,y,z
1,,,,2
"
df <- read_csv(csv)
col_spec <- spec(df)
col_spec$default <- col_double()
col_spec
print(col_spec, n = 3)                          # Not at all what I expected.
print(col_spec, n = 3, condense = FALSE)        # This is more like it.
@presnell
Copy link
Author

presnell commented Mar 3, 2023

On a related matter, I also think that cols_condense() would be safer (?) and more useful if there were an option to specify the default column type. In this case, only columns of that type would be "condensed out", rather than the most common column type. Maybe something like this?

cols_condense <- function (x, .default = NULL) {
  types <- vapply(x$cols, function(xx) class(xx)[[1]], character(1))
  if (is.null(.default)) {
      counts <- table(types)
      most_common <- names(counts)[counts == max(counts)][[1]]
      x$default <- x$cols[types == most_common][[1]]
      x$cols <- x$cols[types != most_common]
  } else {
      x$default <- .default
      x$cols <- x$cols[types != class(.default)[[1]]]
  }
  x
}

@hadley
Copy link
Member

hadley commented Jul 31, 2023

This might be surprising to you, but it's been this way for many years now, so changing it at this point is likely to increase new surprise, not decrease it.

@hadley hadley closed this as completed Jul 31, 2023
@presnell
Copy link
Author

presnell commented Oct 13, 2023

How about the second suggestion? The initial motivation for me was looking at data sets with many columns, many of which contain mostly missing data. The readr/vroom guess for such columns is likely to be logical and those are exactly the ones I want to manually specify. So if I could set the default in cols_condense to the most common non-logical column type, then it leaves me with a more manageable list of the columns that I really need to deal with.

I'm not sure if I have explained that well, but in general it seems like a useful feature to me that shouldn't break anyone's pre-existing code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants