Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_fwf doesn't support using the first row as column names #1393

Closed
rgzn opened this issue Mar 25, 2022 · 7 comments
Closed

read_fwf doesn't support using the first row as column names #1393

rgzn opened this issue Mar 25, 2022 · 7 comments
Labels
feature a feature request or enhancement read 📖

Comments

@rgzn
Copy link

rgzn commented Mar 25, 2022

With most of the readr rectangular data reading functions, the col_names argument can be set to TRUE to interpret the first row as column names. This feature does not exist in read_fwf().

I am not sure if this is a deliberate design decision, but it would be very nice to have the option of using the first row as column names.

Here is an example of reading the same data in csv or fwf format, and the column names being interpreted differently:

> read_csv(I(c("col1,col2,col3", "first,middle,last")))
Rows: 1 Columns: 3                                                                                                                                                  
-- Column specification ---------------------------------------------------------------------------------------------------------
Delimiter: ","
chr (3): col1, col2, col3

i Use `spec()` to retrieve the full column specification for this data.
i Specify the column types or set `show_col_types = FALSE` to quiet this message.
# A tibble: 1 x 3
  col1  col2   col3 
  <chr> <chr>  <chr>
1 first middle last 

Using defaults for read_fwf:

> read_fwf(I(c(" col1   col2 col3", "first middle last")))
Rows: 2 Columns: 3                                                                                                                                                  
, eta: -- Column specification ---------------------------------------------------------------------------------------------------------

chr (3): X1, X2, X3

i Use `spec()` to retrieve the full column specification for this data.
i Specify the column types or set `show_col_types = FALSE` to quiet this message.
# A tibble: 2 x 3
  X1    X2     X3   
  <chr> <chr>  <chr>
1 col1  col2   col3 
2 first middle last 
@rgzn rgzn changed the title read_fwf doesn't support using the first row as column names, why? read_fwf doesn't support using the first row as column names Mar 25, 2022
@sbearrows sbearrows added the feature a feature request or enhancement label Apr 7, 2022
@ilikegitlab
Copy link

Adding my vote as I spend 15 min looking for this as all the other readr functions have it..

@rgzn
Copy link
Author

rgzn commented Jun 12, 2022

Yeah I spent a lot longer than that! Unfortunately after looking at the code, I think it would take a while for me to figure this out enough to submit a pull request.

It's very unexpected behavior though and I don't really understand why it's not there.

@hadley
Copy link
Member

hadley commented Jul 31, 2023

That looks more like a white-space delimited file than what you normally see in a FWF. And read_table() does work:

readr::read_table(I(c(" col1   col2 col3", "first middle last")))
#> # A tibble: 1 × 3
#>   col1  col2   col3 
#>   <chr> <chr>  <chr>
#> 1 first middle last

Created on 2023-07-31 with reprex v2.0.2

@hadley hadley closed this as completed Jul 31, 2023
@rgzn
Copy link
Author

rgzn commented Aug 1, 2023

That looks more like a white-space delimited file than what you normally see in a FWF. And read_table() does work:

Well, I just used that example to illustrate the issue. The original data where I encountered this problem (animal tracking collar) was fixed width, not white space (there was white space in the data). I still don't understand the reasoning for not including the col_names argument.

@hadley
Copy link
Member

hadley commented Aug 1, 2023

Because it's very rare for fwf columns to have reasonable column names, since column widths are typically very small.

@ilikegitlab
Copy link

That is not true.

I've data from commercial dataloggers that produces tons of the stuff.

@hugomflavio
Copy link

hm... this shows closed as completed, but was this functionality added? I have a similar "datalogger produces a weird format output" situation, but couldn't find an argument to get read_fwf to use the first row as column headers. read_table() does not like the file because one of the columns has no data. read_fwf can parse it correctly though (except for the column names).

read_fwf example: Gets the column names (and therefore data types) wrong, but the data right.

> readr::read_fwf(I(c(" col1   col2 col3", "first        last")))
Rows: 2 Columns: 3                                                                                                 
── Column specification ───────────────────────────────────────────────────────────────────────────

chr (3): X1, X2, X3Use `spec()` to retrieve the full column specification for this data.Specify the column types or set `show_col_types = FALSE` to quiet this message.
# A tibble: 2 × 3
  X1    X2    X3   
  <chr> <chr> <chr>
1 col1  col2  col3 
2 first NA    last 

read_table example: Gets the columns right, but the data wrong.

> readr::read_table(I(c(" col1   col2 col3", "first        last")))
Warning: 1 parsing failure.
row col  expected    actual         file
  1  -- 3 columns 2 columns literal data

# A tibble: 1 × 3
  col1  col2  col3 
  <chr> <chr> <chr>
1 first last  NA  

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement read 📖
Projects
None yet
Development

No branches or pull requests

6 participants