Warning message with build_index #1

wgmmaas · 2024-04-02T10:28:55Z

Hi @lecy et al.,

Thanks for your work on this package. I get a warning message that I did not get before:

> index <- build_index(tax.years = 2019)

Warning message:
One or more parsing issues, call problems() on your data frame for
details, e.g.:
dat <- vroom(...)
problems(dat)

What could be the reason for the warning? And is it safe to ignore it, as I end up with the index of 523,999 observations (only two observations short of the 524,001 it should find for 2019 according to the README)?
Thanks, Wim

The text was updated successfully, but these errors were encountered:

lecy · 2024-04-03T05:07:16Z

I am not familiar with the warning, but I suspect it is from the readr package and probably related to data types.

See: tidyverse/readr#1477

Or potentially dplyr when the disaggregated data frames are being stacked.

I suspect it's harmless - for example integers and doubles mixing, which impacts representation in memory in R but would not change how the data would appear once written to a CSV file.

But please let me know if you discover otherwise.

wgmmaas · 2024-04-03T13:19:17Z

Thanks Jesse, you are correct. It is a parsing problem in readr. It is guessing the "LegalDomicileCountry" column type incorrectly (see below). As this does not affect the rest of my application, I will ignore it. Thanks.

URL <- paste0("https://nccs-efile.s3.us-east-1.amazonaws.com/index/data-commons-efile-index-", 2019, ".csv")
d <- readr::read_csv(URL, show_col_types = FALSE)
parsing_problems <- problems(d)
if (nrow(parsing_problems) > 0) {
  print(parsing_problems)
}


> print(parsing_problems)
# A tibble: 181 x 5
     row   col expected           actual file 
   <int> <int> <chr>              <chr>  <chr>
 1  1819    13 1/0/T/F/TRUE/FALSE CA     ""   
 2  3225    13 1/0/T/F/TRUE/FALSE NI     ""   
 3  5076    13 1/0/T/F/TRUE/FALSE CA     ""   
 4  5078    13 1/0/T/F/TRUE/FALSE CJ     ""   
 5  5502    13 1/0/T/F/TRUE/FALSE CA     ""   
 6  7666    13 1/0/T/F/TRUE/FALSE HO     ""   
 7  8408    13 1/0/T/F/TRUE/FALSE CA     ""   
 8  9305    13 1/0/T/F/TRUE/FALSE UK     ""   
 9 14025    13 1/0/T/F/TRUE/FALSE AU     ""   
10 21681    13 1/0/T/F/TRUE/FALSE BD     ""   
# i 171 more rows
# i Use `print(n = ...)` to see more rows

Edit: I patched to the newest version that uses data.table and I do not get the error anymore, thanks!

lecy · 2024-04-03T16:05:19Z

Ok, great. And yes, I updated the build_index() function so that all columns are loaded as strings (character vectors). Glad it worked!

wgmmaas closed this as completed Apr 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Warning message with build_index #1

Warning message with build_index #1

wgmmaas commented Apr 2, 2024

lecy commented Apr 3, 2024

wgmmaas commented Apr 3, 2024 •

edited

Loading

lecy commented Apr 3, 2024

Warning message with build_index #1

Warning message with build_index #1

Comments

wgmmaas commented Apr 2, 2024

lecy commented Apr 3, 2024

wgmmaas commented Apr 3, 2024 • edited Loading

lecy commented Apr 3, 2024

wgmmaas commented Apr 3, 2024 •

edited

Loading