-
Notifications
You must be signed in to change notification settings - Fork 998
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read space delimited file where missing values are blank "". #3658
Comments
Is there any news on this / any workaround without resorting to the slow read.table? I am also trying to read a large space separated file in R and this problem is so frustrating (especially since your SO post describes how they had a solution then made it stop working). What makes it worse is that the file is gzipped, and even read.table seems to have trouble with the double-space containing line. |
I just found this, with the same problem. It's interesting as I saw noted that two consecutive commas are correctly interpreted to imply a blank value between, but two consecutive spaces are not. |
This is still a problem in the latest version, 1.14.2 |
Having the same issue with v1.14.8 |
confirming, this is still an issue with current R-devel and data.table-1.15.0. text <- "c1 c2 c3 c4 c5 c6
r1 0 1 2 3 4
r2 0 3 4
r3 0 1 2 3 4"
read.table(text=text, strip.white = FALSE, sep = " ", na.strings = "")
data.table::fread(text=text, strip.white=FALSE) here are the results on my system: > text <- "c1 c2 c3 c4 c5 c6
+ r1 0 1 2 3 4
+ r2 0 3 4
+ r3 0 1 2 3 4"
> read.table(text=text, strip.white = FALSE, sep = " ", na.strings = "")
V1 V2 V3 V4 V5 V6
1 c1 c2 c3 c4 c5 c6
2 r1 0 1 2 3 4
3 r2 0 <NA> <NA> 3 4
4 r3 0 1 2 3 4
> data.table::fread(text=text, strip.white=FALSE)
c1 c2 c3 c4 c5 c6
<char> <int> <int> <int> <int> <int>
1: r1 0 1 2 3 4
Warning message:
In data.table::fread(text = text, strip.white = FALSE) :
Stopped early on line 3. Expected 6 fields but found 4. Consider fill=TRUE and comment.char=. First discarded non-empty line: <<r2 0 3 4>>
> sessionInfo()
R Under development (unstable) (2024-01-23 r85822 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 10 x64 (build 19045)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.utf8
[2] LC_CTYPE=English_United States.utf8
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.utf8
time zone: America/Phoenix
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_4.4.0 tools_4.4.0 data.table_1.15.0
> |
what about |
That doesn't correspond to the desired result here. Spaces should be delimiters, not NAs--the lack of a value between spaces should be interpreted as an NA. And in any case, trying it regardless just yields different errors: > data.table::fread(text=text, strip.white=FALSE, na.strings = " ")
Error in data.table::fread(text = text, strip.white = FALSE, na.strings = " ") :
na.strings[1]==" " consists only of whitespace, ignoring. But strip.white=FALSE. Use strip.white=TRUE (default) together with na.strings="" to turn any number of spaces in string columns into <NA>
> data.table::fread(text=text, strip.white=TRUE, na.strings = " ")
c1 c2 c3 c4 c5 c6
1: r1 0 1 2 3 4
Warning messages:
1: In data.table::fread(text = text, strip.white = TRUE, na.strings = " ") :
na.strings[1]==" " consists only of whitespace, ignoring. Since strip.white=TRUE (default), use na.strings="" to specify that any number of spaces in a string column should be read as <NA>.
2: In data.table::fread(text = text, strip.white = TRUE, na.strings = " ") :
Stopped early on line 3. Expected 6 fields but found 4. Consider fill=TRUE and comment.char=. First discarded non-empty line: <<r2 0 3 4>> |
Oh I see, you probably meant > packageVersion("data.table")
[1] ‘1.15.0’
> text <- "c1 c2 c3 c4 c5 c6
+ r1 0 1 2 3 4
+ r2 0 3 4
+ r3 0 1 2 3 4"
> data.table::fread(text=text, strip.white=FALSE, na.strings = "")
c1 c2 c3 c4 c5 c6
<char> <int> <int> <int> <int> <int>
1: r1 0 1 2 3 4
Warning message:
In data.table::fread(text = text, strip.white = FALSE, na.strings = "") :
Stopped early on line 3. Expected 6 fields but found 4. Consider fill=TRUE and comment.char=. First discarded non-empty line: <<r2 0 3 4>> Pasteable: text <- "c1 c2 c3 c4 c5 c6
r1 0 1 2 3 4
r2 0 3 4
r3 0 1 2 3 4"
data.table::fread(text=text, strip.white=FALSE, na.strings = "") |
This issue was introduced in data.table 1.11.0 specifically. |
Example input text file - fileTest.txt:
Adding a screenshot to show there are 3 spaces on row r2 between 0 and 3, i.e.: values are missing for c3 and c4.
Using R version 3.4.0 and data.table 1.10.4 (Session info 1), below works as expected:
But fails with R version 3.5.2 and data.table_1.12.2. (Session info 2).
Other attempts, all failed:
Note
Session info 1
Session info 2
The text was updated successfully, but these errors were encountered: