-
Notifications
You must be signed in to change notification settings - Fork 968
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
blank.lines.skip won't work when the top lines is empty #5611
Comments
Add a reprex, and use file <- tempfile()
writeLines(c("", "", NA), file)
readLines(file)
#> [1] "" "" "NA"
data.table::fread(file, blank.lines.skip = FALSE)
#> V1
#> <lgcl>
#> 1: NA
sessionInfo()
#> R version 4.3.1 (2023-06-16)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 22.04.3 LTS
#>
#> Matrix products: default
#> BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/libmkl_rt.so; LAPACK version 3.8.0
#>
#> locale:
#> [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
#> [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
#> [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
#> [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Asia/Shanghai
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> loaded via a namespace (and not attached):
#> [1] digest_0.6.33 styler_1.10.1 fastmap_1.1.1 xfun_0.39
#> [5] magrittr_2.0.3 glue_1.6.2 R.utils_2.12.2 knitr_1.43
#> [9] htmltools_0.5.5 rmarkdown_2.23 lifecycle_1.0.3 cli_3.6.1
#> [13] R.methodsS3_1.8.2 vctrs_0.6.3 reprex_2.0.2 data.table_1.14.9
#> [17] withr_2.5.0 compiler_4.3.1 R.oo_1.25.0 R.cache_0.16.0
#> [21] purrr_1.0.1 tools_4.3.1 evaluate_0.21 yaml_2.3.7
#> [25] rlang_1.1.1 fs_1.6.3 Created on 2023-11-03 with reprex v2.0.2 |
We should probably update the documentation that blank lines at the beginning of a file are always skipped until the first non-empty row is encountered. This is also mentioned by the verbose output (shortened it to highlight the important part). library(data.table)
file2 <- tempfile()
writeLines(c("", "", "a"), file2)
fread(file2, blank.lines.skip = FALSE, verbose=TRUE)
#> [05] Skipping initial rows if needed
#> Positioned on line 3 starting: <<a>> |
It’s not the document, we want to completely turn off the skipping of blank lines |
I get that but changing the behavior of blank.lines.skip = FALSE would probably break a lot of existing code so I doubt we will do that. Possible changes I could imagine would be to introduce another argument or let blank.lines.skip accept other values to indicate you do not want to skip blank lines at the beginning. |
Do we see a purpose of an additional argument to turn off the initial skipping of lines? Or is a documentation update fine to close this issue, shouldn't be difficult to implement regardless. |
|
@ben-schwen I did some initial prototyping of a change and here's what I found: It seems that to make this behavior work in the most intuitive way, ( Currently, If we make it so ATP, I feel a documentation change is the way to go. If we want to cover the MRE from the issue, the change is almost there. WDYT? |
Tough question. Files with multiple columns and complete blank lines are clearly misspecified, so it's hard to guess what the intended behavior should be. Does it play nice with complete blank lines, multiple columns and |
f = tempfile()
input = c("a,b", ",", ",")
writeLines(input, f)
fread(f, blank.lines.skip=FALSE, fill=TRUE)
# a b
# 1: NA NA
# 2: NA NA Seems to work, I think fill is what we needed. I could initialize Edit: f = tempfile()
input = "\n\n\n\n1,3\n\n2,4\n\n"
writeLines(input, f)
fread(f, blank.lines.skip="none", fill=TRUE)
# V1
# 1: NA
# 2: NA
# ---
# 8: NA
# 9: NA |
Following up on this issue -- I figured that skipping the initial blank lines isn't quite so trivial because AFAIU |
Hmm, it seems If we simply want to count the number of newlines, this SO Q&A look pretty good to me (and adjustments to respect https://stackoverflow.com/a/23456450/3576984 As to desired |
f = tempfile()
input = c("a,b", ",", ",")
writeLines(input, f)
fread(f, blank.lines.skip=FALSE, fill=TRUE, header=FALSE)
V1 V2
<char> <char>
1: a b
2:
3:
input = "\n\n\n\n1,3\n\n2,4\n\n"
writeLines(input, f)
fread(f, blank.lines.skip="none", fill=TRUE, header=FALSE)
V1
<num>
1: NA
2: NA
3: NA
4: NA
5: 1.3
6: NA
7: 2.4
8: NA
9: NA
fread(f, blank.lines.skip='none')
V1 V2
<int> <int>
1: 2 4 |
I want to keep the total line number. Following code should return 3 rows, but it always return 1 row
The text was updated successfully, but these errors were encountered: