Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using read_lines on connection objects #1494

Closed
jtlandis opened this issue May 9, 2023 · 1 comment
Closed

Using read_lines on connection objects #1494

jtlandis opened this issue May 9, 2023 · 1 comment

Comments

@jtlandis
Copy link

jtlandis commented May 9, 2023

I am mostly curious as to why readr::read_lines has a different functionality compared to base::readLines when applied to a connection object.

library(readr)
set.seed(123)
f <- tempfile()
write_lines(LETTERS, file = f)

con <- file(f)

can only read con if in binary mode

open(con, "rb")
print(con)
#> A connection with                                               
#> description "/tmp/RtmpMjFt6k/file407bc4bfab5f8"
#> class       "file"                             
#> mode        "rb"                               
#> text        "binary"                           
#> opened      "opened"                           
#> can read    "yes"                              
#> can write   "no"
on.exit(close(con))
read_lines(con, n_max = 7)
#> [1] "A" "B" "C" "D" "E" "F" "G" 
read_lines(con, n_max = 1)
#> character(0)

the connection still shows to be open, but even readLines does not report anything

readLines(con, n = 1)
#> character(0)

Doing the same thing with readLines shows the conneciton doesn’t drop

con2 <- file(f)
open(con2, "rb")
on.exit(close(con2))
readLines(con2, n = 7)
#> [1] "A" "B" "C" "D" "E" "F" "G" 
readLines(con2, n = 1)
#> [1] "H"

I do not know if it is possible, but having read_lines behave in the same way as
readLines on opened connections would make reading data in chunks easier.
I know the developers have made read_lines_chunked version of read_lines - but
currently this is much slower than just using a while loop with readLines on a connection object.

@hadley
Copy link
Member

hadley commented Jul 31, 2023

Here's a somewhat more minimal example illustrating the problem:

library(readr)

f <- tempfile()
write_lines(LETTERS, file = f)

con <- file(f, "rb")
readLines(con, n = 1)
#> [1] "A"
readLines(con, n = 1)
#> [1] "B"

con <- file(f, "rb")
read_lines(con, n_max = 1)
#> [1] "A"
read_lines(con, n_max = 1)
#> character(0)

Created on 2023-07-31 with reprex v2.0.2

I suspect this is because R doesn't expose a public API for working with connections and so unfortunately there's nothing that readr can do except slurp in the entire connection and then parse it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants