|
| 1 | +Read and Write Data - Part 2 |
| 2 | +============================ |
| 3 | + |
| 4 | +Textual Formats |
| 5 | +--------------- |
| 6 | + |
| 7 | +* dumping and dputing are useful because the resulting textual format is editable, and in the case of corruption, potentially recoverable |
| 8 | +* Unlink writing out a table or CSV file, dump and dput preserve the metadata (sacrificing some readability), so that another user doesn't have to specify it all over again |
| 9 | +* Textual formats can work much better with version control programs like subversion and git which can only track changes meaningfully in text files |
| 10 | +* Textual formats can be longer-lived; if there is corruption somewhere in the file, it can be easier to fix the problem |
| 11 | +* Textual formats adhere to the "Unix philosophy" |
| 12 | +* Downside: the format is not very space efficient |
| 13 | + |
| 14 | + |
| 15 | +dput-ting R Objects |
| 16 | +------------------- |
| 17 | + |
| 18 | +Another way to pass data around is by deparsing the R object with dput and reading back using dget. |
| 19 | + |
| 20 | + > y <- data.frame(a = 1, b = "a") |
| 21 | + > dput(y) |
| 22 | + structure(list(a =1, |
| 23 | + b = structure(1L, .Label = "a", |
| 24 | + class = "factor")), |
| 25 | + .Names = c("a", "b"), row.names = c(NA, -1L), |
| 26 | + class = "data.frame") |
| 27 | + > dput(y, file = "y.R") |
| 28 | + > new.y <- dget("y.R") |
| 29 | + > new.y |
| 30 | + a b |
| 31 | + 1 1 a |
| 32 | + |
| 33 | + |
| 34 | +Dumping R Objects |
| 35 | +----------------- |
| 36 | + |
| 37 | +Multiple objects can be deparsed using the dump function and read back in using source. |
| 38 | + |
| 39 | + > x <- "foo" |
| 40 | + > y <- data.frame(a = 1, b = "a") |
| 41 | + > dump(c("x", "y"), file = "data.R") |
| 42 | + > rm(x, y) |
| 43 | + > source("data.R") |
| 44 | + > y |
| 45 | + a b |
| 46 | + 1 1 a |
| 47 | + > x |
| 48 | + [1] "foo" |
| 49 | + |
| 50 | + |
| 51 | +Interfaces to the Outside World |
| 52 | +------------------------------- |
| 53 | + |
| 54 | +Data are read in using connection interfaces. Connections can be made to files (most common) or to other more exotic things. |
| 55 | + |
| 56 | +* file, opens a connection to a file |
| 57 | +* gzfile, opens a connection to a file compressed with gzip |
| 58 | +* bzfile, opens a connection to a file compressed with bzip2 |
| 59 | +* url, opens a connection to a webpage |
| 60 | + |
| 61 | + |
| 62 | +File Connections |
| 63 | +---------------- |
| 64 | + |
| 65 | + > str(file) |
| 66 | + function (description = "", open = "", blocking = TRUE, |
| 67 | + encoding = getOption("encoding")) |
| 68 | + |
| 69 | +* description is the name of the file |
| 70 | +* open is a code indicating |
| 71 | + * "r" read only |
| 72 | + * "w" writing (and initializing a new file) |
| 73 | + * "a" appending |
| 74 | + * "rb", "wb", "ab" reading, writing, or appending in binary mode (Windows) |
| 75 | + |
| 76 | + |
| 77 | +Connections |
| 78 | +----------- |
| 79 | + |
| 80 | +In general, connections are powerful tools that let you navigate files or other external objects. In practice, we often don't need to deal with the connection interface directly. |
| 81 | + |
| 82 | + > con <-file("foo.txt", "r") |
| 83 | + > data <- read.csv(con) |
| 84 | + > close(con) |
| 85 | + |
| 86 | +Is the same as: |
| 87 | + |
| 88 | + > data <- read.csv("foo.txt") |
| 89 | + |
| 90 | + |
| 91 | +Reading Lines of a Text File |
| 92 | +---------------------------- |
| 93 | + |
| 94 | + > con <- gzfile("words.gz") |
| 95 | + > x <- readLines(con, 10) |
| 96 | + > x |
| 97 | + [1] "1080" "10-point" "10th" "11-point" |
| 98 | + [5] "12-point" "16-point" "18-point" "1st" |
| 99 | + [9] "2" "20-point" "" |
| 100 | + |
| 101 | +writeLines takes a character vector and writes each element one line at a time to a text file. |
| 102 | + |
| 103 | +readLines can be useful for reading in lines of webpages. |
| 104 | + |
| 105 | + > ## This might take time |
| 106 | + > con <- url("http://www.jhsph.edu", "r") |
| 107 | + > x <- readLines(con) |
| 108 | + > head(x) |
| 109 | + [1] "<!DOCTYPE html>" |
| 110 | + [2] "<html lang=\"en\">" |
| 111 | + [3] "" |
| 112 | + [4] "<head>" |
| 113 | + [5] "<meta charset=\"utf-8\" />" |
| 114 | + [6] "<title>Johns Hopkins Bloomberg School of Public Health</title>" |
0 commit comments