Skip to content

Getting data into R

ngageorange edited this page Apr 21, 2016 · 12 revisions

Getting data into R

Everything is a "data-frame"

Once you have imported any sheet into R, the result is a data-frame. This is another word for a sheet - like Excel's sheet; it has columns, each identified by a name, and rows for observations.

To access a dataframe's particular column, we use the $ notation. E.g:

df$some_column

Here, df is our dataframe and the column name is "some_column".

Easy!

Introducing the CSV

CSV, a Comma Seperated Values, is a file that will contain your data. This is easily exportable from Microsoft Excel, Apple Numbers, Open Office, Google Sheets...etc.

It's a very simple format. The top line are the column names, each seperated by a comma. The following lines are the observations in those columns, again, seperated by a comma.

It's strength is in it's simplicity. It only has data, no formuals, no tricks and is very well recognised amongst software packages as it is very easily supported. R has excellent support for CSV.

Find your File

You will need to know the absolute location of your file on your harddrive.

On Mac OSX: Right click on the file, choose "Get Info". The "Where" will be the absolution location of the folder encompassing your file. Add the file name at the end:

On Windows: Who knows!

Using in-built function

We can import Comma Separated Values (CSV) files into R very easily. These files can be generated by Microsoft Excel, Apple Numbers and Google Sheets usually through a File -> Export process.

Once a sheet has been exported, it can be imported into R:

 df <- read.csv(LOCATION_OF_FILE, header=TRUE, stringsAsFactor=FALSE)

This will read your file, then a dataframe object will be available to run queries on.

Workshop

Now is a 20 minute workshop on getting this working...

Using Google Sheets

Using Google Sheets is a slightly more advanced way of getting data into R but it is more powerful. It allows the creation of seamless data analytics. Using Google Forms, one can input the data into a pre-validated form that then gets automatically inserted into a sheet that then gets pulled in automatically into R next time you run your script.

We will need to install the googlesheets library to import data from Google Sheets.

install.packages("googlesheets")

Once the package has been installed, load it into your R script:

library(googlesheets)

Now list all the sheets available (local and ones shared to your account as shown https://docs.google.com/spreadsheets/) using

gs_ls()

On the first run, it will ask you to authorise the package 'googlespreadsheet' to view your account by copying a link from the console and pasting it into a browser. Once you have authorised it, you will receive a code from your browser which you can then paste into the console. A list of all your sheets will then be printed in the console.

To access the sheet:

sheet <- gs_title(YOUR_TITLE)

Where YOUR_TITLE is the name of the sheet as shown in the gs_ls command. gs_title will authorise the use of that sheet.

To get access to the worksheet:

df <- gs_read(ss = sheet, ws = NAME_OF_WORKSHEET)

data variable now contains a dataframe much like importing from CSV. You can access columns by:

df$column_name

Looking at Data

Using R-studio, click on Environment, then click on 'data' (as the variable name). This will open up a new window that displays the data if everything has gone smoothly on either methods.

Simple Testing

The see the top rows of your sheet:

head(df)

Similarly, the last rows:

tail(df)

To get a summary of each column (min, max, averages...etc), type:

summary(df)

Issues

  • Discuss what a CSV file

  • Show how to export CSV files from Excel in an OS independent way...

  • Screenshots of what each bit looks like, like googlesheets console/authentication

  • Google Forms screenshot/explaining more

  • Explain what a Path and how to get it (Mac version and a PC version) with screenshot

  • String as factors=False in the function (?using readr as it might also deal with line ending \r\n vs \n in CSV file)

  • Remind people that gs_ls() is a function (also, () = no arguments)

  • Getting CSV into R should be 10min presentation, 10 minute workshop.

  • Google spreadsheets as a 10min presentation, with 20 minute workshop

  • Introduce feedback forms here (meta!)

Clone this wiki locally