Skip to content

Commit

Permalink
Merge pull request #35 from datasciencecampus/dev
Browse files Browse the repository at this point in the history
merge dev into master ahead of opening up repo
  • Loading branch information
tbalbone31 authored Dec 5, 2024
2 parents df0f1f4 + 32381f5 commit d3780d8
Show file tree
Hide file tree
Showing 42 changed files with 8,233 additions and 308,268 deletions.
9 changes: 9 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,16 @@
.Rhistory
.Rproj.user
.RData
.Rprofile

# Python cache and setup files
.ipynb_checkpoints

# Quarto
/.quarto/
_site/
_environment.local

# Mac users
.DS_STORE
/_*.local
Empty file added .nojekyll
Empty file.
Original file line number Diff line number Diff line change
Expand Up @@ -5,24 +5,10 @@ engine: knitr
execute:
echo: true
eval: false
format:
html:
highlight: null
theme:
light: flatly
dark: darkly
toc: true
toc-title: Contents
toc-location: right
toc-depth: 3
number-sections: true
link-external-newwindow: true
embed-resources: true
freeze: auto # re-render only when source changes
---

![](../Images/AF_DSC_banner.png){fig-alt="Data Science Campus and Analysis Function logos."}

> To switch between light and dark modes, use the toggle in the top right
> To switch between light and dark modes, use the toggle in the top left
# Learning Objectives

Expand Down Expand Up @@ -80,7 +66,7 @@ R Studio is broken down into four panels.

When you open R Studio for the first time, you see this:

![](../Images/studio2.PNG){fig-alt="R Studio interface with the Code Editor, Environment, Console and Files panes."}
![](Images/studio2.PNG){fig-alt="R Studio interface with the Code Editor, Environment, Console and Files panes."}

If you don't see the Code Editor pane, go to the tool bar and click **View -\> Panes -\> Show All Panes**.

Expand All @@ -92,7 +78,7 @@ Upon first opening R Studio, you have the most basic form of the tool that has s

Firstly, navigate to "Tools" and "Global Options", which is where this tweaking takes place.

![](../Images/global_options.png){fig-alt="Global options menu with general, code, appearance and more as options."}
![](Images/global_options.png){fig-alt="Global options menu with general, code, appearance and more as options."}

You see that R Studio can be heavily customised. You will only scratch the surface here.

Expand Down Expand Up @@ -131,7 +117,7 @@ The root folder of the R Project (which you choose when you create it) contains

To create an R Project, select **File --\> New Project** and you will be given some examples of where to store the .Rproj file, a.k.a where the working directory will be.

![](../Images/new_r_project.png){fig-alt="A project can be created in a new directory, existing directory or from GitHub."}
![](Images/new_r_project.png){fig-alt="A project can be created in a new directory, existing directory or from GitHub."}

You can:

Expand All @@ -147,7 +133,7 @@ Create an R project in an **existing directory**, selecting the **course_content

In your own work, saving it one level higher in the root folder is a better approach. For this course, you must save it where you will save your scripts so the filepaths function correctly.

![](../Images/rproj_folder.png){fig-alt="The root folder showing the .Rproj file alongside the othr folders."}
![](Images/rproj_folder.png){fig-alt="The root folder showing the .Rproj file alongside the othr folders."}

After creating the R Project, it will open and set your working directory.

Expand All @@ -166,7 +152,7 @@ Thankfully, you have the project menu in the top right, which allows you to:
- Close projects
- See recently open projects and jump straight to them

![](../Images/project_menu.png){fig-alt="The top right menu that allows you to interact with projects."}
![](Images/project_menu.png){fig-alt="The top right menu that allows you to interact with projects."}

From here, assume you create and save your scripts in this project in order for filepaths in Chapter 3 onwards to function.

Expand Down Expand Up @@ -665,7 +651,7 @@ Notice that even without the digits argument, the round() function works. This i

You can investigate what specific functions do by navigating to the "Help" tab in the bottom right and searching it by name.

![](../Images/round_docstring.png){fig-alt="The document string of the rounding functions in R."}
![](Images/round_docstring.png){fig-alt="The document string of the rounding functions in R."}

you see:

Expand Down Expand Up @@ -1061,7 +1047,7 @@ There is a wealth of resource to help you progress in your R journey. Some of th

You can access these by clicking on **Help** tab in R Studio and then **RStudio Cheat Sheets**. They provide an excellent reference point for many common tasks.

![](../Images/data-transformation-cheat-sheet.png){fig-alt="The dplyr cheat sheet for data manipulation."}
![](Images/data-transformation-cheat-sheet.png){fig-alt="The dplyr cheat sheet for data manipulation."}

## R Documentation

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,24 +5,10 @@ engine: knitr
execute:
echo: true
eval: true
format:
html:
highlight: null
theme:
light: flatly
dark: darkly
toc: true
toc-title: Contents
toc-location: right
toc-depth: 3
number-sections: true
link-external-newwindow: true
embed-resources: true
freeze: auto # re-render only when source changes
---

![](../Images/AF_DSC_banner.png){fig-alt="Data Science Campus and Analysis Function logos."}

> To switch between light and dark modes, use the toggle in the top right
> To switch between light and dark modes, use the toggle in the top left
# Learning Objectives

Expand Down Expand Up @@ -129,7 +115,7 @@ There are more types of vectors, but for the purpose of our learning these are

Visually:

![](../Images/vector.png){fig-alt="Diagram of a vector as a column (collection) of the same red rectangle."}
![](Images/vector.png){fig-alt="Diagram of a vector as a column (collection) of the same red rectangle."}

## Creating Vectors

Expand Down Expand Up @@ -518,7 +504,7 @@ Unlike with vectors, where combining multiple vectors creates one vector, in lis
Lists enables you to gather a variety of objects with different contents and lengths under one name in an ordered way.

![](../Images/list.png){fig-alt="Visual of a list, with each element being a collection of boxes with different colours."}
![](Images/list.png){fig-alt="Visual of a list, with each element being a collection of boxes with different colours."}

To create a list we will use the **list()** function.

Expand Down Expand Up @@ -598,7 +584,7 @@ As a whole, DataFrames have the following features:
* Rows are observations (i.e. an entry for each variable forms an observation/row).
* They can hold variables of different types.

![](../Images/dataframe.png){fig-alt="Visual of a dataframe where each column has a name."}
![](Images/dataframe.png){fig-alt="Visual of a dataframe where each column has a name."}

To create one, we can use the **data.frame()** function on vectors you would like to be your columns, they must be the same length.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,24 +5,10 @@ engine: knitr
execute:
echo: true
eval: false
format:
html:
highlight: null
theme:
light: flatly
dark: darkly
toc: true
toc-title: Contents
toc-location: right
toc-depth: 3
number-sections: true
link-external-newwindow: true
embed-resources: true
freeze: auto # re-render only when source changes
---

![](../Images/AF_DSC_banner.png){fig-alt="Data Science Campus and Analysis Function logos."}

> To switch between light and dark modes, use the toggle in the top right
> To switch between light and dark modes, use the toggle in the top left
# Learning Objectives

Expand Down Expand Up @@ -74,7 +60,7 @@ help(seq)
```


![](../Images/seq_helpfile.png){fig-alt="Seq() function help file."}
![](Images/seq_helpfile.png){fig-alt="Seq() function help file."}


### Function help files
Expand Down Expand Up @@ -374,7 +360,7 @@ Below is a list of the core packages in tidyverse to provide some awareness into
* [lubridate](https://lubridate.tidyverse.org/) - For dealing with dates and times - included in tidyverse 2.0.0 onwards.


![](../Images/tidyverse.png){fig-alt="Tidyverse workflow of import, tidy, transform/model/visualise and communicate."}
![](Images/tidyverse.png){fig-alt="Tidyverse workflow of import, tidy, transform/model/visualise and communicate."}

The first of the core packages we will delve into is **readr**, which deals with reading in data, and by extension **tibbles**, the excellent update to dataframes that the tidyverse provides.

Expand Down Expand Up @@ -512,11 +498,11 @@ To do so, we need to know how to go back one folder level, or exit the current d

As such, the relative filepath we need to reach the dataset is

>"../data/titanic.csv"
>"Data/titanic.csv"
Visually, to understand the tree-like folder structure, we have something like the following going on:

![](../Images/folder_structure.jpg){fig-alt="Top level is introduction to R, folders are at level 2, items in the folders are level 3."}
![](Images/folder_structure.jpg){fig-alt="Top level is introduction to R, folders are at level 2, items in the folders are level 3."}

### **Loading in the data**{-}

Expand All @@ -527,7 +513,7 @@ We simply need to go into the **data** folder, then select the **titanic.csv** f
```{r}
# Read in titanic with read_csv()
titanic_data <- read_csv("../data/titanic.csv")
titanic_data <- read_csv("Data/titanic.csv")
```

Expand Down Expand Up @@ -600,7 +586,7 @@ We can easily correct this by adding the **na** paramter to the read_csv() funct
```{r}
# Specifying missing values as a vector to read_csv()
titanic_data <- read_csv("../data/titanic.csv",
titanic_data <- read_csv("Data/titanic.csv",
na = c("*", ".", "", "NULL"))
```
Expand Down Expand Up @@ -645,7 +631,7 @@ Let's read in the police dataset.
```{r}
# Reading in excel data using the readxl package
police_data <- read_excel("../data/police_data.xlsx")
police_data <- read_excel("Data/police_data.xlsx")
View(police_data)
```
Expand All @@ -655,7 +641,7 @@ We see that this is the **first sheet** in our workbook which is just the "Notes
```{r}
# Observe sheet names in police data
excel_sheets("../data/police_data.xlsx")
excel_sheets("Data/police_data.xlsx")
```


Expand All @@ -677,12 +663,12 @@ You can use the name of the sheet or the number/index.
# Using the sheet parameter in 2 ways
police_data <- read_excel("../data/police_data.xlsx",
police_data <- read_excel("Data/police_data.xlsx",
sheet = 2)
# Alternatively
police_data <- read_excel("../data/police_data.xlsx",
police_data <- read_excel("Data/police_data.xlsx",
sheet = "Table P1")
police_data
Expand All @@ -699,7 +685,7 @@ This is better but still not ideal:
# Using the range parameter to avoid empty rows
police_data <- read_excel("../data/police_data.xlsx",
police_data <- read_excel("Data/police_data.xlsx",
sheet = 2,
range = "A5:AA48")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,41 +5,20 @@ engine: knitr
execute:
echo: true
eval: false
format:
html:
highlight: null
theme:
light: flatly
dark: darkly
toc: true
toc-title: Contents
toc-location: right
toc-depth: 3
number-sections: true
link-external-newwindow: true
embed-resources: true
freeze: auto # re-render only when source changes
---

![](../Images/AF_DSC_banner.png){fig-alt="Data Science Campus and Analysis Function logos."}

> To switch between light and dark modes, use the toggle in the top right
> To switch between light and dark modes, use the toggle in the top left
# Learning Objectives

* Understand the importance of clean variable names.

* Be able to clean column names using the janitor package.

* Understand the use of the pipe operator.

* Be able to sort data with dplyr's **arrange** verb.

* Be able to select data with dplyr's **select** verb.

* Be able to filter data with dplyr's **filter** verb.

* Be able to transform data with dplyr's **mutate** verb.

* Be able to join datasets together.


Expand Down Expand Up @@ -79,7 +58,7 @@ library(janitor)
# Read in titanic.csv and set null values to be specific symbols
titanic_data <- read_csv("../data/titanic.csv",
titanic_data <- read_csv("Data/titanic.csv",
na = c("*", ".", "", "NULL"))
# Have a peak
Expand Down Expand Up @@ -693,7 +672,7 @@ This removes the need to type the **.data** argument each time.

* From R 4.1 onwards, the operator comes as standard with base R, as it has become the universal standard for data analysis. It now takes the form **|>**, and this option must be turned on in the Tools --> Global Options tab.

![](../Images/native_pipe.png){fig-alt="The Code, Editing pane with the native pipe operator tick box."}
![](Images/native_pipe.png){fig-alt="The Code, Editing pane with the native pipe operator tick box."}

The shortcut for this operator is **CTRL + SHIFT + M** and is one you will use alot from here on.

Expand Down Expand Up @@ -1224,7 +1203,7 @@ Of course, if you negate and **and**/**or** relationships:

This comes from logical statements in mathematics, specifically, De Morgan's laws.

![](../Images/demorgans_law.png){fig-alt="Negating a combined statement negates each individual statement, as well as the logical operator combining them."}
![](Images/demorgans_law.png){fig-alt="Negating a combined statement negates each individual statement, as well as the logical operator combining them."}

Let's take an example we we required those that embarked from Southampton or paid above £100 in fare.

Expand Down Expand Up @@ -1547,7 +1526,7 @@ A naming convention we must establish here is that of the tibbles themselves, na

Graphically:

![](../Images/sql-joins.png){fig-alt="Venn diagrams for each of the prior examples, with the included data shaded."}
![](Images/sql-joins.png){fig-alt="Venn diagrams for each of the prior examples, with the included data shaded."}

There are also Semi Joins and Anti Joins for filtering, which you can read about in Hadley Wickhams [R for Data Science Chapter 19](https://r4ds.hadley.nz/joins.html)

Expand Down
20 changes: 3 additions & 17 deletions Course_content/chapter_5_summary_agg.qmd → CH5_summary_agg.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -5,24 +5,10 @@ engine: knitr
execute:
echo: true
eval: false
format:
html:
highlight: null
theme:
light: flatly
dark: darkly
toc: true
toc-title: Contents
toc-location: right
toc-depth: 3
number-sections: true
link-external-newwindow: true
embed-resources: true
freeze: auto # re-render only when source changes
---

![](../Images/AF_DSC_banner.png){fig-alt="Data Science Campus and Analysis Function logos."}

> To switch between light and dark modes, use the toggle in the top right
> To switch between light and dark modes, use the toggle in the top left
# Learning Objectives

Expand All @@ -48,7 +34,7 @@ library(janitor)
# Prepare the dataset
titanic_data <- read_csv("../data/titanic.csv",
titanic_data <- read_csv("Data/titanic.csv",
na = c("*", ".", "", "NULL"))
titanic_data <- clean_names(titanic_data)
Expand Down
Loading

0 comments on commit d3780d8

Please sign in to comment.