Skip to content

Commit

Permalink
Renamed RStudio Cloud to Posit Cloud and added datasets
Browse files Browse the repository at this point in the history
  • Loading branch information
tbalbone31 committed Dec 3, 2024
1 parent 401b677 commit 32381f5
Show file tree
Hide file tree
Showing 12 changed files with 5,085 additions and 21 deletions.
16 changes: 8 additions & 8 deletions R_Studio_Cloud.qmd → Posit_Cloud.qmd
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: "R Studio Cloud"
title: "Posit Cloud"
author: "Government Analysis Function and ONS Data Science Campus"
engine: knitr
execute:
Expand All @@ -9,32 +9,32 @@ execute:

# Introduction

If you are having issues installing R Studio or are having issues with installing package as a back up we will prompt you to use R Studio Cloud.
If you are having issues installing R Studio or are having issues with installing package as a back up we will prompt you to use Posit Cloud.

RStudio Cloud is a hosted version of RStudio in the cloud that makes it easy for professionals, hobbyists, trainers, teachers and students to do, share, teach and learn data science using R.
Posit Cloud is a hosted version of RStudio in the cloud that makes it easy for professionals, hobbyists, trainers, teachers and students to do, share, teach and learn data science using R.

Create your analyses using RStudio directly from your browser - there is no software to install and nothing to configure on your computer.

# Set Up

You will be sent a link to a created project, firstly you will need to register for an account, click on the link below.

[R Studio Cloud Sign up](https://login.rstudio.cloud/register)
[Posit Cloud Sign up](https://login.rstudio.cloud/register)

Which will look like below, **Please sign in your personal email address**.

![](Images/r_studio_cloud_signin.png){fig-alt="Image showing R Studio Cloud"}
![](Images/r_studio_cloud_signin.png){fig-alt="Image showing Posit Cloud"}

Once signed up, it will take a while to load. You will see the message "Deploying Project" for a couple of minutes while it creates your Workspace.

![](Images/rstudio_cloud.png){fig-alt="Image showing R Studio Cloud"}
![](Images/rstudio_cloud.png){fig-alt="Image showing Posit Cloud"}

A project has been already created called **Intro to R**.

When you access a project created by someone else, RStudio Cloud automatically creates a temporary copy of the original project for you. You can play with and make edits to it, but none of your changes will be reflected in the original.
When you access a project created by someone else, Posit Cloud automatically creates a temporary copy of the original project for you. You can play with and make edits to it, but none of your changes will be reflected in the original.

Save a copy of the project for yourself by pressing the **Save a Permanent Copy button**. Which is in the top right corner.

Then you will end up with something that looks like the image below, feel free to amend some settings as outlined in **Chapter 1 Getting Started with R section 2.2**.

![](Images/rstudio_cloud2.png){fig-alt="Image showing R Studio Cloud"}
![](Images/rstudio_cloud2.png){fig-alt="Image showing Posit Cloud"}
14 changes: 14 additions & 0 deletions _freeze/CH1_getting_started/execute-results/html.json

Large diffs are not rendered by default.

14 changes: 14 additions & 0 deletions _freeze/CH2_data_structures/execute-results/html.json

Large diffs are not rendered by default.

14 changes: 14 additions & 0 deletions _freeze/CH3_import_export/execute-results/html.json

Large diffs are not rendered by default.

14 changes: 14 additions & 0 deletions _freeze/CH4_tibbles_dplyr/execute-results/html.json

Large diffs are not rendered by default.

14 changes: 14 additions & 0 deletions _freeze/CH5_summary_agg/execute-results/html.json

Large diffs are not rendered by default.

14 changes: 14 additions & 0 deletions _freeze/CH6_case_study/execute-results/html.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{
"hash": "88e82b34e9956af0ff61c20356828567",
"result": {
"markdown": "---\ntitle: \"Chapter 6 - Case Study\"\nauthor: \"Government Analysis Function and ONS Data Science Campus\"\nengine: knitr\nexecute:\n echo: true\n eval: false\n freeze: auto # re-render only when source changes\n---\n\n\n> To switch between light and dark modes, use the toggle in the top left\n\n# Introduction\n\nBy the end of this case study, you should have more confidence with manipulating data and using techniques from the first five chapters of Intro to R, as such, they are a **pre-requisite** for it.\n\nThese data sets and question are designed to be an initial springboard for you to continue with your data journey. \n\nAnswers are provided; but these may only show one or two ways of solving the issue. \n\n>**Your answers may differ slightly from ours, this is fine if the output is consistent, but consider whether you could achieve your answer with less or better written code.** \n\n\n## Structure:\n\nQuestions will be presented in tabs.\n\n* Tab 1 will contain the question \n* Tab 2 will contain the solution in R.\n\nPlease choose the tab with the language you wish to use.\nAn example is below.\n\n## Example \n::: {.panel-tabset}\n\n### **Question**{-}\n\nThis is an example question.\n\n### **Solution**{-} \n\n::: {.cell}\n\n```{.r .cell-code}\n# Solution cell\n\n\"Insert code here\"\n```\n:::\n\n:::\n\n\n# Question 1: Packages\n::: {.panel-tabset}\n\n## **Question**{-}\n\nLoad the following packages:\n\n* tidyverse\n* janitor\n\n## **Show Answer**{-}\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# load packages\n\nlibrary(tidyverse)\nlibrary(janitor)\n```\n:::\n\n:::\n\n# Question 2: Data \n::: {.panel-tabset}\n\n## **Question**{-}\n\nRead in the two files from the **data** folder below, assigning them to the variables suggested:\n\nnetflix - nextflix_data.csv\nimdb_scores - imdb_scores.csv\n\nNote - The data is sourced from [Tidy Tuesday](https://github.com/rfordatascience/tidytuesday) and directly from IMDB.\n\nSome data has been altered to suit the difficulty level of this course. This is a training dataset, and so shouldn't be relied upon for 100% accuracy.\n\n## **Show Answer**{-}\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Read in imdb and netflix data\n\nnetflix <- readr::read_csv(\"Data/netflix_data.csv\")\n\nimdb_scores <- readr::read_csv(\"Data/imdb_scores.csv\")\n```\n:::\n\n:::\n\n# Question 3 \n::: {.panel-tabset}\n\n## **Question**{-}\n\nClean up the column names of imdb_scores\n\n## **Show Answer**{-}\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Use janitor to clean names of imdb data\n\nimdb_scores <- clean_names(imdb_scores)\n\nnames(imdb_scores)\n```\n:::\n\n:::\n\n# Question 4 \n::: {.panel-tabset}\n\n## **Question**{-}\n\nWhat are the dimensions of the Netflix data?\n\nSee if you can output them in a sentence.\n\n## **Show Answer**{-}\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Find the dimensions with dim()\n\ndim(netflix) # Rows and columns\n```\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\n# Output a sentence with the dimensions\n\ncat(\"There are\", nrow(netflix), \"rows and\", ncol(netflix), \"columns in the neflix dataset.\")\n```\n:::\n\n:::\n\n# Question 5 \n::: {.panel-tabset}\n\n## **Question**{-}\n\nUse an inspection function to determine the datatypes of the columns in the Netflix data.\n\n## **Show Answer**{-}\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Have a glimpse of netflix\n\nglimpse(netflix)\n```\n:::\n\n:::\n \n# Question 6 \n::: {.panel-tabset}\n\n## **Question**{-}\n\nHow many missing values do we have in each dataset?\n\n## **Show Answer**{-}\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Number of missings in the netflix data\n\n\ncolSums(is.na(netflix))\n```\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\n# Number of missings in imdb data\n\ncolSums(is.na(imdb_scores))\n```\n:::\n\n:::\n\n# Question 7 \n::: {.panel-tabset}\n\n## **Question**{-}\n\nHow many times does each unique country occur in the dataset? \n\n## **Show Answer** {-}\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Number of unique categories in primary_country\n\nnetflix |> \n count(primary_country)\n```\n:::\n\n:::\n\n# Question 8\n::: {.panel-tabset}\n\n## **Question**{-}\n\nCreate a new tibble \"netflix_movies\" by filtering the netflix tibble to contain only \"Movie\". \n\n## **Show Answer**{-}\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Create a tibble with \"Movie\"s only\n\nnetflix_movies <- netflix |> \n filter(type == \"Movie\")\n\nglimpse(netflix_movies)\n```\n:::\n\n:::\n\n# Question 9 \n::: {.panel-tabset}\n\n## **Question**{-}\n\nUsing your netflix_movies tibble, clean the duration column by:\n\n* Removing the suffix \"min\".\n* Converting the resulting column to an integer\n\nFollowing this, rename the column to \"duration_mins\".\n\n> Note, you can do this in one pipeline!\n\nEnsuring that you overwrite and reassign the dataset!\n\n## **Show Answer**{-}\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Use mutate to clean the duration column\n\nnetflix_movies <- netflix_movies |> \n mutate(duration = as.integer(str_replace(duration, \n pattern = \"min\",\n replacement = \"\"))) |> \n rename(duration_mins = duration)\n\nglimpse(netflix_movies)\n```\n:::\n\n:::\n\n# Question 10\n::: {.panel-tabset}\n\n## **Question**{-}\n\nUsing your netflix_movies tibble, compute:\n\n* The mean and median duration of the movies\n* The mean and standard deviation of the cast numbers.\n\n## **Show Answer**{-}\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Compute summary statistics of duration and cast number\n\nnetflix_movies |> \n summarise(mean_duration = mean(duration_mins, na.rm = TRUE),\n median_duration = median(duration_mins, na.rm = TRUE),\n mean_cast = mean(num_cast, na.rm = TRUE),\n std_cast = sd(num_cast, na.rm = TRUE))\n```\n:::\n\n:::\n\n# Question 11 \n::: {.panel-tabset}\n\n## **Question**{-}\n\nUsing your netflix_movies tibble:\n\n* Select the title, duration, director and cast numbers\n* Sort in descending order of duration\n\nWhich movie was the longest, and who directed it?\n \n## **Show Answer**{-}\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Pipeline for longest movie\n\nnetflix_movies |> \n select(title, duration_mins, director, num_cast) |> \n arrange(desc(duration_mins)) |> \n glimpse()\n```\n:::\n\nThe longest movie on Netflix is Black Mirror: Bandersnatch, at 312 minutes, with no recorded director.\n\n:::\n\n# Question 12 \n::: {.panel-tabset}\n\n## **Question**{-}\n\nUsing your netflix_movies tibble:\n\nGroup by primary_country and obtain the median cast number.\n \n## **Show Answer**{-}\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Group by country\n\nnetflix_movies |> \n group_by(primary_country) |> \n summarise(var_cast = median(num_cast, na.rm = TRUE))\n```\n:::\n\n:::\n\n# Question 13 \n::: {.panel-tabset}\n\n## **Question**{-}\n\nUsing your netflix_movies tibble:\n\nGroup by type and rating of the movie, producing the mean duration.\n \n## **Show Answer**{-}\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Group by type and rating\n\nnetflix_movies |> \n group_by(type, rating) |> \n summarise(mean_duration = mean(duration_mins, na.rm = TRUE))\n```\n:::\n\n:::\n\n# Question 14\n::: {.panel-tabset}\n\n## **Question**{-}\n\nLeft join the imdb_scores data to the **original** netflix data.\n\nCreate a new variable netflix_imdb to contain this.\n\n\n## **Show Answer**{-}\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Join imdb and netflix\n\nnetflix_imdb <- netflix |> \n left_join(y = imdb_scores,\n by = \"title\")\n\nglimpse(netflix_imdb)\n```\n:::\n\n:::\n\n# Summary \n\nIn this case study you have had the opportunity to apply data analysis techniques with the tidyverse to some additional datasets. \n\nThis is not exhaustive; have a look at the data and experiment with other techniques you can use.\n\nThis data has been provided for you to experiment with; however there is nothing better than learning with data that is meaningful to you.\n\nFor additional datasets we recommend exploring:\n\n* [Kaggle](https://www.kaggle.com/)\n* [Tidy Tuesday](https://github.com/rfordatascience/tidytuesday)\n* [Data.gov](https://data.gov.uk/)\n\n",
"supporting": [],
"filters": [
"rmarkdown/pagebreak.lua"
],
"includes": {},
"engineDependencies": {},
"preserve": {},
"postProcess": true
}
}

Large diffs are not rendered by default.

14 changes: 14 additions & 0 deletions _freeze/index/execute-results/html.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{
"hash": "3422a9bcf98c3b27689f9e2ab455783e",
"result": {
"markdown": "---\ntitle: \"Course Information\"\nauthor: \"Government Analysis Function and ONS Data Science Campus\"\nengine: knitr\nexecute:\n echo: true\n eval: false\n freeze: auto # re-render only when source changes\nformat:\n html: \n highlight: null\n theme: \n light: flatly\n dark: darkly\n toc: true\n toc-title: Contents\n toc-location: right\n toc-depth: 3\n number-sections: true\n link-external-newwindow: true\n embed-resources: true\n \n---\n\n![](Images/AF_DSC_banner.png){fig-alt=\"Data Science Campus and Analysis Function logos.\"}\n\n> To switch between light and dark modes, use the toggle in the top left\n\n# Introduction\n\nThis course will cover basic concepts and give you the confidence to work independently in the R programming language. No prior coding or statistical knowledge is assumed, however you should be confident using basic computer software.\n\nThe course is split into chapters; each chapter will build upon the previous one. It will emphasise the application of skills, building confidence and resilience in programming.\n\nIt is essential that you have frequent opportunities to practice what you have learnt from the course.\n\n# Course Materials\n\nThe course materials come in several formats:\n\n* HTML pages such as the one you are reading now\n\n* Data [](datasets.qmd) we will use during the course. **It's highly recommended you create a project with a 'data' folder and download all the required datasets before starting the course**\n\nYou can also navigate to the course Github Repository and clone or fork the website structure for yourself. If you are new to programming and version control, we recommend you remain on the website to gain the best experience.\n\n\n# Software Requirements\n\n* R programming language \n* R studio (recommended but not essential)\n* Web browser (Internet connection not necessary)\n*\tExcel or other spreadsheet software for viewing csv and xlsx documents\n \n\n# Packages\n\nPackages are the fundamental units of reproducible R code. They include reusable R functions, the documentation that describes how to use them, and sample data. The following will be used in this course:\n\n* tidyverse\n* readxl\n* janitor\n\n# Pre-Course Check-list:\n\n* Install R and RStudio on your laptop as per your department's guidance.\n\n* Check your department's guidelines for installing packages.\n\n* Save the data from the ZIP file to your hard drive in your working directory.\n\n\n# Course Overview\n\nThe course is divided into 6 chapters, over the 2 days we will cover,\n\n1. **Chapter One - Getting Started with R**\n\n * Be familiar with R Studio.\n \n * RStudio environment, layout, and customization.\n \n * Understand the Key Benefits of using R.\n \n * How to run code in R.\n \n * Know where to get help.\n \n * Discover R’s data types.\n \n * Be able to create Variables.\n\n\n \n<br> \n \n2. **Chapter Two - Data Structures**\n\n * Be familiar with data structures in R.\n \n * Understand how vectors operate.\n \n * Be familiar with lists.\n \n * Be familiar with data frames and tibbles.\n\n\n\n<br> \n \n \n3. **Chapter Three - Importing and Exporting Data**\n\n * Organise our work\n \n * Have an understanding of what packages are.\n \n * Be able to load and install a package.\n \n * Be able to check package versions and R version.\n \n * Be able to import data from multiple formats.\n \n * Be able to inspect loaded data and select elements within the data frame.\n \n * Be able to export data.\n \n * Be able to explore data.\n\n\n\n<br>\n\n4. **Chapter Four - Tibbles and Dplyr**\n\n\n* Understand the importance of clean variable names.\n\n* Be able to clean column names using the janitor package.\n\n* Understand the use of the pipe operator.\n\n* Be able to sort data with dplyr’s arrange verb.\n\n* Be able to select data with dplyr’s select verb.\n\n* Be able to filter data with dplyr’s filter verb.\n\n* Be able to transform data with dplyr’s mutate verb.\n\n* Be able to join datasets together.\n\n\n\n<br>\n \n5. **Chapter Five - Summary Statistics and Aggregation**\n\n * Describe numeric and categorical data\n\n * Aggregate and data\n \n\n6. **Chapter Six - Case Study**\n",
"supporting": [],
"filters": [
"rmarkdown/pagebreak.lua"
],
"includes": {},
"engineDependencies": {},
"preserve": {},
"postProcess": true
}
}
18 changes: 10 additions & 8 deletions _quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,21 +14,23 @@ website:
- text: "Course Information"
href: index.qmd
- text: "R-Studio Cloud"
href: R_Studio_Cloud.qmd
- text: "Chapter 1 - Getting Started"
href: Posit_Cloud.qmd
- text: "1 - Getting Started"
href: CH1_getting_started.qmd
- text: "Chapter 2 - Data Structures"
- text: "2 - Data Structures"
href: CH2_data_structures.qmd
- text: "Chapter 3 - Import & Export"
- text: "3 - Import & Export"
href: CH3_import_export.qmd
- text: "Chapter 4 - Tibbles & Dplyr"
- text: "4 - Tibbles & Dplyr"
href: CH4_tibbles_dplyr.qmd
- text: "Chapter 5 - Summary Statistics & Aggregation"
- text: "5 - Summary Statistics & Aggregation"
href: CH5_summary_agg.qmd
- text: "Chapter 6 - Case Study"
- text: "6 - Case Study"
href: CH6_case_study.qmd
- text: "Chapter 7 - Control Flow, Loops & Functions"
- text: "7 - Control Flow, Loops & Functions"
href: CH7_control_flow_loops_and_functions.qmd
- text: "Datasets"
href: datasets.qmd

format:
html:
Expand Down
Loading

0 comments on commit 32381f5

Please sign in to comment.