This assignment introduces you to a new type of plot called ridgeline plot, and also require you to apply the skills you have learned so far in the course. Suppose we have a fictional dataset of enrolment at VinUni from the year 2105 to 2125. We will be using this fictional enrolment data, plus some Vietnam population data to construct a population pyramid plot.
Similar to the weekly AEs and prior homeworks, you should submit your work in 2 formats: Rmd and pdf to Canvas.
Answers for non-coding questions should be added in the file as plain text.
- Recreate the example plot to the best of your ability (minor differences like font size and figure size are accepted).
- Provide a short comment about what information you are able to intepret from this ridgeline plot.
Install the package ggridges (quick beginner's guide). Use ggridges to create a ridgeline plot of the distribution of percentage of enrolled students by college in the fictional VinUni dataset college_data_normalized.csv.
Recreate the following plot to the best of your ability (minor differences like font size and figure size are accepted), and provide a short comment about what information you are able to intepret from this ridgeline plot:
You should do this by creating a function called theme_college_stats() to store your theme for the following exercises. In particular, the theme should have the following properties:
- Centered and bold plot title.
- Only y-axis major grid lines are visible.
- Axis title and ticks labels should be bold.
- Font size should be readable and does not need to be identical to the example plot provided above.
Additionally, the font family used in the above plot is Open Sans from Google Font.
The color palette used for color coding the colleges is: #78d6ff for CAS, #ffee54 for CBM, #992212 for CECS, and #117024 for CHS.
Finally, please provide a short comment about what information you are able to intepret from this ridgeline plot.
- Filter the VinUni enrolment data to extract only data from the year 2118.
- Recreate the example plot to the best of your ability (minor differences like font size and figure size are accepted).
Recreate the following plot to the best of your ability (minor differences like font size and figure size are accepted):
- Recreate the example plot to the best of your ability (minor differences like font size and figure size are accepted).
- Provide a short comment comparing the effectiveness of the pie chart and the bar chart from Task 2 to represent the same data.
Recreate the following plot to the best of your ability (minor differences like font size and figure size are accepted):
Hint: If you're unsure how to create this pie chart, check out coord_polar() documentation.
- Create a series of 11 individual population pyramid plots for Vietnam from 1975 to 2025.
- Create a theme to style the plots as close as possible to the example plot provided below (you can use other color schemes).
- Provide a comment describing what information does a population pyramid of a year tell you?
- Provide a comment, based on the 11 plots, describing your idea of what is going on with the population in Vietnam? Optional: Provide your hypothesis/explanation to why that is happening.
Create a series of 11 individual population pyramid plots for Vietnam from 1975 to 2025, based on the data files Viet Nam-####.csv. In particular, the x-axis should represent the percentage of total population while the y-axis should be the age group. Below is an example plot of the data for 2020:
Please try to recreate the plot to the best of your ability. You can pick any color schemes you would like for your plot. In the example plot above, the colors #109466 and #112e80 are used for female and male respectively.
To avoid copy and pasting a lot of code, you can create a function called create_population_pyramid(file_name) to generate a population pyramid with the defined style and theme for a given data file.
Please describe what information does a population pyramid of a year tell you? Based on the 11 plots you've just generated, describe your idea of what is going on with the population in Vietnam? Optional: Provide your hypothesis/explanation to why that is happening.
- Create a line graph for Vietnamese total population count from 1975 to 2025.
Using all provided data files for Vietnamese population from 1975 to 2025, create a line graph for Vietnamese total population count from 1975 to 2025. As the data for 2025 is predicted data, plot it with a dashed line from 2020 to highlight that it is a prediction.
Recreate the following plot to the best of your ability (minor differences like font size and figure size are accepted):
You can pick any color schemes you would like for your plot. In the example plot above, the color #112e80 is used for the lines and points and #999999 is used for the dashed line.
Find an interesting data visualization online and provide a critique of it. It can be a great visualization or a terrible one, up to you!
In 3-4 sentences:
- Provide and introduction to the visualization and an image of it along with proper credit/citation. To insert an image in R Markdown, use:
 - Describe the purpose of the visualization and the question it is attempting to answer.
In bullet points: identify the strengths and weaknesses of the visualization.
In two to three paragraphs (80-150 words): give a critique/analysis/constructive comments about the visualization. Apply the you knowledge of effective visualization and what you've have learned in the course to your analysis. Can you suggest some improvements that can be made to the visualization?
Side note: inserting an image in R Markdown should look something like this:






