Skip to content

lopierra/diversity_data

Repository files navigation

title author date output
Playing with {ggplot2} extensions
Pierrette Lo
6/7/2020
html_document
keep_md
true

Recently, a friend asked me to make a simple data visualization for her. The dataset was tiny and, to be honest, not super interesting (a very simple survey of [not very much] diversity among her department's leadership and overall membership). But the nice thing about a simple dataset is that for once I could spend less time on data cleaning and more time playing with aesthetics.

Here are the libraries I used:

library(tidyverse)
library(readxl)
library(ggalt)
library(patchwork)
library(ggtext)

Data Wrangling

The data I was given was an Excel sheet that looked like this:

I started by doing a bit of data cleanup in Excel. If the dataset had been larger, I might have tried using {readxl} to clean it up in R, but in this case it took about 30 seconds to do it in Excel.

I separated each table onto a different tab, added a "personnel" header (for "Overall" vs "Leadership" categories), and corrected the typo in Race ("Indiana").

I next used the {readxl} package to import the Excel data...

# specify path of original data
path <- "blog_data.xlsx"

# read in all sheets as a named list
data <- path %>% 
  excel_sheets() %>% 
  set_names() %>% 
  map(read_xlsx, path = path)

# split list into separate dataframes
list2env(data, .GlobalEnv)

...And now I have three little dataframes.

Table: ethnicity

personnel Hispanic Not Hispanic Unsp
Dept Overall 0.054 0.946 1E-3
Dept Leadership 0.030 0.970 NA
gender
personnel Female Male Unsp
Dept Overall 0.346 0.654 0
Dept Leadership 0.364 0.636 0
race
personnel White African American American Indian Asian Native Hawaiian Multi-race Unsp
Dept Overall 0.72 0.066 6.0000000000000001E-3 0.152 2E-3 4.2000000000000003E-2 1.2E-2
Dept Leadership 0.85 0.030 NA 0.120 NA NA NA

Next up: tidying each dataframe (yes, I copied and pasted more than twice and therefore should have written some functions, but again I was in a hurry to get to the fun part).

Here's what I did for the ethnicity dataframe, which I repeated for gender and race.

ethnicity <- ethnicity %>%
  
  # convert all columns except personnel to numeric
  mutate_at(vars(-personnel), as.numeric) %>% 
  
  # make it tidy (i.e. long) format
  pivot_longer(-personnel, names_to = "ethnicity", values_to = "percent") %>% 
  
  # convert decimals to percentages; convert `ethnicity` and `personnel` to factors
  mutate(percent = percent * 100,
         ethnicity = as.factor(str_replace(ethnicity, "Unsp", "Unspecified")),
         personnel = as.factor(personnel)) %>% 
  
  # replace NAs with 0 (after confirming with my friend that this was the intent)
  replace_na(list(percent = 0))
ethnicity
personnel ethnicity percent
Dept Overall Hispanic 5.4
Dept Overall Not Hispanic 94.6
Dept Overall Unspecified 0.1
Dept Leadership Hispanic 3.0
Dept Leadership Not Hispanic 97.0
Dept Leadership Unspecified 0.0

Data Visualization

Now for the fun stuff! There's a whole universe of {ggplot2} extensions, many (but not all) of which are listed here.

I picked a few that I had been wanting to play with: {bbplot} for theme, {ggalt} for dumbbell plots, {patchwork} to arrange plots, and {ggtext} for HTML text styling.

I started by setting up a custom theme for my plots -- largely borrowed from the BBC's {bbplot} package.

The preset theme can be applied directly as a ggplot layer using bbplot::bbc_style(), but I made some tweaks and saved it as my_theme.

my_colors <- c("#FAAB18", "#1380A1")

my_theme <- theme_light() +
  theme(axis.ticks = element_blank(), 
        axis.line = element_blank(), 
        panel.grid.minor = element_blank(),
        panel.grid.major.y = element_line(color = "#cbcbcb"),
        panel.grid.major.x = element_blank(), 
        panel.background = element_blank(),
        panel.border = element_blank())

theme_set(my_theme)

Now I create each bar plot (for Gender, Ethnicity, and Race) separately.

  • Reorder gender by percent
  • Set y axis 0-100 so all plots have the same range
  • Use custom colors (from bbplot::bbc_style)
  • Use color in bars (in addition to fill) so 0 shows as a line
p1 <- ethnicity %>%
  mutate(ethnicity = fct_reorder(ethnicity, percent, na.rm = T)) %>% 
  ggplot(aes(x = ethnicity, y = percent, fill = personnel, color = personnel)) +
  geom_col(position = "dodge") +
  coord_flip(ylim = c(0, 100)) +
  ggtitle("Ethnicity") +
  xlab(NULL) +
  ylab(NULL) +
  scale_fill_manual(values = my_colors) +
  scale_color_manual(values = my_colors)

p2 <- gender %>%
  mutate(gender = fct_reorder(gender, percent)) %>% 
  ggplot(aes(x = gender, y = percent, fill = personnel, color = personnel)) +
  geom_col(position = "dodge") +
  coord_flip(ylim = c(0, 100)) +
  ggtitle("Gender") +
  xlab(NULL) +
  ylab(NULL) +
  scale_fill_manual(values = my_colors) +
  scale_color_manual(values = my_colors)

p3 <- race %>%
  mutate(race = fct_reorder(race, percent, na.rm = T)) %>% 
  ggplot(aes(x = race, y = percent, fill = personnel, color = personnel)) + 
  geom_col(position = "dodge") +
  coord_flip(ylim = c(0, 100)) +
  ggtitle("Race") +
  xlab(NULL) +
  ylab(NULL) +
  scale_fill_manual(values = my_colors) +
  scale_color_manual(values = my_colors)

Then I use {patchwork} to stitch them together, and {ggtext} to add color to the title in lieu of a legend.

  • "Collect" guides so legends from each plot are treated the same (ie. deleted)
  • Use {ggtext} element_textbox_simple or element_markdown to allow html in title
p3 + (p1 / p2) +
  plot_layout(guides = "collect") +
  plot_annotation(title = "<span style='font-size:18pt'>Diversity in Department <b style='color:#FAAB18;'>Leadership</b> vs <b style='color:#1380A1;'>Overall</b></span>",
                  subtitle = "Percentages of personnel in each category are shown",
                  theme = theme(plot.title = element_markdown(lineheight = 1.1))) &
  theme(legend.position = "none")

I also repeated the above, but with Race shown in a dumbbell plot:

p4 <- race %>%
  mutate(race = fct_reorder(race, percent, na.rm = T)) %>%
  pivot_wider(names_from = personnel, values_from = percent) %>% 
  ggplot() +
  geom_dumbbell(aes(x = `Dept Overall`, xend = `Dept Leadership`, y = race),
                size = 3, 
                colour = "#dddddd", 
                colour_x = "#1380A1", 
                colour_xend = "#FAAB18",
                show.legend = F) +
  coord_cartesian(xlim = c(0,100)) +
  ggtitle("Race") +
  xlab(NULL) +
  ylab(NULL)

And here's the patchwork:

p4 + (p1 / p2) +
  plot_layout(guides = "collect") +
  plot_annotation(title = "<span style='font-size:18pt'>Diversity in Dept <b style='color:#FAAB18;'>Leadership</b> vs <b style='color:#1380A1;'>Overall</b></span>",
                  subtitle = "Percentages of personnel in each category are shown",
                  theme = theme(plot.title = element_markdown(lineheight = 1.1))) &
  theme(legend.position = "none")

BONUS TIP!

Thanks to this helpful post, I discovered that you can use xaringan's Infinite Moon Reader to get live previews of RMarkdown documents (not just xaringan slides!).

After installing {xaringan}, you can either run xaringan:::inf_mr() or select "Infinite Moon Reader" from the RStudio Addins drop-down menu.

The preview will appear in the RStudio Viewer pane, and it will refresh every time you save changes to your Rmd. So much better than knitting every time you want to check your formatting!

About

Making a figure with ggplot2 extensions

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published