Shiny Teaching Apps

Some applications for self-discovery of statistical concepts and rules-of-thumb. If you use these, let me know! Any suggestions for improvement can be raised in the Issues tab on GiHub.

To run these, you may need to install dplyr, ggplot2, and patchwork packages. If you get a ... not found message, you're probably missing a package.

You can copy and paste the code from here, or you can save ScriptToRunApps.R to your computer and run them from there.

Quick Reference

Tools
- pnorm: Calculate and visualize normal probabilities.
- pvalues: Calculate and visualize p-values (normal distr).
- distrshapes: How parameters affect shape of continuous distributions.
- distrshapes_disc: How parameters affect shape ofdiscrete distributions.
- SimplePower: Calculate power for simple null/alt (normal distr).
Self-Discovery Apps
- Binormail: Normal approximation to the binomial.
- PoisBinApprox: Poisson (and normal) approximation to the binomial.
- QQDistrFittting: QQ Plots to test distributional assumptions.
- ScatterCorr: Scatterplots with a fixed correlation.
- InfluentialPoint: Move around a point to see it's affect on the line.
- MultipleRegression Penguins: See the effects of including/removing predictors.
- Polynomial Fits: See the effects of overfitting/underfitting with polynomials.
- SerialCorrelation: Demonstrate autocorrelation and the runs test.
- MeanLessMeansLeft: Generate distribution with fixed mean and median.
- DensHist: Compare density plots to histograms.
- MeasureSpread: Fixed IQR and SD.
- PoissonCatQuant: Barplots versus histograms.
- ConditionalNormal2: Visualization of conditional distributions.
- indep: Visualization of independence.
- nLarge: How large must n be for CLT?.
- Z_or_t: How large must n be for normal p-value?.
- ci: Confidence Intervals.
- samplingDist: Sampling Distributions.
- gettysburg: Sampling designs using the Gettysburg Address.
Animations
- transform_norm: Tansformation of a Normal to Lognormal
- BlockVariance: Blocking reduces variance.
Credit where credit is due: Teaching materials that I can't beat.
Spatial Stats Apps
- GausProcess_Matern: How Matern parameters affect a 1D Gaussian process.
- GausField_Matern: How Matern parameters affect a 2D Gaussian process.
- SpatialFun/Kfunciton: A for loop to create an animation of a K-function.

Tools

pnorm

A simple app to calculate normal probabilities
Displays the R code used in the title.
-4 = -Inf, 4 = Inf

shiny::runGitHub(repo = "DBecker7/DB7_TeachingApps", 
    subdir = "Tools/pnorm")

pvalues

Calculate p-values, given a z-statistic.
See the difference in p-values for different hypotheses.
Demonstrate why we double the p-value for two-sided, and why we use absolute values.

shiny::runGitHub(repo = "DBecker7/DB7_TeachingApps", 
    subdir = "Tools/pvalues")

distrshapes

See how the shape changes with different parameters
Axes are "sticky" - they increase to fit new data, but don't decrease until you change the distribution or click "Reset axes".
The sampled data are also sticky - the seed only changes when the "New Data" button is pressed.

shiny::runGitHub(repo = "DBecker7/DB7_TeachingApps", 
    subdir = "Tools/distrshapes")

distrshapes_disc

Discrete version of the app above.
Ghosts are blue and fade away.
- Max 10 ghosts before it gets laggy.

shiny::runGitHub(repo = "DBecker7/DB7_TeachingApps", 
    subdir = "Tools/distrshapes_disc")

SimplePower

A visualization for power in the simple null/alternate situation.
- Others have made this before, but this one is mine.
Purple shaded area is Type 1 error, green shaded area is 1 - Type 2 error.

shiny::runGitHub(repo = "DBecker7/DB7_TeachingApps", 
    subdir = "Tools/SimplePower")

Self-Discovery Apps

Binormial

Demonstrates why we check both np and n(1-p) for the normal approximation to the binomial distribution.
Currently very barebones.

shiny::runGitHub(repo = "DBecker7/DB7_TeachingApps", 
    subdir = "Apps/Binormial")

PoisBinApprox

The Poisson (and Normal) are good approximations to the binomial distributions in different situations.
The Poisson distribution is useful when p is small, while the normal distribution is useful when both np and n(1-p) are larger than, say, 10 or so (this is a rule-of-thumb, not some magical value).
Includes sliders for sample size and probability of success.

shiny::runGitHub(repo = "DBecker7/DB7_TeachingApps", 
    subdir = "Apps/PoisBinApprox")

QQDistrFitting

Demonstration of the usefulness of QQ plots in assessing distributional assumptions.
The app shows the histogram (with estimated density overlaid) and the qq-plot (which does not need an estimate of the parameters).
The theoretical distribution can be changed to something other than Normal (currently just Gamma).

shiny::runGitHub(repo = "DBecker7/DB7_TeachingApps", 
    subdir = "Apps/QQDistrFitting")

ScatterCorr

Allows students to discover what different correlations look like.
Allows you to change the slope independently of the correlation, demonstrating that they're not the same thing.
Sliders (with animation) for the slope and correlation.
Doesn't generate new data until specified, so animations allow the student to watch the correlation change.

shiny::runGitHub(repo = "DBecker7/DB7_TeachingApps", 
    subdir = "Apps/ScatterCorr")

InfluentialPoint

The influence of a point depends on where it is in relation to the line as well as to the point (x bar, y bar).
Think of drawing axes at (x bar, ybar). This creates four quadrants, two of which contain the line. Points in the quadrants without the line have more influence.

shiny::runGitHub(repo = "DBecker7/DB7_TeachingApps", 
    subdir = "Apps/InfluentialPoint")

MultipleRegressionPenguins

Add/remove predictors and see the effect.
Flipper Length is strongly correlated, bill length/depth are not.
- Adding bill measurements still changes the estimate for flipper length!

shiny::runGitHub(repo = "DBecker7/DB7_TeachingApps", 
    subdir = "Apps/MultipleRegressionPenguins")

polyFit

The wrong polynomial model will lead to bias.
Bias can mean lack of generalizability!

shiny::runGitHub(repo = "DBecker7/DB7_TeachingApps", 
    subdir = "Apps/polyFit")

SerialCorrelation

Demonstrates the idea of serial correlation.
Lag 6 is a bit overkill, but allows for seasonal effects.
Also includes the runs test and the Durbin-Watson test statistic.

shiny::runGitHub(repo = "DBecker7/DB7_TeachingApps", 
    subdir = "Apps/SerialCorrelation")

MeanLessMeansLeft

How the mean and median affect the skew.
Uses a Gamma distribution, so some parameter combos lead to a singularity at 0.
Please note that it took me a while to figure out how to (efficiently) generate a Gamma distribution with a pre-specified mean and median. I want credit for this.

shiny::runGitHub(repo = "DBecker7/DB7_TeachingApps", 
    subdir = "Apps/MeanLessMeansLeft")

DensHist

Exploration of the connection between binwidth and bandwidth.
A density plot can be found as the limit as n approaches infinity and the binwidth approaches 0.
This tool lets students explore that while also seeing how the histogram changes with binwidth and the density plot changes with bandwidth.

shiny::runGitHub(repo = "DBecker7/DB7_TeachingApps", 
    subdir = "Apps/DensHist")

MeasureSpread

Explore the relationship between IQR and standard deviation.
For the normal distribution, the IQR and sd have a consistent relationship. In particular, the sd is constant factor times the IQR, regardless of what the sd is!
For real data, almost any (IQR, sd) pair is possible.
- I wrote a function to fix the IQR and perturb the data until I get the sd that I want. It doesn't always work perfectly.

shiny::runGitHub(repo = "DBecker7/DB7_TeachingApps", 
    subdir = "Apps/MeasureSpread")

PoissonCatQuant

A histogram is just a bar chart where some of the bars are merged.
For categorical data with a large number of categories, histograms are often preferred.
If there aren't many categories, a bar chart may be better.
This app makes use of the negative binomial distribution to show differing numbers of categories.
- When the overdispersion is 0, this is a Poisson distribution.

shiny::runGitHub(repo = "DBecker7/DB7_TeachingApps", 
    subdir = "Apps/PoissonCatQuant")

ConditionalNormal2

Demonstrating the concept of conditional distributions using the bivariate normal.
Uses the rgl library to display the bivariate normal.
Unfortunately, the rotation resets each time and I don't know how to fix this.
ConditionalNormal also exists, which does not have an interactive 3D plot (uses static plots from plot3D).

shiny::runGitHub(repo = "DBecker7/DB7_TeachingApps", 
    subdir = "Apps/ConditionalNormal2")

indep

Showing that independence doesn't look like anything special.
When changing P(A) and P(B), P(A and B) is automatically set to a value that makes them independent.
After that, P(A and B) can be changed.
Things change when the user chooses disjoint sets or sets either P(A) or P(B) to 0

shiny::runGitHub(repo = "DBecker7/DB7_TeachingApps", 
    subdir = "Apps/indep")

nLarge

A lot of places say that the normal approximation works when n is "Large", then say that 30 or 40 or 50 is large.
This app shows what happens in the most skewed distributions, and how the CLT still applies when the population is far from normal.
Note that the normal approximation is actually biased, and this bias decreases as n increases.

shiny::runGitHub(repo = "DBecker7/DB7_TeachingApps", 
    subdir = "Apps/nLarge")

Z_or_t

Another way to test what counts as a "large" n.
Should you use the normal distribution or the t distribution?
Formulae are added as a legend for which sampling distribution is which.
Code to generate sampling distributions is shown below plot.

shiny::runGitHub(repo = "DBecker7/DB7_TeachingApps", 
    subdir = "Apps/Z_or_t")

ci

The classic app to generate samples, find the CI, then keep a record of them
Shows the coverage so that students can investigate how the coverage relies on n.
Changing n, mu, and sigme re-generate data, but alpha does not. This allows users to see how the coverage changes with alpha.
Has buttons to add 1 at a time or jump up by 5, 25, or 100.

shiny::runGitHub(repo = "DBecker7/DB7_TeachingApps", 
    subdir = "Apps/ci")

samplingDist

A re-creation of a classic app with a few tweaks.
Generate samples, calculate the mean, record it, and show a histogram of all sampled values.
Includes means and sds of samples, sampling distribution, and population.
When a new sample is generated, the colours reflect the new value(s). This is most obvious when adding 1 and when adding 100 many many times.

shiny::runGitHub(repo = "DBecker7/DB7_TeachingApps", 
    subdir = "Apps/samplingDist")

gettysburg

Calculating the average word length in the Gettysburg Address.
Students can see what gets sampled using SRS, stratified, or cluster sampling.
For stratified, shows the sample of words in each stratum (paragraph).

shiny::runGitHub(repo = "DBecker7/DB7_TeachingApps", 
    subdir = "Apps/gettysburg")

Animations

transform_norm

Animation for transformation of normal to lognormal.

library(dplyr)
library(gganimate) # loads ggplot2

x1 <- seq(-3,3,0.1)
y1 <- dnorm(x1)
x2 <- exp(x1)
y2 <- y1/exp(x1)

# Testing
#plot(x2, y2)
#points(x1, y1)
#curve(dlnorm(x), add = TRUE, col = 2)

mydf <- bind_rows(
    data.frame(x = x1, y = y1, trans = "norm", 
        col = case_when(x1 == -1 ~ 1,
            x1 == 0 ~ 2, 
            x1 == 1 ~ 3, 
            x1 == 2 ~ 4, TRUE ~ 0)),
    data.frame(x = x2, y = y2, trans = "lnorm", 
        col = case_when(x2 == exp(-1) ~ 1,
            x2 == exp(0) ~ 2, 
            x2 == exp(1) ~ 3, 
            x2 == exp(2) ~ 4, TRUE ~ 0))
)

ggplot(mydf, aes(x = x, y = y,
        colour = factor(col), size = col > 0)) + 
    theme_minimal() + 
    scale_colour_manual(values = c(1,2,4,6,7)) + 
    scale_x_continuous(breaks = c(exp(-1), 0, exp(0), 
            2, exp(2), seq(-3,25,1)[-5]), 
        labels = c("e^-1", "0", "e^0", "2", "e^2", 
            seq(-3,25,1)[-5])) +
    transition_states(states = trans, 
        transition_length = 1/2, state_length = 1/2) +
    stat_function(fun = dnorm,  
        colour = 4, n = 500, size = 1) +
    stat_function(fun = dlnorm, 
        colour = 2, n = 500, size = 1) +
    geom_point() +
    coord_cartesian(xlim = c(-3,7)) +
    theme(legend.position = "none", 
        title = element_text(size = 14)) +
    annotate(geom = "text", x = c(0, exp(-1)), 
        y = c(0.4,0.66), 
        label = c("y1 = dnorm(x)", "y2 = y1/exp(x)"), 
        hjust = c(1.1,-0.1), size = 6, colour = c(4,2)) +
    labs(y = "Density Function", 
        title = "Transformation to Lognormal",
        subtitle = paste0("The red curve is dlnorm(x1),",
            "the points are transformed",
            "\nas x2 = exp(x1); y2 = dnorm(x1)/exp(x1)."))

anim_save("Animations/transform_norm.gif")

BlockVariance

library(gganimate)
set.seed(2112) # for reproducibility
g1 <- rnorm(400, 0, 1.5)
g2 <- rnorm(400, 4, 1.5)

# Density estimates with same range/bandwidth
g1dens <- density(g1, from = min(g1, g2),
    to = max(g1,g2), n = 400)
g2dens <- density(g2, from = min(g1, g2), 
    to = max(g1,g2), n = 400, bw = g1dens$bw)
g3dens <- density(c(g1,g2), from = min(g1, g2), 
    to = max(g1,g2), n = 400, bw = g1dens$bw)

# as a dataframe
gnames <- c(paste0("Group 1: Var=", round(var(g1), 3)), 
    paste0("Group 2: Var=", round(var(g2), 3)), 
    paste0("Group 3 (Combined): Var=", 
        round(var(c(g1, g2)), 3)))
allg <- data.frame(x = rep(g1dens$x, 3), 
    y = c(g1dens$y, g2dens$y, g3dens$y),
    group = rep(gnames, each = length(g1dens$x)))
allg$frame <- 1

# centered
allg2 <- allg
allg2$x <- c(g1dens$x - mean(g1), g2dens$x - mean(g2), 
    g3dens$x - mean(c(g1,g2)))
allg2$frame <- 2
ggplot(allg2, aes(x = x, y = y, colour = group)) + 
    geom_line()

# 0th frame - all densities together
allg0 <- data.frame(x = rep(g1dens$x, 3), 
    y = rep(g3dens$y, 3), 
    group = rep(gnames, each = length(g1dens$x)), 
    frame = 0)

all3 <- dplyr::bind_rows(allg, allg2, allg0)

ggplot(all3, aes(x = x, y = y, colour = group)) + 
    geom_line(size = 1.5) +
    scale_colour_manual(values = c(2, 4, 1)) + 
    transition_states(frame, wrap = FALSE) +
    theme_bw() +
    theme(legend.position = "bottom", 
        axis.title = element_text(size = 14), 
        title = element_text(size = 16),
        axis.text = element_text(size = 12),
        legend.text = element_text(size = 11)) + 
    labs(x = "x", y = "Density", colour = NULL,
        title = "Blocking reduces variance",
        subtitle = "Individual densities have smaller variance than combined.") 

anim_save("Animations/BlockVariance.gif")

Credit where credit is due

CLT: This app is a classic, and there's no reason for me to top it. http://onlinestatbook.com/stat_sim/sampling_dist/

Importance of visualizations: Ya can't beat the datasauRus dozen from https://www.autodeskresearch.com/publications/samestats. It's an update of Anscombe's quartet with even more interesting features. It's also a great way to demonstrate some tidyverse/ggplot2 functions!

The following code chunks are both standalone scripts. The resulting plots are good for demonstration.

# Load some packages
library(datasauRus)
library(ggplot2)
# as always
theme_set(theme_bw()) 
library(dplyr)

# All of these plots have the same summary statistics,
    # including xbar, ybar, sd_x, sd_y, and correlation
data("datasaurus_dozen")
# remove a dataset for 3x4 plot
filter(datasaurus_dozen,
    dataset != "slant_up") %>% 
    ggplot(aes(x = x, y = y)) + 
        geom_point() + 
        facet_wrap(~ dataset, ncol = 3) +
        labs(title = "All have same summary statistics")

datasaurus_dozen %>% 
    group_by(dataset) %>% 
    summarise(m_x = mean(x), m_y = mean(y),
        s_x = sd(x), s_y = sd(y), r = cor(x,y)) %>% 
    knitr::kable(digits = 3)

dataset	m_x	m_y	s_x	s_y	r
away	54.266	47.835	16.770	26.940	-0.064
bullseye	54.269	47.831	16.769	26.936	-0.069
circle	54.267	47.838	16.760	26.930	-0.068
dino	54.263	47.832	16.765	26.935	-0.064
dots	54.260	47.840	16.768	26.930	-0.060
h_lines	54.261	47.830	16.766	26.940	-0.062
high_lines	54.269	47.835	16.767	26.940	-0.069
slant_down	54.268	47.836	16.767	26.936	-0.069
slant_up	54.266	47.831	16.769	26.939	-0.069
star	54.267	47.840	16.769	26.930	-0.063
v_lines	54.270	47.837	16.770	26.938	-0.069
wide_lines	54.267	47.832	16.770	26.938	-0.067
x_shape	54.260	47.840	16.770	26.930	-0.066

Boxplots hide shapes: From the same people who brought you the datasaurus dozen!

# I need a surprising amount of packages for this
library(datasauRus)
library(ggplot2)
theme_set(theme_bw()) # as always
library(dplyr)
library(patchwork)
library(tidyr)

data("box_plots")
# to make my code more compact (faceting)
box_plots_long <- pivot_longer(data = box_plots, cols = 1:5,
    names_to = "dataset", values_to = "x")

boxes <- ggplot(box_plots_long, aes(x = x)) + 
    geom_boxplot() + 
    facet_wrap(~ dataset, ncol = 1)
histos <- ggplot(box_plots_long, aes(x = x)) + 
    geom_histogram(colour = 1, fill = "lightgrey", bins = 30) + 
    facet_wrap(~ dataset, ncol = 1)
    
# patchwork is a magical package
boxes + histos +
    plot_annotation(
        title = "Boxplots hide more complicated shapes"
    )

Spatial Stats Apps

The following apps are for my own exploration of spatial statistics. Simulating the data and exploring the parameters is my favourite way to comprehend the underlying concepts.

GausProcess_Matern

GPs are vital to any spatial processes with a Gaussian term, so this app helps to understand how the parameters affect the process.

shiny::runGitHub(repo = "DBecker7/DB7_TeachingApps", subdir = "SpatialFun/GausProcess_Matern")

GausField_Matern

Like a Gaussian Process, but a field instead. Still based on Matern covariance.

shiny::runGitHub(repo = "DBecker7/DB7_TeachingApps", subdir = "SpatialFun/GausField_Matern")

SpatialFun/Kfunction

Animation (using a for loop and Sys.sleep, rather than being an image) of the calculation of the K-function. Can be found under SpatialFun/Kfunction.R.

Name		Name	Last commit message	Last commit date
Latest commit History 273 Commits
Animations		Animations
Apps		Apps
Figures		Figures
SpatialFun		SpatialFun
Tools		Tools
.gitignore		.gitignore
Common Distributions.docx		Common Distributions.docx
Distr_Table.docx		Distr_Table.docx
README.md		README.md
ScriptToRunApps.R		ScriptToRunApps.R

DBecker7/DB7_TeachingApps

Folders and files

Latest commit

History

Repository files navigation