Sampling Distributions

IMS, Ch. 12

Prof Randi Garcia

2026-03-09

Distribution of random variables

Let’s make up some random variables

Roll the 🎲 10 times
\(n = 10\) is our sample size
Record in our spreadsheet:
1. \(X\): the mean number of pips of the 10 rolls
2. \(Y\): the median number of pips f the 10 rolls
3. \(Z\): the number of rolls out of 10 with an even number of pips
4. \(W\): the number of sixes out of 10 rolls

Read data from Google Sheet

library(tidyverse)
library(googlesheets4)
gs4_deauth()
dice <- read_sheet("https://docs.google.com/spreadsheets/d/1lmR7wBOfEmz9csNQVVhdJrdM7XnurPVuM1kQSKup4TI/") |>
  mutate(id = row_number())
dice

# A tibble: 32 × 6
       X     Y     Z     W color    id
   <dbl> <dbl> <dbl> <dbl> <chr> <int>
 1   5.8   6      10     9 black     1
 2   5.6   6      10     9 black     2
 3   5.6   6      10     9 black     3
 4   3.8   4.5     4     2 white     4
 5   3.8   5       4     1 white     5
 6   5.4   6       8     8 black     6
 7   6     6      10    10 black     7
 8   3.5   4.5     3     2 white     8
 9   5.3   6       7     7 black     9
10   5.5   6       7     7 black    10
# ℹ 22 more rows

Compute point estimates

dice |>
  group_by(color) |>
  summarize(
    x_bar = mean(X),
    y_bar = mean(Y),
    z_bar = mean(Z),
    w_bar = mean(W),
  )

# A tibble: 2 × 5
  color x_bar y_bar z_bar w_bar
  <chr> <dbl> <dbl> <dbl> <dbl>
1 black  5.58  6.04  8.85  8.38
2 white  3.5   3.58  3.83  1.5

\(\bar{x}\) is our best estimate of \(E[X] = \mu_{X}\)

Key idea

Every random variable has a probability distribution
Every random variable has an expected value (center of distribution)
Every random variable has a variance (spread around the center)
What are the distributions of \(X\)? \(Y\)? \(Z\)?

View distribution of r.v.’s

dice |>
  pivot_longer(-c(color, id), names_to = "r_v", values_to = "value") |>
ggplot(aes(x = value, fill = color)) +
  geom_histogram(bins = 25) +
  facet_wrap(vars(r_v))

Your turn

What do you observe about the distribution of \(X\)?
- Center, shape, and spread?
Are the black and white 🎲 different?
- How do you know?
What do you suspect the distribution of \(X\) would look like if we had rolled 20 times?

Segue

The distribution of \(X\) is called a sampling distribution

Sampling distributions

We cheated

IRL, we can’t conduct 32 samples in parallel like we just did
We only get one
The single sample mean is still our best point estimate of the population mean
What can we say about our uncertainty around that point estimate?

Sampling distribution

The bootstrap

What if we resample from our sample?

The bootstrap

Developed by Brad Efron in 1979
- National Medal of Science (2005)
Resample from your sample with replacement!
Bootstrap distribution is similar to sampling distribution

In practice

library(infer)
pips10_bstrap <- dice |>
  filter(color == "white") |>
  specify(response = X) |>
  generate(1000, type = "bootstrap") |>
  calculate(stat = "mean")
pips10_bstrap

Response: X (numeric)
# A tibble: 1,000 × 2
   replicate  stat
       <int> <dbl>
 1         1  3.67
 2         2  3.63
 3         3  3.6 
 4         4  3.55
 5         5  3.37
 6         6  3.42
 7         7  3.37
 8         8  3.68
 9         9  3.52
10        10  3.48
# ℹ 990 more rows

A confidence interval

ci <- pips10_bstrap |>
  get_ci()
ci

# A tibble: 1 × 2
  lower_ci upper_ci
     <dbl>    <dbl>
1      3.2     3.72

The bootstrap distribution

pips10_bstrap |>
  visualize() +
  shade_ci(ci) + 
  geom_vline(xintercept = 3.5, linetype = 5, color = "red")

Characteristics of boostrap dist.

pips10_bstrap |>
  summarize(
    num_replicates = n(),
    mean = mean(stat),
    var = var(stat),
    pct025 = quantile(stat, 0.025),
    pct975 = quantile(stat, 0.975)
  )

# A tibble: 1 × 5
  num_replicates  mean    var pct025 pct975
           <int> <dbl>  <dbl>  <dbl>  <dbl>
1           1000  3.50 0.0188    3.2   3.72

Variance of bootstrap distribution is excellent estimate of variance of sampling distribution