Sampling Distributions

IMS, Ch. 12

Prof Randi Garcia

2026-03-09

Distribution of random variables

Let’s make up some random variables

  • Roll the 🎲 10 times
  • \(n = 10\) is our sample size
  • Record in our spreadsheet:
    1. \(X\): the mean number of pips of the 10 rolls
    2. \(Y\): the median number of pips f the 10 rolls
    3. \(Z\): the number of rolls out of 10 with an even number of pips
    4. \(W\): the number of sixes out of 10 rolls

Read data from Google Sheet

library(tidyverse)
library(googlesheets4)
gs4_deauth()
dice <- read_sheet("https://docs.google.com/spreadsheets/d/1lmR7wBOfEmz9csNQVVhdJrdM7XnurPVuM1kQSKup4TI/") |>
  mutate(id = row_number())
dice
# A tibble: 32 × 6
       X     Y     Z     W color    id
   <dbl> <dbl> <dbl> <dbl> <chr> <int>
 1   5.8   6      10     9 black     1
 2   5.6   6      10     9 black     2
 3   5.6   6      10     9 black     3
 4   3.8   4.5     4     2 white     4
 5   3.8   5       4     1 white     5
 6   5.4   6       8     8 black     6
 7   6     6      10    10 black     7
 8   3.5   4.5     3     2 white     8
 9   5.3   6       7     7 black     9
10   5.5   6       7     7 black    10
# ℹ 22 more rows

Compute point estimates

dice |>
  group_by(color) |>
  summarize(
    x_bar = mean(X),
    y_bar = mean(Y),
    z_bar = mean(Z),
    w_bar = mean(W),
  )
# A tibble: 2 × 5
  color x_bar y_bar z_bar w_bar
  <chr> <dbl> <dbl> <dbl> <dbl>
1 black  5.58  6.04  8.85  8.38
2 white  3.5   3.58  3.83  1.5 
  • \(\bar{x}\) is our best estimate of \(E[X] = \mu_{X}\)

Key idea

  • Every random variable has a probability distribution

  • Every random variable has an expected value (center of distribution)

  • Every random variable has a variance (spread around the center)

  • What are the distributions of \(X\)? \(Y\)? \(Z\)?

View distribution of r.v.’s

dice |>
  pivot_longer(-c(color, id), names_to = "r_v", values_to = "value") |>
ggplot(aes(x = value, fill = color)) +
  geom_histogram(bins = 25) +
  facet_wrap(vars(r_v))

Your turn

  • What do you observe about the distribution of \(X\)?
    • Center, shape, and spread?
  • Are the black and white 🎲 different?
    • How do you know?
  • What do you suspect the distribution of \(X\) would look like if we had rolled 20 times?

Segue

The distribution of \(X\) is called a sampling distribution

Sampling distributions

We cheated

  • IRL, we can’t conduct 32 samples in parallel like we just did

  • We only get one

  • The single sample mean is still our best point estimate of the population mean

  • What can we say about our uncertainty around that point estimate?

Sampling distribution

The bootstrap

What if we resample from our sample?

The bootstrap

  • Developed by Brad Efron in 1979

    • National Medal of Science (2005)
  • Resample from your sample with replacement!

  • Bootstrap distribution is similar to sampling distribution

In practice

library(infer)
pips10_bstrap <- dice |>
  filter(color == "white") |>
  specify(response = X) |>
  generate(1000, type = "bootstrap") |>
  calculate(stat = "mean")
pips10_bstrap
Response: X (numeric)
# A tibble: 1,000 × 2
   replicate  stat
       <int> <dbl>
 1         1  3.67
 2         2  3.63
 3         3  3.6 
 4         4  3.55
 5         5  3.37
 6         6  3.42
 7         7  3.37
 8         8  3.68
 9         9  3.52
10        10  3.48
# ℹ 990 more rows

A confidence interval

ci <- pips10_bstrap |>
  get_ci()
ci
# A tibble: 1 × 2
  lower_ci upper_ci
     <dbl>    <dbl>
1      3.2     3.72

The bootstrap distribution

pips10_bstrap |>
  visualize() +
  shade_ci(ci) + 
  geom_vline(xintercept = 3.5, linetype = 5, color = "red")

Characteristics of boostrap dist.

pips10_bstrap |>
  summarize(
    num_replicates = n(),
    mean = mean(stat),
    var = var(stat),
    pct025 = quantile(stat, 0.025),
    pct975 = quantile(stat, 0.975)
  )
# A tibble: 1 × 5
  num_replicates  mean    var pct025 pct975
           <int> <dbl>  <dbl>  <dbl>  <dbl>
1           1000  3.50 0.0188    3.2   3.72
  • Variance of bootstrap distribution is excellent estimate of variance of sampling distribution