library(mosaic)
library(dplyr)
library(ggplot2)

#install.packages("lme4")
library(lme4)

Load the Hox Data

We will use the hox dataset that has the example from the slides. Each pupil has a popularity rating, popular, and extroversion rating, extrav, their gender, sex, the class room their in, class, and their teachers’ years of experience, texp. We will be exploring popularity as a function of extroversion, sex, and teacher’s experience.

hox <- read.csv("/Users/randigarcia/Desktop/Data/hox.csv", header=TRUE)

glimpse(hox)
## Observations: 2,000
## Variables: 7
## $ pupil    <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16...
## $ class    <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
## $ extrav   <int> 5, 7, 4, 3, 5, 4, 5, 4, 5, 5, 5, 5, 5, 5, 5, 6, 4, 4,...
## $ sex      <int> 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0,...
## $ texp     <int> 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 2...
## $ popular  <dbl> 6.3, 4.9, 5.3, 4.7, 6.0, 4.7, 5.9, 4.2, 5.2, 3.9, 5.7...
## $ popteach <int> 6, 5, 6, 5, 6, 5, 5, 5, 5, 3, 5, 5, 5, 6, 5, 5, 2, 3,...

Exploratory Data Analysis

Do some preliminary data analysis to get familiar with the data set.

  • What are the descriptives for popular?
  • Make some visualizations. How about sex?
  • How many classrooms do we have (hint: the distinct() function might help)?
  • What are the gender percentages by classroom?
  • Which classrooms have the top 5 most popular teachers (variable ’popteach`)?
#Exploratory data analysis

Random Intercept, No Predictors

First let’s fit the most basic multilevel model, the random intercept model with no predictors.

mlm <- lmer(popular ~ 1 + (1 | class), data = hox)

summary(mlm)
## Linear mixed model fit by REML ['lmerMod']
## Formula: popular ~ 1 + (1 | class)
##    Data: hox
## 
## REML criterion at convergence: 6330.5
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.5655 -0.6975  0.0020  0.6758  3.3175 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  class    (Intercept) 0.7021   0.8379  
##  Residual             1.2218   1.1053  
## Number of obs: 2000, groups:  class, 100
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  5.07786    0.08739    58.1

Note that model objects have handy properties:

names(summary(mlm))
##  [1] "methTitle"    "objClass"     "devcomp"      "isLmer"      
##  [5] "useScale"     "logLik"       "family"       "link"        
##  [9] "ngrps"        "coefficients" "sigma"        "vcov"        
## [13] "varcor"       "AICtab"       "call"         "residuals"   
## [17] "fitMsgs"      "optinfo"
#which do you think will confirm your number of groups from above?

Intraclass Correlation (ICC)

The intraclass correlation (ICC) can be computed as:

ran_eff <- as.data.frame(summary(mlm)$varcor) %>% 
  select(-var1, -var2)

icc <- ran_eff[1,2]/(ran_eff[1,2] + ran_eff[2,2])

icc
## [1] 0.3649386

Level 1 Predictor, Fixed

Does the pupil’s extroversion predict their popularity? Now, let’s add 1 level 1 predictor, the pupil’s extroversion score. We’ll ass the fixed component only, not the random piece.

mlm_pred1 <- lmer(popular ~ extrav + (1 | class), data = hox)

summary(mlm_pred1)
## Linear mixed model fit by REML ['lmerMod']
## Formula: popular ~ extrav + (1 | class)
##    Data: hox
## 
## REML criterion at convergence: 5832.6
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.0644 -0.7267  0.0165  0.7088  3.3587 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  class    (Intercept) 0.8406   0.9168  
##  Residual             0.9304   0.9646  
## Number of obs: 2000, groups:  class, 100
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  2.54214    0.14113   18.01
## extrav       0.48631    0.02015   24.13
## 
## Correlation of Fixed Effects:
##        (Intr)
## extrav -0.745

Level 1 predictor, Random

Spaghetti Plot

Spaghetti plot shows effects for different classrooms.

hox_small <- hox %>%
  filter(class >= 1 & class <= 6)

ggplot(hox_small, aes(extrav, popular, 
                            group = as.factor(class),
                            color = as.factor(class))) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE)

Try faceting by class instead.

#Alt spaghetti plot

For every 1 unit increase in extroversion we’d expect a 0.49 point increase in popularity. Does this size of this popularity effect differ across classrooms? Let’s ass the random component of the extroversion effect.

mlm_pred1_ran <- lmer(popular ~ extrav + (extrav | class), data = hox)

summary(mlm_pred1_ran)
## Linear mixed model fit by REML ['lmerMod']
## Formula: popular ~ extrav + (extrav | class)
##    Data: hox
## 
## REML criterion at convergence: 5779.4
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.1961 -0.7291  0.0146  0.6816  3.2217 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr 
##  class    (Intercept) 2.99952  1.7319        
##           extrav      0.02599  0.1612   -0.97
##  Residual             0.89492  0.9460        
## Number of obs: 2000, groups:  class, 100
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  2.46102    0.20315   12.11
## extrav       0.49286    0.02546   19.36
## 
## Correlation of Fixed Effects:
##        (Intr)
## extrav -0.917

The standard deviation of the extroversion effect is 0.16. There is a negative correlation between the classroom’s intercept and the effect of extroversion.

Level 2 predictor

Does the teacher’s experience have an effect on the pupil’s popularity? Let’s ass this level 2 predictor, it can only be fixed.

mlm_pred2 <- lmer(popular ~ texp + (1 | class), data = hox)

summary(mlm_pred2)
## Linear mixed model fit by REML ['lmerMod']
## Formula: popular ~ texp + (1 | class)
##    Data: hox
## 
## REML criterion at convergence: 6313.3
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.5326 -0.6963 -0.0005  0.6896  3.3455 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  class    (Intercept) 0.5427   0.7367  
##  Residual             1.2218   1.1053  
## Number of obs: 2000, groups:  class, 100
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  4.19668    0.18609  22.552
## texp         0.06164    0.01183   5.212
## 
## Correlation of Fixed Effects:
##      (Intr)
## texp -0.909

Yes, more experienced teachers seem to have more popular pupils. Next we we can have level 1 and level 2 predictors, the level 1 predictor is random.

mlm_pred12_ran <- lmer(popular ~ extrav + texp + (extrav | class), data = hox)

summary(mlm_pred12_ran)
## Linear mixed model fit by REML ['lmerMod']
## Formula: popular ~ extrav + texp + (extrav | class)
##    Data: hox
## 
## REML criterion at convergence: 5749.3
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.0472 -0.7305  0.0110  0.6777  3.2032 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr 
##  class    (Intercept) 1.55946  1.2488        
##           extrav      0.03534  0.1880   -0.87
##  Residual             0.89059  0.9437        
## Number of obs: 2000, groups:  class, 100
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  1.01683    0.23072   4.407
## extrav       0.50215    0.02763  18.173
## texp         0.09777    0.01026   9.530
## 
## Correlation of Fixed Effects:
##        (Intr) extrav
## extrav -0.719       
## texp   -0.702  0.103

Cross-Level Interaction

Is the effect of extroversion stronger in classrooms with more experienced teachers? Finally, we can estimate the cross-level interaction of extroversion and teacher’s experience.

mlm_inter <- lmer(popular ~ extrav*texp + (extrav | class), data = hox)

summary(mlm_inter)
## Linear mixed model fit by REML ['lmerMod']
## Formula: popular ~ extrav * texp + (extrav | class)
##    Data: hox
## 
## REML criterion at convergence: 5694.8
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.2505 -0.7267  0.0162  0.6861  3.1895 
## 
## Random effects:
##  Groups   Name        Variance  Std.Dev. Corr 
##  class    (Intercept) 0.5289833 0.72731       
##           extrav      0.0003337 0.01827  -1.00
##  Residual             0.8898028 0.94329       
## Number of obs: 2000, groups:  class, 100
## 
## Fixed effects:
##              Estimate Std. Error t value
## (Intercept) -1.210965   0.321040  -3.772
## extrav       0.891087   0.044954  19.822
## texp         0.251926   0.019746  12.758
## extrav:texp -0.027462   0.002872  -9.563
## 
## Correlation of Fixed Effects:
##             (Intr) extrav texp  
## extrav      -0.874              
## texp        -0.918  0.805       
## extrav:texp  0.773 -0.898 -0.859

See if you can make a graph to get a sense of this interaction. Hint: first create a variable that is the median split of texp.

#graph the interaction

Check the residuals.

#model diagnostics here.