SDS 290 Homework 8

The instructions for this homework give you more information about analyzing different designs than you will need for your exercises. The goal of these instructions and this assignment is to give you more exposure to analyzing BF[2], SP/RM designs, designs with compound within-block factors, and three-way interactions.

The Basic Factorial Two-way

Consider the piglet data we have been working with in class. The code below adds this data to your environment.

piglets <- data.frame(gain = c(1.3, 1.19, 1.08,
                               1.05, 1.0, 1.04,
                               1.26, 1.21, 1.19,
                               1.52, 1.56, 1.54),
                      antibiotic = c(rep("anti_0mg", 3), rep("anti_40mg", 3), 
                                     rep("anti_0mg", 3), rep("anti_40mg", 3)),
                      B12 = c(rep("B12_0mg", 6), rep("B12_5mg", 6)))

Explore the Data

To begin exploring the data, first we get the cell (combinations) counts. We will use the tally() function from the mosaic package, but there are many other ways to do this.

tally(~ B12|antibiotic, data = piglets)

As we would expect, there are three observations per cell. It s always great to perform these “sanity check” type analyses when you can.

Next we will want to check the cell averages and standard deviations. Note that for whichever variable you list after the |, you will get overall means by group.

favstats(gain ~ B12|antibiotic, data = piglets)

Looking at the last two rows we see that on average, ignoring vitatim B12, piglets gain 1.205 pounds when antibiotics are at zero mg and 1.285 pounds when antibiotics are at 40 mg. It seems that maybe piglets are gaining more weight with 40 mg then with 0 mg antibiotics, but we do not yet have a sense of chance like error to compare this difference to.

sd <- favstats(gain ~ B12|antibiotic, data = piglets)[c(1:4),8]

max(sd)/min(sd)

It looks like the largest SD is 5.5 times as large as the smallest, so we might consider transforming the data. Unfortunutely, I we could not find a transformation that fixed the problem. See the log transformation below.

piglets <- piglets %>%
  mutate(gain_log = log(gain))

sd <- favstats(gain_log ~ B12|antibiotic, data = piglets)[c(1:4),8]

max(sd)/min(sd)

Informal Analysis

Then we look at it visually, creating a parallel dot graph for piglets data.

ggplot(piglets, aes(x = B12, y = gain, shape = antibiotic, color = antibiotic)) +
  geom_point(size = 2)

Or a parallel boxplot.

ggplot(piglets, aes(x = B12, color = antibiotic, y = gain)) +
  geom_boxplot()

Interaction Graph

Finally, beacuse this is a bf[2] design, we will want to check for an interaction between factor 1 and factor 2. We can create an interaction graph starting with the same code above for the parallel dot graph. We will add a few more aesthetics to the ggplot() function including group = antibiotic and linetype = antibiotic. Once we have added these two aesthetics, we can add a new layer to the plot that will connect the means for each B12 group within each level of antibiotic. We do this with the geom_smooth(method = "lm", se = 0) code.

ggplot(piglets, aes(x = B12, 
                    y = gain, 
                    shape = antibiotic, 
                    color = antibiotic,
                    group = antibiotic,
                    linetype = antibiotic)) +
  geom_point(size = 2) +
  geom_smooth(method = "lm", se = 0)

Formal ANOVA

We can also conduct a formal two-way ANOVA for this bf[2] data using the code below. Note the antibiotic * B12 in the formula. This gives us F-ratios for the two main effects as well as the interaction.

mod2 <- lm(gain ~ antibiotic * B12, data = piglets)

anova(mod2)

Split Plot/Repeated Measures

To illustrate the analysis of the split plot/repeated measures design, we will use our familiar hamster data. Instead of entering in the data, we can load it from the URL below.

hamster <- read_csv("https://randilgarcia.github.io/sds290interterm21/KellysHamsters.csv")

We can use the glimpse() function from the dplyr package to get a first look at the data and the variables, as well as the variable types.

glimpse(hamster)

All of our factor variables are saved as int type variables, so first we need to identify factor variables as categorical. We will use the factor() function in combination with the mutate() function to accomplish this.

hamster <- hamster %>%
  mutate(day_length = factor(day_length, levels=c(0,1),labels=c("short","long")),
         organ = factor(organ, levels=c(0,1), labels=c("brain","heart")),
         interact = factor(interact, levels=c(1,2,3,4), labels=c("LH","SH","LB","SB")),
         id = factor(id)) 

glimpse(hamster)

Informal analyses

Parallel dot graph

We can then move on to the informal analyses stage.

ggplot(hamster, aes(x = day_length, y = conc, color = organ)) +
  geom_jitter(height = 0, width = 0.03, alpha = .7)

Check group means and standard deviations. First for organ. Note that here we are using the group_by() and summarize() function from dplyr instead of the favstats() function from mosaic. There no real reason to use one over the other, it’s just your preference.

hamster %>%
  group_by(organ) %>%
  summarise(mean = mean(conc),
            sd = sd(conc))

Then for day length.

hamster %>%
  group_by(day_length) %>%
  summarise(mean = mean(conc),
            sd = sd(conc))

We can also do this for interaction cell means.

hamster %>%
  group_by(interact) %>%
  summarise(mean = mean(conc),
            sd = sd(conc))
  
hamster %>%
  group_by(organ, day_length) %>%
  summarise(mean = mean(conc),
            sd = sd(conc))

Transformation

The variances are not equal, so we can try a transformation.

hamster <- hamster %>%
  mutate(lconc = log10(conc)*100) 

hamster %>%
  group_by(interact) %>%
  summarise(lmean = mean(lconc),
            lsd = sd(lconc))

Parallel dot graphs

Draw plots using transformed data. It looks better

ggplot(hamster, aes(x = day_length, y = lconc, color = organ)) +
  geom_jitter(height = 0, width = 0.03, alpha = .7)

Try a side-by-side boxplot too if you’d like!

Interaction Plots

Next, because the SP/RM design also has crossed factors, we should make an interaction graph.

ggplot(hamster, aes(x = organ, y = lconc, 
                    group = day_length, 
                    linetype = day_length, 
                    shape = day_length,
                    color = day_length)) +
  geom_jitter(height = 0, width = 0.03, alpha = .7) +
  geom_smooth(method = "lm", se = 0)

Try flipping the aesthetic mappings and having day length on the x-axis and organ as color.

Scatter plots

Whenever we have a within block factor we can check the additivity assuption (A) by making scatterplot(s) and looking for the x=y regression line. That is, the 45 degree, or slope = 1 line. We will need to first restructure data for scatter plots using spread() from tidyr.

hamster %>%
  select(-conc, -interact) %>%
  spread(organ, lconc) %>%
  ggplot(aes(x = heart, y = brain, shape = day_length)) +
  geom_point()

Formal Modeling with the ANOVA

Split-plot ANOVA. Things get a little weird with the SP/RM ANOVA because we need to incorporate the error term defined by the blocks nested within the between block factor. The aov() function wanted to know about your blocking factor by adding + Error(id) in the equation. Here, id is the hamster ID variable in your data set.

mod <- aov(lconc ~ day_length*organ + Error(id), data = hamster)

summary(mod)

Or we can use a linear model, but we’ll need the lmer() function from the lme4 package. The lmer() functon wants the blocking factor added as + (1|id).

library(lme4)

mod <- lmer(lconc ~ day_length*organ + (1|id), data = hamster)

anova(mod)

Note that lmer() doesn’t give p-values.

Compound within-block factors

Creating a Data Frame in R (review)

To create a data frame for data from a CB[2] design, we will want 4 variables: 1) the response variable observations, 2) the level for each observation for the first factor, 3) the level for each observation on the second factor, and 4) the block index.

The code below creates a data frame for the imagery and working memory example on page 289 of your textbook. First, notice that observations on the response variable, resp_time are specified inside of the c() function. The c stands for concatenate, and this concatenate function makes lists in R. Next, the block index is created, this variable is called subject because in this experiment, the blocks are subjects. Then, levels of the task variable are created, again with the c() function, but also making use of the rep() function. The rep() function will repeat the first argument how ever many times you specify in the second argument. Levels of the report variable are created in the same way. Finally, all four of these variables are contain in a data frame with the data.frame() function, and the data frame is saved in an object called imagery.

imagery <- data.frame(resp_time = c(11.60, 22.71, 20.96, 13.96, 14.60, 10.98, 21.08, 
                                    15.85, 15.68, 16.10, 11.87, 17.49, 24.40, 23.35, 
                                    11.24, 20.24, 15.52, 13.70, 28.15, 33.98, 13.06, 
                                    6.27, 7.77, 6.48, 6.01, 7.60, 18.77, 10.29, 9.18, 
                                    5.88, 6.91, 5.66, 6.68, 11.97, 7.50, 11.61, 10.90, 
                                    5.74, 9.32, 12.64, 16.05, 13.16, 15.87, 12.49, 
                                    14.69, 8.64, 17.24, 11.69, 17.23, 8.77, 8.44, 
                                    9.05, 18.45, 24.38, 14.49, 12.19, 10.50, 11.11, 
                                    13.85, 15.48, 11.51, 23.86, 9.51, 13.20, 12.31, 
                                    12.26, 12.68, 11.37, 18.28, 8.33, 10.60, 8.24, 
                                    8.53, 15.85, 10.91, 11.13, 10.90, 9.33, 10.01, 28.18),
                      subjects = rep(1:20, 4),
                      task = c(rep("visual", 40), rep("verbal", 40)),
                      report = c(rep("visual", 20), rep("verbal", 20),
                                 rep("visual", 20), rep("verbal", 20)))

Calculating means and SDs for cells

Next, we should get some descriptive statistics for the 4 combinations.

imagery %>%
  group_by(task, report) %>%
  summarize(mean = mean(resp_time),
            sd = sd(resp_time))

Checking our Fisher assumption of same standard deviation (S), we appear to be in good shape because the largest SD is not >3 times as large as the smallest SD.

6.12/3.37

ANOVA for Designs with Compound Within-Block Factors

If we can assume the additive error model, we use the following code that simply “controls” for the blocking.

mod <- lm(resp_time ~ subjects + task*report, data = imagery)

anova(mod)

For non-additive error models, we can get the appropriate error terms with the following code. Note that the aov() function is used instead of lm().

mod_nonadd <- aov(resp_time ~ task*report + Error(subjects/(task*report)), data = imagery)

summary(mod_nonadd)

Informal Analysis and interaction graph

We next want to look at the patterns in data. Because this is a CB[2] design, we will want to make an interaction graph.

#parallel dot graph
ggplot(imagery, aes(x = task, y = resp_time, color = report)) +
  geom_point()

When you have more than just a few points of data, a side-by-side boxplot is often a better choice than a dot graph.

#side-by-side boxplot
ggplot(imagery, aes(x = task, y = resp_time, color = report)) +
  geom_boxplot()

There appear to be a few outling points, and the boxplots indicate some skew for all groups—violating the Fisher assumption of normal errors (N). we may want to consider a transformation of our data.

#interaction graph
ggplot(imagery, aes(x = task, y = resp_time, color = report,
                    group = report)) +
  geom_point() +
  geom_smooth(method = "lm", se = 0)

The interaction group shows that there is an interaction between task mode and report mode. For verbal taks, there is no difference between reporting answers verbally versus visually on response time, but for visual taks, subjects were faster when reporting verbally than visually.

Three-way Interactions

Note that we are also able to create interaction graphs in R when we have summary data. That is, for the example in problem D1 (page 396), we only have the cell means. Below, these cell means are entered into a data frame called schizophrennia and we use ggplot() to create and interaction graph.

schizophrenia <- data.frame(resp_time = c(78, 56, 25, 25,
                                          44, 28, 25, 23),
                            diagnosis = c(rep("schizophrenic", 4), rep("normal", 4)),
                            slope = rep(c("steep", "flat"), 4),
                            instructions = c(rep("free", 2), rep("idiosyncratic", 2),
                                             rep("free", 2), rep("idiosyncratic", 2)))

For three-way interactions, we can easily see how the pattern of the two-way interaction migth differ across levels of the third factor by using the facet_wrap() function. Inside of facet_wrap(), you put ~variable, where the variable is that third factor of interest. Don’t forget the ~!

ggplot(schizophrenia, aes(x = diagnosis, y = resp_time, 
                          color = slope, group = slope)) +
  geom_smooth(method = "lm", se = 0) +
  facet_wrap(~instructions)

Homework 8 Problems

For homework 8, complete the following exercises. When you are done, please knit your homework to an HTML file and submit the HTML file on Moodle.

Refer to the hornworm example on page 208 (problems 1 and 7) of your textbook for exercises 1-2.

Create a data frame in your R Studio environment called hornworms with the data given in problem 7 on page 209 of your textbook. For help, review the discussion above on how the piglet data was entered. What are the units for this data frame? How many observations are there total?
Using the data you created in exercise 4, create an interaction graph. Refer to the interaction graph code for the piglet example above. based on your graph, is there a main effect of the first diet? Is there a main effect of the second diet? Is there an interaction of the diet given first and the diet given second?
This is problem C2 on page 387 of your textbook. Using the data in Table 9.9, enter all of the data in R, and compute averages for all basic factors of interest.
Create an interaction graph. Is there evidence of an interaction between the two basic factors?
Problem D2 on page 396 of your textbook. Enter the summary data into R just as shown above for the schizophrenia data. Answer the question asked in problem D2.

This is a product of OpenIntro that is released under a Creative Commons Attribution-ShareAlike 3.0 Unported. This lab was adapted by Randi Garcia from labs originally written by Mark Hansen, further adapted for OpenIntro by Andrew Bray and Mine Çetinkaya-Rundel.