The instructions for this homework give you more information about analyzing different designs than you will need for your exercises. The goal of these instructions and this assignment is to give you more exposure to analyzing BF[2], SP/RM designs, designs with compound within-block factors, and three-way interactions.
Consider the piglet data we have been working with in class. The code below adds this data to your environment.
<- data.frame(gain = c(1.3, 1.19, 1.08,
piglets 1.05, 1.0, 1.04,
1.26, 1.21, 1.19,
1.52, 1.56, 1.54),
antibiotic = c(rep("anti_0mg", 3), rep("anti_40mg", 3),
rep("anti_0mg", 3), rep("anti_40mg", 3)),
B12 = c(rep("B12_0mg", 6), rep("B12_5mg", 6)))
To begin exploring the data, first we get the cell (combinations) counts. We will use the tally()
function from the mosaic
package, but there are many other ways to do this.
tally(~ B12|antibiotic, data = piglets)
As we would expect, there are three observations per cell. It s always great to perform these “sanity check” type analyses when you can.
Next we will want to check the cell averages and standard deviations. Note that for whichever variable you list after the |
, you will get overall means by group.
favstats(gain ~ B12|antibiotic, data = piglets)
Looking at the last two rows we see that on average, ignoring vitatim B12, piglets gain 1.205 pounds when antibiotics are at zero mg and 1.285 pounds when antibiotics are at 40 mg. It seems that maybe piglets are gaining more weight with 40 mg then with 0 mg antibiotics, but we do not yet have a sense of chance like error to compare this difference to.
<- favstats(gain ~ B12|antibiotic, data = piglets)[c(1:4),8]
sd
max(sd)/min(sd)
It looks like the largest SD is 5.5 times as large as the smallest, so we might consider transforming the data. Unfortunutely, I we could not find a transformation that fixed the problem. See the log transformation below.
<- piglets %>%
piglets mutate(gain_log = log(gain))
<- favstats(gain_log ~ B12|antibiotic, data = piglets)[c(1:4),8]
sd
max(sd)/min(sd)
Then we look at it visually, creating a parallel dot graph for piglets data.
ggplot(piglets, aes(x = B12, y = gain, shape = antibiotic, color = antibiotic)) +
geom_point(size = 2)
Or a parallel boxplot.
ggplot(piglets, aes(x = B12, color = antibiotic, y = gain)) +
geom_boxplot()
Finally, beacuse this is a bf[2] design, we will want to check for an interaction between factor 1 and factor 2. We can create an interaction graph starting with the same code above for the parallel dot graph. We will add a few more aesthetics to the ggplot()
function including group = antibiotic
and linetype = antibiotic
. Once we have added these two aesthetics, we can add a new layer to the plot that will connect the means for each B12 group within each level of antibiotic. We do this with the geom_smooth(method = "lm", se = 0)
code.
ggplot(piglets, aes(x = B12,
y = gain,
shape = antibiotic,
color = antibiotic,
group = antibiotic,
linetype = antibiotic)) +
geom_point(size = 2) +
geom_smooth(method = "lm", se = 0)
We can also conduct a formal two-way ANOVA for this bf[2] data using the code below. Note the antibiotic * B12
in the formula. This gives us F-ratios for the two main effects as well as the interaction.
<- lm(gain ~ antibiotic * B12, data = piglets)
mod2
anova(mod2)
To illustrate the analysis of the split plot/repeated measures design, we will use our familiar hamster data. Instead of entering in the data, we can load it from the URL below.
<- read_csv("https://randilgarcia.github.io/sds290interterm21/KellysHamsters.csv") hamster
We can use the glimpse()
function from the dplyr
package to get a first look at the data and the variables, as well as the variable types.
glimpse(hamster)
All of our factor variables are saved as int
type variables, so first we need to identify factor variables as categorical. We will use the factor()
function in combination with the mutate()
function to accomplish this.
<- hamster %>%
hamster mutate(day_length = factor(day_length, levels=c(0,1),labels=c("short","long")),
organ = factor(organ, levels=c(0,1), labels=c("brain","heart")),
interact = factor(interact, levels=c(1,2,3,4), labels=c("LH","SH","LB","SB")),
id = factor(id))
glimpse(hamster)
We can then move on to the informal analyses stage.
ggplot(hamster, aes(x = day_length, y = conc, color = organ)) +
geom_jitter(height = 0, width = 0.03, alpha = .7)
Check group means and standard deviations. First for organ. Note that here we are using the group_by()
and summarize()
function from dplyr
instead of the favstats()
function from mosaic
. There no real reason to use one over the other, it’s just your preference.
%>%
hamster group_by(organ) %>%
summarise(mean = mean(conc),
sd = sd(conc))
Then for day length.
%>%
hamster group_by(day_length) %>%
summarise(mean = mean(conc),
sd = sd(conc))
We can also do this for interaction cell means.
%>%
hamster group_by(interact) %>%
summarise(mean = mean(conc),
sd = sd(conc))
%>%
hamster group_by(organ, day_length) %>%
summarise(mean = mean(conc),
sd = sd(conc))
The variances are not equal, so we can try a transformation.
<- hamster %>%
hamster mutate(lconc = log10(conc)*100)
%>%
hamster group_by(interact) %>%
summarise(lmean = mean(lconc),
lsd = sd(lconc))
Draw plots using transformed data. It looks better
ggplot(hamster, aes(x = day_length, y = lconc, color = organ)) +
geom_jitter(height = 0, width = 0.03, alpha = .7)
Try a side-by-side boxplot too if you’d like!
Next, because the SP/RM design also has crossed factors, we should make an interaction graph.
ggplot(hamster, aes(x = organ, y = lconc,
group = day_length,
linetype = day_length,
shape = day_length,
color = day_length)) +
geom_jitter(height = 0, width = 0.03, alpha = .7) +
geom_smooth(method = "lm", se = 0)
Try flipping the aesthetic mappings and having day length on the x-axis and organ as color.
Whenever we have a within block factor we can check the additivity assuption (A) by making scatterplot(s) and looking for the x=y
regression line. That is, the 45 degree
, or slope = 1
line. We will need to first restructure data for scatter plots using spread()
from tidyr
.
%>%
hamster select(-conc, -interact) %>%
spread(organ, lconc) %>%
ggplot(aes(x = heart, y = brain, shape = day_length)) +
geom_point()
Split-plot ANOVA. Things get a little weird with the SP/RM ANOVA because we need to incorporate the error term defined by the blocks nested within the between block factor. The aov()
function wanted to know about your blocking factor by adding + Error(id)
in the equation. Here, id
is the hamster ID variable in your data set.
<- aov(lconc ~ day_length*organ + Error(id), data = hamster)
mod
summary(mod)
Or we can use a linear model, but we’ll need the lmer()
function from the lme4
package. The lmer()
functon wants the blocking factor added as + (1|id)
.
library(lme4)
<- lmer(lconc ~ day_length*organ + (1|id), data = hamster)
mod
anova(mod)
Note that lmer()
doesn’t give p-values.
To create a data frame for data from a CB[2] design, we will want 4 variables: 1) the response variable observations, 2) the level for each observation for the first factor, 3) the level for each observation on the second factor, and 4) the block index.
The code below creates a data frame for the imagery and working memory example on page 289 of your textbook. First, notice that observations on the response variable, resp_time
are specified inside of the c()
function. The c
stands for concatenate, and this concatenate function makes lists in R. Next, the block index is created, this variable is called subject
because in this experiment, the blocks are subjects. Then, levels of the task
variable are created, again with the c()
function, but also making use of the rep()
function. The rep()
function will repeat the first argument how ever many times you specify in the second argument. Levels of the report
variable are created in the same way. Finally, all four of these variables are contain in a data frame with the data.frame()
function, and the data frame is saved in an object called imagery
.
<- data.frame(resp_time = c(11.60, 22.71, 20.96, 13.96, 14.60, 10.98, 21.08,
imagery 15.85, 15.68, 16.10, 11.87, 17.49, 24.40, 23.35,
11.24, 20.24, 15.52, 13.70, 28.15, 33.98, 13.06,
6.27, 7.77, 6.48, 6.01, 7.60, 18.77, 10.29, 9.18,
5.88, 6.91, 5.66, 6.68, 11.97, 7.50, 11.61, 10.90,
5.74, 9.32, 12.64, 16.05, 13.16, 15.87, 12.49,
14.69, 8.64, 17.24, 11.69, 17.23, 8.77, 8.44,
9.05, 18.45, 24.38, 14.49, 12.19, 10.50, 11.11,
13.85, 15.48, 11.51, 23.86, 9.51, 13.20, 12.31,
12.26, 12.68, 11.37, 18.28, 8.33, 10.60, 8.24,
8.53, 15.85, 10.91, 11.13, 10.90, 9.33, 10.01, 28.18),
subjects = rep(1:20, 4),
task = c(rep("visual", 40), rep("verbal", 40)),
report = c(rep("visual", 20), rep("verbal", 20),
rep("visual", 20), rep("verbal", 20)))
Next, we should get some descriptive statistics for the 4 combinations.
%>%
imagery group_by(task, report) %>%
summarize(mean = mean(resp_time),
sd = sd(resp_time))
Checking our Fisher assumption of same standard deviation (S), we appear to be in good shape because the largest SD is not >3 times as large as the smallest SD.
6.12/3.37
If we can assume the additive error model, we use the following code that simply “controls” for the blocking.
<- lm(resp_time ~ subjects + task*report, data = imagery)
mod
anova(mod)
For non-additive error models, we can get the appropriate error terms with the following code. Note that the aov()
function is used instead of lm()
.
<- aov(resp_time ~ task*report + Error(subjects/(task*report)), data = imagery)
mod_nonadd
summary(mod_nonadd)
We next want to look at the patterns in data. Because this is a CB[2] design, we will want to make an interaction graph.
#parallel dot graph
ggplot(imagery, aes(x = task, y = resp_time, color = report)) +
geom_point()
When you have more than just a few points of data, a side-by-side boxplot is often a better choice than a dot graph.
#side-by-side boxplot
ggplot(imagery, aes(x = task, y = resp_time, color = report)) +
geom_boxplot()
There appear to be a few outling points, and the boxplots indicate some skew for all groups—violating the Fisher assumption of normal errors (N). we may want to consider a transformation of our data.
#interaction graph
ggplot(imagery, aes(x = task, y = resp_time, color = report,
group = report)) +
geom_point() +
geom_smooth(method = "lm", se = 0)
The interaction group shows that there is an interaction between task mode and report mode. For verbal taks, there is no difference between reporting answers verbally versus visually on response time, but for visual taks, subjects were faster when reporting verbally than visually.
Note that we are also able to create interaction graphs in R when we have summary data. That is, for the example in problem D1 (page 396), we only have the cell means. Below, these cell means are entered into a data frame called schizophrennia
and we use ggplot()
to create and interaction graph.
<- data.frame(resp_time = c(78, 56, 25, 25,
schizophrenia 44, 28, 25, 23),
diagnosis = c(rep("schizophrenic", 4), rep("normal", 4)),
slope = rep(c("steep", "flat"), 4),
instructions = c(rep("free", 2), rep("idiosyncratic", 2),
rep("free", 2), rep("idiosyncratic", 2)))
For three-way interactions, we can easily see how the pattern of the two-way interaction migth differ across levels of the third factor by using the facet_wrap()
function. Inside of facet_wrap()
, you put ~variable
, where the variable is that third factor of interest. Don’t forget the ~
!
ggplot(schizophrenia, aes(x = diagnosis, y = resp_time,
color = slope, group = slope)) +
geom_smooth(method = "lm", se = 0) +
facet_wrap(~instructions)
For homework 8, complete the following exercises. When you are done, please knit your homework to an HTML file and submit the HTML file on Moodle.
Refer to the hornworm example on page 208 (problems 1 and 7) of your textbook for exercises 1-2.
Create a data frame in your R Studio environment called hornworms
with the data given in problem 7 on page 209 of your textbook. For help, review the discussion above on how the piglet
data was entered. What are the units for this data frame? How many observations are there total?
Using the data you created in exercise 4, create an interaction graph. Refer to the interaction graph code for the piglet
example above. based on your graph, is there a main effect of the first diet? Is there a main effect of the second diet? Is there an interaction of the diet given first and the diet given second?
This is problem C2 on page 387 of your textbook. Using the data in Table 9.9, enter all of the data in R, and compute averages for all basic factors of interest.
Create an interaction graph. Is there evidence of an interaction between the two basic factors?
Problem D2 on page 396 of your textbook. Enter the summary data into R just as shown above for the schizophrenia data. Answer the question asked in problem D2.
This is a product of OpenIntro that is released under a Creative Commons Attribution-ShareAlike 3.0 Unported. This lab was adapted by Randi Garcia from labs originally written by Mark Hansen, further adapted for OpenIntro by Andrew Bray and Mine Çetinkaya-Rundel.