acitelli <- read.csv("/Users/randigarcia/Desktop/Data/acitelli.csv", header=TRUE)We first want to create a gender variable that is a character, this will make the output look nicer. We’ll make use of the ifelse() function inside of mutate().
library(dplyr)
acitelli <- acitelli %>%
mutate(Gender = ifelse(gender == -1, "Women", "Men"))
menOnly <- acitelli %>%
filter(gender == 1) %>%
mutate(wise_hus = Yearsmar > median(Yearsmar)) %>%
select(-gender)There are quite a few ways to make figures in R, we’ll the popular package ggplot2. You can find a cheat sheet for ggplot2 here. Be sure to install it first if you never have, or if you need to update it.
#install.packages("ggplot2")
library(ggplot2)First, let’s make a histogram for satisfaction. The easiest way to make a figure with ggplot2 is with the qplot() function. This stands for quick plot. Notice in the code below that we did not specify anything about a histogram. qplot() guesses which type of plot we want based on the variable’s type (i.e., integer, number, double, factor, character).
qplot(satisfaction, data = acitelli)There are too many bins (it defaults to 30 bins), we can ask for a specific number by adding the bins = argument. Try playing around the bin number below to find the optimal plot. I put 30 in there as a placeholder.
qplot(satisfaction, data = acitelli, bins = 10)We might want to see if the distributions are different for men and women. We can do this by mapping Gender to the fill aesthetic. Note that we could used color = if we want a hollow histogram.
qplot(x = satisfaction, color = Gender, data = acitelli, bins = 10)An alternative to the histogram is the density plot. It displays a smoothed distribution and the area under the curve always sums to 1, thus, it’s good for comparing two groups with different n’s.
qplot(x = satisfaction, color = Gender, data = acitelli, geom = "density")We might also want a scatter plot. Again,qplot() guesses what we want, but it’s a good idea to specify which variable goes on the x-axis and which goes on the y-axis.
qplot(x = tension, y = satisfaction, data = acitelli)We can even add a third variable, mapping it to color. To get the behavior we want, gender has to be a categorical variable (a character variable is fine).
qplot(x = tension, y = satisfaction, color = Gender, data = acitelli)We can ask for side-by-side boxplots when our x variable is categorical. In this case qplot() does NOT know what to do, so we tell it we want boxplots with geom = "boxplot".
qplot(y = satisfaction, x = wise_hus, data = menOnly, geom = "boxplot")ggplot()For more complex figures we will need to move away from using the qplot() function in favor of the heavy duty ggplot() function. To get a sense of how ggplot() builds plots, first we will just run the empty function.
ggplot()Next, we can add the data and start mapping variables to aesthetics.
ggplot(acitelli, aes(x = satisfaction))After we have specified aesthetic mappings, we can then add geoms. Notice that we make use of the + symbol with ggplots. The + needs to be on the right of each piece of the plot. We add a histogram with the geom_histogram() function.
ggplot(acitelli, aes(x = satisfaction)) +
geom_histogram()Note in the histogram above the y-axis are the counts of observations in each bin. There were some calculations involved in getting these counts. Counts are the default statistic when you ask for a histogram. We can change the number of bins by adding bins = inside of the geom_histogram() function.
ggplot(acitelli, aes(x = satisfaction)) +
geom_histogram(bins = 12)Just as before, if we want overlayed histograms, we can map gender to the fill aesthetic.
ggplot(acitelli, aes(x = satisfaction, fill = Gender)) +
geom_histogram(bins = 12)Alternatively, we can ask for separate facets for each level of the Gender variable with the fact_wrap() function. Notice that there is a ~ before Gender inside of this function.
ggplot(acitelli, aes(x = satisfaction)) +
geom_histogram(bins = 12) +
facet_wrap(~Gender)Next we’ll make that scatter plot again. We’ll map tension to the x-axis and satisfaction to the y-axis. Then we’ll add geom_point().
ggplot(acitelli, aes(x = tension, y = satisfaction)) +
geom_point()Why does it appear as though there is far less data than there really is? Check out the plot when we use geom_jitter(). What do you think geom_jitter() does?
ggplot(acitelli, aes(x = tension, y = satisfaction)) +
geom_jitter()We can add more than one geom. To the jittered scatter plot we can add a least squares regression line with geom_smooth(). Inside of geom smooth we need to specific method = "lm", the lm stand for linear model. We can also turn off the standard errors with se = 0.
ggplot(acitelli, aes(x = tension, y = satisfaction)) +
geom_jitter() +
geom_smooth(method = "lm", se = 0)Again, we can map gender to the color aesthetic.
ggplot(acitelli, aes(x = tension, y = satisfaction, color = Gender)) +
geom_jitter() +
geom_smooth(method = "lm", se = 0)Or use facet_wrap().
ggplot(acitelli, aes(x = tension, y = satisfaction)) +
geom_jitter() +
geom_smooth(method = "lm", se = 0) +
facet_wrap(~Gender)We can create a plot object with the <- symbol. Then to print the plot we’d need to run a line with the name of our plot.
myplot <- ggplot(acitelli, aes(x = tension, y = satisfaction, color = Gender, linetype = Gender)) +
geom_jitter() +
geom_smooth(method = "lm", se = 0)
myplotThen, we can add to that plot object. We can add x labels, x labels, change the colors, and the theme. There is much more that you can do with ggplot2!
myplot +
xlab("Tension") +
ylab("Satisfaction") +
scale_color_manual(values = c("gold", "dodgerblue")) +
theme_classic()