You might want to get descriptive stats or frequencies for specific variables. There are base R
functions, but I like to use the package mosaic
. You can download a cheat sheet for mosaic
by clicking here.
First we need to install the mosaic
package using the install.packages()
function. The package name goes inside of the parentheses in double quotes: “mosaic”. This is something we do only once in the console, you wouldn’t want to save it in your .Rmd file, but if you do, be sure to comment it out with a #
like I have here.
#install.packages("mosaic")
Once a package is installed, any time we start a new R session and we want to use functions inside of that package, we will need to load the package with the library()
function.
library(mosaic)
acitelli <- read.csv("acitelli.csv")
mosaic
The function favstats()
will give descriptive statistics for a numerical variable, and the function tally()
will give you frequencies for a categorical variable (or a numerical variable…if you want it). Functions in mosaic
use the formula syntax, where y ~ x
, or for a single variable, ~x
. The ~
key can be found just below your esc
key. The first argument is the formula, and the second argument is the data frame, e.g., data = acitelli
.
favstats(~satisfaction, data = acitelli)
## min Q1 median Q3 max mean sd n missing
## 1.166667 3.333333 3.833333 4 4 3.60473 0.4964205 296 0
tally(~gender, data = acitelli)
## gender
## -1 1
## 148 148
#tally() can also give you percentages with the format argument
tally(~gender, data = acitelli, format = "percent")
## gender
## -1 1
## 50 50
You can look up more information about a function with the ?
.
?favstats
Descriptives split by gender.
favstats(satisfaction ~ gender, data = acitelli)
## gender min Q1 median Q3 max mean sd n missing
## 1 -1 1.500000 3.333333 3.833333 4 4 3.591216 0.5300260 148 0
## 2 1 1.166667 3.500000 3.833333 4 4 3.618243 0.4617875 148 0
What are the standard deviations of perceived tension
split by gender?
What is the minimum of the self_pos
variable?
The mosaic
package also has a function for getting the correlation coefficient, it’s called cor()
. Using the same format (i.e., formula then data), how do you think would you get the correlation of satisfaction and tension?
cor(satisfaction~tension, data = acitelli)
## [1] -0.5971907
For bivariate correlation matrices, I use the corr.test()
function from the psych
package.
library(psych)
corr.test(acitelli)
## Call:corr.test(x = acitelli)
## Correlation matrix
## cuplid Yearsmar gender self_pos other_pos satisfaction
## cuplid 1.00 -0.13 0.00 0.03 -0.13 -0.16
## Yearsmar -0.13 1.00 0.00 0.07 0.13 -0.01
## gender 0.00 0.00 1.00 -0.25 0.04 0.03
## self_pos 0.03 0.07 -0.25 1.00 0.24 0.18
## other_pos -0.13 0.13 0.04 0.24 1.00 0.47
## satisfaction -0.16 -0.01 0.03 0.18 0.47 1.00
## tension 0.17 -0.11 -0.13 -0.10 -0.37 -0.60
## simhob -0.02 -0.09 -0.17 0.06 0.18 0.29
## tension simhob
## cuplid 0.17 -0.02
## Yearsmar -0.11 -0.09
## gender -0.13 -0.17
## self_pos -0.10 0.06
## other_pos -0.37 0.18
## satisfaction -0.60 0.29
## tension 1.00 -0.12
## simhob -0.12 1.00
## Sample Size
## [1] 296
## Probability values (Entries above the diagonal are adjusted for multiple tests.)
## cuplid Yearsmar gender self_pos other_pos satisfaction
## cuplid 0.00 0.40 1.00 1.00 0.40 0.08
## Yearsmar 0.03 0.00 1.00 1.00 0.39 1.00
## gender 1.00 1.00 0.00 0.00 1.00 1.00
## self_pos 0.56 0.21 0.00 0.00 0.00 0.04
## other_pos 0.02 0.02 0.55 0.00 0.00 0.00
## satisfaction 0.00 0.92 0.64 0.00 0.00 0.00
## tension 0.00 0.06 0.02 0.07 0.00 0.00
## simhob 0.79 0.11 0.00 0.30 0.00 0.00
## tension simhob
## cuplid 0.07 1.00
## Yearsmar 0.67 1.00
## gender 0.40 0.06
## self_pos 0.82 1.00
## other_pos 0.00 0.04
## satisfaction 0.00 0.00
## tension 0.00 0.48
## simhob 0.04 0.00
##
## To see confidence intervals of the correlations, print with the short=FALSE option
We can also subset the acitelli
dataset to get the correlation matrix split by gender
using the brackets, but this looks like garbage!
#correlations for men
corr.test(acitelli[which(acitelli$gender==1),c(-3)])$r
## cuplid Yearsmar self_pos other_pos satisfaction
## cuplid 1.00000000 -0.12821170 0.064763248 -0.1120233 -0.15068767
## Yearsmar -0.12821170 1.00000000 0.157866700 0.1697970 -0.01488433
## self_pos 0.06476325 0.15786670 1.000000000 0.1886611 0.08142680
## other_pos -0.11202325 0.16979696 0.188661064 1.0000000 0.50496072
## satisfaction -0.15068767 -0.01488433 0.081426804 0.5049607 1.00000000
## tension 0.15613894 -0.16045792 -0.009658491 -0.3418260 -0.56745581
## simhob -0.04381344 -0.10357773 0.005403892 0.1595127 0.26747767
## tension simhob
## cuplid 0.156138943 -0.043813443
## Yearsmar -0.160457922 -0.103577727
## self_pos -0.009658491 0.005403892
## other_pos -0.341825957 0.159512724
## satisfaction -0.567455810 0.267477666
## tension 1.000000000 -0.065196943
## simhob -0.065196943 1.000000000
#correlations for women
corr.test(acitelli[which(acitelli$gender==-1),c(-3)])$r
## cuplid Yearsmar self_pos other_pos
## cuplid 1.000000000 -0.128211698 0.007610632 -0.14778490
## Yearsmar -0.128211698 1.000000000 -0.004238569 0.09844142
## self_pos 0.007610632 -0.004238569 1.000000000 0.32661940
## other_pos -0.147784898 0.098441418 0.326619400 1.00000000
## satisfaction -0.177837708 0.001917215 0.291161501 0.43982125
## tension 0.184257376 -0.067582842 -0.259670893 -0.38786934
## simhob 0.016215251 -0.085762710 0.030159790 0.21967179
## satisfaction tension simhob
## cuplid -0.177837708 0.18425738 0.01621525
## Yearsmar 0.001917215 -0.06758284 -0.08576271
## self_pos 0.291161501 -0.25967089 0.03015979
## other_pos 0.439821249 -0.38786934 0.21967179
## satisfaction 1.000000000 -0.62477474 0.33026923
## tension -0.624774745 1.00000000 -0.23792175
## simhob 0.330269230 -0.23792175 1.00000000
Don’t worry, the tidyverse
to the rescue.