a_thing <- 4
another_thing <- 1
another_Thing <- 7
both_things <- a_thing + another_thing
Then we created a tiny data set.
a_data_thing <- data.frame(x = 2, y = 8)
a_data_thing$x
## [1] 2
How would we print the variable y? Create a new chunk below and type your answer in it.
Write notes for yourself in the white space. Maybe explain to your future self what dollar signs do.
ERASE THIS AND TYPE SOME NOTES HERE
Enough playing around, let’s load some data!
acitelli <- read.csv("acitelli.csv")
Notice that there is now another data object in the top right “environment” pane. If you click on the name of the dataset you can actually look at it. Importantly, you cannot change any data, this is by design. We want this behavior but it’s hard to get used to!
Next, you want to look at your data. Hint: you can run a single line of code within a chunk with the keyboard shortcut: ctrl + enter
.
head(acitelli)
## cuplid Yearsmar gender self_pos other_pos satisfaction tension simhob
## 1 3 8.202667 -1 4.8 4.6 4.000000 1.5 0
## 2 3 8.202667 1 3.8 4.0 3.666667 2.5 1
## 3 10 10.452667 -1 4.6 3.8 3.166667 4.0 0
## 4 10 10.452667 1 4.2 4.0 3.666667 2.0 0
## 5 11 -8.297333 -1 5.0 4.4 3.833333 2.5 0
## 6 11 -8.297333 1 4.2 4.8 3.833333 2.5 0
str(acitelli)
## 'data.frame': 296 obs. of 8 variables:
## $ cuplid : int 3 3 10 10 11 11 17 17 21 21 ...
## $ Yearsmar : num 8.2 8.2 10.5 10.5 -8.3 ...
## $ gender : int -1 1 -1 1 -1 1 -1 1 -1 1 ...
## $ self_pos : num 4.8 3.8 4.6 4.2 5 4.2 4 4 4.2 4.4 ...
## $ other_pos : num 4.6 4 3.8 4 4.4 4.8 3.6 4.4 3.8 4.8 ...
## $ satisfaction: num 4 3.67 3.17 3.67 3.83 ...
## $ tension : num 1.5 2.5 4 2 2.5 2.5 3 2 3.5 2.5 ...
## $ simhob : int 0 1 0 0 0 0 -1 0 0 0 ...
names(acitelli)
## [1] "cuplid" "Yearsmar" "gender" "self_pos"
## [5] "other_pos" "satisfaction" "tension" "simhob"
You probably also want descriptive statistics.
summary(acitelli)
## cuplid Yearsmar gender self_pos
## Min. : 3.0 Min. :-11.214000 Min. :-1 Min. :2.600
## 1st Qu.:165.2 1st Qu.: -7.089000 1st Qu.:-1 1st Qu.:4.000
## Median :313.5 Median : -1.089000 Median : 0 Median :4.200
## Mean :282.6 Mean : -0.000036 Mean : 0 Mean :4.186
## 3rd Qu.:401.2 3rd Qu.: 6.077667 3rd Qu.: 1 3rd Qu.:4.400
## Max. :485.0 Max. : 15.036000 Max. : 1 Max. :5.000
## other_pos satisfaction tension simhob
## Min. :2.600 Min. :1.167 Min. :1.000 Min. :-1.0000
## 1st Qu.:4.000 1st Qu.:3.333 1st Qu.:2.000 1st Qu.: 0.0000
## Median :4.200 Median :3.833 Median :2.500 Median : 0.0000
## Mean :4.264 Mean :3.605 Mean :2.431 Mean : 0.0777
## 3rd Qu.:4.600 3rd Qu.:4.000 3rd Qu.:3.000 3rd Qu.: 0.2500
## Max. :5.000 Max. :4.000 Max. :4.000 Max. : 1.0000
The summary()
function is smart, it will give five-number summaries for numerical variables and counts for categorical vairables, called factors in R. We can use the as.factor()
function to temporarily change gender from an integer to a factor. This will come in handy later
summary(acitelli$gender)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -1 -1 0 0 1 1
#the summary function gives counts for factor type variables
summary(as.factor(acitelli$gender))
## -1 1
## 148 148
We can also select pieces of a data frame. That first number is the row, the second is the column.
acitelli[2, 6]
## [1] 3.666667
#You try it! Find a numder you want to pull from the dataset.
#acitelli[ ?, ?]
If it is instead a single variable, you can also select a piece.
acitelli$satisfaction[2]
## [1] 3.666667
In the chunk below, pick out the gender of the person in the 50th case.
#try it by referring to the row and column of the data frame.
#try it by referring to the variable, using the dollar sign notation.