SOC830 Lab 3: Playing with Variables

The third lab session covers the following:

  • How to make variables
  • How to assign variable labels
  • How to assign value labels
  • How to save datasets

We continue working on the dataset we created in the last lab. Open RStudio, and you will see the R file you made in the last lab. If you do not see your R file, go to File >> Open File… in the top menu. And open it from your working directory. Then, run all the codes you wrote in the previous lab. You will see mydata in the tab of Environment.

Now, you are ready to learn new codes.

Playing with variables

R is a calculator.

Just think of R as a calculator. It can compute elementary equations. Examples are:

250 + 125
## [1] 375
250 - 125
## [1] 125
250 * 125
## [1] 31250
250/125
## [1] 2

As expected, R can do complicated equations as well. The example below computes \({e}^{5}+\sqrt{\frac{\ln{253}+\pi}{653-258}}\).

exp(5)+sqrt((log(253) + pi)/(653-258))
## [1] 148.5614

If you want to learn more about basic mathematical operations in R, go to Basic Operations in R Tutorial.

How to make variables

In R, variables can take several forms. In this lab, we will cover only 1) numbers, 2) characters, and 3) data frames that are essential for the course. If you want to study more about variables in R, go to Variable Types in R Tutorial. A variable could be a single number. For instance, we can assign 3 to variable a.

a <- 5

The “<-” tells R to assign the number to the right of the symbol (in this case, a). If you want to see numbers assigned to variables, type the variable name and run it. It will show the numbers.

a
## [1] 5

R can assign the outcomes of mathematical operations to variables. For example,

b <- a * a + 3
b
## [1] 28

Now, we finished assigning numbers to two variables. Can you guess what the outcome will be if we add a and b? Let’s check the result.

a + b
## [1] 33

Besides, R can assign texts to variables. In R, this form of data is called characters; any value written within a pair of single or double quotes in R is treated as a character. For example,

c <- "soc 830"
c
## [1] "soc 830"

Note that you are not limited to assigning one single value. You can also make a list of values (called a vector in R) and assign it to a variable. For example, we are going to input age and gender for five respondents.

age <- c(36, 19, 30, 55, 42)
age
## [1] 36 19 30 55 42
gender <- c("male", "female", "female", "male", "female")
gender
## [1] "male"   "female" "female" "male"   "female"

age is a list of numbers, and gender is a list of characters. If you want to confirm this, run the following code:

class(age)
## [1] "numeric"
class(gender)
## [1] "character"

Now, we will make a data frame that combines age and gender that we just created.

data <- cbind(age, gender)
data <- as.data.frame(data)

Let’s check the data frame we just made and also the type of variable.

data
##   age gender
## 1  36   male
## 2  19 female
## 3  30 female
## 4  55   male
## 5  42 female
class(data)
## [1] "data.frame"

How to remove variables we made so far.

So far, we have made seven variables: mydata, a, b, c, age, gender, and data. However, we will keep only mydata which will be used for the remaining lab. If you want to remove only one variable, use ‘rm(variable name)’. For example,

rm(a)

If you want to remove multiple variables, use ‘rm(variable name 1, variable name 2, variable name 3, ...)’. For example,

rm(b, c, age, gender, data)

Check the tab of Environment. You will see the variables you specified are removed.

Assigning labels and value labels to the AuSSa subsample dataset

The remaining lab will work on the AuSSA subsample dataset we created in the previous lab. We will assign labels and value labels to each variable.

Loading packages

For labelling data, we need to use two packages which I recommended to install in the lab 2: sjlabelled and sjmisc. To load them in R, run the following codes:

library(sjlabelled)
library(sjmisc) 

Every time you want to use packages, you need to run ‘library(package name)’. Otherwise, you will see a warning message that says “could not find function”.

How to access variables in a data frame.

A data frame consists of many variables. For instance, mydata consists of five variables: id, sex, age, polorient, class. We learned how to see a data frame (if you are not sure, review How to make variables). But we do not know how to see a specific variable in a data frame. To access it, use ‘data frame name$variable name’. For example, the following codes will show each of four variables in mydata.

mydata$sex
##  [1] 1 2 2 2 2 1 1 1 2 2 1 1 2 2 1 1 2 2 2 1 2 2 1 1 2 2 2 2 1 2
mydata$age  
##  [1] 66 72 59 20 68 76 61 90 64 39 57 47 56 51 34 18 18 30 65 35 44 40 57
## [24] 40 59 82 44 30 77 60
mydata$polorient
##  [1] 4 4 2 2 4 4 2 4 2 2 4 2 2 2 2 3 2 2 4 4 4 4 2 2 2 4 5 2 2 4
mydata$class
##  [1] 4 5 4 3 5 4 5 4 3 5 4 1 4 4 2 4 2 5 4 4 6 4 5 3 4 4 2 4 2 3

How to add variable labels

It is recommended to add short descriptions of variables which I call variable labels. We often set variable names in a simple way such as polorient. Consequently, it is easy to forget what variables are about after a while. In that case, variable labels will be helpful for recalling them. First, check the variable label of id. ‘get_label(data name$variable name)’ will show a variable label.

get_label(mydata$id)
## NULL

Expectedly, it shows nothing. Let’s assign the variable label of id.

mydata$id <- set_label(mydata$id, label = "Identification Number")
get_label(mydata$id)
## [1] "Identification Number"

data name$variable name <- set_label(data name$variable name, label = "variable label")’ will assign variable labels to specified variables. After assigning it, ‘get_label’ function will show a newly assigned variable label.

Let’s assign variable labels to the others as well.

mydata$sex <- set_label(mydata$sex, label = "Gender")
mydata$age <- set_label(mydata$age, label = "Age")
mydata$polorient <- set_label(mydata$polorient, label = "Political Orientation")
mydata$class <- set_label(mydata$class, label = "Social Class")

How to add value labels

When we created the dataset in lab 2, we inputted numbers instead of texts. Nonetheless, the number itself has no meaning except for age. Therefore, it is recommended to assign a (category) label to each value (number). Category information of each variable can be found in Lab 2: How to enter data manually. For example, in sex “Male” will be assigned to 1, and “Female” will be assigned to 2. We will use ‘set_labels’ function for this purpose. The R code is ‘data name$variable name <- set_labels(data name$variable name, labels = c("category 1" = value 1, "category 2" = value 2, ...))’.

mydata$sex <- set_labels(mydata$sex, labels = c("male" = 1, "female" = 2))

Then, let’s assign value labels to polorient and class as well.

mydata$polorient <- set_labels(mydata$polorient, 
                               labels = c("Far left" = 1,
                                          "Left" = 2,
                                          "Center" = 3,
                                          "Right" = 4,
                                          "Far right" = 5))
mydata$class <- set_labels(mydata$class, labels = c("Lower class" = 1,
                                                   "Working class" = 2,
                                                   "Lower middle class" = 3,
                                                   "Middle class" = 4,
                                                   "Upper middle class" = 5,
                                                   "Upper class" = 6))

Now is the time to check whether you followed all the steps so far correctly. We will make frequency tables of sex, polorient and class using ‘frq’ function (you will learn more about this function in lab 5). If you followed well, you will see the variable and value labels of the tree variables.

frq(mydata$sex)
## 
## # Gender (x) <integer> 
## # total N=30  valid N=30  mean=1.60  sd=0.50
##  
##  val  label frq raw.prc valid.prc cum.prc
##    1   male  12      40        40      40
##    2 female  18      60        60     100
##   NA     NA   0       0        NA      NA
frq(mydata$polorient)
## 
## # Political Orientation (x) <integer> 
## # total N=30  valid N=30  mean=2.93  sd=1.05
##  
##  val     label frq raw.prc valid.prc cum.prc
##    1  Far left   0    0.00      0.00    0.00
##    2      Left  16   53.33     53.33   53.33
##    3    Center   1    3.33      3.33   56.67
##    4     Right  12   40.00     40.00   96.67
##    5 Far right   1    3.33      3.33  100.00
##   NA        NA   0    0.00        NA      NA
frq(mydata$class)
## 
## # Social Class (x) <integer> 
## # total N=30  valid N=30  mean=3.77  sd=1.14
##  
##  val              label frq raw.prc valid.prc cum.prc
##    1        Lower class   1    3.33      3.33    3.33
##    2      Working class   4   13.33     13.33   16.67
##    3 Lower middle class   4   13.33     13.33   30.00
##    4       Middle class  14   46.67     46.67   76.67
##    5 Upper middle class   6   20.00     20.00   96.67
##    6        Upper class   1    3.33      3.33  100.00
##   NA                 NA   0    0.00        NA      NA

Saving data into RDS format

Now, we have a complete dataset. So, we will save it for future use. There are multiple ways to save datasets in R. However, I recommend to save them in RDS format because this format preserves data structure and reduces the size of files considerably. The R code for this job is ‘saveRDS(data name, file = "file-name.rds")’. Note that the file name should end with “.rds”.

saveRDS(mydata, file = "mydata.rds")

After running this code, look at your working directory. You will see “mydata.rds” there. Also, do not forget to save this R file as well. If you click on the icon of disk in the text editor, your R file will be saved.

The next lab will introduce how to import this file into R. So, please keep this file.

The R codes you have written so far look like:

################################################################################
# Title: Lab 2 & 3
# Date: 18/03/2019
################################################################################

# Import CSV files
mydata <- read.csv("table-1-30-respondents.csv")
mydata

# Elementary equations
250 + 125
250 - 125
250 * 125
250/125

# A complicated equation
exp(5)+sqrt((log(253) + pi)/(653-258))

# Make variables
a <- 5
a
b <- a * a + 3
b
a + b
c <- "soc 830"
c

# Create vectors
age <- c(36, 19, 30, 55, 42)
age
gender <- c("male", "female", "female", "male", "female")
gender

class(age)
class(gender)

# Create data frames
data <- cbind(age, gender)
data <- as.data.frame(data)
data
class(data)

# How to remove variables
rm(a)
rm(b, c, age, gender, data)

# Load package
library(sjlabelled)
library(sjmisc) 

# How to access variables in data frame
mydata$sex
mydata$age  
mydata$polorient
mydata$class

# How to add variable labels
get_label(mydata$id)
mydata$id <- set_label(mydata$id, label = "Identification Number")
get_label(mydata$id)

mydata$sex <- set_label(mydata$sex, label = "Gender")
mydata$age <- set_label(mydata$age, label = "Age")
mydata$polorient <- set_label(mydata$polorient, label = "Political Orientation")
mydata$class <- set_label(mydata$class, label = "Social Class")

# How to add value labels 
mydata$sex <- set_labels(mydata$sex, labels = c("male" = 1, "female" = 2))
mydata$polorient <- set_labels(mydata$polorient, 
                               labels = c("Far left" = 1,
                                          "Left" = 2,
                                          "Center" = 3,
                                          "Right" = 4,
                                          "Far right" = 5))
mydata$class <- set_labels(mydata$class, labels = c("Lower class" = 1,
                                                   "Working class" = 2,
                                                   "Lower middle class" = 3,
                                                   "Middle class" = 4,
                                                   "Upper middle class" = 5,
                                                   "Upper class" = 6))

# Let me check whether all the steps so far have made differences.
# The following codes will show frequency tables along with variable names and value labels.
frq(mydata$sex)
frq(mydata$polorient)
frq(mydata$class)

# Saving data into RDS format
saveRDS(mydata, file = "mydata.rds")

# Do not forget to save this R file.
Last updated on 18 March, 2019 by Dr Hang Young Lee(hangyoung.lee@mq.edu.au)