The second lab session covers the following:
- How to install packages in RStudio
- How to enter data manually
- How to save R files
How to install packages in RStudio
R packages are a collection of R functions, sample datasets, and compiled codes developed by the R developer community. Base R (which you installed in Lab 1) provides just essential functions. To conduct more complicated analyses, it would be easier and more efficient to take advantage of predefined R functions that are widely used by researchers. Installing packages is an easy way to access and use such popular R functions. Currently, there are more than 10,000 R packages which are available for free. Out of them, we will use seven packages throughout the course. They are:
Let’s start installing these packages. First, we will install gmodels package (See Figure 1).
- Open RStudio.
- Click on the Packages tab in the bottom left pane and then click on install. This will open a new window.
- Type the name of packages you want to install (in this case gmodels) in the section of Packages. You can install multiple packages at one time, but each package name should be separated with space or comma (e.g., “gmodels, gplots, sjlabelled”) Also, make sure that the box of “Install dependencies” should be ticked, which enables R to install other packages that are required for running the package of your choice.
- Click on OK. RStudio will start installing packages.
Note: It is recommended to update installed R packages. An easy way to update them is to click on Update in the Package tab.
Alternatively, you can also install packages using an R code. In the R Console, type the following code:
install.packages("gmodels", dependencies = TRUE)
Then, hit Enter (for Windows) or Return (for MacOS). It will start installing the gmodels packages. Package names should be enclosed by double quotation marks. Otherwise, R cannot recognise the package name and will show an error message.
Note: Installed packages can be updated by an R code. For example, if you want to update the gmodels package. execute the following code in your R Console:
How to enter data manually
It is not often the case that researchers have to construct datasets manually. They often use constructed datasets, or survey companies often construct datasets instead of them. Nonetheless, we will create a very simple dataset because it helps you understand basic data structures.
Now, we will manually enter a subsample of 30 respondents from Aussa (Australian Survey of Social Attitudes) dataset using Table 1. It includes four variables: gender, age, political orientation and social class.
The questionnaires used for this dataset are:
(999) Don’t know; No answer; refused
2. How old are you?
(________) years old
(999) Don’t know; No answer; Refused
3. In politics, people often talk about left or right. Where would you put yourself among the following?
(1) Far left
(5) Far right
(999) Don’t know; No answer; Refused
4. Most people see themselves as belonging to a particular class. Please tell me which social class you would say you belong to?
(1) Lower class
(2) Working class
(3) Lower middle class
(4) Middle class
(5) Upper middle class
(6) Upper class
(999) Don’t know; No answer; Refused
|Gender||Age||Political Orientation||Social Class|
|Female||72||Right||Upper middle class|
|Female||20||Left||Lower middle class|
|Female||68||Right||Upper middle class|
|Male||61||Left||Upper middle class|
|Female||64||Left||Lower middle class|
|Female||39||Left||Upper middle class|
|Female||30||Left||Upper middle class|
|Male||57||Left||Upper middle class|
|Male||40||Left||Lower middle class|
|Female||44||Far right||Working class|
|Female||60||Right||Lower middle class|
Step 1: Creating a CSV file using Excel
It is possible to input information directly into R. However, I do not recommend this approach because it is not an efficient way to create datasets. Instead, we will use Excel (or any spreadsheet program) for entering data, and then will import the created datasets into R.
Open Excel and look at Table 1. When you enter gender information, you may start by inputting either “Male” or “Female”. However, typing texts will not be an efficient way to enter data. Instead of typing “Male” or “Female”, I recommend inputting numbers that correspond to each gender category. Look at the questionnaire 1. You will see 1s for males and 2s for females. For the same reason, we will use numbers instead of texts for all other questionnaires. In addition, we will make a new variable of identification numbers for each respondent. The identification number for the first respondent is 1, that for the second is 2, and finally that for the 30th is 30. Also, we need to make variable name in a simple way. Most important is that the variable name should have no space in it. For example, I assign variable names in the following way:
- id: identification number
- sex: gender
- age: age
- polorient: political orientation
- class: social class
Your final dataframe will look like Table 2.
Enter Table 2 in Excel. Variable names should be entered in the first row (See Figure 2).
After data input is completed, save your data as a format of CSV (Comma delimited) in your WORKING DIRECTORY (See Figure 3).
Note: If you are not sure about what working directory is, see “Setting your default working directory” in Lab 1.
Step 2: Importing CSV Files
Open RStudio. You will see the tab of “Untitled1” in the “Source” window. In this window, we will write R codes. First, write the following codes (See Figure 4).
# Import CSV files mydata <- read.csv("table-1-30-respondents.csv")
The first line starts with a hashtag. Any line beginning with a hashtag is a comment for codes in which researchers often put explanations about codes. When you write new codes with which you are not familiar, it would always be good to add comments for them. Otherwise, you may forget the meaning of those codes when you work with them again in the future. The second line is a real R code which imports CSV files into R.
- mydata is a name of data I assign. You can assign other names as well.
- <- has the same meaning as an equal sign(=).
- read.csv(“file name”) is a code for importing CSV files. You need to specify your file name between the parentheses.
Overall, the meaning of this code is: 1) import CSV files from your working directory. 2) call the imported CSV file mydata.
Next, we need to execute this code. Move the cursor at the line you want to execute. Then, hit Ctrl and Enter key (For MacOS, command and return key) simultaneously. You will see your code transmitted to the “Console” window. After executing the line of code, RStudio automatically advances the cursor to the next line. This enables you to single-step through a sequence of lines.
Note: If you fail to import CSV files, please check the warning message in your R console. It is often the case that you see “No such file or directory” in the warning message. This means that R cannot find your CSV files. Check whether your CSV files are in your working directory and whether the file name is correctly specified (Note that R distinguishes uppercase and lowercase letters).
Step 3: Check Imported Datasets
Let’s check whether the dataset is imported correctly.
## id sex age polorient class ## 1 1 1 66 4 4 ## 2 2 2 72 4 5 ## 3 3 2 59 2 4 ## 4 4 2 20 2 3 ## 5 5 2 68 4 5 ## 6 6 1 76 4 4 ## 7 7 1 61 2 5 ## 8 8 1 90 4 4 ## 9 9 2 64 2 3 ## 10 10 2 39 2 5 ## 11 11 1 57 4 4 ## 12 12 1 47 2 1 ## 13 13 2 56 2 4 ## 14 14 2 51 2 4 ## 15 15 1 34 2 2 ## 16 16 1 18 3 4 ## 17 17 2 18 2 2 ## 18 18 2 30 2 5 ## 19 19 2 65 4 4 ## 20 20 1 35 4 4 ## 21 21 2 44 4 6 ## 22 22 2 40 4 4 ## 23 23 1 57 2 5 ## 24 24 1 40 2 3 ## 25 25 2 59 2 4 ## 26 26 2 82 4 4 ## 27 27 2 44 5 2 ## 28 28 2 30 2 4 ## 29 29 1 77 2 2 ## 30 30 2 60 4 3
If you write the data name and execute it, this will show the whole dataset.
Step 4: Saving Your R Codes
Let’s save our R codes so far so that we can import and work on it again next time. Click on the icon of disks in the top menu of the “Source” window (See Figure 5).
In a newly popped-up window, type “myRcode-1.R” in the “File name”. Note that the file name should end with “.R”, which means the file type is an R code file. Then, click on “Save”. This will save your R file in your working directory. Also, you will see the tab of “Untitled” changed into “myRcode-1.R”.
Close RStudio (Do not save workspace image when it is asked) and open it again. If you followed all my instructions in Lab 1, you will see the file of “myRcode-1.R” is automatically loaded. If not, review “Automatically loading your previous R codes” in Lab 1.
In the next lab, we will keep working on the dataset and R file we have made so far. Thus, please keep all the files.