This ninth lab introduces how to produce a cross-tabulation and how to conduct a Chi-square test of Independence.
We will use three packages for this lab. Load them using the following code:
library(sjlabelled)
library(sjmisc)
library(sjPlot)
This lab uses the 2012 AuSSa dataset. You can download the file of this dataset in the course website(iLearn). Download the data file and put it in your working directory. Then, run the following code:
aus2012 <-readRDS("aussa2012.rds")
The dataset is loaded as aus2012.
A Simple Example
A frequency table is a typical way to describe just one categorical variable. When you want to describe two categorical variables simultaneously, especially their relationship, we need a special type of table called cross-tabulation (or crosstab for short). In a crosstab, the categories of one variable determine the rows of the table, and the categories of the other variable determine the columns. The cells of the table contain the frequency that a particular combination of categories occurred.
Suppose that we are investigating whether there is an association between gender (sex) and attitudes toward single parenthood. (singlpar). singlpar measures the extent to which respondents agree or disagree with the statement that one parent can raise the children as well as two parents together. We assume that gender may influence attitudes. Therefore, we think of gender as independent and attitudes toward single parenthood as dependent variable.
To generate a crosstab and to conduct a Chi-square test, we use ‘sjt.xtab()
’ from the ‘sjPlot’ package. Use ‘sjt.xtab(data name$name of dependent, data name$name of independent, show.col.prc =TRUE)
’. ‘show.col.prc=TRUE
’ adds column percentages to the crosstab. Thus, the following code creates the crosstab of gender (sex) and attitudes toward single parenthood (singlpar).
sjt.xtab(aus2012$singlpar, aus2012$sex, show.col.prc = TRUE)
Q5a Single parent can raise child as well |
Sex of Respondent | Total | |
---|---|---|---|
Male | Female | ||
Strongly agree |
35 5.1 % |
114 13.4 % |
149 9.7 % |
Agree |
173 25.4 % |
375 44 % |
548 35.7 % |
Neither agree nor disagree |
81 11.9 % |
130 15.2 % |
211 13.8 % |
Disagree |
303 44.5 % |
208 24.4 % |
511 33.3 % |
Strongly disagree |
89 13.1 % |
26 3 % |
115 7.5 % |
Total |
681 100 % |
853 100 % |
1534 100 % |
χ^{2}=162.659 · df=4 · Cramer’s V=0.326 · p=0.000 |
The output shows the crosstab and its associated Chi-square statistics. Independent variable (sex) is put in the first row, and dependent variable (singlpar) in the first column. The crosstab shows column percentages. Thus, you can easily compare the attitude between men and women. For instance, women (44%) are more likely to agree with the statement than men (25.4%). Below the table, Chi-square statistic and p-value are displayed. Chi-square statistic is 162.659, degree of freedom is 4, and p-value is 0.000. Since p-value is less than .05, gender is significantly associated with attitudes toward single parenthood at alpha = .05.
More Complex Examples
When you try to examine bivariate association using a crosstab, it would be a very daunting task if your categorical variable has too many categories or you are using a continuous variable. In this case, you need to recode such variables so that the variables have reduced numbers (normally less than five) of categories. Nonetheless, the reduced categories should still be theoretically meaningful. In this lab, we examine how education, age, and class—which are independent variables— are associated with attitudes toward single parenthood. When you look at these independent variables, you will easily notice that they have so many categories. Education (degree) is a categorical variable with seven categories, but we do not need many such categories to examine the association. Age (age) and class (tobpot) are continuous variables, and therefore, categorising these two variables is a must for creating crosstabs.
Recoding Variables
First, let’s recode age into a variable of three categories, which are “40 or less = 1”, “41 to 60 =2” and “61 or more = 3”. The following codes perform this task.
aus2012 <- rec(aus2012, age, rec = "min:40=1; 41:60=2; 61:max=3", append = TRUE)
aus2012$age_r <- set_label(aus2012$age_r, label = "Age Category")
aus2012$age_r <- set_labels(aus2012$age_r,
labels = c("40 or less" = 1, "41 to 60" = 2, "61 or more" = 3))
Second, let’s make a new education variable which simplifies the categories of degree. “Did not complete High School to Year 10 (1)”, “Completed High School to Year 10 (2)” and “Completed High School to Year 12 (3)” are collapsed into “High School or less (1)”. “Trade qualification or apprenticeship (4)” and “Certificate or Diploma (5)” are collapsed into “Vocational Education & Training (2)”. “Bachelor Degree (6) and “Postgraduate Degree or Postgraduate Diploma(7)” are collapsed into “University or more (3)”. The following codes perform this task.
aus2012 <- rec(aus2012, degree, rec = "1:3=1; 4:5=2; 6:7=3", append = TRUE)
aus2012$degree_r <- set_label(aus2012$degree_r, label = "Education")
aus2012$degree_r <- set_labels(aus2012$degree_r,
labels = c("High school or less" = 1,
"Vocational Education & Training" = 2,
"University or more" = 3))
Lastly, a 10-scale social position variable, topbot, is recoded into a variable of class consisting of lower, middle, and upper class. Values from 1 to 5 are collapsed into “lower class (1)”, 6 to 8 into “middle class (2)”, and 9 to 10 into “upper class (3)”. The following codes perform this task.
aus2012 <- rec(aus2012, topbot, rec = "1:5=1; 6:8=2; 9:10=3", append = TRUE)
aus2012$topbot_r <- set_label(aus2012$topbot_r, label = "class")
aus2012$topbot_r <- set_labels(aus2012$topbot_r,
labels = c("lower" = 1, "middle" = 2, "upper" = 3))
Crosstab and Chi-sqaure Test
Now we are ready to examine the bivariate association. The following codes generate crosstabs of attitudes toward single parenthood (singlpar) and age (age_r).
sjt.xtab(aus2012$singlpar, aus2012$age_r, show.col.prc = TRUE)
Q5a Single parent can raise child as well |
Age Category | Total | ||
---|---|---|---|---|
40 or less | 41 to 60 | 61 or more | ||
Strongly agree |
64 17.5 % |
61 9.9 % |
23 4.3 % |
148 9.7 % |
Agree |
152 41.6 % |
205 33.4 % |
187 34.6 % |
544 35.8 % |
Neither agree nor disagree |
53 14.5 % |
83 13.5 % |
73 13.5 % |
209 13.8 % |
Disagree |
81 22.2 % |
213 34.7 % |
211 39 % |
505 33.2 % |
Strongly disagree |
15 4.1 % |
52 8.5 % |
47 8.7 % |
114 7.5 % |
Total |
365 100 % |
614 100 % |
541 100 % |
1520 100 % |
χ^{2}=71.039 · df=8 · Cramer’s V=0.153 · p=0.000 |
In the crosstab, you can easily notice that younger people are more likely to be in favour of single parenthood than older people. Chi-square is 71.04, and p-value is approximately 0.000, which is much less than .05. Thus, we conclude that age and attitudes toward single parenthood are dependent at alpha = .05.
The following codes generate crosstabs of attitudes toward single parenthood (singlpar) and education (degree_r).
sjt.xtab(aus2012$singlpar, aus2012$degree_r, show.col.prc = TRUE)
Q5a Single parent can raise child as well |
Education | Total | ||
---|---|---|---|---|
High school or less |
Vocational Education & Training |
University or more | ||
Strongly agree |
35 7.9 % |
58 10.7 % |
56 11.4 % |
149 10.1 % |
Agree |
164 36.9 % |
179 32.9 % |
172 35 % |
515 34.8 % |
Neither agree nor disagree |
63 14.2 % |
84 15.4 % |
60 12.2 % |
207 14 % |
Disagree |
149 33.6 % |
184 33.8 % |
164 33.3 % |
497 33.6 % |
Strongly disagree |
33 7.4 % |
39 7.2 % |
40 8.1 % |
112 7.6 % |
Total |
444 100 % |
544 100 % |
492 100 % |
1480 100 % |
χ^{2}=6.602 · df=8 · Cramer’s V=0.047 · p=0.580 |
The crosstab does not show a clear pattern of association between the two variables. Chi-square is 6.602, and p-value is 0.580, which is greater than .05. Thus, we conclude that education and attitudes toward single parenthood are independent at alpha = .05.
The following codes generate crosstabs of attitudes toward single parenthood (singlpar) and class (topbot_r).
sjt.xtab(aus2012$singlpar, aus2012$topbot_r, show.col.prc = TRUE)
Q5a Single parent can raise child as well |
class | Total | ||
---|---|---|---|---|
lower | middle | upper | ||
Strongly agree |
45 11.1 % |
79 8.7 % |
13 13.3 % |
137 9.7 % |
Agree |
151 37.3 % |
329 36.4 % |
28 28.6 % |
508 36.1 % |
Neither agree nor disagree |
53 13.1 % |
123 13.6 % |
14 14.3 % |
190 13.5 % |
Disagree |
121 29.9 % |
319 35.3 % |
31 31.6 % |
471 33.5 % |
Strongly disagree |
35 8.6 % |
54 6 % |
12 12.2 % |
101 7.2 % |
Total |
405 100 % |
904 100 % |
98 100 % |
1407 100 % |
χ^{2}=13.879 · df=8 · Cramer’s V=0.070 · p=0.085 |
Again, the crosstab does not show a clear pattern of association between the two variables. Chi-square is 13.879, and p-value is 0.085, which is greater than .05. Thus, we conclude that class and attitudes toward single parenthood are independent at alpha = .05.
The R codes you have written so far look like:
################################################################################
# Lab 9: Crosstab and Chi-square Test
# 20/05/2019
# SOC830, SOCI702, SOCX830
################################################################################
# Load packages
library(sjlabelled)
library(sjmisc)
library(sjPlot)
# Import the 2012 AuSSA dataset
aus2012 <- readRDS("aussa2012.rds")
# A Simple Example
sjt.xtab(aus2012$singlpar, aus2012$sex, show.col.prc = TRUE)
# More Complex Examples
# Recode independent variables
## Age
aus2012 <- rec(aus2012, age, rec = "min:40=1; 41:60=2; 61:max=3", append = TRUE)
aus2012$age_r <- set_label(aus2012$age_r, label = "Age Category")
aus2012$age_r <- set_labels(aus2012$age_r,
labels = c("40 or less" = 1, "41 to 60" = 2, "61 or more" = 3))
## Education
aus2012 <- rec(aus2012, degree, rec = "1:3=1; 4:5=2; 6:7=3", append = TRUE)
aus2012$degree_r <- set_label(aus2012$degree_r, label = "Education")
aus2012$degree_r <- set_labels(aus2012$degree_r,
labels = c("High school or less" = 1,
"Vocational Education & Training" = 2,
"University or more" = 3))
## Social Class
aus2012 <- rec(aus2012, topbot, rec = "1:5=1; 6:8=2; 9:10=3", append = TRUE)
aus2012$topbot_r <- set_label(aus2012$topbot_r, label = "class")
aus2012$topbot_r <- set_labels(aus2012$topbot_r,
labels = c("lower" = 1, "middle" = 2, "upper" = 3))
# Crosstab & Chi-square test
sjt.xtab(aus2012$singlpar, aus2012$age_r, show.col.prc = TRUE)
sjt.xtab(aus2012$singlpar, aus2012$degree_r, show.col.prc = TRUE)
sjt.xtab(aus2012$singlpar, aus2012$topbot_r, show.col.prc = TRUE)