SOCI832: Week 10: Overview: Logistic Regression + Path Analysis


This week we are going to learn about path analysis.

Before we get into the details of path analysis we are going to learn about logistic regression and odds ratios.

After linear regression, logit models are the most fundamental regression models in the social and natural sciences, and it is crucial to have a rudimentary understanding of this method.

After that we will introduce the idea of path analysis - the idea that estimate and represent the multiple causal pathways that link multiple causes, effects, and mediating and moderating variables.

The structure of today’s class is as follows

  1. Introduction to today’s dataset
  2. Revision of logistic regression
  3. More on odds ratios
  4. An introduction to path analysis

1. Introduction to today’s dataset

1.1 Motivation

This week’s example dataset is one very close to my heart. It is a dataset I have simulated to be almost identical to the dataset used in this paper:

Harrigan, N. M., Koh, C. Y., & Amirrudin, A. (2017). Threat of deportation as proximal social determinant of mental health amongst migrant workers. Journal of immigrant and minority health, 19(3), 511-522.


The dataset contains information on 582 migrant workers in Singapore. The dataset was collected in 2013, and it was collect by university students similar to yourselves.

These workers face many difficult conditions, as they lack many basic labour and civil rights - there is no minimum wage, protest by foreigners is illegal, and there is no protection from sacking and deportation for any reason.

The workers we studied were from the construction and marine (shipyard) industries, and almost all the workers were from Bangaldesh or India.

The study was motivated by a desire to understand the problems faced by one particular group of these workers: those with injury and salary claims.

Every night around 300-500 of construction and shipyard workers from South Asia who have injury or salary claims are fed by a Non-Government Organisation (NGO) called Transient Workers Count Too (TWC2). The vast majority of these workers have run away from their employers, and are caught in a limbo where they are making claims to get compensated, but also languishing without employment, income, or even proper housing.

An undergraduate student (Ms Koh) and I wanted to use validated psychological measures to be able to study what were the primary causes of distress for these workers - with the idea that this would hopefully give insight to the social structures and problems that were causing the most significant problems.

1.2 The data

The dataset we will be using this week is actually a simulated dataset - meaning that while the dataset we will be using looks like the real dataset, it has actually been simulated by R.

The advantage of using a simulated dataset is that it protects the confidentiality of participants. The migrant workers who answered surveys were promised confidentiality, and also faced significant harm if they could ever be identified. About half the dataset is comprised of workers who had workplace injuries or had made salary claims against their employers. And a majority of these workers claimed that their employers had undertaken some form of illegal or unethical action against the worker - such as threatening deportation, physically abusing the worker, or neglecting to provide medical treatment.

I have provided a very simple codebook for the datset here:


You can load the dataset into R by simply running the following line:

aw <- get(load(url("")))

And remember you can find the full article here:


1.3 Code to run to set up your computer.

# Update Packages
update.packages(ask = FALSE, repos='', dependencies = TRUE)
# Install Packages
if(!require(dplyr)) {install.packages("dplyr", repos='', dependencies=TRUE)}
if(!require(sjlabelled)) {install.packages("sjlabelled", repos='', dependencies=TRUE)}
if(!require(sjmisc)) {install.packages("sjmisc", repos='', dependencies=TRUE)}
if(!require(sjstats)) {install.packages("sjstats", repos='', dependencies=TRUE)}
if(!require(sjPlot)) {install.packages("sjPlot", repos='', dependencies=TRUE)}
if (!require(oddsratio)) {install.packages("oddsratio", repos='', dependencies=TRUE)}
if (!require(rcompanion)) {install.packages("rcompanion", repos='', dependencies=TRUE)}
if (!require(xtable)) {install.packages("xtable", repos='', dependencies=TRUE)}

# Load packages into memory

# Turn off scientific notation
options(digits=3, scipen=8) 

# Stop View from overloading memory with a large datasets
RStudioView <- View
View <- function(x) {
  if ("data.frame" %in% class(x)) { RStudioView(x[1:500,]) } else { RStudioView(x) }

# Datasets
# Example 5: Migrant Workers in Singapore 
aw <- get(load(url("")))

# Open Codebook
Last updated on 20 October, 2019 by Dr Nicholas Harrigan (