# 0. Code to run to set up your computer.

```
# Update Packages
update.packages(ask = FALSE, repos='https://cran.csiro.au/', dependencies = TRUE)
```

```
# Install Packages
if(!require(dplyr)) {install.packages("dplyr", repos='https://cran.csiro.au/', dependencies=TRUE)}
if(!require(sjlabelled)) {install.packages("sjlabelled", repos='https://cran.csiro.au/', dependencies=TRUE)}
if(!require(sjmisc)) {install.packages("sjmisc", repos='https://cran.csiro.au/', dependencies=TRUE)}
if(!require(sjstats)) {install.packages("sjstats", repos='https://cran.csiro.au/', dependencies=TRUE)}
if(!require(sjPlot)) {install.packages("sjPlot", repos='https://cran.csiro.au/', dependencies=TRUE)}
if(!require(lm.beta)) {install.packages("lm.beta", repos='https://cran.csiro.au/', dependencies=TRUE)}
# Load packages into memory
base::library(dplyr)
base::library(sjlabelled)
base::library(sjmisc)
base::library(sjstats)
base::library(sjPlot)
base::library(lm.beta)
# Turn off scientific notation
options(digits=3, scipen=8)
# Stop View from overloading memory with a large datasets
RStudioView <- View
View <- function(x) {
if ("data.frame" %in% class(x)) { RStudioView(x[1:500,]) } else { RStudioView(x) }
}
# Datasets
# Example 1: Crime Dataset
lga <- readRDS(url("https://methods101.com/data/nsw-lga-crime-clean.RDS"))
# Example 2: AuSSA Dataset
aus2012 <- readRDS(url("https://mqsociology.github.io/learn-r/soci832/aussa2012.RDS"))
# Example 3: Australian Electoral Survey
aes_full <- readRDS(gzcon(url("https://mqsociology.github.io/learn-r/soci832/aes_full.rds")))
# Example 4: AES 2013, reduced
elect_2013 <- read.csv(url("https://methods101.com/data/elect_2013.csv"))
```

# 1. Other types of regression models

There are an almost infinite number of regression models available for data analysis.

As a researcher, it is impossible to know all them.

What I want to do in this lesson is introduce you:

- The main ways that models vary, such as their dependent variable, their assumptions about the distribution of the dependent variable, and the method of estimation.
- The basic commands for running the most common models you are likely to come across.

# 2. How regression models vary.

I think we can conceptualise - even if it is an oversimplification - of three main ways that regression models systematically differ from each other:

- The measurement of dependent variable: is it continuous/interval, binary, ordinal, or a range of choices, or something else?
- The (assumed) statistical distribution of the dependent variable: is it normally distributed, or a count, or is it best represented by a logistic (or probit) distribution.
- Dependencies between the cases (units of analysis): Are these repeated measurements on the same cases (such as in time-series)? Are the cases nested within larger organisational units (e.g. classes, schools, states, nations?).
- The method of estimation: there are lots of different ways of calcuating the best model - some involve direct calculation, while others involve simulations and maximising/minimising certain ‘fit’ statistics.

In the table below we list a number of the most important regression models, and their characteristics.

Model name | Dep Var | When to use? | Command in R |
---|---|---|---|

Linear regression (ordinary least squares - OLS) |
Cont. or Intval |
DV is continuous or interval. e.g. Mark out of 100 in exam. |
`stats::lm(...)` |

Logistic regression (Logit) |
Binary | DV is binary. e.g. Pass(1)/Fail(0) Alternative to Probit Follow convention of discipline |
`stats::glm(... , family = binomial)` |

Probit regression (Probit) |
Binary | DV is binary. e.g. Pass(1)/Fail(0) Alternative to Logit Follow convention of discipline |
`stats::glm(...,` `family = binomial(link = "probit"))` |

Conditional logit | Choices | DV is three or more (unordered) nominal choices. e.g. Brand of phone; Favourite colour. IVs = characteristics of choices |
`survival::clogit(...)` |

Multinomial logit | Choices | DV is three or more (unordered) nominal choices. e.g. Brand of phone; Favourite colour. IVs = characteristics of individuals |
`mlogit::mlogit(...)` |

Ordinal logistic regression (Ordered logit) |
Ordinal | DV is ordinal variable (few options). e.g. Agree/Neutral/Disagree Trump is good President |
`MASS::polr(...)` or `ordinal::clm(...)` |

Poisson regression | Count | DV is a count variable (assumes variance = mean). e.g. Number of students who fail in each class |
`stats::glm(..., family="poisson")` |

Negative binomial regression | Count | DV is a count variable (doesnot assume variance = mean). e.g. Number of students who fail in each class |
`MASS::glm.nb(...)` |

Zero inflated negative binomial regression |
Count | DV is a count variable (large number of zero cases). e.g. Number of students who fail in each class |
`pscl::zeroinfl(...)` |

Multilevel Models | Any | Cases are clustered into groups which mean they are not independent e.g. students in classes, classes in schools, schools in states. |
`lme4::lmer(...)` |

Tobit regression | Cont. but censored |
DV is censored, i.e. you cannot observe the DV above or below a certain value. e.g. ‘ATAR less than 30’; ‘surivial longer than 5 years’ |
`VGAM::vglm(..., tobit(Upper = ...)` or `AER::tobit(...)` |

Survival analysis (Cox regresion) |
Time | DV is time until event. e.g. Years survival from diagnosis; Years studying PhD until graduation. |
`survival::coxph(...)` |

# 3. References

## Conditional logistic regression

https://stat.ethz.ch/R-manual/R-devel/library/survival/html/clogit.html

## Multinomial logistic regression

https://stats.idre.ucla.edu/r/dae/multinomial-logistic-regression/ Hoffman, S.D. & Duncan, G.J. (1988) ‘Multinomial and conditional logit discrete-choice models in demography.’ Demography 25: 415. DOI: 10.2307/2061541 Estimation of multinomial logit models in R : The mlogit Packages

## Poisson Regression

https://www.dataquest.io/blog/tutorial-poisson-regression-in-r/ https://stats.idre.ucla.edu/r/dae/poisson-regression/ https://www.theanalysisfactor.com/generalized-linear-models-in-r-part-6-poisson-regression-count-variables/ https://www.theanalysisfactor.com/glm-r-overdispersion-count-regression/

## Ordered Logit

https://stats.idre.ucla.edu/r/dae/ordinal-logistic-regression/ https://www.rdocumentation.org/packages/MASS/versions/7.3-51.4/topics/polr https://www.r-bloggers.com/how-to-perform-ordinal-logistic-regression-in-r/ https://cran.r-project.org/web/packages/ordinal/ordinal.pdf

## Zero inflated negative binomial

https://stats.idre.ucla.edu/r/dae/zinb/ https://cran.r-project.org/web/packages/pscl/pscl.pdf

## Multilevel models

https://www.rdocumentation.org/packages/lme4/versions/1.1-21/topics/lmer

## Tobit models

https://stats.idre.ucla.edu/r/dae/tobit-models/ https://www.rdocumentation.org/packages/AER/versions/1.2-7/topics/tobit https://cran.r-project.org/web/packages/VGAM/VGAM.pdf https://www.rdocumentation.org/packages/VGAM/versions/1.1-1/topics/vglm

*21 October, 2019*by

*Dr Nicholas Harrigan*(nicholas.harrigan@mq.edu.au)