# Concepts

1. Foundational concepts

Unit of analysis (case, statistical unit)

Variable

Value of variable

Dataset

Independent variable

Dependent variable

Control variable

Conceptualisation

Operationalisation

Hypothesis

Conceptual hypothesis

Operational hypothesis

1. Levels of measurement

Categorical/Nominal variable

Ordinal variable

Interval variable

Binary variable

Dummy coding

1. Scales and Indexes

Likert scale

1. Qualities of good measures

Reliability

Validity

# 1. Foundational Concepts

## Unit of analysis (case, statistical unit)

Definition: The individual entities or instances which are being analysed.

Example/s: The respondents to a survey, or the participants in an experiment.

What is this concept useful for? It gives us a word to describe the objects we are studying. e.g. “The unit of analysis for our survey was individual student respondents.” or “The unit of analysis for a cross-national study was individual countries.” or “In our comparison of school results in high school exams, our unit of analysis was individual high schools, with between 400 and 1,200 students in each school.”

## Variable

Definition: A characteristic of a unit of analysis (case) that varies across the the units of analysis in your dataset.

Example/s: The age, the gender, the number of hours sleep, or party preference of a survey respondent.

What is this concept useful for? In social science we are almost always concerned with how changes in one characteristic cause changes in another. ‘Variable’ gives us a word to describe the characteristics of whatever we are studying. E.g. [Variables in bold] “We studied whether hard-work correlated with academic success.” or “We wanted to know how gender impacted on time on social media.” or “We wanted to know whether economic downturns predicted the likelihood of a political revolution.”

## Value of variable

Definition: The measured level of a variable for a case.

Example/s: In a survey of the age of students in a class:

• unit of analysis: student
• variable: age
• value of variable (for one student): 22 years of age

What is this concept useful for? It gives us a phrase to refer to the contents of a variable for a particular individual (or group of individuals).

For example:

• “Researcher A: What is the value of the fear variable for participant number 13?”
• “Researcher B: She scored 5 out of 25 on the fear variable.”

## Dataset

Definition: The systematically organised information that is analysed for a study.

Example/s: An excel spreadsheet with all the answers from 2000 respondents to a survey about voting behaviour.

A set of 150 transcripts of interviews with people trapped on cruiseships with coronavirus in Feburary to April 2020.

What is this concept useful for? It gives us a word to refer to the raw information we are analysing, no matter what type of study we are doing.

Tips: In quantitative resarch, datasets are normally organised as a single table, were the rows are cases (e.g. participants in a survey) and the columns are variables (e.g. age, gender, years of education).

In qualitative research, datasets are often a set of interview transcripts, or artifacts, like historical documents.

## Independent variable

Definition: A potential cause. The predictor (of the outcome). In a cause-effect statement - such as in a hypothesis - an independent variable represents a potential cause

Example/s: If we are studying whether ‘isolation’ has an impact on ‘depression’, then ‘isolation’ is our independent varable.

If we hypothesise that “Students who get less sleep will tend to get poorer grades at school.” then sleep is our independent variable.

What is this concept useful for? It gives us a word to refer to our potential cause/s.

Tips: You can remember the independent variable as the variable which is NOT effected or caused by the other variable. The independent variable is NOT dependent on the other variable.

There are other names for independent variables you may see in the literature:

• predictor variable
• explanatory variable
• exogenous variable

## Dependent variable

Definition: The ‘effect’ or outcome. In a cause-effect statement - hypothesis - the dependent variable is the effect. The outcome.

Example/s: If we are studying whether ‘isolation’ has an impact on ‘depression’, then ‘depression’ is our dependent varable.

If we hypothesise that “Students who get less sleep will tend to get poorer grades at school.” then grades at school is our dependent variable.

What is this concept useful for? It gives us a word to refer to our outcome.

Tips: You can remember the dependent variable as the variable which IS effected or caused by the other variable. The dependent variable IS dependent on the other variable.

There are other names for dependent variables you may see in the literature:

• response variable
• outcome variable
• explained variable

## Control variable

Definition: A variable you are NOT interested in studying, but that could impact on your outcome (dependent variable), and therefore need to be measured and accounted for when doing analysis.

Example/s: In laboritory experiments, we try to hold control variables constant. For example, we make sure gender is controlled for, by making sure the same proportion of women are in our experimental and control group.

In observational studies (such as surveys), we measure control variables, and then use statistical analysis to remove the effect these variables. For example, in our survey on sleep and grades we might ask people their gender. We can then use techniques like analysing our data as two separate groups (men and women), to see if sleep effects grades in both groups. We can also use techniques like multiple regression (we will learn about this later).

## Conceptualisation

Definition: The process of clearly defining - in abstract terms - a variable or hypothesis.

Example/s: We might define the concept of ‘education level’ as “(1) the amount of (2) formal instruction a person has (3) successfully completed.”

We might define the concept of ‘sexism’ as “(1) the belief that (2) women are inferior to men.”

What is this concept useful for? This word helps us to recognise we always need to specify our variable at TWO levels: (1) defining in the realm of ideas and abstract words (conceptualisation); and (2) defining in the realm of measurement (operationalisation).

Tips: Notice we are basically creating a very precise, dictionary-like or legal-type definition, of a variable (or hypothesis).

Clear conceptualisation of your variables is cruicially important to make sure you are clearly and logically thinking through your research question and analysis. Don’t skip this step.

You can generally find conceptualisation of variables in the introduction or literature review sections of a paper, so search for how other people conceptualised their variables when reading their papers. Feel free to ‘steal’ others definitions (but cite them!).

## Operationalisation

Definition: The process of clearly defining a measurement for a variable or hypothesis.

Example/s: We might operationalise ‘education level’ as:

1. ‘years of formal education’ or
2. “highest qualification received” (with the options: none, primary, high school, diploma, degree, graduate degree).

We might operationalise ‘sexism’ as:

1. Answering ‘agree or strongly agree’ to the statement “To what extent do you agree or disagree with the following statement: Women are naturally, on average, less intelligent than men.” or
2. Score on a ten point ‘sexism’ scale which asks respondent to indiciate extent of agreement with 10 statements such as “Women are manipulative.” “Investment in a women’s education is often wasted.”

## Hypothesis

Definition: A proposed explanation for a phenomenon. Generally taking the form of a testable statement about the expected relationship between two or more variables.

Scientific hypotheses:

• Testible: must be testable
• Prediction: make a prediction
• Two or more varibles: generally has two variables (X and Y)
• The more, the more: generally takes the form of “The more X, the more Y.” or “The more X, the less Y.”
• Test explanations (theories): generally test between the predictions of different explanations or theories.

Example/s: That people with greater levels of education tend to show lower levels of sexism.

That people with greater levels of education tend to not support government spending on poverty reduction.

What is this concept useful for? This is a very precise way to focus your entire study, and be very clear about what you expect your data to show. It brings together your theory and literature review with your dataset in one testable statement. It is the heart of your entire study. It is most important link in the chain of reasoning and analysis in your paper.

### Conceptual hypothesis

Definition: A hypothesis expressed in abstract terms.

Example/s: That people with greater levels of education tend to show lower levels of sexism.

That people with greater levels of education tend to not support government spending on poverty reduction.

### Operational hypothesis

Definition: A hypothesis expressed in terms of how variables are measured.

Example/s: That the greater the number of years of education of a person, the lower will be their score on our 10 question sexism scale.

That people with higher formal qualificiations will tend to score lower on our question “Should the government keep Newstart at the current level (double the normal payment) after the end of the coronavirus epidemic?”

# 2. Levels of measurement

## Nominal variable

Definition: A variable whose values cannot be ordered or ranked. The values of a variable are simply different categories.

Example/s:

• Variable: Colour of car
• Values: Red (0), Green (1), Blue (2), Black (3), White (4).
• Variable: Nationality
• Values: [list of all the nations of the world]
• Variable: Gender
• Values: Male (0), Female (1), Other (2).

## Ordinal variable

Definition: A variable whose values are ordered or ranked, but the distance between the values are not known.

Example/s:

• Variable: How much do you agree with this statement: “Australia should reintroduce the death penality.”
• Values: Strongly agree (4), Agree (3), Neutral (2), Disagree (1), Strongly disagree (0.
• Variable: In the last 30 days, how often do you feel proud of your achievements in your job?
• Values: All the time (4), Most of the time (3), Some of the time (2), A little of the time (1), None of the time (0).

## Interval variable

Definition: A variable whose values are ordered or ranked, AND the distance between the values is known.

Example/s:

• Variable: What is your age?
• Values: 18 to 101 years of age.
• Variable: Annual household income
• Values: Income in Australian Dollars

## Binary variable

Definition: A variable which only takes two values: 0 and 1.

Example/s:

• Pregnant (1), Not-pregnant (0)
• Cured (1), Not cured (0)
• Survived (1), Deceased (0)
• Pass (1), Fail (0)

### Dummy coding

Definition: The transfOrmation of a nominal, ordinary, or interval variable into one or more binary variables

Example/s:

• Before Dummy Coding:
• Variable: Colours.
• Values: Red (0), Green (1), Blue (2), Black (3), White (4.
• After Dummy Coding:
• Variable 1: Red. Values: Red (1), Not red (0)
• Variable 2: Green. Values: Green (1), Not red (0)
• Variable 3: Blue. Values: Blue (1), Not red (0)
• Variable 4: Black. Values: Black (1), Not red (0)
• Variable 5: White. Values: White (1), Not red (0)
• Before Dummy Coding:
• Variable: Age. Values: Age in years (18 - 101 years)
• After Dummy Coding:
• Variable: Young. Values: Young (<35 years) (1), Old (35+ years) (0)

Why do we do this?

• Simplify: To analyse our data, we often need to simplify it so we can see the patterns (e.g. compare old and young).
• Make nominal variables make sense: The numbers in a nominal variable don’t make sense as numbers, and often it is safer and easier to analyse by creating a separate variable for each value.

# 3. Scales and Indexes

Definition: Scales and Indexes are measures that combines multiple indicators (items) into one measure.

Purpose: By using multiple measures, scales and indexes are generally more accurate than just a single question.

Often scales and indexes have been rigorously tested to make sure they are valid (they measure what they say the measure - they are sound), and reliable (when you repeatedly measure the same case, you get the same answer).

Example/s:

• ‘democracy index’ (hypothetical) = combination of measures of free and fair elections + openness of media + civil and political liberties.

• ‘consumer price index’ = the index used to calculate inflation in a country, based on a weighted basket of goods and services.

• ‘Kessler 6’ scale = a six item scale that screens for emotional distress by asking about how frequently a person has had various symptoms of anxiety and depression in last 30 days.

## Likert scale (pronouced ‘Lick-ERT’)

Definition: Most common type of multi-time scale in social sciences. Measures intensity of an attitude with a series of items, normally 6 or more questions, normally answered on a scale from “Strongly agree to Strongly disagree”.

Example/s:

• How much do you agree with the following statement: “Abortion should be legal in NSW.”
• Options:
• Strongly agree
• Agree
• Neutral
• Disagree
• Strongly disagree

# 4. Qualities of good measures

## Reliability

Definition: The consistency of a measure when used to repeatedly measure the same object or characteristic.

Example/s: You give 100 people the same sexist attitude scale on two occassions a month apart, and each person gets almost the same score in both tests.

You also tested a second sexism scale, but found that most people gave very different scores the second time, and so decided not to use that scale.

Why this concept is useful? We want measures that give reliable and consistent answers, and that don’t have too much random variation.

## Validity

Definition: The soundness or truthfulness of a measure. The extent to which a measure of a variable accurately captures the concept it says it is measuring.

How do we prove validity?

• opinions of experts (peer review)
• compare to a reliable source (e.g. more elaborate, detailed, or already validated measure)

Example/s: The Kessler 6 scale has been validated by comparing the results of this two minute screening test with a one hour interview with a psychiatrist.

Tips:

• The best way to ensure you have a valid measure is to copy from a published academic study.
• Published studies are said to be valid because
Last updated on 03 June, 2020 by Dr Nicholas Harrigan (nicholas.harrigan@mq.edu.au)