SSCI202 Workshop 5: Exploring NSW Crime Data

This workshop introduces the Data for Crime Rates of Local Government Areas in New South Wales. The data include information on the crime rate, population, economy and residents of local government areas (LGAs) in New South Wales. The information is from various sources. For detailed descriptions of variables, see “Codebook of Crime Rates of NSW LGAs” on the course website (iLearn). Download Data File for Crime Rates of NSW LGAs from the course website (under the section of Dataset). Then, upload this file to your Google Drive as you did for the 2012 AuSSA. Then, open it in SPSS. Then, you are ready to start the workshop 5.

Moving (or Relocating) variables

When you have many variables in your dataset, it is often hard to locate a specific variable of interest when you browse the data. In that case, you can move a variable to a new location in the Variable View. Suppose that we want to move region2 next to name.

  1. Click the row number of a variable that you intend to move (in this example, it is region2). This will highlight the row of the selected variable.
<Figure 1>

Figure 1: <Figure 1>

  1. Drag and drop the variable to the new location (in our case, just below name).
<Figure 2>

Figure 2: <Figure 2>

  1. You will see region2 just below name.
<Figure 3>

Figure 3: <Figure 3>

  1. Go to Data View. You will see that region2 is located nicely next to name.
<Figure 4>

Figure 4: <Figure 4>

Sorting cases

Sorting cases may be necessary when you want to have a close look at the data. Sorting cases will re-arrange the rows (cases) in the dataset by any variable or combination of variables you wish. By default, the cases will be sorted in ascending order (smallest to largest, or alphabetical). But sometimes it may be helpful to sort cases in descending order (largest to smallest, or reverse alphabetical).

Sorting cases by a single variable

Suppose that we want to know which local government area (LGA) in NSW has the highest crime rates of robbery (robbery). Sorting all the NSW LGAs in descending order by robbery will make this job more manageable.

  1. Go to Data > Sort Cases.
<Figure 5>

Figure 5: <Figure 5>

  1. In the box of Sort Cases, select a variable by which cases will be sorted and move them to the section of Sort by:. And choose Descending in the section of Sort Order. Click OK at the bottom.
<Figure 6>

Figure 6: <Figure 6>

  1. In Data View, you see that cases (LGAs) are sorted by robbery rates. The LGA with the highest rate of robbery is Moree Plains Shire, and the Municipality of Woollahra has the lowest robbery rate (excluding LGAs having missing values on robbery rates).
<Figure 7>

Figure 7: <Figure 7>

Sorting cases by multiple variables

It is often the case that we need to sort cases by more than one variable. Suppose that we would like to find an LGA with the highest robbery rates in the Greater Metropolitan Sydney, Murray and Hunter, respectively. To find them, it is necessary to sort all the LGAs by the combination of region2 and robbery.

  1. Go to Data > Sort Cases. In the box of Sort Cases, move region2 first and then robbery to the section of Sort by:. When you use more than one variable on which cases are sorted, the order of variables matters. In this example, the cases will be sorted first by region2. Then, within each region, cases will be sorted by robbery. You can reorder variables in the section of Sort by: by clicking and dragging the variable of which you want to change the order.

  2. You can set either an ascending or descending order for each variable. Click on a variable in the section of Sort by: and choose either Ascending or Descending. In this example, we choose Ascending for region2 and Descending for robbery. Then, click OK.

<Figure 8>

Figure 8: <Figure 8>

  1. In Data View, you will see all the cases sorted by region2 and robbery. LGAs in the Greater Metropolitan Sydney (coded as 1) appear first, those in the Sydney Surrounds (coded as 2) second and so forth. Within the same area, an LGA with the highest robbery rate appears first, and that with the lowest robbery rate appears last. Therefore, you will be able to easily find which LGA has the highest and lowest robbery rate in each area.
<Figure 9>

Figure 9: <Figure 9>

Creating z-scores for variables

This section will show how to make a variable in which the value is the standardised score (z-score) of a variable of your choice. Suppose that we want to make a variable which is the z-score of medage (median age of residents).

  1. To create a variable of z-scores, go to Analyze > Descriptive Statistics > Descriptives.
<Figure 10>

Figure 10: <Figure 10>

  1. Select a variable for which you want to create z-scores and move it to the box of Variable(s). Then, tick the box of Save standardized values as variables. Click OK.
<Figure 11>

Figure 11: <Figure 11>

  1. You will see a newly created variable, Zmedage, in the lowermost row of Variable View. Move this new variable just below to medage for easier comparison. Then, your Data View looks like <Figure 12>.
<Figure 12>

Figure 12: <Figure 12>

Workshop Activity 5: Exploring NSW Crime Data

Using the Data for Crime Rates in Local Government Areas in New South Wales, answer the following questions.

  1. What is the unit of analysis in this dataset?

  1. Which LGA shows the highest and the lowest non-domestic violence rate? Use astnondomviol to answer this question.

  1. Find LGAs which show the highest sexual offence rate (sexoff) in each sub-region of New South Wales (region2). Note that NSW has 13 sub-regions.

  1. This question asks you to compare the raw scores of average family size (avgfamsize) with its z-scores. This means that you need to create a new variable (Zavgfamsize) which is the z-score of avgfamsize. Answer the following questions.

    1. Check whether the distribution of avgfamsize is normal or skewed and justify your answer.

    2. Have SPSS generate a new variable, which is the z-scores of avgfamsize. Report the z-score of avgfamsize for City of Sydney, City of Blacktown, Inner West Council, City of Ryde and Mosman Council. Note that all these LGAs are in the Greater Metropolitan Sydney.

    3. Make histograms for avgfamsize and its z-score variable (Zavgfamsize). Do both histrograms look the same or not? And explain why. (Tip: If you are not sure how to make a histogram, see Producing a histogram.)

    4. Using the standard normal table, what is the percentage of LGAs in which the average family size is equal to or less than 3.2? To answer this question, first find the equivalent z-score for LGA of which the average family size is 3.20, and then find the area below this z-score using the standard normal table.

    5. Produce the frequency table of avgfamsize. Use the frequency table to find the percentage of NSW LGAs in which the average family size is equal to or less than 3.2. (Tip: the cumulative percentage is the most relevant information for this question)

    6. Does the percentage you found from the standard normal table (Q4-D) correspond to the percentage from the frequency table (Q4-E)? If the difference is equal to or smaller than 2%, you can say that they correspond to each other. If the difference is greater than 2%, you can’t say that they correspond.

Note: External students should post their answers to these four questions on the iLearn. This activity will contribute to your workshop participation marks.

Last updated on 31 August, 2019 by Dr Hang Young Lee(