Sometimes when you’re playing with electorate level demographic data, you get the occasional WTF! moment – some relationship between a demographic variable and party vote share pops up that is so big it makes you take a deep breath and a second and third look. Unusually large, strong relationships are rare with this sort of aggregated data and on the few occasions they do occur, it’s usually some sort of ecological fallacy writ large that is responsible for it.
But pottering around with the census data this morning, something jumped out and slapped me – a relationship between the ALP two party preferred swing at the last election and a certain variable was so large it literally screwed the constant up in the regression equations.
The variable comes from the answers to this question in the Census:

But specifically, the geographical distribution of the people that marked the “Retailing (incl. Take-aways)” box. You can read more about the technical ins and outs of the question at the ABS.
I pulled out of the census the percentage of the total population of each electorate that marked “retail” as an answer and compared it to the ALP swing by electorate at the last election. The results are incredible.
Remembering back to last year, the retail sector was at the pointiest of pointy ends of the Workchoices debate – it was where the data suggested that the biggest losers of the Workchoices policy were employed. State government research, Union backed research, academic research and eventually the governments own numbers all told the same story and received wide media coverage. To give a few examples:
NEW STUDY SHOWS WORKCHOICES CUTS PAY IN RETAIL AND HOSPITALITY INDUSTRIES.
RETAIL AND HOSPITALITY: KEY CASUALTIES OF WORKCHOICES
WorkChoices Hurting Retail Workers
I’m sure you can all remember the general gist. Apparently the retail workers did when they got into the ballot booths.
First up, the actual data itself. The numbers might at first glance appear small as we are only using the percentage of the total electorate that market retail on the answer to that question above, so all of those people not in the labour force water down the headline numbers, and many of the folks actually working in a retail area of a mostly non-retail business would have answered something different. But don’t let the small proportions fool you.
If we use a histogram to describe the data across all 150 electorates we get:

As we can see, no electorate passes the 1% of the total population mark answering this question as “Retail”, so what we are effectively looking at here is a pretty tight definition of retail employment.
If we run this as a scatter plot against the actual ALP TPP vote achieved at the last election, by electorate, we can see that retail employment wasn’t particularly skewed to the size, or the stock of the Labor vote.
There is no statistically significant relationship there. But what is interesting about this data is the way it partially relates to education. If we run two scatterplots of retail against the proportion of each electorate that has a year 10 education or less and also run a linear fit through one, and a locally weighted polynomial regression fit though the other we get:
Over the entire spectrum of values, there is no linear relationship between the average increase in the percentage of each electorate that works primarily in retail and the proportion of the electorate with a year 10 education or less, but if we look beyond the average, retail employment increases as the number of folks with a yr10 education or less increases up until about about 35% point, after which it starts to walk hand in hand with a reduction in the proportion of the electorate that works primarily in retail.
The other thing worth noting goes back to how we saw the growth in the population aged 65+ over the period 2006/7 correlating with an increase in the ALP swing by electorate. But we may have found out a partial explanation for this anomaly since the 65+ age cohort growth correlates strongly with this retail variable:
But now for the actually point of all this. This is how the correlation between retail and the ALP TPP Swing looks across all 150 electorates:
As a linear regression, it is huge for this type of data – and the nature of the relationship screws up our constant somewhat (which is not a bad thing – it’s just an unusual thing for this sort of data)

A 1% increase in the proportion of the electorate that answered “Retail” to that Census question walked hand in hand with, on average, a 12% increase in the ALP TPP swing. But since we’re dealing with such small proportions on the retail side here it’s probably best explained as a 0.1% increase in the proportion of the electorate that answered “retail” to that Census question walked hand in hand with, on average, a 1.2% increase in the swing to the ALP. This retail variable alone explains statistically 15% of the variation in the ALP swing by electorate.
But let’s look at how it played out in just the metro seats.

In the metro seats the relationship was larger and stronger and statistically explained even more of the variation of the swing. This time a 0.1% increase in the “Retail” answer walked hand in hand with, on average, a 1.6% increase in the ALP Swing by electorate to explain 27% of the variation of the ALP swing.
If we now throw the growth in the 0-4yr age cohort over the 2006/7 period into the mix in the metro seats – which we found previously had a strong and significant relationship with the ALP swing (although in an opposite direction to the retail variable), we get.

This suggests that some group of things that correlate with rugrat growth in metro seats combined with retail employment numbers in metro seats explains 38% of the variation in the ALP swing in those metro seats.
For every 1% increase in the rugrat population over 2006/7 (holding retail constant) there was on average approximately a one half of one percent reduction in the ALP Swing, and for every 0.1% increase in that retail answer (holding rugrat growth constant) there was on average an approximate 1.4% increase in the ALP swing for metro seats.
Ordinarily this would be hard to believe – but the power of Workchoices might have been a little bit more powerful than even the union movement is suggesting when it came to getting the ALP elected. It might be worth some deeper research.







7 Comments
Possum: If you’re getting excited by those lines through those dots, and believing they mean something, you really do need to get out more, or get some counselling. Take a break over Chrsitmas at least, this obsession of yours is just not healthy.
Freaky! Thanks to recent redundancy and a bit of time on my hands, I have been keeping myself busy by running brute force correlations against the 7000+ census variables and 2007 first preference votes by electorate for ALP, LIB, Green, FFP and Nats. My laptop could fry eggs at times!
Firstly, I am interested to see that you are running electorate level census data. Wouldn’t the much smaller census districts (CD) give you more richness (effectively using the electorate swing as an averaged result for each CD)?
The results I have been seeing are fairly predictable (although nevertheless enlightening) for each party with one notable exception. Except for the Liberals, each of the parties are turning up 100+/200+ strongly correlated variables. The findings are pretty much as you would expect: Nats are rural blue collar, Greens are young, wealthy and well educated, ALP heartland is in the non-Anglo ethnic communities and transport workers, FFP are religious and dropped out of high school (actually, dropped out in Year 11, which may be an indicator of aspirantionalism).
The most interesting is the Liberal vote which turns up only about 30 significant variables and the correlations in these are generally much weaker. This is telling me that there is no real demographic that the Libs appealed to at the last election. The strongest Liberal demographic are immigrants from the UK which may explain why they had Blair and we had Howard (ha ha).
Also, did you take into account changes in electorate boundaries (and new at least one new electorate) between 2004 and 2007 as this may affect swing values? This is something that you can account for more accurately at CD level (something I am about look at).
I’ve always felt that in order to properly model elections you need to model the spread of ideas via social networks. Its no surprise to me that a relatively small number of people has a relatively large effect
Poss, El Nino, fascinating to see (and only vaguely apprehend) the depth to which youse people dig down into this info. Interesting to see that work choices was poison, and then some, and that the libs have issues with forming a core demo, these are smarty pants insights that’ll get a run at my xmas parties (with acknowledgement of course!). Cheers.
Maybe you should send this to Julia Gillard and suggest that the ALP has a huge mandate to go even further with dismantling WorkChoices and making it highly unlikely such a system will ever again see the light of day. The current downturn and impending layoffs shows that no matter how good you think your AWA was in good times, the rights you signed away bite you in the bum bigtime in a downturn.
CCD level data would be great to use, but the big problem is linking it to booth level data. I’ve been playing around a little using a couple of seats I know well to develop a way to cluster the CCD data to booths (using some off the shelf spatial data aggregation algorithms in GIS), but nothing has really worked to my satisfaction yet. The smaller the booths the greater the difficulty.
If we run CCD level data against the swing by seat, a lot of really good info get’s washed out and some info will aggregate in misleading ways – but there’ll also be some richer things floating around. The problem is trying to sort the wheat from the chaff!
I’m using 2007 election boundaries.
If you could keep me up to date with any interesting things you find in the data it would be much appreciated – better still, start a blog El Nino!