Politics, elections and piffle plinking

Nerdy Sunday – You’re So Smooth Edition.

YouTube Preview Image

Today we’ll have a bit of a squiz at the various types of smoothing algorithms regularly used to produce “lines of best fit” for polling data.

The two most common smoothing algorithms you see around the place are the simple 5 point Moving Average and the Exponential Smoothing family of equations. The 5 point Moving Average (which doesn’t have to be 5, it can be any odd number that tickles your fancy) is best used to describe historical data because of the way it averages both past and future data for any given observation. If we had a long polling series that was monthly, then our 5 point MA for March would equal the average of January, February, March, April and May. For our April observation it would equal the average of February, March, April, May and June. Because, for every observation, it uses information from the past, the present and the future, while it is handy for historical data it isn’t very useful when it comes to giving us information about the polling data of today.

The other common smoothing algorithm is the Exponential Smoothing family. There’s many types of exponential smoothing from Single and Double through to Holt Winters methods just to name but a few. Probably the most common for polling data is the Double Exponential Smoothing which is a just a more sophisticated version of the Single variety.

If we had a polling series where y is the value of, say, the ALP Primary vote at time t and we wanted to create a Single Exponentially smoothed version of this series called , then we could do this by computing the equation :

Where the value of alpha was between 0 and 1, and where the smaller the value we gave alpha, the smoother our series became. The reason it’s called exponential smoothing is that after a quick rewrite of that equation, it turns into:

Where our smoothed series is a weighted average of the past values of actual polling data, and where the weights decline exponentially with time.

If we call our smoothed series above S_t , then Double Exponential Smoothing (which we’ll call D_t) can be described as a function of our first equation such that:

But enough of this mumbo jumbo – if we chart the ALP Primary Vote between the 2004 and 2007 Elections against both the 5 Point Moving Average and the Double Exponentially Smoothed series we get:

As you can see, the 5 point MA is good – but hopeless for anything but historical analysis, while the Double Exponential Smoothing suffers from serious lag, always being behind where the actual polling data is at any given time. When Rudd took over the ALP leadership, the Double Smoothing – being heavily reliant upon a large number of previous results – couldn’t keep up with ALP Newspoll movement. Double Smoothing is so laggy it’s a little pointless to use for polling IMHO, but plenty still use it for some reason.

Next up is a thing called a Henderson moving average which you might remember from such classic hits as Bryan’s Graphs, or the Census Data. This is a quirky old school moving average based on a set of weights originally designed by Robert Henderson back in 1916 – but while it’s quirky, it’s a ripsnorter of a smoothing algorithm.

If we chart a 13 point Henderson Moving Average against both a low sensitivity and high sensitivity Loess regression (of the type we use here for Pollytrack) with the same ALP vote series used above, the general goodness of the HMA stands out.

The high sensitivity Loess and the HMA track each other nearly identically. The reason I use Loess rather than HMA is because it’s sensitivity can be tuned according to requirements of the analysis we might be undertaking, and because it measures actual polling changes quicker than does HMA (but at the expense of occasionally picking up a little more polling noise than the HMA). To give an example, if we use our high sensitivity Loess regression, our HMA and a Double Exponential Smoothing for the ALP Primary vote as above, but only up until January of 2007 – where Rudd had taken over and the polling had jumped dramatically for a few polls, the nature of the three algorithms can be clearly seen:

The Exponential Smoothing was lagging because of the way it weights data, the HMA too was lagging because of the way it weights data but the High Sensitivity Loess Regression had picked up the movement by January – even though for the period from the 2004 Election through to Rudd the HMA and Loess had tracked each other in a nearly identical fashion. Loess showed barely any noise effects compared to the HMA.

That’s why I use Local regression smoothing rather than HMA – from my perspective the benefits it provides for real time polling analysis over the HMA outweigh the small costs in occasionally picking up a little bit more polling noise than HMA.

Finally, there is another type of highly sophisticated smoothing algorithm that we’ll be seeing a bit more of, especially during election campaigns – the local polynomial kernel regression.

This stuff is quite a piece of work, invented in the 1990’s. If we let N= the number of observations in our series, h= our smoothing parameter (also called bandwidth) and where x is our polling series data, then our local polynomial kernel regression creates a series Y for each value of our polling series x by calculating Beta parameters that minimise the sum of the square residuals of:

Where K, is a Cosinus Kernel function that integrates to 1 of the form:

Our beta parameters change for each value of our polling data x making local polynomial kernel regressions not only highly adaptive, but generally 14 shades of spiffy to boot!

To demonstrate how adaptive this beast of an equation is, if we run it on the 04-07 ALP Primary Vote series and compare it to our excellent high sensitivity Loess regression we get:

During periods of low polling density such as we get outside of election campaigns, the Kernal regression can get a little noisy (by maybe half a percentage point or so) – but in high density polling periods where we get polls daily, nothing else comes close to it’s capability to pick up real movements in the polls through time, cutting through the varying house biases of the different pollsters that may confuse other smoothing algorithms. We’ll be seeing much more of this little beastie in the future.

4 Comments

  1. 1
    Greensborough Growler
    Posted October 5, 2008 at 7:03 pm | Permalink

    Possum,

    Could be the holy grail of psephology.

    Tells us what we think tomorrow.

  2. 2
    fmark
    Posted October 5, 2008 at 9:05 pm | Permalink

    Nerdy Sunday keeps getting nerdier – I’m loving it! Keep up the good work :)

  3. 3
    Posted October 13, 2008 at 10:46 pm | Permalink

    Possum, the critical question is: why smooth?

    For me, the answer is because political opinion polling series are typically very noisy. For any individual poll, a one or two point movement on the previous poll is more likely to be noise than a reflection of a genuine change in public opinion. Indeed, the volatility in political opinion polling data suggests that the actual error is much larger than the statistically predicted margin of error. Smoothing helps look me beyond the noise.

    I prefer Henderson over LOESS because of the way in which it manages the end-of-series problem. As you note, Henderson and LOESS in mid-flight are effectively the same thing. But the LOESS function weights the last data point in a series much heavier than Henderson. If the purpose of smoothing is to deal with noise, the Henderson end=point weighting makes more sense than LOESS. In this context it is worth noting that a number of independent studies show that LOESS typically has a larger end-point correction factor than Henderson (as data points move from being at the end of the series to somewhere in the middle).

    Just looking at the curve, I suspect that while the Cosinus Kernel function is a useful smoothing function (when smoothing is seeking to address other issues) it is not going to significantly reduce noise.

    Finally, you use the term “lag” in quite different ways, to refer to different phenomena.

  4. 4
    Posted October 14, 2008 at 9:38 am | Permalink

    The two reasons I use smoothing are firstly to cut through the noise like you do with HMA, but secondly (and the reason I’ve chosen local regressions) is to attempt to identify real shifts in public political opinion in as close to real time as is possible. The end of series problem can be an issue with any type of local regression or nearest neighbour type analysis – although it mostly depends on not only the underlying nature of the data itself, but the bandwidth and polynomial degree of the regressions or the structure of the kernel. That’s why I run both a high and low sensitivity series.

    Since I’m particularly interested in the last, say, 5 points of data at any given time – the local regressions give me pretty much a HMA style smoothing at the back end of the series (as we can see from the charts, it’s virtually identical) but is more couragous in it’s adaption to the last few points. If the regression get’s it wrong or gets too bullish one way or another, it all washes out as more data comes in. It did pick up the one week nature of the Turnbull bounce which my HMA didnt, and which is increasingly appearing to be the reality as more polling data comes in – so while it isnt always right at the end points, it often is which is what I’m really looking for with it.

    The Cosinus kernal appears to need really dense data sets to unleash itself properly – I’ve been having some fun with it on US polling and it’s been performing really well. Next Fed election our polling density should increase so it will be good to have a play with.

    Stats, IT and economics all have different nuances over the meaning of lag. Oh the joys of the English language.

Post a Comment

You must be logged in to post a comment.