# Nerdy Sunday - When Trends Go Bad

We often use the phrase “the trend is your friend” when analysing noisy data, primarily because it’s a pretty good rule of thumb for the type of polling, economic and dem

Jan 17, 2010

We often use the phrase “the trend is your friend” when analysing noisy data, primarily because it’s a pretty good rule of thumb for the type of polling, economic and dem

We often use the phrase “*the trend is your friend*” when analysing noisy data, primarily because it’s a pretty good rule of thumb for the type of polling, economic and demographic data we usually deal with round these parts. Yet sometimes, with certain types of data that exhibit autocorrelated random noise, the “trend”, particularly any local trend witnessed across a relatively short time period, can be extremely deceptive.

To demonstrate, first up we’ll create a really simple time series of data that will become the “*reality we are trying to find*” for the rest of the post. It will be the “real trend” we will try to find after we swamp it with random noise.

This trend is pretty simple – at observation zero it has a value of zero, at observation 1 it has a value of 0.05, at observation 2 it has a value of 0.1 – each observation the value of this series increases by 0.05. If we look at the first 20 observations of this series, it’s a standard straight line:

Next up, we need to create some random noise to overlay onto this trend – but we need to make that random noise similar to the sorts of wandering behaviour that regularly infects real world data series. This requires a two stage process, the first part of which is to simply generate some random numbers. For every observation, we will generate a random number between 1 and minus 1 inclusive, to 1 decimal place – so the numbers that will be randomly generated will be out of the set (-1, -0.9, -0.8, -0.7, -0.6, -0.5, -0.4, -0.3, -0.2, -0.1 0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1)

Our first 20 observations of our random number series look like this:

Next, we’ll **autocorrelate** this series of random numbers for each observation *(EDIT: autocorrelation is where the the value of a series at any given point is correlated with, in this instance, it’s most recent value)*. To start, our zero observation will have a value of zero and our first observation of our autocorrelated series will have the same value as the first observation of our random number series – which in this case is -1.

That’s just required to start the series off. Now, the value of observation 2 will be the 2nd observation of the random number series (0.8) PLUS the previous observation of our autocorrelated series (-1), to give us a value of -0.2. The 3rd observation of our autocorrelated series will be the 3rd observation of our random number series(0.9) plus the 2nd observation of our autocorrelated series (-0.2) …..etc etc. The value of the Nth observation for the autocorrelated series will be the Nth value of the random number series plus the (N-1)th value of our autocorrelated series.

This is what the first 20 values of this now autocorrelated random noise looks like.

It’s worth noting at this stage that the noise involved is much larger than the observation by observation increase in our “real trend” data. Our real trend data increases by 0.05 every period, but the noise can change by as much as 1 or minus 1, a 20 times larger increase than our real trend data, but in either direction.

What we do next to get a noisy version of our real trend data is simply to add the autocorrelated random noise series to our real trend series. If we do this for 300 observations, we can compare how our Actual Trend (the real data we are actually trying to find) compares to the Trend + Random Noise (the type of series that we would see in real life)… *(click to expand)*

What I’ve also added here is a simple linear regression line through that Trend+Random Noise series. What is interesting at this stage is how the noisy series is behaving fairly differently to the Actual Trend series that is its underlying basis.

If you were an economist, or a statistician or some other data scientist and you were given a series of data to analyse, it will often look like the red series above. The problem to be solved is what the real, underlying trend data might actually be. We know what the real trend is – it’s the blue line marked “Actual Trend” – but trying to figure that out without knowing it can be a complicated process, especially when the series has a lot of autocorrelated noise in it.

Let’s now look at the first 800 and 1500 observations:

As the number of observations increases, the noisy series slowly starts to converge on to the Actual Trend. If we now move on to the first 5000 observations, it really starts to become apparent:

Over an infinite number of observations, the regression line through the “Trend + Random Noise” series would become identical to the Actual Trend that is our underlying data we would be attempting to find.

What is worth pointing out at this stage, is that even over large numbers of observations, series with autocorrelated random noise (or series where more than one thing is influencing it) can produce local trends that are not representative of the the real Actual Trend underlying the data – trends can be deceptive even after hundreds or thousands of observations depending on the nature of the time series we are looking at.

Let’s now take a look at a small subsection of the series and run the numbers again, but where the regression line for Trend+Random Noise is just taken over the sample that is on the chart – it’s 260 observations long.

The local regression would have us believe that there has been a decline in the value of the series over this period, and a statistically significant one at that – even though we know that the Actual Trend has continued to increase at 0.05 units per time period. If this occurred in the real world with some real, important data series – we’d have 3rd rate columnists in the the press banging on about “Teh Decline!!!!!!”, accusing those professionally trained people that would be attempting to point out the “Actual Trend” as being dishonest conspirators.

This brings me back to the problem of people talking about things like “temperature decline since 2001” and the type of arguments used by some **in comments in our post about it**.

Folks, it’s an exercise in whistling out your arse – you are essentially arguing about random variation so large that it swamps any underlying trend.

Hopefully, from this, you can see why and how that happens, and why and how it is not only futile, but meaningless.

What is important is the larger trends over larger time spans and any structural change that might be measurable in a series, not twaddle about the behaviour of random noise over a short time frame.

Powered by Taboola

You must be logged in to post a comment.

Not already subscribed? Get your free trial, access everything immediately

You must be logged in to post a comment.

Not already subscribed? Get your free trial, access everything immediately

seanAs clean and clear an explanation as you can get. Like calculus, the non-intuitive nature of statistics trips up the armchair expert. Very useful, thanks!

Rocket RocketYes, I love the way various people have to be increasingly selective in their time periods and “smoothing” periods to get the answers they want!

As time goes by, they look increasingly silly. Eventually they tend to move on to Phase II – “Yes, the climate is warming, but it is not caused by human activity” (“And that’s what I’ve always said” – lots of a***-covering when their “good data gos bad”)

SteveThanks Possum, I am a lurker on this site and find your posts extrmemly valuable.

I do not know much about stats, though I am trying to learn. My understanding is that linear regression not only gives a trend but also a measure of the possible error in the trend estimate.

Is this true and if so what is the error value for the temperature trend from 2001 that Bolt and others bang on about?

Just Me“Thanks Possum, I am a lurker on this site and find your posts extrmemly valuable.”

Me too. Lovely clear explanations of things statistical that I do not pretend to understand well.

Chris FryerMy projection is that the deniers will completely ignore this post. 🙂

Captain ColSo Possum, what’s the ‘correct’ way to observe climate change data if a decade is too short? If the long term trend changes for some reason, how long will it take to notice that we should have a second trend line. Presumably, we are allowed to have several trend lines to depict interim variations to the single long term one.

FirstdogYou are the wind beneath my wings

Possum ComitatusThankyou Dog. I am glad, however, that I’m not the wind beneath your angry underpants 😛

thewetmaleVery nice post Possum. Very nice indeed.

The only problem is that you STILL haven’t proved global warming to be real! 😛

Perhaps we need to call Sophie Black and get her to shut your blog down until you actually answer these important and pressing questions.

Sam CliffordYes, that’s all well and good but why haven’t you used a fourth order polynomial?

Possum ComitatusBecause I was too busy planning to use a 6th degree polynomial!

http://scienceblogs.com/deltoid/2009/01/the_australians_war_on_science_32.php

😀 😀

Sam CliffordSixth order? That’s just too powerful a statistical tool for warmists to disagree with. The IPCC may as well hang up its boots when papers start publishing sixth order polynomials. Just as well it wasn’t a 7th or Al Gore would’ve been banished back to the fiery bowels of Tennessee.

Possum ComitatusAnd not just any paper -The Australian. Comes with extra gravitas you know!

SteveCaptain Col,

Tamino has addressed the question of how much time is requred before a clear trend in climate atmospheric data can be seen.

The answer is about 15 years as Tamino shows in the post at the link below:

http://tamino.wordpress.com/2009/12/15/how-long/#more-2124

cud chewerSteve @14, thanks, that’s exactly the post I’ve been looking for for a long time.

cud chewerPossum, it would be fun to go further and show how many crimes can be committed with polynomial fits 🙂

Sam CliffordThat is a brilliant post. Of course, people will misread the plot of starting time versus estimated warming rate as saying that the temperature has been constant since 1975 and that since 1993 the world has been cooling and that the sharp peak at the end is measurement error.

SteveHere is a piece by James Hansen and others discussing a range of issues including the topic of this post.

http://www.columbia.edu/~jeh1/mailings/2010/20100115_Temperature2009.pdf

Note that according to GISS 2009 was the second warmest year on record (after 2005) but its temperature was so close to 1998, 2002, 2003, 2006, and 2007 that all of them (and 2009) are considered equal second hottest.

cud chewerFor me it makes more sense, when dealing with someone who is honestly confused about the issue, to start with the proposition that the Earth is gaining heat energy, and then to ask the question “where does that energy go?”.

Only a minority goes into heating the air. A lot goes into heating the oceans. Some goes into melting ice. Some goes into biological or chemical processes. Its pretty easy to explain then that the amount of energy that goes into the air can vary from year to year but the net gain of the whole planet is pretty stable and predictable. Over time though, the air has to get warmer.

It would be nice if the scientific community were to popularise the concept “radiative balance”, or at least put it into more palatable phrases like “global heating” which emphasises the cause and not (one of) the effects.

Regarding the saturation of the effects of CO2, this one explains things clearly (although its quite a read). To put it simply, the atmosphere is a lot more complex than a test tube.

http://www.realclimate.org/index.php/archives/2007/06/a-saturated-gassy-argument/

DanThanks Poss

That last thread made me so angry

We love you man (you too firstdog)

🙂