### forecasting the result (54-46), by pooling and trending the polls

## Monday November 19, 2007

So its the last week of the campaign, perhaps high time to roll out some pooling of the polls, as I did in 2004 (here and here).

The bottom line: yes, there is a trend back to the government, or at least there was early in the campaign, but its too little, too late. Labor is headed for about 54% 2PP nationally. So barring something spectacular this last week (and we will get a couple more national polls before the weekend), or the Coalition pulling off the greatest marginal seat “run-of-the-table” in Australian electoral history, Labor will win, and win comfortably.

Â

Here is a thumbnail of one of several graphs appearing below the fold, along with much more detail.

Â

A 95% bound around that forecast goes down as low as the high 52s, and as high as the low 56s. Recall that the 1983 result was 53.2% 2PP for Labor, and 1972 was in the high 52s. There has been a trend back to the government over the formal campaign, and indeed, since a Labor peak in September.Labor has been tracking above 50% 2PP since late March 2005, just after the first of six interest rate rises since the last election, surging into the high 50s after Rudd assumed the leadership. Again, keep in mind that Labor’s best 2PP result in the post-WW2 era is the 53.2% recorded in 1983, Hawke vs Fraser), and that Labor got just over 47% 2PP in 2004.

Â

We’re looking at almost a 7% swing, in 2PP terms, which would make it one of the biggest “swing” elections in Australian political history as well (Labor got a 7.1pp swing in 1969, but failed to win office; Labor suffered a 7.4pp swing against it in 1975; Howard won office with 5.1pp swing in 1996).

Ok, details: I’ve augmented the simple statistical model I used in the earlier research to include a trend term, which lets us make forecasts (although we’re only five days out now, so its not a particularly bold forecast, and we’ll get some more national polling between now and then, but anyway…)Â Â I have the results of every poll published by the Big Four (Galaxy, Morgan, Newspoll and Nielsen) since the 2004 election. I toss out Morgan face-to-face polls, since they are far too pro-Labor. I keep the Morgan phone polls. I also keep the Nielsen on-line polls.

Â

In earlier work I’ve attempted to correct for house effects, and I’ll do so again this cycle, but post-election.Â For now, I assume no house effects in the two-party preferred estimates, which may not be crazy: (a) my analysis of the 2004 data showed the average of Galaxy, Newspoll and Nielsen did very well on 1st preferences: (b) the industry seems to have settled on assigning 2nd preferences to minor party voters and independents using previous elections as a guide (e.g., about 80% of Green preferences go to Labor etc). So the assumption (hope?) is that a simple pooling of the non-face-to-face work done by the Big Four is an unbiased reflection of the true 2PP breakdown. And by the way, this is the implicit assumption in all the other poll averaging work out there (Reuters, Mumble’s poll-mix for Crikey, Possum).

Mumble’s work differs in that he uses the pooling to get ensemble estimates of first preferences, and then he allocates 2nd preferences (i.e., use the published polls only to get their first preference shares). This is the better way to do it, and I’ve almost got that going as well; perhaps another couple of hours of work.

I also exploit the fact that on October 9, 2004 (the date of the last election), we know/knew the true state of vote shares, given by the actual election result itself. That is, we’re tracking a moving, hidden target with noisy sensors (samples of the electorate), except that once every 3 years or so the target reveals itself.

Â Â

So here we go, pictures. Solid lines are daily tracks. Dotted lines are 95% confidence intervals. Confetti dots show individual poll estimates. My thanks to Andrea Abel (a graduate of the University of Sydney, now enrolled in the Stanford PhD program) and Peter Mumble Brent for some Nielsen and Galaxy data.

Labor’s two-party preferred vote share, 2004-2007:

Zoomed-in on the period since Sept 1, 2007:

Estimate of the trend: daily rate of change in Labor 2PP, percentage points per day:Â

Trend estimate: zoomed in on period since September 1, 2007. Â We see evidence of a trend away from Labor from September 1 onwards, reaching its peak in the first week of the campaign, at which point Labor has shedding about 0.05pp of 2PP vote share, or about a percentage point every 3 weeks. It looks like that trend is continuing, but the evidence for it is weaker later in the campaign. And in any event, 1 percentage point every 3 weeks, or 2 percentage points over the course of the campaign isn’t enough to bring the election back for the government.

Â Â

Bottom line: Barring something amazing in this last week, or a marginal seats miracle, Labor will win, and win comfortably.Â

[…] Jackman (who wrote a paper after the 2004 election called “pooling the polls”) has crunched the data for 2007, pooling all the Galaxy, Nielsen, Morgan and Newspoll polls into a single…. This is handy if you think that some pollsters have “house biases”, causing them to […]

[…] On a more serious note, that Scintillating Psephy from Stanford, Simon Jackman, has ‘pooled the polls’ and predicted the parliament… so to speak. I’m not going to tell you what he came up with, you’ll have to go and take a look for yourself […]

Fascinating stuff. Like many others, I can’t wait to observe the Transit of Australian Politics, which comes only once every three to four years. Best viewed from Tahiti, or Stanford, or somewhere like that. A great way to test measurements of latitude / attitude.

For the record, I’m viewing this one from up close, or at least much closer than Stanford. I’m in Sydney as I type this…

If you “adjust” for House Effects (defined as the long-term deviation from the mutual mean), then the projected TPP goes up by about 0.2%age points. With “adjustment”, you could add Morgan back in and reduce the error margins.

Simon its nice to see someone averaging the polls. However, I have some doubts about the method that you have used. The local linear trend model is based on the assumption that the growth rate of the variable of interest (in this case the ALP two party preferred vote share) is a random walk! There are two points to make about this.

The first point is that it implies that the ALP two party preferred vote share is integrated of order two. If you simulate this model then as the time horizon increases all of your simulated values will lie outside of the 0-100 per cent region.

The second point is that the way these models fit the data is by choosing a small value for standard deviation of the inovation into the growth rate relative to the standard deviation of the measurement error. This is what makes the curve smooth. Essentially what you are assuming is that the true, but unobserved, voting pattern changes only slowly. I am not a political scientist but this seems to conflict with the notion that a lot can happen in the last week of a campaign. My belief is that alternative methods of averaging the data might give considerably larger confidence intervals about your prediction.

Despite these quibbles I think that averaging of the polls is a useful thing to do.

I began to be convinced by the polls analysis last week (your analysis and Geoff Lambert’s ‘History shows the polls don’t lie (II)’ are what finally convinced me). Howard’s past seems to have finally caught up with him.