Now that all of the counting is finally done for the 2008 elections, it is possible to compare how different election forecasters fared. The three I have long been most interested in comparing are:
My method, which takes the simple mean of all non-campaign funded, telephone polls that were conducted entirely within the final eight days of a campaign. My rationale for this method is described here: No Special Sauce Needed For Electoral Projections. This is an intentionally rudimentary "election forecasting for dummies" method that anyone can reproduce.
Pollster.com, which uses all polls ever conducted in a state, and creates a regression line based on those polls. This is the ultimate "don't cherry pick polls and don't argue with polls" method. It was developed by a professional pollster and a political scientist, and is explained here.
Fivethirtyeight.com, whose complicated methodology is essentially the opposite of Pollster.com's: adjust every poll based on demographics, previous house effects, and previous error rate.
How did these three distinct prediction methods fare against each other? Results in the extended entry.
To compare the three methodologies, I looked at the 65 Presidential and Senatorial campaigns that ended on November 4th, 2008, for which at least one non-campaign funded telephone poll was in the field entirely from October 27th through November 3rd. This was done to create an apples to apples to apples comparison where, for all 65 campaigns, there are either publicly available predictions or publicly reproducible predictions (Pollster.com and Bowers don't work when there are no polls, and 538 didn't forecast House or Governor campaigns). The final predicted margin was used for all campaigns to maintain the apples to apples to apples comparison, since not every website predicted the final percentage for each candidate in every campaign. The mean and median errors were calculated for each method, and results were also sorted based on how many polls were available for each campaign. That last bit was done to try and answer the age-old question, "can polls be combined to create more accurate forecasts?"
Using the median prediction error, all methods were very accurate. Once four or more polls are available, all methods can predict the final margin within less than 1.7%, plus or minus, most of the time.
Mean Error
# of Polls
Bowers
538.com
Pollster
# of Cases
1 or more
3.99
3.28
3.34
65
2 or more
3.03
2.72
2.77
44
3 or more
2.97
2.41
2.37
31
4 or more
2.00
1.69
1.92
24
5 or more
1.55
1.37
1.65
17
6 or more
1.44
1.21
1.69
14
7 or more
1.32
1.09
1.40
8
Overall error is noticeably higher using the mean, mainly due to occasional extreme outliers, such as a bad Research 2000 poll of the Wyoming Senate races (check the data to see just how bad). Once again, error decreases in direct correlation with an increase in the number of polls. Also, once again, when four or more polls enter the equation, error drops to plus or minus 2%, or even less.
Here are what these numbers tell me:
Combining polls works: There is a minority belief that polls cannot be combined to produce more accurate results, due to different methodologies and sample sizes. This is clearly wrong. All three methods combine polls, and all three methods become more accurate every time an additional poll is introduced to the equation. So, don't ever let anyone say "combining polls doesn't work." Clearly, it does.
Four is the magic number? Looks to me like having four polls or more polls on a general election campaign dramatically increases the forecasting accuracy for that campaign. If there are four or more polls in the final week, you can use pretty much and method and hit the target.
538 and Pollster.com even, I'm further back: Pollster was equal to 538 when all campaigns are included (the "1 or more" line) and with all campaigns except the outliers (the "2 or more" line). Kind of funny that not adjusting any of the polls, and adjusting all of the polls, results in the same rate of error. To no one's surprise, my method was much better among more highly polled campaigns, but still about 10% behind the other two once poll averaging (2 polls or more) comes into play. I make no pretense about my method needing polls in order to work.
Anti-conventional wisdom: 538 had the edge among higher-polled campaigns, which means Pollster.com was superior among lower-polled campaigns. This goes against conventional wisdom. Many thought Silver's demographic regression gave him an edge among less-polled campaigns, but that Pollster's method only worked well in heavily polled environments. Turns out the opposite was true, and I'm not sure why. Maybe Silver's demographic regressions don't work, but his poll weighting does. Or something.
Still very close: While I was a little behind, the difference between the methods is minimal. I'm a little disappointed, but clearly anyone can come very close to both 538 and Pollster.com in terms of prediction accuracy with virtually no effort. Just add up the polls and average them. It is about 90% as good as the best methods around, and anyone can do it.
Even with this all in mind, the worst thing a forecaster can do is to sit on his or her laurels, and not look through the numbers to produce a more accurate methodology. These numbers might point the way to an even better forecasting method than these three, and I am eager to try and find it in advance of 2010 and 2012.
Update: I just realized that these numbers actually mean that 538 and Pollster.com are equal, not that 538 was better ("1 or more" means all polls, not just those with one poll). Article updated to reflect this. Also, the best way to judge my method against the other two is the "2 or more" polls line, since that is the moment when polls are "averaged." Given this, I was about 10% behind. Not bad for never taking a statistics course, and for only spending between one-third to one-half half my election writing on forecasts.