Bowers Vs. 538 Vs. (Updated)

by: Chris Bowers

Mon Jan 05, 2009 at 19:56

Now that all of the counting is finally done for the 2008 elections, it is possible to compare how different election forecasters fared. The three I have long been most interested in comparing are:
  1. My method, which takes the simple mean of all non-campaign funded, telephone polls that were conducted entirely within the final eight days of a campaign. My rationale for this method is described here: No Special Sauce Needed For Electoral Projections. This is an intentionally rudimentary "election forecasting for dummies" method that anyone can reproduce.

  2., which uses all polls ever conducted in a state, and creates a regression line based on those polls. This is the ultimate "don't cherry pick polls and don't argue with polls" method. It was developed by a professional pollster and a political scientist, and is explained here.

  3., whose complicated methodology is essentially the opposite of's: adjust every poll based on demographics, previous house effects, and previous error rate.

How did these three distinct prediction methods fare against each other? Results in the extended entry.

Chris Bowers :: Bowers Vs. 538 Vs. (Updated)
To compare the three methodologies, I looked at the 65 Presidential and Senatorial campaigns that ended on November 4th, 2008, for which at least one non-campaign funded telephone poll was in the field entirely from October 27th through November 3rd. This was done to create an apples to apples to apples comparison where, for all 65 campaigns, there are either publicly available predictions or publicly reproducible predictions ( and Bowers don't work when there are no polls, and 538 didn't forecast House or Governor campaigns). The final predicted margin was used for all campaigns to maintain the apples to apples to apples comparison, since not every website predicted the final percentage for each candidate in every campaign. The mean and median errors were calculated for each method, and results were also sorted based on how many polls were available for each campaign. That last bit was done to try and answer the age-old question, "can polls be combined to create more accurate forecasts?"

The data for this comparison can be found here:

Prediction error rates: Bowers vs. 538 vs. (PDF)

Here were the results:

Median Error
# of Polls Bowers Pollster # of Cases
1 or more 2.55 2.23 2.23 65
2 or more 2.26 2.17 2.09 44
3 or more 2.43 1.61 2.05 31
4 or more 1.57 1.34 1.68 24
5 or more 1.37 1.15 1.43 17
6 or more 1.26 1.12 1.50 14
7 or more 0.98 1.12 1.17 8

Using the median prediction error, all methods were very accurate. Once four or more polls are available, all methods can predict the final margin within less than 1.7%, plus or minus, most of the time.

Mean Error
# of Polls Bowers Pollster # of Cases
1 or more 3.99 3.28 3.34 65
2 or more 3.03 2.72 2.77 44
3 or more 2.97 2.41 2.37 31
4 or more 2.00 1.69 1.92 24
5 or more 1.55 1.37 1.65 17
6 or more 1.44 1.21 1.69 14
7 or more 1.32 1.09 1.40 8

Overall error is noticeably higher using the mean, mainly due to occasional extreme outliers, such as a bad Research 2000 poll of the Wyoming Senate races (check the data to see just how bad). Once again, error decreases in direct correlation with an increase in the number of polls. Also, once again, when four or more polls enter the equation, error drops to plus or minus 2%, or even less.

Here are what these numbers tell me:

  • Combining polls works: There is a minority belief that polls cannot be combined to produce more accurate results, due to different methodologies and sample sizes. This is clearly wrong. All three methods combine polls, and all three methods become more accurate every time an additional poll is introduced to the equation.  So, don't ever let anyone say "combining polls doesn't work." Clearly, it does.

  • Four is the magic number? Looks to me like having four polls or more polls on a general election campaign dramatically increases the forecasting accuracy for that campaign. If there are four or more polls in the final week, you can use pretty much and method and hit the target.

  • 538 and even, I'm further back: Pollster was equal to 538 when all campaigns are included (the "1 or more" line) and with all campaigns except the outliers (the "2 or more" line). Kind of funny that not adjusting any of the polls, and adjusting all of the polls, results in the same rate of error. To no one's surprise, my method was much better among more highly polled campaigns, but still about 10% behind the other two once poll averaging (2 polls or more) comes into play. I make no pretense about my method needing polls in order to work.

  • Anti-conventional wisdom: 538 had the edge among higher-polled campaigns, which means was superior among lower-polled campaigns. This goes against conventional wisdom. Many thought Silver's demographic regression gave him an edge among less-polled campaigns, but that Pollster's method only worked well in heavily polled environments. Turns out the opposite was true, and I'm not sure why. Maybe Silver's demographic regressions don't work, but his poll weighting does. Or something.

  • Still very close: While I was a little behind, the difference between the methods is minimal. I'm a little disappointed, but clearly anyone can come very close to both 538 and in terms of prediction accuracy with virtually no effort. Just add up the polls and average them. It is about 90% as good as the best methods around, and anyone can do it.

Even with this all in mind, the worst thing a forecaster can do is to sit on his or her laurels, and not look through the numbers to produce a more accurate methodology. These numbers might point the way to an even better forecasting method than these three, and I am eager to try and find it in advance of 2010 and 2012.

Update: I just realized that these numbers actually mean that 538 and are equal, not that 538 was better ("1 or more" means all polls, not just those with one poll). Article updated to reflect this. Also, the best way to judge my method against the other two is the "2 or more" polls line, since that is the moment when polls are "averaged." Given this, I was about 10% behind. Not bad for never taking a statistics course, and for only spending between one-third to one-half half my election writing on forecasts.

Tags: , , , , , , (All Tags)
Print Friendly View Send As Email

very nice (4.00 / 3)
This has been very interesting.  

New Jersey politics at Blue Jersey.

With all due respect to Nate Silver... (4.00 / 3)
I think Bowers has demonstrated that Silver's approach, while very sound, wasn't exactly the unprecedented wizardry that the Silverites have made it out to be. It was just common sense math. Open Left (OpenLeft?)is still the font of wisdom as far as I'm concerned.  

it appears that Silver's method (4.00 / 3)
basically adds the benefit of a additional poll.  I'd say that's impressive, but of course your mileage may vary.

New Jersey politics at Blue Jersey.

[ Parent ]
Silver's use of regressional analysis elsewhere (4.00 / 3)
I think it strengthens the idea that Silver was doing good work earlier with regression analysis, to the skepticism of some who doubted his methodology, when he claimed things like that the Dean 2004 coalition had migrated more to Obama than any other candidate in 2008 and that Obama would be a stronger general election candidate because he could win without nailing Florida and Ohio while Hillary Clinton really needed those states.

His work also points to the importance of changing demographics, something that has been stressed repeatedly on this site.

Things You Don't Talk About in Polite Company: Religion, Politics, the Occasional Intersection of Both

[ Parent ]
It's funny that some people think you can't use more polls (4.00 / 6)
to get a more accurate answer. Doing this is a very common statistical technique (used in climate and weather forecasting all the time) and there's good theory behind it for why it works. And actually, the fact that different polls have different methodologies and samples is exactly why adding them together improves accuracy. Each time you do that, it's like you're adding another independent observation to the sample, so your overall mean should then be closer to the true (or theoretically expected, the expectation in statistical lingo) result. Otherwise if every poll had the same type of sampling methodology and weighting schemes, you'd run the risk of introducing more bias into the result. This way, the biases of individual polls are more likely to sort of cancel each other out.

Good job Bowers. Personally, I like your's and pollster's method the best because of the transparency and because of pollster's presentation.  

Sizzle, Not Steak (4.00 / 3)
Basically, what Nate sells is the sizzle, much more than the steak.  This comparison looks at the steak, and finds only minimal difference.

But precise because it's so simple and transparent, there's not so much be mesmerized by.

Sort of leaves you more time to talk about other stuff.  Such as, I don't know, issues maybe?

"You know what they say -- those of us who fail history... doomed to repeat it in summer school." -- Buffy The Vampire Slayer, Season 6, Episode 3

[ Parent ]
I think there is a little more to it than that (4.00 / 5)
I think what Chris' analysis shows is that Silvers method works a little better in low polling environments. Since it takes all the information in to predict each race, it can borrow information from more recent polls of other races (or other states in the Presidential). Now, in the last week of a presidential election, thats pretty meaningless, since there are tons and tons of polls. But in some of the Senate races and in the presidential race before the last few weeks, we are in more of a low poll environment.

One of the things I really liked about the Silver model is that it could learn from the polls that came out each day and interpret what it might mean for the race as a whole.

Now it may not have been that much better than the other two, so it might be moot. But I think that most of the time we care about polls, we are in a fairly low poll environment, so we want the best model possible. That said I really enjoyed Chris's posts on his method.

Chris, I hope you aren't in too much pain. There is a noticable increase in spelling mistakes in your posts since you broke your arm. Not quite Yglesias levels, but still significant. :)

[ Parent ]
Both of my arms are broken and so are both of the fingers I use to type, that is now my official reason for posting so ba0ly. (0.00 / 0)


The government has a defect: it's potentially democratic. Corporations have no defect: they're pure tyrannies. -Chomsky

[ Parent ]
that's a bit glib (4.00 / 5)
Chris' method was very good, but Nate's had some definite advantages. For one thing, 538 was different from the other sites in that it tried to predict the outcome of the election, even months in advance, whereas everyone else could do no more than describe the state of the horse race at a given moment. Much as I liked Chris' occam-shaved method, it was less useful in like June and July, when outlying polls would throw the map off considerably. Nate's demographic regressions corrected for those outliers, and quite effectively so, for the most part.

[ Parent ]
My Seat of the Pants Did Pretty Well, Too (0.00 / 0)
And I didn't even try.

I hear what you're saying, and I don't mean to diss Nate's site.  It was a fun place to visit.  But most other folks were intentionally refraining from prediction.  Just by standing back a bit and not refraining, I think you could tell that (a) this was going to be a Democratic year, (b) there were opportunities in the South and West, (c) McCain was going to have hard sledding in the Midwest.  In particular, I thought McCain was cooked, because he wouldn't win Ohio, but I also though he'd lose North Carolina if Obama really made an effort there.  (I was a bit afraid he'd slack off and just settle for Virginia.)  I also thought Florida was winnable.

Bottom line, while I could see Nate doing all kinds of tweaks to make his methods even more exact, I think he'll run up against a wall, in that you just can't tell if people will run a shitty or a spectacular campaign,  and that alone can easily overwhelm anything you can predict by any means I've ever heard of.

Macacca, anyone?  Poor President Allen!

"You know what they say -- those of us who fail history... doomed to repeat it in summer school." -- Buffy The Vampire Slayer, Season 6, Episode 3

[ Parent ]
To be fair Paul I thought many of the same thoughts but I also used the information from Nates to bolster my confidence. (0.00 / 0)


The government has a defect: it's potentially democratic. Corporations have no defect: they're pure tyrannies. -Chomsky

[ Parent ]
Small improvements are hard (4.00 / 8)
Imagine a runner shaving even 5% off their best time.

Nate would freely admit that while his fantasy baseball system is the best, it's only the best by a few percentage points. Here he's doing as much as 10-20% better, which isn't Earth-shattering, but it's very impressive considering the known and potential sources of error. It's not an accident, that's for sure.

Conduct your own interview of Sarah Palin!

More polls is better for predicting results... (4.00 / 1)
...ok, I'm not shocked by that at all...but how about the effect of increased polling on the actual outcomes.

What about the increased possibility that among the flooding of legitimate polls, more devious (like push polling) calls are able to get through taken as legitimate?  CW might tell some that this is wrong, that more polling means people are less likely to respond, but there is a counter effect, friends and neighbors become jealous of those that were polled, some are just impressed by the legitimate sounding names of the pollsters, which in some cases sound more legitimate than the real deal pollsters...

In general as a campaign manager/strategist, I find that the massive levels of polling being conducted in competitive races is playing an ever increasing role in influencing outcomes.  From early polling being conducted before a challenger has a chance to make any progress and then being hammered with that number for months as a handicap to his/her fund raising when it matters most to the simple effects of increasing name recognition and associations (valid or not) through repetitive polling/printing of polls.

Obviously as a strategist, I'm going to use such data when it is to my advantage, but the reality is that polling numbers early are given far too much credence by the media, pundits and the voting populace, thus making them more accurate.  It becomes one more obstacle for a challenger to overcome, when they are already fighting a massive battle versus incumbent advantage.

What's my point?  I'm not sure I have one beyond that I'm concerned that the excessive polling is not good and there really is no mechanism to prevent it.

All good, Chris (4.00 / 2)
Fine work all around. I gotta say I deeply miss the daily poll floods and the excitement and all.  That's probably because I suspected we had this thing in the bag since mid-summer.  I wasn't having much fun in 2004 and 2000.  Every once in a while I check the electoral map results just to bask in the crush Obama put on those bastards.  Massive margins over 2004 in so many states, red, blue and purple.  If he succeeds, he's gonna get 58 percent in 2012 ...

Prediction (4.00 / 2)
Even though Bowers and Pollster weren't doing "prediction," I think it would be great to see how each of the methods performed earlier in the race. Like a graph of the difference between each averaging method and the final result, with one data point per week for a couple months before the election. I think that would show even more strength for Nate's method, but who knows.

I was just thinking the same thing (0.00 / 0)
The numbers are so close because this year, the polling agencies have been very good. This wasn't the case in 2004. My head still hurts from the Daily Show where Steward asked Zogby: "So, who's gonna win the election?" Zogby (with great self-assurance): "John Kerry." This was with less than a week to go, if I remember right.

[ Parent ]
538 more accurate with most polls (0.00 / 0)
Would 538's increased accuracy relative to Pollster and Chris be produced by his use of house effects?

I can't see any reason why regression would work with few polls but not with more, so as far as I can see that's the only explanation. I'm willing to hear better theories, however, as I'm not exactly a polling savant.

Forgotten Countries - a foreign policy-focused blog

Very interesting. (0.00 / 0)
And impressive that you presented a detailed analysis even though it shows your own method coming in third. Good job.

I must say I'm pleasantly surprised that 538 actually did the best. I had been afraid, with the final Presidential margin pulling away from 538's prediction by something like a full percentage point after all the ballots got finished being counted, that its performance would be worse. It's not ahead by much -- it's tied on median error with Pollster and only a little bit better on mean error -- but hey, every little bit counts, and it turned out he did find a way to improve the accuracy further than Pollster's already innovative methods.

I'd be interested in seeing analogous charts with specific numbers of polls for each row rather than "X or more" -- so how well each method did for races with 1 poll, with 2 polls, with 3 polls, and so on (with a single "X or more" at the end, obviously, because otherwise it'll be a lot longer than useful). (I assume Bowers and Pollster should report the same result in cases with only one poll, right?)

By my reading, 538's method is worst in races with 2-3 polls (it's worse on mean and median error, respectively, compared to Pollter in those two cases), better in races with only 1 poll (where it recovers somewhat), and best on races with lots of polls, and I can't really tell where it's at for races with a number of polls in between 2-3 and lots (nor the value of lots), though maybe I could calculate it with a bunch of math.

Hopefully Nate himself will perform a dissection of his method at some point to show which parts of it were helpful and which of them weren't.


Open Left Campaigns



Advanced Search

Powered by: SoapBlox