The latest fashion in electoral projections is to use data other than polls. For example, CNN unveiled their electoral map today, and declared it was based on a variety of factors, of which polling is only one:
The map is based on analysis of several factors, including polling, voting trends, ad spending, candidate visits, and guidance from the campaigns and political strategists.
Um, yeah. Good luck with that. Note to self: ignore CNN's electoral projections, if this is their bizarre methodology.
More interestingly and compellingly, Fivethirtyeight.com has also gained quite a bit of notoriety in recent weeks. Once again, no surprise, DIY election analysis trumps the stuff found on national news outlets. According to the 538's FAQ, it's projections also include a variety of other factors instead of just polling averages:
There are several ways that the FiveThityEight methodology differs from other poll compilations. Firstly, we assign each poll a weighting based on that pollster's historical track record, the poll's sample size, and the recentness of the poll. More reliable polls are weighted more heavily in our averages. Secondly, we include a regression estimate based on the demograhics in each state among our 'polls', which helps to account for outlier polls and to stabilize the results.
Now, I am a bit of a skeptic about any methodology that does not simply average polls. My basic reason for this is that, back in 2004, in my electoral forecasts at MyDD, I included 2000 results and "the incumbent rule" as counterweights to polling averages. While a simple poll averaging methodology would have resulted in forecasting the correct national popular vote and the winner of every state except Wisconsin, my methodology resulted in incorrectly forecasting Florida, Iowa and New Mexico, along with the national popular vote. I freely admit that my experience in this regard has biased me against any electoral forecasting methodology that includes data other than polls. Now, however, I have the data to back it up this guy feeling of mine.
In the FAQ for fivethirtyeight.com, Poblano writes the following:
Well, I still think you're making a mistake by using 'old' polls. It is your right to think that, but I'd challenge you to present a case based on the evidence. When I attempted to mimic the Real Clear Politics method -- including only the most recent poll from among pollsters who conducted surveys within 10 days of the election -- I found that the average error in my state-by-state projections would have increased by about half a point (from 2.4 points of error to 2.9) over 2000-2006.
This afternoon, I compiled a case based on the evidence, and developed a methodology based purely on poll averaging that produced an average error of 2.0, which is 0.4 lower than Poblano's (and a zillion times superior to CNN's insane methodology). Here are the parameters of the polling data that I included in my study:
- Only use polls that were conducted entirely during the final week of the 2004 and 2006 elections. For 2004, this means October 25th, 2004 through November 1st, 2004. for 2006, this means October 31st, 2006 through November 6th, 2006.
- The nine Senate races in 2006 that were decided by 10% or less.
- The eight Governors races in 2006 that were decided by 10% or less (does not include Idado, since no polls were conducted for that campaign entirely during the final week of the election).
- The nineteen states that were decided by 10.0% or less in the 2004 Presidential election. This does not include Delaware or Hawaii, since no polls were conducted in those states entirely during the final week of the election.
- The six Senate campaigns in 2004 that were decided by 10% or less. This does not include South Carolina, where there were no final week polls, and it also does not include Louisiana, where Vitter won by 1% or 22%, depending on the way one counts.
For all 42 of these statewide campaigns, I subtracted the simple mean of the polls conducted entirely during the final week from the final result of the election. The absolute value of these 42 numbers were then added up, and divided by 42. The result was an average error of 2.0. Election results were taken from CNN.com and Dave Leip's Election Atlas. Polls were taken from Real Clear Politics and Pollster.com. If one polling organization conducted multiple polls during the final week of the election, then only the final poll from that organization was included in the average.
The data that I used for this quick study can be viewed here.
So, while my previous experience makes me biased toward only using the simple mean of recent polls in order to forecast statewide general elections, I believe the evidence also supports me. There are also good, deductive reasons for believing that only using polls conducted in the final week of the campaign is the most accurate method for predicting election results:
- Polls are the only scientific measurement of public opinion in an election before the election takes place. Introducing any other element renders the study deductive and based on assumptions, rather than inductive and scientific.
- Polls measure a snapshot of public opinion only during the time period when they were taken. As such, including polls more than one week old in a predictive analysis simply does not make sense. The goal is to predict the final result, not what the result would have been if the election had been held a week or two earlier.
- Before 2004, there simply were not many polls taken during the final week in statewide general elections. In most cases, only one or two polls were available for statewide campaigns in 2000 and 2002. As such, the lack of data would increase the potential for error in the final polling averages, resulting in a higher margin of error for any study that includes pre-2004 campaigns.
- I did not look at elections where the final margin was greater than 10.0%. For my purposes, polls only need to be be highly accurate when they are looking at close elections where the final result could go either way. As such, polls for campaigns that were decided by double-digits are not useful for forecasting the winner of elections.
- Early voting does not impact these results, since polls conducted in the final week of an election always have subsets of people who already voted.
- It is not necessary to correct for demographic imbalance, since polls are also measuring the projected demographics of the electorate. If there are enough recent polls, collectively they will produce a very accurate projection of the make-up of the electorate.
- Favoring differing polling firms based on the accuracy of their past results is not always wise, given that the future performance of a polling organization can be different from their previous performance. For example, in 2000, Rasmussen was the least accurate organization for the national popular vote, while Zogby was the most accurate. My, how times change.
And so, for all of these reasons, and because of the data backing it up, I will stick with taking the simple mean of the most recent polls in a state for my electoral forecasts. While it isn't as sexy as adding a lot of special sauce, ala CNN and 538, I believe, and the evidence shows, that it is the most accurate methodology. Sometimes, the simplest solution really is the most accurate one. Until someone can present me evidence indicating otherwise, this certainly appears to be one of those cases.
My current Presidential and Senate forecasts are based on pure poll averaging methodologies. As the election moves forward and more state by state poll data becomes available, I believe these will be the most accurate projections around. Right now, in many cases there isn't enough state-level polling data for them to be highly reliable, but that will change the closer we move to the election. |