As students of data science, we found the aftermath of the 2016 election fascinating in terms of the discussion major news outlets had regarding the polls. To begin with, we thought this Forbes article was interesting in admitting the shortcomings of polling, however, it did not do a good job in figuring out what the polling errors actually were in the 2016 election.
While there were vast amounts of polls sponsored by a plethora of companies before election day, there were only measuring two things. The first was national sentiment, determined by national polls. The second was individual state sentiment, determined by state polls. Now, this is where the hyperbole thrown out by the media about the polls being wrong needs clarification.
The national polls were pretty much as on-target as they have been on every other election. Clinton was predicted to win the popular vote by 3-4 percentage points, she ended up winning by over 2% for a difference of less than two percent. In contrast, the 2012 popular vote prediction was wrong by around 2.7%. Since national polls are almost always off in regards to the popular vote, a systematic error is assumed and should be taken as such in any analysis. Unfortunately, this margin of error was hardly reported by the media.
This systemic error occurs due to a combination of all three issues mentioned above. However, these errors are more prevalent in relation to polls assessing state sentiment. In regards to bias and error, suppose, for example, that the polls underestimated Clinton’s performance with Hispanic voters, but overestimated it among white voters without college degrees. These errors would almost cancel themselves out in a national poll. However, they could make or break a state poll. If these polls overestimated Clinton’s performance consistently in states with white voters without college degrees, and she also only had a tenuous lead there anyway, like in the Midwest, then suddenly the picture painted by the media looks a whole lot different. This overestimation was probably due to a sampling issue.
It can be argued that white voters without college degrees were sampled less, since, like Hispanic and African American voters, they respond less to telephone and internet surveys. Washington Post also noted that these voters claimed to support Trump more when interviewed by robo-calls as opposed to saying they supported Clinton when asked on live interviews. The biggest issue however is that this discrepancy is not taken into account by weighting their responses more. We believe that sampling results should be weighted in a proportion that accurately describes the target population. If these weights are not put into place to overcome coverage errors in a sample, then the result would be skewed and gleaning accurate information would be nigh impossible.
Finally, the double digit percentage of undecided electorate leading up to the election was a huge issue that the news media did not take into account. Up to 13% of people polled were undecided two weeks before the election! Most of them broke for Trump, and mostly in swing states. The media should not have endorsed such strong leads for Clinton when Trump was only around 1 margin of error behind, with a large amount of an undecided electorate, and volatile polls.
It is interesting that this happened again after everyone was up in arms about the Brexit polls. They had a similar polling sample error and the Leave campaign was only one margin of error behind Remain. However, most people did not want to believe that Leaving was possible, and so selectively read the data in support for Remain. A similar thing could have happened here in regards to people’s, and the media’s, perception of Donald Trump odds in the election.
With the European elections and Congressional elections coming up in the near future, as well as the popularization of polls in modern media culture, we believe understanding their methodologies remains a valuable investment.
Until next time,
The SaberSmart Team