Wisdom of the crowd is more than the ill-fated television show of the same name. We have all heard the old adage, many minds are better than one, and this can be seen abundantly in nature. Across countless species, nature show us that social creatures, when working together as unified systems, can outperform the vast majority of individual members when solving problems and making decisions. Examples include bees, fish, and ants. It should come as less of a surprise, then, that humans working together in tandem can efficiently converge decision problems, and even make accurate predictions. This theory of collective intelligence has been studied and analyzed for the past century to try and validate how, when, and in what circumstances it accurately and inaccurately predicts. Research studies have concluded that wisdom of the crowd is best when an intellectually diverse cohort of experts answer a predefined question and focuses on optimization around medians or brainstorming. It is worst when the crowd thinks the same, includes many non-experts, faces a spectrum of answers, and whose conclusions are not the foundation for future decisions on which error could be compounded. According to IBM, today, when analysts speak of the wisdom of crowds, they talk in particular of human swarming, an approach that uses real-time feedback loops from groups of users to arrive at accurate insights. Indeed, swarming has sometimes out-predicted large groups of experts who rely on non-swarm methodology. Take, for example, the work of researchers working with Unanimous AI, who asked groups of people to take part in various intellectual tasks. Among other things, members of participant groups attempted to predict the winners of the NFL playoffs, the Super Bowl, the Golden Globes, the Oscars, the NBA finals and the Stanley Cup. In every case, the swarm outperformed not only individuals within the swarm, but also the experts within the swarm. I looked more into Unanimous AI and their research publications as sources for this post. One white paper that struck my attention was titled Amplifying Prediction Accuracy using Swarm A.I, I highly recommend reading the whole thing, the methodology is quite fascinating. Basically, Unanimous AI has established a system where pseudo-random people around the globe, as they have opted in, make predictions, such as who would win a soccer match between Arsenal and Watford. The study concluded that while an individual picks correctly only in 55% of the 50 games looked at, the human swarm as a collective picked 72% right. For comparison, the BBC’s supercomputer only picked 64% correctly. They are actively looking for volunteers, so if this sounds interesting, why not participate in a swarm? The swarm provides predictions for a specific game slate, say the baseball games on Friday, August 24. Before joining the swarm, you fill out a survey, where you enumerate your own personal picks, and provide confidence by deciding on a virtual wager. If you are more confident in a pick, you would wager more virtual money. Then the human swarming starts. Everyone uses magnets, with a pull dictated by the inverse square law, to pull a puck towards one point on the hexagon, relating to a choice. The more people who pull towards a specific option, the quicker the swarm converges to an answer. The difference in run expectancy can be compared to confidence. By picking Cleveland by 2+, the swarm is more confident on a win by the Indians, than a pick of Cleveland by 1. Here is a full gif of the swarm making a pick between Atlanta and Miami: Unfortunately, the swarm did not do great for the games on 8/24, as they only got 5 of the 14 picks correct. They did get their pick of the night (NYY over BAL) right though! It should also be reminded that one game slate is a small sample size. In my opinion, this methodology will definitely impact how decisions are made in the future, and I am excited to see how this technique expands and influences the data science community! Personally, I find the application of wisdom of crowds to sports predictions to be quite fascinating, and have implemented something similar in my own research. I developed a new way to calculate MLB playoff odds by using methods from Empirical Bayes, which requires a prior distribution defined by a mean and variance. For the mean of my priors, I took the average of various expected wins from casino over/unders, Fangraphs, and Baseball Prospectus. However, I think it is time that I explained why specifically I took that aggregated average and also how it can be improved. First I gathered as many predictions for the 2017 MLB season that I could. Luckily, on the internet, everything lasts forever. I found five casino over/unders from Oddshark, Bovada, CRIS, Westgate, and Atlantis. I naturally gathered Fangraphs’ and PECOTA’s preseason projections. Bleacher Report had their experts do two predictions, one right after the Super Bowl with hardly any information available (BleacherReport_Wins1) and one closer to Spring Training (BleacherReport_Wins). USA Today also provided expert picks. For sake of a baseline, I also dug up the 2016 final season standings, as well as the 2016 Pythagorean record from Baseball-Reference for each team. I calculated the average and median of all twelve of the predictions and then calculated both the mean absolute error (MAE) and root mean squared error (RMSE) from the actual 2017 wins for each team. Brief statistics note: The RMSE penalizes larger errors more than the MAE, as they are squared, which makes the RMSE always higher than the MAE. If you want to read more about these two error metrics, here you go. Back to the results: The results are fairly obvious. The 2016 wins and the early Bleacher Report predictions do the worst by both error metrics. Fangraphs and PECOTA top both errors, while the casino predictions also minimize the RMSE. Perhaps less obvious is that the median performs better than the average. This is because the average is sensitive to outliers, while the median is more robust. These results encapsulates a drawback of wisdom of the crowd. If the crowd is small, not diverse, or are not knowledgeable about the subject, their response can be heavily skewed. For example, if you ask everyone in Alabama who they think will win the National Championship this year, you will probably get a biased result. The 2016 wins and the three “expert” predictions from Bleacher Report and USA Today skew our wisdom of the crowd result because our crowd is so small and those win predictions are not very good in comparison to the professional picks from casinos, Fangraphs, and PECOTA, who all have money and/or their business riding on the accuracy of their predictions. To emphasize my last point, compare the two charts below. One compares Fangraphs’ and the way-too-early Bleacher Report predictions and the other depicts Fangraphs’ vs PECOTA’s. The closer to the dashed line, the more accurate the prediction. While Fangraphs and PECOTA almost balance each other out, with one’s predictions more accurate than the other for one team and then reversing for the next, the early Bleacher Report projections are just way out in left field. By eliminating the poor predictions from our small crowd, and only using an expert panel, we can improve our median and average. Here are just the five casino predictions, Fangraphs, and PECOTA, with an updated median and average. Our median actually has the third lowest MAE, with an error better than that of PECOTA! The median and average also perform well in the RMSE metric, coming in fourth and fifth. When the number of participants is small, using a knowledgeable crowd is imperative. Now you may ask why not just use Fangraphs’ predictions for my prior mean since they perform the best in the combination of MAE and RMSE? Or, why not drop Oddshark and Atlantis, who perform the worst of all the casinos’ over/unders (I don’t endorse gambling, but…)?
My response is that without the results of the 2017 season, I would have had no idea how accurate their predictions were. Additionally, what if Fangraphs were just lucky in 2017 and Oddshark/Atlantis were unlucky? I can’t assume that Fangraphs will have the best predictions every year and those two casinos will always be worse than the other three. To reduce the noise consistently, then, it is best to use the median of a small and knowledgeable crowd. If I had the resources, I expect that a sample from thousands of participants, creating a large and diverse crowd, would actually outperform all of the predictions above. However, that is, for now, outside of my scope. Interestingly, SBNation polls their large audience to determine the best MLB over/under prediction before their season starts. In 2016, there were 2,885 votes in their poll, and more than 53 percent picked a winner. However, in 2017, the SBNation wisdom of the crowd did not do as hot, with only 51% picking a winner. If we expect to only get 50% of picks on average though, then these results are still better than a coin flip. What do you think about the phenomena of Wisdom of the Crowd? Have you seen it in action, perhaps in a Twitter poll or other social media outlet? I would love to continue the conversation in the comments! As always, our code and data can be found on our Github if you want to reproduce the results yourself. And you may have noticed this already, but I am aware that I used the average instead of the median in my past posts on this subject, but hey, (hey! what a wonderful kind of day!), it looks like I learned something new too! That only means my future predictions will be more accurate right? The SaberSmart Team P.S. If you enjoyed this article, and need something off Amazon anyway, why not support this site by clicking through the banner at the bottom of the page? As a member of the Amazon Affiliates program, we may receive a commission on any purchases. All revenue goes towards the continued hosting of this site.
Comments
|
Archives
August 2019
Categories
All
|