Every Friday, the political and sports analytics website FiveThirtyEight offers up problems related to the things we hold dear around here, math, logic and probability, in their popular Riddler column. When we get the chance, we enjoy checking out these puzzlers, and the solutions that oftentimes are the exact opposite of what you initially think!
This week, the Riddler offered up a problem relating to the MLB postseason and winning percentages, so we know we had to take a crack at it.
Have you heard of machine learning but haven’t found a way to implement any algorithms? Do you use R for all of your machine learning models and are wondering how to scale and deploy your models to production quickly and efficiently? Do you solely use R, or caret, for your machine learning models and want to diversify your skillset?
No judgement if you do, but let me introduce you to a, in my opinion, superior way to craft and deploy machine learning models using Python and scikit-learn.
A common dream of most baseball aficionados is to visit every ballpark, stepping inside with their own two feet, feeling the history, and seeing every team live. I’m no different; and can already cross off almost two-thirds of the current ballparks off my list (I’m coming for you soon East Coast). After seeing this article on an itinerary that would allow travelers to see every single national park in the 48 contiguous states on a road trip without wasting any time, I wondered how this methodology worked, and how it could be applied to a ballpark journey. As a side note, Randy Olson has also organized the ultimate US road trip and the best cross-Canada journey. I highly recommend him as a follow on Twitter as well!
For those of you that have followed my blog in the past year and a half, one statistical technique that you may have noticed I commonly use is Monte Carlo simulation. While I usually skim over the basics of Monte Carlo simulation to get to the meat of my analysis, I want to take the time in this post to delve into this method a little more deeply, and show by example, the immense power of the Monte Carlo method.
Monte Carlo simulation is a type of probability simulation used by companies to understand the impact of risk and uncertainty in financial, project management, cost, and other forecasting models. It is also a major strategy in decision analytics. One of the drawbacks with trying to predict the future is that you can't know with certainty what the actual value will be...
Last week, we talked about the differences between building models for statistical inference versus building models for predictive modeling. Predictive modeling has not always been part of the statistics community. One person who is quite responsible for bridging the gap between the computer science community and the statistics community is Leo Breiman, known colloquially as the Father of CART and Random Forests. In his 2001 article, "Statistical Modeling: The Two Cultures", he articulated his views on the difference between the modern, at least for his time, statistics community and the machine learning community. Here are our thoughts on this ground breaking paper, and the topic in general.