For those of you that have followed my blog in the past year and a half, one statistical technique that you may have noticed I commonly use is Monte Carlo simulation. While I usually skim over the basics of Monte Carlo simulation to get to the meat of my analysis, I want to take the time in this post to delve into this method a little more deeply, and show by example, the immense power of the Monte Carlo method.
Monte Carlo simulation is a type of probability simulation used by companies to understand the impact of risk and uncertainty in financial, project management, cost, and other forecasting models. It is also a major strategy in decision analytics. One of the drawbacks with trying to predict the future is that you can't know with certainty what the actual value will be...
How does it work?
In a Monte Carlo simulation, a random value is selected from a distribution, calculated from historical knowledge or expectations. The prediction model is calculated based on this random value. The result of the model is recorded, and the process is repeated.
A typical Monte Carlo simulation calculates the model hundreds or thousands of times, each time using different randomly-selected values. When the simulation is complete, we have a large number of results from the model, each based on random input values. These results are used to describe the likelihood, or probability, of reaching various results in the model and can be summarized by a confidence interval, or how often certain results appeared.
For example, when I predicted the various results of the World Series and the Super Bowl, I used historical data from the regular season to create a distribution of runs/points scored and allowed, as a proxy for offense and defense. I then used these randomly-selected values as the input values into a model to predict the winner, a simple model averaging the values and then comparing them against each other. I did this a hundred thousand times to determine a confidence interval for the likely probability of the overall winner.
I found it oddly serendipitous that SMBC posted this comic the day after I had finished my code and outline for this post. Essentially, by inscribing a circle entirely within a square, and throwing enough darts at it, or in this case shooting arrows, one can actually estimate the value of pi!
You remember the famous mathematical constant pi, perhaps from school as a way to find the circumference or the area of a circle. It's hard to pinpoint who, exactly, first became conscious of the constant ratio between the circumference of a circle and its diameter, though human civilizations seem to have been aware of it as early as 2550 BC. In most modern cases, the value of pi is often rounded to 3.14 for simplicity.
However, pi is in fact what mathematicians call an irrational number, meaning that it is a number that cannot be written as a simple fraction, no not even 22/7. In other words, pi has an infinite, non-repeating number of decimal digits. In addition to irrational numbers, mathematicians have given other quaint names to specific numbers in number theory. For example, evil numbers are non-negative integers that have an even number of 1s in its binary expansion, like 3, 5, 6, or 9. What is the opposite of an evil number?
Nope, numbers that are not evil are called odious numbers.
The fact that pi is infinite has not stopped mathematicians or computer scientists from trying to calculate it to as many digits as possible. For instance, one popular way to show off computing power is to calculate pi to as many digits as possible, creating a string that starts with 3.14 and continues to the nth digit. The more digits one wants, the more computations it takes.
However, it turns out all one needs is a square, a circle, and some random numbers to get our own estimate of the infinite number.
If we inscribe a circle perfectly within a square, we have a basic dartboard and backboard. The problem is easiest when the circle is centered on (0,0), and has a radius of 1. The circle also has an area of pi. This layout also means the square has a side length of 2, and an area of 4. Therefore, the ratio of the area of the circle to the area of the square is pi : 4, or pi/4.
Here is what I mean by that arrangement:
If we repeatedly throw darts at the board and they are to randomly land within the bounds of the square, some will land on the square backboard and some will land on the circular dartboard. If these throws are truly random, then the number of darts that land on the dartboard, divided by the number that we threw in total, will be in the ratio of the areas, pi/4. Therefore, if we multiply that number by 4, we have our estimate of pi!
I broke this down in R, a statistical programming language.
First, how do we know if a dart landed inside the circle?
If the distance of the point from the origin, (0,0), is greater than 1, the radius of the circle, then the dart must have landed outside of the circle.
Next, I wrote a function, estimatepi, that takes a single input N, generates N pairs of uniform random numbers (darts) and uses insidecircle to produce an estimate of pi based on how many darts landed inside the circle as described above. It also includes an optional graph, to visualize the location of the dart strikes.
So how does it do? Well let’s throw some darts!
Or for those who want a little slower game of darts:
As you can see, the more darts we throw, the more area on our board we cover, meaning a better estimate right?
Well, I estimated π for N = 100 to 50,000 in increments of 50 and recorded the
estimate, its standard error and the upper and lower bounds of the 95% CI. Below is a partial table, showing estimates from 1,000 to 10,000, by 500. The change does not seem that great! In fact, in this small subset, out best N is 6,500 which is low if we expect our estimate to get better as N tends to infinity.
Many simulations are needed to see the minuscule average improvement. This full data output can be found on my GitHub. However, here is a graph depicting my sample estimate of pi as N increases, versus the actual value of pi. As the number of simulations increase, so too does the precision of our estimate! However, it looks like the precision reaches an upper limit, and does not improve regardless of how many more darts we throw.
Finally, I found the best value of N which resulted in the closest estimate to pi. This N turned out to be 24,450, again smaller than the maximum number of darts we threw.
I then estimated pi 500 more times with this same value of N. Since the darts are random for each iteration, each estimate differed from the last. With the help of a statistical phenomenon called the Central Limit Theorem, I created a dataset of potential estimates of pi, modeled by the Normal distribution. Here is the histogram of all 500 estimates of pi with this N, with its Normal Approximation overlaid.
Some interesting observations result. The sample standard deviation of all 500 point estimates was essentially the same as the standard error from the first estimate of pi with this value of N. Also, almost 95% exactly of the 500 estimates fell within the 95% CI initially calculated, 3.12 to 3.16. Math works!
I hope you found this application of Monte Carlo simulation interesting! With just a little geometry, and a whole lot of computing power, we used a myriad of darts to estimate pi. If only Archimedes had access to R and a computer back in Ancient Greece…
All of my data, visualizations, and code used and produced can be found on my GitHub. What problems can use Monte Carlo simulations to help solve? Have you seen this technique before? Let me know below in the comments! Until next time!
The SaberSmart Team.
There are many other articles written out there on this topic. Here are a few I found helpful in writing this post:
R-Bloggers, Analytics-Link, Emory CS, RiskAMP
And finally, here is an amazing accomplishment where a team of engineers created a dartboard where you get a bullseye every time you throw a dart thanks to some engineering and motion tracking. Not sure if it was run on Raspberry Pis, but go watch it anyway. It will blow your mind.
Automatic Bullseye, MOVING DARTBOARD