The following is part of a on-going collection of Jupyter notebooks. The goal being to have a library of notebooks as an introduction to Mathematics and technology. These were all created by Gavin Waters. If you use these notebooks, be nice and throw up a credit somewhere on your page.

Rolling a fair D-3

Imagine you have a dice with 3 sides, you know the dice is fair but you would like to figure out what is the probability space of the outcome. One way to do it, is just by assigining each of sides 33% and call it a day.

Another way to do this is by rolling the dice for a number of times and finding the culmative proabilities for each side.

As you can see from above the probabilities are definately not at 33%

In fact the probabilities are..

Another way to look at this by keeping track of the rolling probabilities.

This would get very clumbersome, but looking at a graph would seem to be a better option.

Notice that for each roll of the dice, the probability will always equal one. Now, the probability will eventually go to 0.33, or at least it should, given that our random generator is uniform. If we allow the simulator to run for 10,000 rolls of the dice, we can see that the proabailities converge.

As you can see, if given enough time, you can get a fairly accurate proability. Here is the problem, what if this was a trial of some kind. 10,000 trials would be nice but expensive, also this is a very discrete answer with only 3 outcomes. What if our choices were on a continuous spectrum?

Enter Markov Chain Monte Carlo (MCMC) methods

I am just going to introduce you to the methodology about this, the theory is for another notebook.

The question I have is, "Can we generate a continous probability space given some discrete data points we have collected."

Markov Chain Monte Carlo methods is a process that we can set loose random sample walks and determine probability spaces from it. First we will consider the fair dice method, then we shall use the finite data collection example.

First let us assume that there is an equal 33% chance of obtaining any number. Let $P(x_1)$ be the probability of rolling a $1$. The main idea of MCMC is to randomly choose another point in the probability space and then weight the decision on wether to move there or not. When you have the weight, randomly choose whether you move or not, based upon that weight, and then repeat.

So for our example, we first choose a starting probabilty point in our proability space. Since we only have 3 dimensions and they all have to add up to one, I can simply represent this probability space in a 2-D space. So the x-axis will represent $P(x_1)$ and the y-axis will be the $P(x_2)$. We know that $P(x_3)=1-P(x_1)-P(x_2)$

First our starting point,

Since we are walking about our two dimensional proability space, we need to able to choose a random $x_1$ and $x_2$ measurement. Lets make it normal with mean of zero and standard deviation of 0.2.

Now we always have to check to see wether or not the jump will place us ourside of our probability space.

Now, what we need is to place a weight on whether or not we should jump. Usually this would be "Given our data, is it more probable that the place we are going to go to is more likely than not"

If it is, then lets find the probability that that happens and roll a random number to see it we move.

If not, lets stay where we are and not move.

The trinomial joint probability mass function is the following $$f(x_1,x_2)=P(X=x_1,Y=x_2)=\dfrac{n!}{x_1!x_2!(n-x_1-x_2)!} p^{x_1}_1 p^{x_2}_2 (1-p_1-p_2)^{n-x_1-x_2}$$

Where there is n experiments, with $x_1$ outcomes with associated $p_1$ probability

Because this is uniform, we can just assume $n= x_1+x_2+x_3$ trials with $x_1=x_2=1-x_1-x_2$ outcomes.

This helps because then our probability to move reduces quite a bit.

$$ P = \left(\dfrac{p_1p_2(1-p_1-p_2)}{p^*_1p^*_2(1-p^*_1-p^*_2)}\right)^{x_1}$$

For our example lets say that there was 100 trails with $x_1=33$

Now we have to put it all together and record the path

As you can see from above, the random walk started and then moved only if it was benificial to do so. This is only 10 steps, if we extend to 100 you should be able to see some convergengce

The graph seems to be converging to our projected probability of 33%, lets run out to 1,000 samples and also show the other probabilities

While this is interesting that the samples may converge, there is a theorem out there that shows that this random sampling will converge to the sample distributions of 33% each. We can see this by plotting a KDE distribution plot of the solution of the samples for each axis

Lets look at the random walks in a 2-d probability space.

By bumping it up to 100,000 samples or higher, we get a normal distribution about our desired probability

It also does not matter where we start in our probability space, eventually the sample will converge.

An Example with unfair dice

Consider that you rolled a 3-d 30 times and came up with the following data:

From first impressions, we can see that the proabilities are not equally uniform(hopefully)

We can change our previous code to take in this data and generate probability distributions using MCMC samples. it should not really matter where our chain starts.

Putting all this code in one cell.

You should notice that the standard deviation of these probability distributions are quite high.

This is primarily because there is not enough data points to have much convergence on our samples. We are only looking at 30 trials.

If we use the same proportions but increase the trials we should get faster convergence of our samples and thus smaller standard deviation of our probability distributions.