$$ \frac{P(B\cap A)}{P(A)}=P(B|A) $$
so, ...
$$ P(A\cap B) = P(B\cap A) $$$$ \implies $$$$ \frac{P(A\cap B)}{P(A)}=P(B|A) $$$$\implies $$$$P(A\cap B) = P(B|A)P(A) $$Hence,
$$ P(A|B) = \frac{P(B|A)P(A)}{P(B)} $$Or more conventially expressed,
$$ P(\theta | D) = \frac{P(D|\theta)P(\theta)}{P(D)} $$Here, $\theta$ could be thought as the probability of a belief/hypothesis, and $D$ as data set
Moreover this can be expressed as
$$ \text{posterior} = \frac{\text{likelihood} \times \text{prior}}{\text{ marginal likelihood}} $$The prior , or $P(\theta)$, distributon is what the probability of our beliefs/hypothesis, before we start seeing the data
The likelihood, or $P(D| \theta)$, is the probability of the data given the belief/hypothesis that we have.
The posterior, $P( \theta|D)$, distribution of our beliefs/hypothesis after we see the data
And the marginal likelihood, $P(D)$ or normalized likelihood is the probability of the data under any circumstance
We could re-write this as
$$ P(\text{Male | Diabetic}) = \frac{P(\text{Diabetic | Male})\times P(\text{Male})}{P(\text{Diabetic}) }$$So our
prior belief is that the robber is Male
likelihood whats the probability that they are Diabetic Given Male
marginal likelihood whats the probability of being Diabetic
posterior the probability distribution that they are Male Given Diabetic
This could be seen as that given prior information, and the likelihood of it happening then that effects our Posterior view, scaled by some sort of normalizing constant
This scaling does not depend on our beliefs, its purely the distribution of the data, hence we can say that our posterior is proportional to the prior times likelihood
OR since the margnial likelihood (i.e probability of observed data) is not dependent on $\theta $
$$ \text{Posterior} \propto \text{likelihood}\times \text{Prior} $$Given a certain prior and likelihood, we should be able to get a posterior distribution which is proportional to the true posterior.
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
from scipy import stats
%matplotlib inline
%config InlineBackend.figure_format='retina'
First we will generate an experiment
flips = 10
Heads_prob = np.random.random() # lets figure out a weight
num_heads = stats.binom(flips,Heads_prob).rvs()
num_tails = flips - num_heads
print(num_heads, num_tails, Heads_prob)
7 3 0.48443168382353086
def nCk(n,k):
return np.math.factorial(n)/(np.math.factorial(n-k)*np.math.factorial(k))
First lets create a prior distribution, a finite one
num_prob = 100
prob1 = np.linspace(1/(num_prob+1),num_prob/(num_prob+1),num_prob)
uniform_prob = np.ones(num_prob)*1/num_prob
triangle_prob = np.minimum(prob1,(1-prob1))
triangle_prob = triangle_prob/np.sum(triangle_prob)
plt.plot(prob1,uniform_prob)
plt.plot(prob1,triangle_prob)
[<matplotlib.lines.Line2D at 0x7f9923fa18e0>]
Remember that
$$ \text{Posterior} \propto \text{likelihood}\times \text{Prior} $$OR $$ P(\theta | D) = \frac{P(D|\theta)P(\theta)}{P(D)} $$
So our likelihood is basically, the binomial of heads with flips
p_like_prob1 = nCk(flips,num_heads)*prob1**num_heads*(1-prob1)**num_tails
plt.plot(p_like_prob1,'.')
[<matplotlib.lines.Line2D at 0x7f99419df850>]
We have to find our normalized likelihood $P(D)$
p_norm = np.sum(p_like_prob1*prob1)
Now apply Bayes rule. $$ P(\theta | D) = \frac{P(D|\theta)P(\theta)}{P(D)} $$
p_posterior = p_like_prob1*triangle_prob/p_norm
plt.figure(figsize=(13,6))
plt.stem(prob1,triangle_prob,markerfmt=',')
plt.title('prior')
plt.show()
plt.figure(figsize=(13,6))
plt.stem(prob1,p_like_prob1,markerfmt=',')
plt.title('likelihood')
plt.show()
plt.figure(figsize=(13,6))
plt.stem(prob1,p_posterior,markerfmt=',')
plt.title('posterior')
plt.show()
flips = 100
num_prob = 100
Heads_prob = np.random.random() # lets figure out a weight
num_heads = stats.binom(flips,Heads_prob).rvs()
num_tails = flips-num_heads
prob1 = np.linspace(1/(num_prob+1),num_prob/(num_prob+1),num_prob)
uniform_prob = np.ones(num_prob)*1/num_prob
triangle_prob = np.minimum(prob1,(1-prob1))
triangle_prob = triangle_prob/np.sum(triangle_prob)
prior = triangle_prob
p_like_prob1 = nCk(flips,num_heads)*prob1**num_heads*(1-prob1)**num_tails
p_norm = np.sum(p_like_prob1*prob1)
p_posterior = p_like_prob1*prior/p_norm # Bayes Theorem
print(Heads_prob)
plt.figure(figsize=(13,6))
plt.stem(prob1,prior,markerfmt=',')
plt.title('prior')
plt.show()
plt.figure(figsize=(13,6))
plt.stem(prob1,p_like_prob1,markerfmt=',')
plt.title('likelihood')
plt.text(1.1, np.max(p_like_prob1)/2, 'D = %sH,%sT' % (num_heads, num_tails))
plt.show()
plt.figure(figsize=(13,6))
plt.stem(prob1,p_posterior,markerfmt=',')
plt.title('posterior')
plt.show()
0.5780159358026402
sns.displot(stats.beta(0.5,0.5).rvs(10000),kind='kde')
<seaborn.axisgrid.FacetGrid at 0x7f99257da190>
Lets create a random walk through our probability space, our walk is going to be limited between 0 and 1
p_1 = 0.51
jump=0.2
MCMCp1 = [p_1]
while len(MCMCp1)<100000:
p_walk = np.random.normal(0,jump,1)[0]
if p_1+p_walk>0 and p_1+p_walk<1:
p1jump = p_1+p_walk
else:
p1jump = p_1
prob_move = (nCk(flips,num_heads)*(p1jump)**(num_heads)*(1-p1jump)**(flips-num_heads))/(nCk(flips,num_heads)*(p_1)**(num_heads)*(1-p_1)**(flips-num_heads))
MCMCp1.append(p_1)
if min(1,prob_move)>=np.random.random():
p_1 = p1jump
plt.plot(MCMCp1,range(0,len(MCMCp1)),'.-',c='b',alpha=0.11)
plt.xlabel('probability')
plt.ylabel('sample step')
plt.xlim(0,1)
plt.show()
sns.displot(MCMCp1[500:],kind='kde')
plt.vlines(Heads_prob,0,3,color='r')
print(Heads_prob)
0.4368240876323427
flips = 10
Heads_prob = np.random.random() # lets figure out a weight
num_heads = stats.binom(flips,Heads_prob).rvs()
p_1 = 0.51
jump=0.3
MCMCp1 = [p_1]
while len(MCMCp1)<1000:
p_walk = np.random.normal(0,jump,1)[0]
if p_1+p_walk>0 and p_1+p_walk<1:
p1jump = p_1+p_walk
else:
p1jump = p_1
prob_move = (nCk(flips,num_heads)*(p1jump)**(num_heads)*(1-p1jump)**(flips-num_heads))/(nCk(flips,num_heads)*(p_1)**(num_heads)*(1-p_1)**(flips-num_heads))
MCMCp1.append(p_1)
if min(1,prob_move)>=np.random.random():
p_1 = p1jump
plt.plot(MCMCp1,range(0,len(MCMCp1)),'.-',c='b',alpha=0.11)
plt.xlabel('probability')
plt.ylabel('sample step')
plt.xlim(0,1)
plt.show()
sns.displot(MCMCp1[500:],kind='kde')
plt.vlines(Heads_prob,0,3,color='r')
plt.xlim(0,1)
print(Heads_prob)
0.447982886878346