Binomial distribution is a probability distribution that summarises the likelihood that a variable will take one of two independent values under a given set of parameters. The distribution is obtained by performing a number of Bernoulli trials.
A Bernoulli trial is assumed to meet each of these criteria :
There must be only 2 possible outcomes.
Each outcome has a fixed probability of occurring. A success has the probability of p, and a failure has the probability of 1 - p.
Each trial is completely independent of all others.
The binomial random variable represents the number of successes(r) in n successive independent trials of a Bernoulli experiment.
Probability of achieving r success and n-r failure is :
p^r * (1-p)^{n-r}
The number of ways we can achieve r successes is :
\frac{n!}{(n-r)!\ *\ r!}
Hence, the probability mass function(pmf), which is the total probability of achieving r success and n-r failure is :
\frac{n!}{(n-r)!\ *\ r!}\ *\ p^r * (1-p)^{n-r}
An example illustrating the distribution :
Consider a random experiment of tossing a biased coin 6 times where the probability of getting a head is 0.6. If 'getting a head' is considered as 'success' then, the binomial distribution table will contain the probability of r successes for each possible value of r.
r
0
1
2
3
4
5
6
P(r)
0.004096
0.036864
0.138240
0.276480
0.311040
0.186624
0.046656
This distribution has a mean equal to np and a variance of np(1-p).
Using Python to obtain the distribution :
Now, we will use Python to analyse the distribution(using SciPy) and plot the graph(using Matplotlib).
Modules required :
SciPy:SciPy is an Open Source Python library, used in mathematics, engineering, scientific and technical computing.
Installation :
pip install scipy
Matplotlib:Matplotlib is a comprehensive Python library for plotting static and interactive graphs and visualisations.Installation :
pip install matplotlib
The scipy.stats module contains various functions for statistical calculations and tests. The stats() function of the scipy.stats.binom module can be used to calculate a binomial distribution using the values of n and p.
Syntax : scipy.stats.binom.stats(n, p)
It returns a tuple containing the mean and variance of the distribution in that order.
scipy.stats.binom.pmf() function is used to obtain the probability mass function for a certain value of r, n and p. We can obtain the distribution by passing all possible values of r(0 to n).
Syntax : scipy.stats.binom.pmf(r, n, p)
Calculating distribution table :Approach :
Define n and p.
Define a list of values of r from 0 to n.
Get mean and variance.
For each r, calculate the pmf and store in a list.
Code :python3
fromscipy.statsimportbinom# setting the values# of n and pn=6p=0.6# defining the list of r valuesr_values=list(range(n+1))# obtaining the mean and variance mean,var=binom.stats(n,p)# list of pmf valuesdist=[binom.pmf(r,n,p)forrinr_values]# printing the tableprint("r\tp(r)")foriinrange(n+1):print(str(r_values[i])+"\t"+str(dist[i]))# printing mean and varianceprint("mean = "+str(mean))print("variance = "+str(var))
Code: Plotting the graph using matplotlib.pyplot.bar() function to plot vertical bars. python3
fromscipy.statsimportbinomimportmatplotlib.pyplotasplt# setting the values# of n and pn=6p=0.6# defining list of r valuesr_values=list(range(n+1))# list of pmf valuesdist=[binom.pmf(r,n,p)forrinr_values]# plotting the graph plt.bar(r_values,dist)plt.show()
Output :
When success and failure are equally likely, the binomial distribution is a normal distribution. Hence, changing the value of p to 0.5, we obtain this graph, which is identical to a normal distribution plot :