## Probability Distribution

We have learned various probability distribution during high school and  engineering courses. However at times we forget them, so here I am providing simple practical scenarios for each distribution with no theories involved.

### Bernoulli Distribution

• When the random variable has just two outcomes
• Probability of Drug/Medicine will be approved by government is p = 0.65
• Probability that it will not approve is 0.35
• Below formula works when we have probability available, in real life we estimate them from data :
• Mean = p
• Variance (Sigma Square) = p*(1-p)
• Parameters : p
• Probability evaluation P(x|params) = p if x = 1, (1-p) if x = 0
• MLE : p = n/N, where n = no of time 1 observed , N = no of experiments

### Binomial Distribution

• When you perform the Bernoulli experiment multiple times and want to see how many times certain outcome appears.
• For example you flip a coin(fair/biased) 10 time and probability that head will appear for x (1, 2, …..10) times.
• Another more practical example :
• Suppose oil price can increase by 3 bucks or decreased by 1 buck each day
• Probability of increasing p = 0.65, and that of decreasing = 0.35
• What price can we expect after three days
• Note (Increase, Increase, Decrease) and (Increase, Decrease, Increase) will give same price.
• From another point of view it count no of successes in an experiment :
• No of patient responding to treatment
• Binary classification problem
• Below formula works when we have probability available, in real life we estimate them from data :
• n = no of times experience is performed
• Mean = n*p
• Variance (Sigma Square) = n*p*(1-p)
• Example of binomial used in modeling :
• Parameters : n, p
• Probability evaluation P(x|params) = nCx * p^x * (1-p)^(1-x)
• MLE
• n = no of samples = N
• p = n/N where n = no of successes
• Interestingly MLE for binomial and multinomial distribution is very simple

### Normal Distribution

• Very popular distribution
• Observed very often because of central limit theorem (CLT)
• Example :
• % change in a stock price of google from a previous day
• Heights and weights of persons
• Exam scores
• It is good to remember empirical numbers for normal distribution :
• 68 % – one standard deviation
• 95 % – two standard deviation
• 99.7 % – three standard deviation
• We use Z score as a distance in the unit of standard deviation from mean
• Parameters : μ, σ
• Probability estimation P(x|params) = 1/sqrt(2*pi*sigma^2) * exp(-(x-μ)^2/(2*sigma^2))
• MLE :
• μ = average (x)
• σ = sqrt((x – μ)^2/N-1)

### Poisson Distribution

• mean = variance = lambda (average no of events)
• For a fix region if we know the average no of events, it helps formulate probability for no of events.
• PDF(Probability Distribution Function) is a skewed curve
• There is just one parameter (lambda)
• While normal has two parameters (u and sigma)
• Bernoulli has just one parameter (p)
• Binomial has two parameters (n and p)
• Poisson regression example :
• Poisson distribution can be derived as special case of binomial distribution as n -> ∞
• Parameters : λ
• Probability Estimation P(x|params) = λ^x * e^(-λ) / x !
• Which is probability of observing x succeesses
• Or probability of observing x events
• MLE : λ = average (x)

### Multi-nomial Distribution

• Binomial distribution has two parameters : n, p
• Multi-nomial distribution has n, p1, p2, p3, p4 (, . . . .pn)
• Probability estimation :
• P(x1, x2, x3 | n, p1, p2, p3) = n!/(x1! x2! x3!)  * p1^x1 * p2^x2 * p3^x3
• Above is probability of observing event1 x1 times, event2 x2 times
• In binomial we estimate probability of success x times
• From that we can easily determine probability of failure
• Parameter estimation
• n = No of samples (easy)
• p1 = x1/n, p2=x2/n, p3 = x3/n
• Interestingly MLE for binomial and multinomial distribution is very simple
• Derivation of above is constrained optimization problem, solved using Lagrangian 
• Wikipedia
• For example, it models the probability of counts for rolling a k-sided die n times.
• When k is 2 and n is 1, the multinomial distribution is the Bernoulli distribution.
• When k is 2 and n is bigger than 1, it is the binomial distribution.
• When k is bigger than 2 and n is 1, it is the categorical distribution.
• It is k dimensional distribution
• It is joint distribution of k variables

### T Distribution

• It has just one parameter called df (Degrees of Freedom)
• mean = 0
• std. deviation= sqrt(df/(df-2))
• As df increases it moves more and more toward standard normal curve
• In general it is more wider than bell curve.
• Reason being from above formula std. deviation is always greater than 1
• For standard bell curve std. deviation = 1
• Area under t distribution is 1
• Parameter : df
• Probability Estimation P(x | params) = check Wikipedia
• MLE : df = N -1 where N is no of samples

## Fitting the Distribution?

Fitting the distribution means, we are using some distribution as the model and we want to estimate the parameters. In case of Gaussian/Normal we estimate u and sigma, in case of poisson we estimate lambda.

## What is probabilistic models ?

Models that propagate uncertainty of input to target variables are probabilistic models. Examples are :

• Regression
• Probability Trees
• Monte Carlo Simulations
• Markov chains

## Further Reference :

 : stats.stackexchange

 : MLE for various distributions : https://onlinecourses.science.psu.edu/stat504/node/28/