We have learned various probability distribution during high school and engineering courses. However at times we forget them, so here I am providing simple practical scenarios for each distribution with no theories involved.

### Bernoulli Distribution

- When the random variable has just two outcomes
- Probability of Drug/Medicine will be approved by government is p = 0.65
- Probability that it will not approve is 0.35

- Below formula works when we have probability available, in real life we estimate them from data :
- Mean = p
- Variance (Sigma Square) = p*(1-p)

- Parameters : p
- Probability evaluation P(x|params) = p if x = 1, (1-p) if x = 0
- MLE : p = n/N, where n = no of time 1 observed , N = no of experiments

### Binomial Distribution

- When you perform the Bernoulli experiment multiple times and want to see how many times certain outcome appears.
- For example you flip a coin(fair/biased) 10 time and probability that head will appear for x (1, 2, …..10) times.
- Another more practical example :
- Suppose oil price can increase by 3 bucks or decreased by 1 buck each day
- Probability of increasing p = 0.65, and that of decreasing = 0.35
- What price can we expect after three days
- Note (Increase, Increase, Decrease) and (Increase, Decrease, Increase) will give same price.

- From another point of view it count no of successes in an experiment :
- No of patient responding to treatment
- Binary classification problem

- Below formula works when we have probability available, in real life we estimate them from data :
- n = no of times experience is performed
- Mean = n*p
- Variance (Sigma Square) = n*p*(1-p)

- Example of binomial used in modeling :
- Parameters : n, p
- Probability evaluation P(x|params) = nCx * p^x * (1-p)^(1-x)
- MLE
- n = no of samples = N
- p = n/N where n = no of successes
- Interestingly MLE for binomial and multinomial distribution is very simple

### Normal Distribution

- Very popular distribution
- Observed very often because of central limit theorem (CLT)
- Example :
- % change in a stock price of google from a previous day
- Heights and weights of persons
- Exam scores

- It is good to remember empirical numbers for normal distribution :
- 68 % – one standard deviation
- 95 % – two standard deviation
- 99.7 % – three standard deviation

- We use Z score as a distance in the unit of standard deviation from mean
- Parameters : μ, σ
- Probability estimation P(x|params) = 1/sqrt(2*pi*sigma^2) * exp(-(x-μ)^2/(2*sigma^2))
- MLE :
- μ = average (x)
- σ = sqrt((x – μ)^2/N-1)

### Poisson Distribution

- mean = variance = lambda (average no of events)
- For a fix region if we know the average no of events, it helps formulate probability for no of events.
- PDF(Probability Distribution Function) is a skewed curve
- There is just one parameter (lambda)
- While normal has two parameters (u and sigma)
- Bernoulli has just one parameter (p)
- Binomial has two parameters (n and p)

- Poisson regression example :
- Poisson distribution can be derived as special case of binomial distribution as n -> ∞
- Parameters : λ
- Probability Estimation P(x|params) = λ^x * e^(-λ) / x !
- Which is probability of observing x succeesses
- Or probability of observing x events

- MLE : λ = average (x)

### Multi-nomial Distribution

- Binomial distribution has two parameters : n, p
- Multi-nomial distribution has n, p1, p2, p3, p4 (, . . . .pn)

- Probability estimation :
- P(x1, x2, x3 | n, p1, p2, p3) = n!/(x1! x2! x3!) * p1^x1 * p2^x2 * p3^x3
- Above is probability of observing event1 x1 times, event2 x2 times
- In binomial we estimate probability of success x times
- From that we can easily determine probability of failure

- Parameter estimation
- n = No of samples (easy)
- p1 = x1/n, p2=x2/n, p3 = x3/n
- Interestingly MLE for binomial and multinomial distribution is very simple
- Derivation of above is constrained optimization problem, solved using Lagrangian [2]

- Wikipedia
- For example, it models the probability of counts for rolling a k-sided die n times.
- When k is 2 and n is 1, the multinomial distribution is the Bernoulli distribution.
- When k is 2 and n is bigger than 1, it is the binomial distribution.
- When k is bigger than 2 and n is 1, it is the categorical distribution.

- It is k dimensional distribution
- It is joint distribution of k variables

### T Distribution

- It has just one parameter called df (Degrees of Freedom)
- mean = 0
- std. deviation= sqrt(df/(df-2))
- As df increases it moves more and more toward standard normal curve
- In general it is more wider than bell curve.
- Reason being from above formula std. deviation is always greater than 1
- For standard bell curve std. deviation = 1

- Area under t distribution is 1
- Parameter : df
- Probability Estimation P(x | params) = check Wikipedia
- MLE : df = N -1 where N is no of samples

## Fitting the Distribution?

Fitting the distribution means, we are using some distribution as the model and we want to estimate the parameters. In case of Gaussian/Normal we estimate u and sigma, in case of poisson we estimate lambda.

## What is probabilistic models ?

Models that propagate uncertainty of input to target variables are probabilistic models. Examples are :

- Regression
- Probability Trees
- Monte Carlo Simulations
- Markov chains

## Further Reference :

[1] : MLE for various distributions : https://onlinecourses.science.psu.edu/stat504/node/28/