  Thompson sampling is one approach for Multi Armed Bandits problem and about the Exploration-Exploitation dilemma faced in reinforcement learning. Challenge in solving such a problem is that we might end up fetching the same arm again and again. Bayesian approach helps us solving this dilemma by setting prior with somewhat high variance.


On multivariate Gaussian

Formulas Formula for multivariate gaussian distribution Formula of univariate gaussian distribution Notes: There is normality constant in both equations Σ being a positive definite ensure quadratic bowl is downwards σ2 also being positive ensure that parabola is downwards   On Covariance Matrix Definition of covariance between two vectors: When we have more than two variable…