  Thompson sampling is one approach for┬áMulti Armed Bandits problem and about the Exploration-Exploitation dilemma faced in reinforcement learning. Challenge in solving such a problem is that we might end up fetching the same arm again and again. Bayesian approach helps us solving this dilemma by setting prior with somewhat high variance. Here is the… Continue reading Thompson Sampling