This is taken from https://courses.csail.mit.edu/6.867/wiki/images/a/a7/Qp-cvxopt.pdf

Standard form

Converting to standard form

Python Code

Reference:

https://courses.csail.mit.edu/6.867/wiki/images/a/a7/Qp-cvxopt.pdf

Skip to content
# Author: architvora

optimization
# Quadratic Programming CVXOPT

Deep Learning
# Derivation of backpropogation

Deep Learning
# Derivation Of Backpropagation – 2

optimization
# SVM Solution Lagrange

Coding
# Python Module and Packages

Reinforcement learning
# Dynamic Programming for RL

Reinforcement learning
# OpenAI Gym Environment

optimization
# [Example] Lagrange Multiplier With Equality Constraints

#### Stationary Point

## Example

Reinforcement learning
# Thompson Sampling

Deep Learning
# Deep learning taking off

Notes on data science

This is taken from https://courses.csail.mit.edu/6.867/wiki/images/a/a7/Qp-cvxopt.pdf

Standard form

Converting to standard form

Python Code

Reference:

https://courses.csail.mit.edu/6.867/wiki/images/a/a7/Qp-cvxopt.pdf

Advertisements

References :

- Pattern Recognition and Machine Learning by Bishop [Page no 244]
- Andrew NG’s course by deeplearning.ai
- https://sudeepraja.github.io/Neural/

- Unlike Java python module can have multiple classes
- One module is one python file
- Generally one module is some complete functionality
- It contains multiple classes and function

- Package needs to be specified with __init__.py files
- This file can be empty
- I like that

- You can add __all__ to specify list of module inside it
- This __all__ is called when we import *
- It is author’s responsibility to keep updating this file
- In contrary java packages where formed using folder paths

- Intra-package reference
from . import echo from .. import formats from ..filters import equalizer

- This file can be empty
- One sample structure

Reference : https://docs.python.org/3/tutorial/modules.html#packages

Dynamic Programming is one of the method to solve reinforcement learning problem. It assumes that complete dynamics of MDP are known and we are interested in

- FInding value function for given policy (Prediction problem)
- Finding optimal policy for given MDP (Control problem)

There are three things :

- Policy Evaluation
- Takes Probability of actions into account

- Policy Iteration
- First evaluates policy
- Then generates new policy based on this evaluation

- Value Iteration
- Policy evaluation is time taking process
- Let’s go with just one iteration
- We don’t need to update policy in each iteration, just keep using new value function does the job
- So it is like keep updating value function (choosing action that maximizes value instead of taking probabilities) until it converges and then selecting a policy based on this optimal values function

Policy Evaluation

Policy Iteration

Values Iteration

Here are two slides from : http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/DP.pdf

References :

Open AI provides framework for creating environment and training on that environment. In this post I am pasting a simple notebook for a quick look up on how to use this environments and what all functions are available on environment object.

I have used environment available on github by Denny Britz and here are the references :

References :

https://github.com/dennybritz/reinforcement-learning

http://www.wildml.com/2016/10/learning-reinforcement-learning/

Definition of stationary point from wikipedia :

In mathematics, particularly in calculus, a stationary point or critical point of a differentiable function of one variable is a point on the graph of the function where the function’s derivative is zero. Informally, it is a point where the function “stops” increasing or decreasing (hence the name).

**Lagrange multiplier helps us to find all the stationary points**, It can be local minima, local maxima, global minima or global maxima. Once we evaluate objective function at each of these stationary point we can classify which one is local/global minima and maxima.

Thompson sampling is one approach for Multi Armed Bandits problem and about the Exploration-Exploitation dilemma faced in reinforcement learning.

Challenge in solving such a problem is that we might end up fetching the same arm again and again. Bayesian approach helps us solving this dilemma by setting prior with somewhat high variance.

Here is the code for two armed bandit. One has success probability of 40% (bandit 0) and another has 25% (bandit 1).

We are using beta distribution for deciding which arm to pull. Beta distribution has two parameter alpha and beta. Higher values of alpha, pulls distribution towards 1. Beta distribution is always confined between 0 and 1.

How we train is that for each feedback we receive we increment alpha by 1 if it was success or beta by 1 in case of failure. For choosing the arm we draw random sample from the distribution of each arm and select the arm with highest value.

And here is simulation results. We see that initially both the the armed are pulled frequently but slowly arm 1 is pulled less and less, but it is never straight away zero.

I recently started Andrew Ng’s specialization on deep learning and found these two interesting points :

One is about how performance of algorithm changes with the amount of data. Traditional algorithms have limits but Deep neural network has more advantages.

Also for the small amount of data traditional algorithms may win over neural nets with good feature engineering.

Second reason is that deep learning requires data, computation and efficient algorithms. Recent years have seen significant advancement in algorithm to increase computation efficiency. For example sigmoid to ReLU was an algorithmic change which allowed gradient to converge faster.

Ref : https://www.coursera.org/learn/neural-networks-deep-learning/home