optimization

Quadratic Programming CVXOPT

This is taken from https://courses.csail.mit.edu/6.867/wiki/images/a/a7/Qp-cvxopt.pdf

 

Standard form

qp1

Converting to standard form

qp2

Python Code

 

Reference:

https://courses.csail.mit.edu/6.867/wiki/images/a/a7/Qp-cvxopt.pdf

 

 

Advertisements
Coding

Python Module and Packages

  • Unlike Java python module can have multiple classes
    • One module is one python file
    • Generally one module is some complete functionality
    • It contains multiple classes and function
  • Package needs to be specified with __init__.py files
    • This file can be empty
      • I like that
    • You can add __all__ to specify list of module inside it
    • This __all__ is called when we import *
    • It is author’s responsibility to keep updating this file
      • In contrary java packages where formed using folder paths
    • Intra-package reference
      from . import echo
      from .. import formats
      from ..filters import equalizer
  • One sample structure structure

 

 

Reference : https://docs.python.org/3/tutorial/modules.html#packages

 

 

Reinforcement learning

Dynamic Programming for RL

Dynamic Programming is one of the method to solve reinforcement learning problem. It assumes that complete dynamics of MDP are known and we are interested in

  • FInding value function for given policy (Prediction problem)
  • Finding optimal policy for given MDP (Control problem)

 

There are three things :

  • Policy Evaluation
    • Takes Probability of actions into account
  • Policy Iteration
    • First evaluates policy
    • Then generates new policy based on this evaluation
  • Value Iteration
    • Policy evaluation is time taking process
    • Let’s go with just one iteration
    • We don’t need to update policy in each iteration, just keep using new value function does the job
    • So it is like keep updating value function (choosing action that maximizes value instead of taking probabilities) until it converges and then selecting a policy based on this optimal values function

 

Policy Evaluation

policy_eval

Policy Iteration

policy_iter

Values Iteration

value_iter

 

Here are two slides from : http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/DP.pdf

 

 

References :

Reinforcement learning

OpenAI Gym Environment

Open AI provides framework for creating environment and training on that environment. In this post I am pasting a simple notebook for a quick look up on how to use this environments and what all functions are available on environment object.

I have used environment available on github by Denny Britz and here are the references :

References :

https://github.com/dennybritz/reinforcement-learning

http://www.wildml.com/2016/10/learning-reinforcement-learning/

https://gym.openai.com/docs/

optimization

[Example] Lagrange Multiplier With Equality Constraints

 

Stationary Point

Definition of stationary point from wikipedia :

In mathematics, particularly in calculus, a stationary point or critical point of a differentiable function of one variable is a point on the graph of the function where the function’s derivative is zero. Informally, it is a point where the function “stops” increasing or decreasing (hence the name).

 

Lagrange multiplier helps us to find all the stationary points, It can be local minima, local maxima, global minima or global maxima. Once we evaluate objective function at each of these stationary point we can classify which one is local/global minima and maxima.

 

Example

 

 

Reinforcement learning

Thompson Sampling

 

Thompson sampling is one approach for Multi Armed Bandits problem and about the Exploration-Exploitation dilemma faced in reinforcement learning.

Challenge in solving such a problem is that we might end up fetching the same arm again and again. Bayesian approach helps us solving this dilemma by setting prior with somewhat high variance.

Here is the code for two armed bandit. One has success probability of 40% (bandit 0) and another has 25% (bandit 1).

We are using beta distribution for deciding which arm to pull. Beta distribution has two parameter alpha and beta. Higher values of alpha, pulls distribution towards 1. Beta distribution is always confined between 0 and 1.

How we train is that for each feedback we receive we increment alpha by 1 if it was success or beta by 1 in case of failure. For choosing the arm we draw random sample from the distribution of each arm and select the arm with highest value.

 

And here is simulation results. We see that initially both the the armed are pulled frequently but slowly arm 1 is pulled less and less, but it is never straight away zero.

 

choice

 

 

Deep Learning

Deep learning taking off

I recently started Andrew Ng’s specialization on deep learning and found these two interesting points :

One is about how performance of algorithm changes with the amount of data. Traditional algorithms have limits but Deep neural network has more advantages.

whyD

 

Also for the small amount of data traditional algorithms may win over neural nets with good feature engineering.

Second reason is that deep learning requires data, computation and efficient algorithms. Recent years have seen significant advancement in algorithm to increase computation efficiency. For example sigmoid to ReLU was an algorithmic change which allowed gradient to converge faster.

 

Ref : https://www.coursera.org/learn/neural-networks-deep-learning/home