math

Lagrange Multiplier and Constrained Optimization

Lagrange multipliers helps us to solve constrained optimization problem.

An example would to maximize f(x, y) with the constraint of g(x, y) = 0.

Geometrical intuition is that points on g where f either maximizes or minimizes would be will have a parallel gradient of f and g

∇ f(x, y)  =  λ ∇  g(x, y)

We need three equations to solve for x, y and λ.

Solving above gradient with respect to x and y gives two equation and third is g(x, y) = 0

These will give us the point where f is either maximum or minimum and then we can calculate f manually to find out point of interest.

Lagrange is a function to wrap above in single equation.

L(x, y, λ) = f(x, y) – λ g(x, y)

And then we solve ∇ L(x, y, λ) = 0

 

Budget Intuition

Let f represent the revenue R and g represent cost C.

Also let g(x, y) = b where b is maximum cost that we can afford.

Let M* be the optimized value obtained by solving lagrangian and value of λ we got was λ*.

l1

So λ is the rate with which maximum revenue will increase with unit change in b.

 

Inequality Constraint

 

Here is the problem of finding maximum of f(x) [2]

l5

l4l7

Above conditions are known as Karush-Kuhn-Tucker (KKT) conditions.

First – primal feasibility

Second – dual feasibility

Third  –  complementary slackness conditions

For the problem of finding minimum value of f(x) we minimize following with same KKT conditions:

l9

 

This can be extended to arbitrary number of constraints.

l8

 

Notes

  • This is relates to solving the dual of a problem for convex function as we had in Boyd’s book
  • For unconstrained optimization computing hessian of a matrix is one way of solving if it is continuous  [1]

 

 

References

[0] https://www.khanacademy.org/math/multivariable-calculus/applications-of-multivariable-derivatives/constrained-optimization/a/lagrange-multipliers-single-constraint

[1] https://www.youtube.com/watch?v=TyfZaZSF5nA

[2] Pattern Recognition and machine learning by Bishop

Uncategorized

On multivariate Gaussian

Formulas

Formula for multivariate gaussian distribution

g1

Formula of univariate gaussian distribution

g2

Notes:

  • There is normality constant in both equations
  • Σ being a positive definite ensure quadratic bowl is downwards
  • σ2 also being positive ensure that parabola is downwards

 

On Covariance Matrix

Definition of covariance between two vectors:

g3

When we have more than two variable we present them in matrix form. So covariance matrix will look like

g4

  • Formula of multivariate gaussian distribution demands Σ to be singular and symmetric positive semidefinite, which in terms means sigma will be symmetric positive semidefinite.
  • For some data above demands might not meet

 

Derivations

Following derivations  are available at [0]:

  • We can prove[0] that when covariance matrix is diagonal (i.e there is variables are independent) multivariate gaussian distribution is simply multiplication of single gaussian distribution of each variable.
  • It was derived that shape of isocontours (figure 1) is elliptical and axis length is proportional to individual variance of that variable
  • Above is true even when covariance matrix is not diagonal and for dimension n>2 (ellipsoids)

g5

Linear Transformation Interpretation

g6

This was proved in two steps [0]:

Step-1 : Factorizing covariance matrix

g7

Step-2 : Change of variables, which we apply to density function

g8

 

References

[0] http://cs229.stanford.edu/section/gaussians.pdf