References :

- Pattern Recognition and Machine Learning by Bishop [Page no 244]
- Andrew NG’s course by deeplearning.ai
- https://sudeepraja.github.io/Neural/

Skip to content
# Tag: maths

Deep Learning
# Derivation Of Backpropagation – 2

optimization
# [Example] Lagrange Multiplier With Equality Constraints

#### Stationary Point

## Example

optimization
# Lagrange Duality

math
# Class 12 Geometry Notes

optimization
# [Theory] Lagrange Multiplier and Constrained Optimization

## Budget Intuition

## Inequality Constraint

### Notes

### References

math
# On multivariate Gaussian

### Formulas

### On Covariance Matrix

### Derivations

### Linear Transformation Interpretation

### References

Notes on data science

References :

- Pattern Recognition and Machine Learning by Bishop [Page no 244]
- Andrew NG’s course by deeplearning.ai
- https://sudeepraja.github.io/Neural/

Advertisements

Definition of stationary point from wikipedia :

In mathematics, particularly in calculus, a stationary point or critical point of a differentiable function of one variable is a point on the graph of the function where the function’s derivative is zero. Informally, it is a point where the function “stops” increasing or decreasing (hence the name).

**Lagrange multiplier helps us to find all the stationary points**, It can be local minima, local maxima, global minima or global maxima. Once we evaluate objective function at each of these stationary point we can classify which one is local/global minima and maxima.

There are three things :

- Original problem (Primal problem)
- Dual function (Function of lagrange multiplier)
- Dual problem

Suppose p is the solution of primal problem and d the dual problem.

If original problem is minimization, we are interested in lower bound (d) such that d<p. We want to find maximum value of d and dual problem becomes maximization problem.

If original problem is maximization, we are interested in finding upper bound (d) such that p<d. We want to find minimum value of d and dual problem become minimization problem.

Dual problem is always convex irrespective of primal problem.

d<p is weak duality, while d=p is strong duality.

If primal problem is convex strong duality generally holds.

Slater’s condition is sufficient tests for convex optimization.

Further reading :

Kanpsack Problem here

http://ocw.nctu.edu.tw/upload/classbfs121109082344836.pdf

Convex Optimization by Stephen Boyd

*Motivation behind these notes is that geometry helps in providing intuitive derivation to machine learning models and optimization scenarios !*

Line in 2D resembles plane in 3D, not the line in 3D.

Concept of distance is essentially projection, It can be either sine (Cross product) or cosine (Dot product)

References :

http://www.ncert.nic.in/index.html

Lagrange multipliers helps us to solve constrained optimization problem.

An example would to maximize f(x, y) with the constraint of g(x, y) = 0.

Geometrical intuition is that points on g where f either maximizes or minimizes would be will have a parallel gradient of f and g

∇ f(x, y) = λ ∇ g(x, y)

We need three equations to solve for x, y and λ.

Solving above gradient with respect to x and y gives two equation and third is g(x, y) = 0

These will give us the point where f is either maximum or minimum and then we can calculate f manually to find out point of interest.

Lagrange is a function to wrap above in single equation.

L(x, y, λ) = f(x, y) – λ g(x, y)

And then we solve ∇ L(x, y, λ) = 0

Let f represent the revenue R and g represent cost C.

Also let g(x, y) = b where b is maximum cost that we can afford.

Let M* be the optimized value obtained by solving lagrangian and value of λ we got was λ*.

So λ is the rate with which maximum revenue will increase with unit change in b.

Here is the problem of finding maximum of f(x) [2]

Above conditions are known as Karush-Kuhn-Tucker (KKT) conditions.

First – primal feasibility

Second – dual feasibility

Third – complementary slackness conditions

For the problem of finding minimum value of f(x) we minimize following with same KKT conditions:

This can be extended to arbitrary number of constraints.

- This is relates to solving the dual of a problem for convex function as we had in Boyd’s book
- For unconstrained optimization computing hessian of a matrix is one way of solving if it is continuous [1]

[1] https://www.youtube.com/watch?v=TyfZaZSF5nA

[2] Pattern Recognition and machine learning by Bishop

Formula for multivariate gaussian distribution

Formula of univariate gaussian distribution

Notes:

- There is normality constant in both equations
- Σ being a positive definite ensure quadratic bowl is downwards
- σ2 also being positive ensure that parabola is downwards

Definition of covariance between two vectors:

When we have more than two variable we present them in matrix form. So covariance matrix will look like

- Formula of multivariate gaussian distribution demands Σ to be singular and symmetric positive semidefinite, which in terms means sigma will be symmetric positive semidefinite.
- For some data above demands might not meet

Following derivations are available at [0]:

- We can prove[0] that when covariance matrix is diagonal (i.e there is variables are independent) multivariate gaussian distribution is simply multiplication of single gaussian distribution of each variable.
- It was derived that shape of isocontours (figure 1) is elliptical and axis length is proportional to individual variance of that variable
- Above is true even when covariance matrix is not diagonal and for dimension n>2 (ellipsoids)

This was proved in two steps [0]:

Step-1 : Factorizing covariance matrix

Step-2 : Change of variables, which we apply to density function

[0] http://cs229.stanford.edu/section/gaussians.pdf