Negative sampling in word2vec

In the precious post we talked about skipgram model.


Now let’s say we have 1000 words and 300 hidden units, we shall have 300,000 wights in both hidden and output unit, which are two many parameters.

Output label during training is one hot vector with 999 zeros and single 1. We randomly select 5 zeros and update weight for six words only. (5 zeros and single 1). The more frequent word is the highr the probability of it getting selected. Google paper has mentioned some emperical formula for this. This is know as negative sampling.

Above was for output layer. In hidden layer weights are updated only for input words. (Irrespective if it is negative sampling or not)






[Example] Lagrange Multiplier With Equality Constraints


Stationary Point

Definition of stationary point from wikipedia :

In mathematics, particularly in calculus, a stationary point or critical point of a differentiable function of one variable is a point on the graph of the function where the function’s derivative is zero. Informally, it is a point where the function “stops” increasing or decreasing (hence the name).


Lagrange multiplier helps us to find all the stationary points, It can be local minima, local maxima, global minima or global maxima. Once we evaluate objective function at each of these stationary point we can classify which one is local/global minima and maxima.






Lagrange Duality

There are three things :

  1. Original problem (Primal problem)
  2. Dual function (Function of lagrange multiplier)
  3. Dual problem


Suppose p is the solution of primal problem and d the dual problem.


If original problem is minimization, we are interested in lower bound (d) such that d<p.  We want to find maximum value of d and dual problem becomes maximization problem.


If original problem is maximization, we are interested in finding upper bound (d) such that p<d. We want to find minimum value of d and dual problem become minimization problem.


Dual problem is always convex irrespective of primal problem.


d<p is weak duality, while d=p is strong duality.

If primal problem is convex strong duality generally holds.

Slater’s condition is sufficient tests for convex optimization.



Further reading :

Kanpsack Problem here

Convex Optimization by Stephen Boyd



[Theory] Lagrange Multiplier and Constrained Optimization

Lagrange multipliers helps us to solve constrained optimization problem.

An example would to maximize f(x, y) with the constraint of g(x, y) = 0.

Geometrical intuition is that points on g where f either maximizes or minimizes would be will have a parallel gradient of f and g

∇ f(x, y)  =  λ ∇  g(x, y)

We need three equations to solve for x, y and λ.

Solving above gradient with respect to x and y gives two equation and third is g(x, y) = 0

These will give us the point where f is either maximum or minimum and then we can calculate f manually to find out point of interest.

Lagrange is a function to wrap above in single equation.

L(x, y, λ) = f(x, y) – λ g(x, y)

And then we solve ∇ L(x, y, λ) = 0


Budget Intuition

Let f represent the revenue R and g represent cost C.

Also let g(x, y) = b where b is maximum cost that we can afford.

Let M* be the optimized value obtained by solving lagrangian and value of λ we got was λ*.


So λ is the rate with which maximum revenue will increase with unit change in b.


Inequality Constraint


Here is the problem of finding maximum of f(x) [2]



Above conditions are known as Karush-Kuhn-Tucker (KKT) conditions.

First – primal feasibility

Second – dual feasibility

Third  –  complementary slackness conditions

For the problem of finding minimum value of f(x) we minimize following with same KKT conditions:



This can be extended to arbitrary number of constraints.




  • This is relates to solving the dual of a problem for convex function as we had in Boyd’s book
  • For unconstrained optimization computing hessian of a matrix is one way of solving if it is continuous  [1]


formal summary




[2] Pattern Recognition and machine learning by Bishop