Maximize a multi-variable function with a constraint
Example:
Maximize
on the set
We can imagine a 2D circle project onto the 3D surface and looking for the highest point.
=> Instead, we can use contour map (lines). Find a tangent contour line intersect the circle.
To find the tangent line, we use Gradient (the gradient vectors, pass through a contour line, are perpendicular to it)
Both the gradient of f and g are perpendicular to its contour lines, and direct in the same way (length are proportional) =>
We call
Core idea:
Set these gradients equal to each other => Represent when the contour line for one function is tangent to the contour line of another.
The matrix form is the same as
We still need the third equation to solve three unknown, that is our constraint that we've know the whole time.
These three equations characterize our constrained optimization problem.
Top two equations tell us what's necessary in order for our contour lines, the contour of f and the contour of g to be perfectly tangent with each other. The bottom one just tells us that we have to be on the unit circle
Solve this example
(Each time you're dividing by a variable, you're basically assuming that it's not equal to zero. And so we need to check if it really not equal to zero.)
In this four potential points (x, y), We're going to plug these point in f and see which point is the best.
So
Package the equations up as a function (and unpackage it again when we solve its gradient)
(b is a constant)
from matrix form to equation form (top two)
These steps simply set one of the gradient vectors proportional to the other one.
And finally the partial derivative of the Lagrangian with respect to the Lagrange multiplier:
- Computer is good at computing the gradient of some function equal to zero.
- If you construct the Lagrangian and then compute its gradient, all you're reallly doing is repackaging it up only to unpackage it again.
- The reason is because that's how you solve unconstrained maximization problems.
- The whole point of the Lagrangian is that it turns our Constrained Optimization Problem involving f and g and lambda into an Unconstrained Optimization Problem
The example so far haven't use the variable lambda (usually eliminate in calculation). But it's not just some dummy variable.
Consider a more general problem,
M* is the maximum of function f, b is a constant. (We can pretend f is revenue and g is budget)
When you solved
$\lambda$ carry information about how much we can increase f if we increase g
Lets consider b is a variable. And we rewrite
$$ M^(b) = f(x^(b), y^*(b)) $$
Magical fact:
Plug in $(x^, y^, \lambda^*)$ into Lagrangian. (rather than f)
$$ \mathcal{L}(x^, y^, \lambda^) = \underbrace{f(x^, y^)}_{M^} - \lambda^\underbrace{(g(x^, y^*) - b)}_{=0} $$
$g(x^, y^) - b$ must be euqal to zero because $x^$ and $y^$ have to satisfy the constraint.
$$ \mathcal{L}(x^(b), y^(b), \lambda^(b), b) = f(x^(b), y^(b)) - \lambda^(g(x^(b), y^(b)) - b) $$
Find the derivative respect to b. (use multivariable chain rule)
$$ \frac{d\mathcal{L}^}{db} = \frac{\partial\mathcal{L}}{\partial x^} \cdot \frac{\partial h^}{db} + \frac{\partial\mathcal{L}}{\partial y^} \cdot \frac{\partial y^}{db} + \frac{\partial\mathcal{L}}{\partial \lambda^} \cdot \frac{\partial \lambda^*}{db} + \frac{\partial\mathcal{L}}{\partial b} \cdot \frac{db}{db} $$
By the definition of $x^, y^, \lambda^*$ that it happends when
$$
\frac{d\mathcal{L}^}{db} = 0 \cdot \frac{\partial h^}{db} + 0 \cdot \frac{\partial y^}{db} + 0 \cdot \frac{\partial \lambda^}{db} + \frac{\partial\mathcal{L}}{\partial b} \cdot 1 = \frac{\partial\mathcal{L}}{\partial b}
$$
The derivative of Lagrangi an is equal to the partial derivative of Lagrangian respect to b. (
That means the single-variable derivative of L with respect to b ends up being the same as the partial derivative of L. This L, where you're free to change all the variables that these should be the same.
$$ \frac{d\mathcal{L}^}{db} =\frac{\partial\mathcal{L}}{\partial b} = \lambda^(b) $$
TBD
Karush-Kuhn-Tucker (KKT) Conditions:
$$
\nabla_x \mathcal{L}(x^, \alpha^, \beta^) = 0 \
\nabla_\alpha \mathcal{L}(x^, \alpha^, \beta^) = 0 \
\nabla_\beta \mathcal{L}(x^, \alpha^, \beta^) = 0
$$
$$
\alpha_i^ c_i(x^) = 0 \
c_i(x^) \leq 0
$$
$$
\alpha_i^* \geq 0 \
h_j(x^*) = 0
$$
- Applications of multivariable derivatives
- Tangent planes and local linearization
- Quadratic approximations
- Optimizing multivariable functions
- Optimizing multivariable functions (articles)
- Lagrange multipliers and constrained optimization
- Constrained optimization (articles)
- Lagrange Multipliers and Constrained Optimization
- Khan Academy - Lagrange multipliers and constrained optimization
- Tomas Calculus Ch 14.8 Lagrange Multipliers
- Lagrange Duality
- 李航 - 統計學習方法 Appendix C