Posts
-
A Lagrange Multipliers Refresher, For Idiots Like Me
What Are Lagrange Multipliers?
Lagrange multipliers are a tool for doing constrained optimization. Say we are trying to minimize a function \(f(x)\), subject to the constraint \(g(x) = c\).
\[\begin{aligned} \min &\,\, f(x) \\ \text{subject to} &\,\, g(x) = c \end{aligned}\]To solve this, you define a new function.
\[\mathcal{L}(x, \lambda) = f(x) - \lambda (g(x) -c)\]The optimum lies at a stationary point of \(\mathcal{L}\) (a point where the gradient in \(x\) and \(\lambda\) is both zero).
This is all true, but it doesn’t explain why it works.
Why Do Lagrange Multipliers Work?
Let’s consider a variant of the problem. We want to minimize \(f(x)\) subject to \(g(x) \ge 0\).
\[\begin{aligned} \min &\,\, f(x) \\ \text{subject to} &\,\, g(x) \ge 0 \end{aligned}\]Let’s define the following min-max optimization problem.
\[\min_x \max_{\lambda \ge 0} f(x) - \lambda g(x)\]I claim the solution \(x\) of this optimization problem occurs at the smallest \(f(x)\) that satisfies the constraint \(g(x) \ge 0\). Why?
As written, we first choose an \(x\), then choose a \(\lambda\) that maximizes the objective, and we want to choose an \(x\) that minimizes how much an adversarial \(\lambda\) can hurt us. Suppose we are violating the constraint \(g(x) \ge 0\). Then we have \(g(x) < 0\). At such an \(x\), \(-g(x) > 0\), so we can pick \(\lambda = \infty\) to drive the objective value to \(\infty\).
If we are not violating the constraint, then \(-g(x)\) is \(0\) or negative, and since \(\lambda\) is constrained to \(\lambda \ge 0\), the optimal \(\lambda\) is \(\lambda = 0\), giving objective value \(f(x)\).
In other words, the solution of the unconstrained problem is the same as the solution to the original constrained problem.
This handles the \(g(x) \ge 0\) case. What if we have multiple constraints?
\[\begin{aligned} \min &\,\, f(x) \\ \text{subject to} &\,\, g_1(x) \ge 0 \\ &\,\, g_2(x) \ge 0 \end{aligned}\]We can define a similar min-max problem by adding a Lagrange multiplier \(\lambda_i\) for each constraint \(i\).
\[\min_x \max_{\lambda_1,\lambda_2 \ge 0} f(x) - \lambda_1 g_1(x) - \lambda_2 g_2(x)\]By a similar argument, the optimal solution to this min-max problem occurs at \(\lambda_1 = 0, \lambda_2 = 0\), and \(x\) at the solution of the original constrained optimization problem, assuming it exists. If either constraint was violated, then we could have driven the corresponding \(\lambda_i\) to \(\infty\), like before.
What if we have a constraint \(g(x) \ge c\)? We can rearrange it to the constraint \(g(x) - c \ge 0\).
\[\begin{aligned} \min &\,\, f(x) \\ \text{subject to} &\,\, g(x) - c \ge 0 \end{aligned}\]This is solved the same way.
\[\min_x \max_{\lambda \ge 0} f(x) - \lambda (g(x) - c)\]If we have a constraint \(g(x) \le 0\), we can negate both sides to get \(-g(x) \ge 0\). If we have a constraint \(g(x) \le c\), we can rearrange it to \(c - g(x) \ge 0\). This reduces everything to the \(g(x) \ge 0\) case we know how to solve.
Bringing it back to the equality case, let’s say we have a constraint \(g(x) = c\). How do we reduce this to the previous cases? This equality is the same as the pair of constraints \(g(x) \ge c, g(x) \le c\), which are only both satisfied at \(g(x) = c\). This time, I’ll write the Lagrange multipliers as \(\lambda_+\) and \(\lambda_-\)
\[\begin{aligned} \min &\,\, f(x) \\ \text{subject to} &\,\, g(x) \ge c \\ \text{subject to} &\,\, g(x) \le c \end{aligned}\] \[\min_x \max_{\lambda_+, \lambda_- \ge 0} f(x) - \lambda_+ (g(x) - c) - \lambda_- (c - g(x))\]Like before, if we ever have \(g(x) \neq c\), then we can choose \(\lambda_+, \lambda_-\) such that the objective value shoots up to \(\infty\). But, since \(g(x) - c\) and \(c - g(x)\) are just negations of each other, we can simplify this further.
\[\min_x \max_{\lambda_+, \lambda_- \ge 0} f(x) - (\lambda_+ - \lambda_-) (g(x) - c)\]It’s possible to make \(\lambda_+ - \lambda_-\) equal any real number while still satisfying \(\lambda_+ \ge 0, \lambda_- \ge 0\), so let’s just replace \(\lambda_+ - \lambda_-\) with \(\lambda\), and say \(\lambda\) can be any real number instead of only nonnegative ones.
\[\min_x \max_{\lambda} f(x) - \lambda (g(x) - c)\]Now, we’re back to the equality case we started at. The solution to this must lie at a saddle point where the gradient with respect to \(x\) and \(\lambda\) is both zero.
Why Do I Like This Explanation?
When I was re-learning Lagrange multipliers a while back, I was upset that all the most popular results were targeted towards people taking their first multivariate calculus course. These explanations exclusively covered the \(g(x) = c\) case, and the constraints I wanted to add were more like \(a \le g(x) \le b\).
It’s a shame that most people’s first introduction to Lagrange multipliers only covers the equality case, because inequality constraints are more general, the concepts needed to understand that case aren’t much harder, and it’s clearer how you’d apply an optimization algorithm to solve the resulting unconstrained problem. Like all other min-max optimization (GANs, actor-critic RL, etc.), doing alternating gradient descent updates on \(x\) then \(\lambda\) is both simple and works out fine.
This procedure doesn’t guarantee that your constraint \(g(x) \ge c\) is met over the entire optimization process, but it does quickly penalize the optimization for violating the constraint, since the \(\lambda\) for that constraint will quickly rise and keep rising until the constraint is satisfied again, at which point \(\lambda\) will quickly regress towards \(0\) until it is needed once more.
As one final detail, handling inequalities requires handling optimization over \(\lambda \ge 0\). The common trick is to rewrite \(\lambda\) as \(\lambda = \exp(w)\) or \(\lambda = \text{softplus}(w)\), then do gradient descent over variable \(w\) instead.
-
I'm Still Here
I normally like my blog posts to have a coherent theme to them. The problem is that this takes time, which I haven’t had because I’ve been putting it into other side projects instead. I got inspired recently, and figured, screw it, I’ll just write a grab bag blog post to remind people this blog still exists, and go from there.
The first side project I’ve been working on is a history of Dominion Online. Dominion is a card game I was introduced to in 2010, and I took a liking to it right away. I got deep enough into it that I started playing competitively, participating in online tournaments, and so on. These days I don’t play it very often, but I still stick around the community. The online version of Dominion has a rich, storied history. Many of the older members who’ve seen it all have left Dominion, and I’m realizing I’m one of the few people who both experienced all of it firsthand and wants to make sure those stories aren’t forgotten. Hence, the side project. Writing it has been really fun, but also really time-consuming. One restriction I have is that everything needs to have a primary source, and this means searching through 8 year old forum threads and archive.org snapshots for defunct companies to get links for everything. I have no idea who will read this, but it’s important to me.
My second side project is running a puzzlehunt themed around My Little Pony. I’ve thought about doing this for a while, and when Hasbro announced the final season of My Little Pony: Friendship is Magic would air this year, I realized it was do or die. I don’t know when the puzzlehunt will be finished, but the hunt has passed the point where it’ll for-sure exist in some form. I’ve gotten a few people roped into constructing and testsolving, the rough puzzle and story structure is in place, and metapuzzles are getting testsolved as I speak. Funnily enough, I’m the only brony out of all the puzzle constructors and testsolver so far, and I’m still surprised everyone involved is putting up with my nonsense. If you’re interested in testsolving or constructing, email me, and if you’d like to solve it when it’s done, stay tuned.
In AI for Games news, Blizzard announced that AlphaStar will be playing on the live SC2 competitive ladder, to evaluate agent performance in blind trials. AlphaStar can now play as all three races, has more restrictive APM limits than their previous version, and has been trained to only observe information from units within the current camera view. Should be fun - I expect it to beat everyone in blind games by this point.
And today, Facebook and CMU announced they’ve developed Pluribus, a poker bot that can beat pros at 6-player no-limit Texas Hold’em. The Science paper is available here, and the Facebook blog post linked above is both pretty informative and free of all the media hype that will surely appear around this result. I say Facebook and CMU, but it’d be more accurate to credit the authors: Noam Brown and Tuomas Sandholm, the two authors of Libratus, and now Pluribus. Congrats!
I have two predictions about the reception. The first prediction is that there will be a ridiculously wide misconception that Pluribus is learning to read pro poker players and adapt its strategy to beat them. This isn’t what’s happening - it’s more like Pluribus is playing against imagined versions of itself that are more prone to folding or calling or raising, and using this to learn poker strategies that are less exploitable. The second prediction is that the news media is going to milk the “too dangerous to release the code” angle for all it’s worth. I likely wouldn’t release the code either, but last I heard, making money in online poker has been dicey ever since online poker’s Black Friday. The biggest consequence will likely be that scrubs will now believe all their opponents are using superhuman poker bots, to avoid facing the reality that they suck at poker.
-
Recent RL Papers I've Worked On
I’m a coauthor on two RL papers that went on arXiv recently.
The first is The Principle of Unchanged Optimality in RL Generalization, which I co-wrote with Xingyou Song. This evolved out of discussions we had about generalization for RL, where at some point I realized we were discussing ideas that were both clearly correct and not written down anywhere. So, we wrote them down. It’s a very short paper that draws comparisons between supervised learning generalization and RL generalization, using this to propose a principle RL generalization benchmarks should have: if you change the dynamics, make this observable to your policy, or else your problem isn’t well-founded. It also talks about how model-based RL can improve sample efficiency at the cost of generalization, by constructing setups where modelling environment-specific features speeds up learning in that environment while generalizing poorly to other environments.
The second is Off-Policy Evaluation via Off-Policy Classification. This is a much longer paper, written with Kanishka Rao, Konstantinos Bousmalis, Chris Harris, Julian Ibarz, and Sergey Levine over many more months. It’s about off-policy evaluation, the problem of evaluating an RL policy without directly running that policy in the final environment. This is a problem I was less familiar with before starting this paper, but I now believe it to be both really important and really understudied. Our aim was to evaluate policies using just a Q-function, without importance sampling or model learning. With some assumptions about the MDP, we can use classification techniques to evaluate policies, and we show this scales up to deep networks for image-based tasks, including evaluation of real-world grasping models. I’m working on a more detailed post about off-policy evaluation and why it’s important, but it needs more polish.
Let me know if you read or have comments on either paper. Both will be presented at ICML workshops, so if you’re going to ICML you can go to those workshops as well - see the Research page for workshop names.