Posts
-
The Other Two Envelope Problem
The two envelope problem is a famous problem from decision theory, at least famous enough to have its own Wikipedia page. It’s an interesting problem, but this post is about its less well known variation.
Here is the formulation.
There are two envelopes, one with \(A\) and one with \(B\) dollars, \(A \neq B\). \(A\) and \(B\) are unknown positive integers. You are randomly given an envelope, can look inside, then must choose whether to switch envelopes or not.
Does there exists a switching strategy which ends with the larger envelope more than half the time?
At first glance, the answer should be no. Although we know the contents of one envelope, we have no information on how much the other envelope might have. When the other envelope is a black box, how can we make a meaningful decision?
As it turns out, such a strategy does exist, and the small amount of information we have (the amount in the current envelope) is enough to improve our standing, even when the alternative has unknown value.
The Power of Randomness
The trick is to make the switching decision randomly, in a way that biases outcomes to end with the larger envelope. The strategy is as follows:
- Inspect amount \(X\) inside the envelope.
- Flip a fair coin until it lands tails.
- If there were at least \(X\) flips, switch envelopes. Otherwise, do not switch.
For analysis, assume without loss of generality that \(A < B\). There are two ways to end with envelope \(B\). Either start with \(A\) and switch, or start with \(B\) and do not switch. The first case happens with probability
\[P[\text{started A, switched}] = \frac{1}{2}\left(\frac{1}{2^{A}}\right)\]The second happens with probability
\[P[\text{started B, did not switch}] = \frac{1}{2}\left(1 - \frac{1}{2^{B}}\right)\]So, the chance to end with \(B\) is
\[\frac{1}{2} + \frac{1}{2^{A+1}} - \frac{1}{2^{B+1}}\]which is indeed larger than \(1/2\) for \(A < B\). \(\blacksquare\)
Extending to the Reals
This coin flip strategy works when \(A\) and \(B\) are integers, but fails when \(A,B\) are arbitrary reals. The problem is that the number of flips comes from a discrete distribution, so it does not have the granularity to distinguish between values like \(A = 2.3, B = 2.4\).
So, how is this strategy extendable? The way the current strategy works is by creating a decision boundary based on the amount in the seen envelope. The boundary is then applied to the geometric distribution with parameter \(p=1/2\).
Decision boundaries when \(A = 2, B = 4\). The switching strategy gets an advantage when the sampled \(Y\) lies between the two boundaries.
To handle arbitrary reals, we should use a continuous probability distribution instead. Nothing in the previous logic relies on the distribution being discrete, so we can simply drop in a distribution of our choice. The only requirement we need is a strictly positive probability density function over \((-\infty, \infty)\) (we’ll explain why we need this later.) The standard normal distribution \(N(0, 1)\) will do fine.
The modified strategy is
- Inspect amount \(X\) inside the envelope.
- Sample \(Y\) from \(N(0,1)\).
- If \(Y \ge X\). switch. Otherwise, do not switch.
Decision boundaries when \(A=0.5, B=1.5\). Again, the switching strategy gets an advantage when \(Y\) falls between \(A\) and \(B\).
The proof is also similar. For \(A < B\), \(P(Y \ge A) > P(Y \ge B)\). This is why the p.d.f. needs to be positive everywhere. If the p.d.f. is zero over some interval \([a,b]\), it is possible that both \(A, B \in [a,b]\), which gives \(P(Y \ge A) = P(Y \ge B)\). The chance of ending with the larger envelope is
\[\frac{1}{2}P(Y \ge A) + \frac{1}{2}\left(1 - P(Y \ge B)\right) = \frac{1}{2} + \frac{1}{2}\left(P(Y \ge A) - P(Y \ge B)\right) > \frac{1}{2}\]and this strategy generalizes to real valued envelopes. \(\blacksquare\)
Extending to Multiple Envelopes
At this point, it’s worth considering whether we can take any lessons from this problem. If we substitute “given an envelope” with “given an action”, and “dollars” with “utility”, this result suggests that even with no information on alternative choices, we could use a random action to improve our current standing. In real life, we often have very little information on how different actions will play out, so this power to bias towards improvement is appealing. However, in real life there are usually much more than two possible actions, so we should first consider whether the two envelope framework is even valid.
When there are \(n\) envelopes instead of \(2\) envelopes, we can no longer aim for only the best envelope. The switch could slightly help us, slightly hurt us, especially help us, or especially hurt us. Because of this, we need to modify the criteria of a good strategy.
There are \(n\) envelopes, which give utilities \(A_1 < A_2 < A_3 < \cdots < A_n\). The \(A_i\) are unknown real numbers. You are randomly given an envelope, can look inside, then must choose whether to switch with another envelope.
Does there exist a switching strategy that improves your utility more often than it worsens your utility?
Once again, it turns out the real valued two envelope strategy still works with appropriate modifications. We still choose whether to swap by seeing if sample \(Y\) is greater than seen utility \(X\). The only difference is choosing which envelope to switch to. From the opener’s perspective, every other envelope appears identical. Thus, if the strategy says to swap, we should pick an alternative envelope at random.
This leaves showing the improvement chance is higher than the worsening chance. If the algorithm does not swap, the utility is unchanged, so it suffices to analyze only situations where we switch envelopes.
For envelope \(i\), the chance of switching is \(P[Y \ge A_i]\). The reward rises for \(n-i\) envelopes and drops for \(i-1\) envelopes. Thus, the chance of improving utility is
\[P[\text{utility rises}] = \sum_{i=1}^n \frac{1}{n} P[Y \ge A_i]\cdot \frac{n-i}{n-1}\]and the chance of worsening utility is
\[P[\text{utility drops}] = \sum_{i=1}^n \frac{1}{n} P[Y \ge A_i]\cdot \frac{i-1}{n-1}\]Envelope \(i\) has \(n-i\) envelopes that are better, and envelope \(n-i+1\) has \(n-i\) envelopes that are worse. Intuitively, every improving term for \(A_i\) should cancel with a worsening term for \(A_{n-i+1}\), but the improving terms have more weight because the switching probability after observing \(A_i\) is higher. This is formalized below.
Take the difference and pair up terms with matching \(P[Y \ge A_i]\) to get
\[P[\text{improves}] - P[\text{worsens}] = \frac{1}{n}\sum_{i=1}^n \frac{n-2i+1}{n-1} P[Y \ge A_i]\]The coefficients of the terms are \(\frac{n-1}{n-1}, \frac{n-3}{n-1}, \ldots, \frac{-(n-3)}{n-1}, \frac{-(n-1)}{n-1}\). Since we only care about whether the difference is positive, multiply by \(2\) to get two copies of the terms, then pair terms by coefficient, matching each coefficient with its negative.
\[2(P[\text{improves}] - P[\text{worsens}]) = \frac{1}{n}\sum_{i=1}^n \frac{n-2i+1}{n-1} (P[Y \ge A_i] - P[Y \ge A_{n+1-i}])\]Consider each term of this summation.
- For \(i < \frac{n+1}{2}\), the coefficient is positive, and \(A_i < A_{n+1-i}\), so \(P[Y \ge A_i] - P[Y \ge A_{n+1-i}]\) is positive. The entire term is positive.
- For \(i = \frac{n+1}{2}\), the coefficent is \(0\).
- For \(i > \frac{n+1}{2}\), the coefficient is negative, and \(A_i > A_{n+1-i}\), so \(P[Y \ge A_i] - P[Y \ge A_{n+1-i}]\) is negative. The entire term is positive.
Every term in the summation is positive or zero, so the entire sum is positive, and the probability of increasing utility is higher than the probability of decreasing it. \(\blacksquare\)
Implications
So, now we have a strategy that is very slightly biased towards increasing utility. But before trying to implement this in real life, it’s worth thinking through all the complications.
- The values \(A_i\) are chosen through some unknown process. Although this strategy is more likely to end on a larger \(A_j\), it is impossible to quantify how much better your utility will be after the swap. The utility might even go down in expectation. Although the chance of improving utility is higher, the potential utility increase may be small while the potential utility decrease is large. If we had more information about how \(A_j\) are generated, then we could argue this in more detail, but at that point this information independent scheme is probably no longer useful.
- The analysis relies on having a value on the initial action. It is incredibly likely that we are just as confused about the value of our current action as we are about the values of other actions, which renders the strategy moot.
- The analysis also relies on starting with an action picked uniformly at random out of a pool of possible actions. In reality, this is almost never the case. We usually have reasons to choose to act in one way or another, which places a non-uniform prior on the initial action. Intuitively, if we are already good at making decisions, it is far less likely we can switch to an even better one.
So overall, this is not a life hack. All we have is a surprising decision theory result, and that’s fine too.
(Thanks to the people who helped review earlier drafts.)
-
Why Aren't I Optimizing My Process?
While finding ways to spend time that didn’t involve studying for an entrance exam, an out of context quote popped into my head.
I also believe that the usefulness of software tools is usually greatly underrated. Better tools can act as a significant multiplier on everyone’s productivity.
(From the announcement for the Computation Graph Toolkit. Based on Theano, it’s a cool library for neural net implementation. It’s also written by a PhD student I actually know. Check it out!)
This came after I’d done the bare minimum to organize my life. I cleared out my inboxes across three email accounts. I set up email forwarding to send everything to one master account. For classes and events this semester, instead of memorizing the times, I added them to Google Calendar. As an added bonus, I can check my calendar from my phone and get notifications ahead of every event.
Setting this up took me around 1-2 hours, and it’s saved me so much more in just a few weeks. The real reward, however, has been the peace of mind. My inbox has unread messages if and only if I have something to check. If I’m not sure about my schedule, I can check my calendar from my phone. If I forget an event, I’ll get a notification with 30 minutes of lead time, and an email notification with 10 minutes of lead time. In the future, I can add HW due dates and job interview times. These are all simple, obvious ideas, and I’m incredibly disappointed I’m behind the curve.
Back when I was applying to colleges, I based one essay around planning for only the short term. The main argument was that no plan survives contact with the real world, so it was better to plan at most a few weeks ahead, and make a new plan when necessary.
Sure, parts of that are true. Uncertainty in the real world is the big reason agile development is a thing. But my god, I was so, so stupid. It’s a cute saying, to say that it’s pointless to plan when reality won’t match your plan, but that means your plan was bad in the first place. It still makes sense to talk about where a company is going in 3 months, or the goals a long-term project should achieve. “No plan survives the real world” doesn’t mean “Don’t plan at all”, it means “Plan in broad strokes, come up with plausible ways to achieve your goals, expect your methods or goals to change along the way, and make sure you keep slack for when things go wrong.”
I’m convinced I wrote that essay because I was too lazy to organize my life, and needed a way to justify it. But there’s no justification for bad organization. I cannot think of a single scenario where messiness is better, because organization is a meta skill that improves literally everything. It means never disappointing someone by forgetting about an appointment. It means knowing where to look when you need something, whether you need a textbook or a bag of flour. It means having a system for note taking that is conducive to learning the material, both while taking the course and when you need to review it two years later. Although there are obviously other factors, the organized and well-regimented are the ones who are going to make the biggest impact in the world.
And I’m not one of them.
I need to start optimizing my process.
-
Perfectly Intelligent Mafia
This is an experiment I set for myself to see how quickly I could summarize the results of an interesting paper I had read. I had the advantage of knowing the paper was very approachable. Still, I need to clarify that this was written to optimize writing time. Not clarity, not length, not style, and certainly not correctness (although it looks correct when I look at it.)
Before writing this entry, I had read the paper and written a few Facebook posts about the strategy, but nothing formal. This entry, including all code, was written in 3 hours, 40 minutes.
Recently, I found a paper by Braverman, Etasami and Mossel (open access here) about strategies in the game of Mafia. As with many other games, assuming perfect play turns it into a math problem, and acting out the strategy in real life ruins a lot of the fun, but at least the math is interesting.
First, let’s lay down some ground rules for Mafia, since there are many house variations.
- There are \(N\) players, where \(M\) players are Mafia. Mafia members
know who each other are. Town members do not know who each other are.
- Some town members are power roles. Common power roles are the cop (can investigate people) and the doctor (can save people from the mafia), but the paper only analyzes the game with the cop.
- The game starts in day, and alternates between day and night.
- Each day, all players have a discussion, then choose someone to lynch by a plurality vote. These votes are public. A strict majority is required to lynch someone.
- Each night, mafia chooses someone to kill. The cop can choose to investigate a player, and the moderator will tell the cop whether he or she is town or mafia. No one else knows who the cop investigated.
- Whenever someone dies in day or night, the moderator announces whether they were vanilla town, cop, or mafia.
- Town wins if all mafia members get lynched. Mafia wins if mafia are a majority of the town (since at that point, mafia control the lynch and can kill whomever they want in the day.)
For the perfectly intelligent version of Mafia, we also assume the following.
- Players get a fixed amount of computation time to choose who to lynch and who to kill. This ensures the game eventually ends.
- Players are allowed to call for an informal secret vote. Every player makes
a vote in secret. The moderator announces how many votes each player got.
This does not form a binding contract - the lynching is still done through
the open vote, and people can vote differently in the final vote.
- Presumably all players can hand a note to the trusted moderator, or a cryptographic voting scheme is set up beforehand.
- Players can send hidden messages to other players. The only public information
is that a player has sent or received a message from another player, and the message
contents remain hidden.
- This is done through public key crypto, to allow sending messages that only the intended recipient can decrypt.
- It is assumed the computation time to choose who to lynch and kill is not long enough to break the public key.
The paper considers two different games. In the first game, there are no power roles. In the second game, there is exactly one cop. In both cases, we are interested in how many mafia members there need to be for town to win 50% of the time.
No Power Roles
In real life mafia, the winner comes down to whether mafia can act convincingly like a townie, and whether town can determine who is lying.
In perfectly intelligent mafia, a mafia member leaks no information and acts exactly like how a townie would act. So, it is impossible for town to gain any information on who is mafia and who isn’t. Thus, the best town can do is lynch randomly. Since no town members are special, the best mafia can do is kill people randomly. (In perfectly intelligent mafia, there is no leader of discussion, and there is no person who avoids talking, so no one is more special than any one else. It’s all very egalitarian.)
This is a disappointingly boring optimal strategy, but it still needs to be implementable despite malicious parties. This can be done as follows.
- Assign each player a number from \(0\) to \(R-1\), where \(R\) is the number of players still alive.
- Every player generates a random number from \(0\) to \(R-1\).
- Every player announces their number simultaneously. Town unanimously lynches the sum of numbers mod \(R\).
This generates a random number as long as one townie is still alive. Even if everyone else is mafia, no matter what numbers they choose, the resulting sum will be uniformly distributed. The only people who have incentives to not match the town vote are mafia, and anyone who defects is immediately slated for execution the next day. (Defection also only changes the final vote if a majority of players are mafia, at which point the game is over anyways.)
Under this scheme, town wins half the time when the number of mafia is order \(\sqrt{N}\). There is a decent amount of math to prove this is correct, but we can easily write some simulations to see if we can trust their result. Here is a sample script to do this in Python.
When \(N = 1000\), the 50/50 win chance is at 11 mafia. When \(N = 4000\), the 50/50 win chance is at 21-22 mafia. When \(N = 9000\), the 50/50 win chance is at 32-34 mafia. This falls in line with growing with \(\sqrt{N}\). (If you run the code for yourself, you may want to increase the threshold, since the variability is more than 0.5%)
One Cop To Rule Them All
It turns out the addition of a single cop radically changes the number of mafia needed to make the game fair. In the one-cop game, the number of mafia needs to be proportional to \(N\) instead of \(\sqrt{N}\)
This may be surprising, but remember that a perfectly intelligent player can act exactly like a vanilla townie. The cop can hide perfectly, and means the mafia’s best strategy is still to lynch randomly. Because the odds of hitting the cop are so small, the cop has several rounds to collect intel. More interestingly, the power of the cop is not in finding mafia members. It is in verifying players are town.
The strategy from the paper is not guaranteed to be optimal, but it is optimal asymptotically - there may be different constant factors with optimizations. The strategy is split into three phases.
Phase 1
Assume there are \(\eta N\) mafia, where \(0 < \eta < 1\). For \(\sqrt{\eta}N\) rounds, the cop randomly investigates, town randomly lynches, and mafia randomly kills. If the cop is killed in phase 1, town immediately forfeits.
Phase 2
The cop sends a hidden message to every investigated player that turned up town, containing an ordered list of all investigation results. In the note passing implementation of this, the cop also sends a note to every other player, but that note is blank, to disguise who the investigated players were. In the public key crypto implementation, this is not necessary because all messages are sent in public (but only investigated town can decrypt them.)
The cop then immediately asks to be lynched. This seems unintuitive; why would the cop want to die? However, once again recall that players are perfectly intelligent. On this round, another player could claim to be cop, send the same messages, and it would be impossible to tell who is genuine. The key is that when a player dies, their role and alignment is announced. So, the announcement that town has lynched the cop is the authentication for the list sent. If multiple people claim cop, they are lynched in order of the claim, since only mafia has motivation to claim cop.
Phase 3
At this phase, there exists a cabal of townies who knows all members of the cabal, and knows all those members are town. Additionally, there are more cabal members than mafia, since \(\sqrt{\eta}N > \eta N\) for \(\eta < 1\).
On each day, players call for a secret vote. The leader of the cabal is defined as the townie highest up the investigation list. (If the leader dies, the second highest townie is the new leader.) One member of the cabal chooses a person \(p\) that is not in the cabal, and sends that message to every other cabal member. Everyone in the cabal votes \(p\) in the secret vote, and all town members abstain. In the open vote, every player votes for the winner of the secret vote. Since there are more cabal than mafia, it is impossible for mafia to hijack the secret vote, and thus every lynch avoids the cabal, improving the successful lynch probability massively.
Overall, the game ends when either
- Cabal size \(\le\) mafia size, in which case town forfeits immediately because mafia can hijack the secret vote.
- Every citizen outside the cabal dies, in which case town wins - there are more cabal members than mafia members, and every lynch made by the cabal hits mafia.
- All mafia members die, in which case town wins.
For \(\eta < 1/49\), this strategy works around half the time. Again, there is quite a bit of math to show the cop lives and the cabal stays large enough, but we can sidestep all of that by doing more simulations. Again, here is a Python implementation.
This code is much more complicated, because we need to account for the detective investigating no person twice and the mafia killing a cabal member before the detective reveals themselves. The output in one run ended up being
low = 0.000000, high = 0.990000, mid = 0.495000, win chance = 0.000000 low = 0.000000, high = 0.495000, mid = 0.247500, win chance = 0.000000 low = 0.000000, high = 0.247500, mid = 0.123750, win chance = 0.000000 low = 0.000000, high = 0.123750, mid = 0.061875, win chance = 0.024000 low = 0.000000, high = 0.061875, mid = 0.030937, win chance = 0.253700 low = 0.000000, high = 0.030937, mid = 0.015469, win chance = 0.587000 low = 0.015469, high = 0.030937, mid = 0.023203, win chance = 0.398300 low = 0.015469, high = 0.023203, mid = 0.019336, win chance = 0.497000
giving \(\eta \approx 0.019336\), which is below the \(1/49 = 0.0204\) claimed.
Remarks
This strategy can be easily improved. In the implementation above, I did not let the cabal lynch mafia members investigated by the detective. I also did not let the game continue when the detective dies or when the cabal becomes too small, even though it is strictly better for town to keep playing because there is still a small chance of winning.
All this reasoning relies on the existence of secret message passing. If that doesn’t sit well with you, the cop can also choose to not reveal themselves until a majority of alive players are verified town. At this point, the cop reveals themselves and gets lynched. In this case, the cabal is public knowledge. Each day, the cabal lynches a random person not in the cabal, and mafia kills a person in the cabal. By construction, the cabal will always be a majority of living players, so eventually all mafia members will be killed. This is proved to work for a smaller \(\eta\), but is still only fair if the number of mafia is proportional to \(N\).
Potential Followups
The obvious place to go from here is to analyze the doctor, or analyze a multi-cop game. Some of this is done in the original paper, and I recommend reading it if you are interested in these results. There are also plenty of other roles to use. Vigilantes, roleblockers, serial killers…if you try, you might find something interesting falls out.
Practically Perfect In Every Way…
The most important part of this paper is not learning how to play Mafia given perfect intelligence.
it’s the existence of the paper itself. Much like the paper showing Nintendo games are NP-hard, it shows that if you look hard enough, you can find interesting problems everywhere.
- There are \(N\) players, where \(M\) players are Mafia. Mafia members
know who each other are. Town members do not know who each other are.