Posts

Aug 25, 2015
Perfectly Intelligent Mafia
This is an experiment I set for myself to see how quickly I could summarize the results of an interesting paper I had read. I had the advantage of knowing the paper was very approachable. Still, I need to clarify that this was written to optimize writing time. Not clarity, not length, not style, and certainly not correctness (although it looks correct when I look at it.)

Before writing this entry, I had read the paper and written a few Facebook posts about the strategy, but nothing formal. This entry, including all code, was written in 3 hours, 40 minutes.

Recently, I found a paper by Braverman, Etasami and Mossel (open access here) about strategies in the game of Mafia. As with many other games, assuming perfect play turns it into a math problem, and acting out the strategy in real life ruins a lot of the fun, but at least the math is interesting.

First, let’s lay down some ground rules for Mafia, since there are many house variations.
- There are \(N\) players, where \(M\) players are Mafia. Mafia members know who each other are. Town members do not know who each other are.
  - Some town members are power roles. Common power roles are the cop (can investigate people) and the doctor (can save people from the mafia), but the paper only analyzes the game with the cop.
- The game starts in day, and alternates between day and night.
  - Each day, all players have a discussion, then choose someone to lynch by a plurality vote. These votes are public. A strict majority is required to lynch someone.
  - Each night, mafia chooses someone to kill. The cop can choose to investigate a player, and the moderator will tell the cop whether he or she is town or mafia. No one else knows who the cop investigated.
- Whenever someone dies in day or night, the moderator announces whether they were vanilla town, cop, or mafia.
- Town wins if all mafia members get lynched. Mafia wins if mafia are a majority of the town (since at that point, mafia control the lynch and can kill whomever they want in the day.)
For the perfectly intelligent version of Mafia, we also assume the following.
- Players get a fixed amount of computation time to choose who to lynch and who to kill. This ensures the game eventually ends.
- Players are allowed to call for an informal secret vote. Every player makes a vote in secret. The moderator announces how many votes each player got. This does not form a binding contract - the lynching is still done through the open vote, and people can vote differently in the final vote.
  - Presumably all players can hand a note to the trusted moderator, or a cryptographic voting scheme is set up beforehand.
- Players can send hidden messages to other players. The only public information is that a player has sent or received a message from another player, and the message contents remain hidden.
  - This is done through public key crypto, to allow sending messages that only the intended recipient can decrypt.
  - It is assumed the computation time to choose who to lynch and kill is not long enough to break the public key.
The paper considers two different games. In the first game, there are no power roles. In the second game, there is exactly one cop. In both cases, we are interested in how many mafia members there need to be for town to win 50% of the time.

No Power Roles

In real life mafia, the winner comes down to whether mafia can act convincingly like a townie, and whether town can determine who is lying.

In perfectly intelligent mafia, a mafia member leaks no information and acts exactly like how a townie would act. So, it is impossible for town to gain any information on who is mafia and who isn’t. Thus, the best town can do is lynch randomly. Since no town members are special, the best mafia can do is kill people randomly. (In perfectly intelligent mafia, there is no leader of discussion, and there is no person who avoids talking, so no one is more special than any one else. It’s all very egalitarian.)

This is a disappointingly boring optimal strategy, but it still needs to be implementable despite malicious parties. This can be done as follows.
1. Assign each player a number from \(0\) to \(R-1\), where \(R\) is the number of players still alive.
2. Every player generates a random number from \(0\) to \(R-1\).
3. Every player announces their number simultaneously. Town unanimously lynches the sum of numbers mod \(R\).
This generates a random number as long as one townie is still alive. Even if everyone else is mafia, no matter what numbers they choose, the resulting sum will be uniformly distributed. The only people who have incentives to not match the town vote are mafia, and anyone who defects is immediately slated for execution the next day. (Defection also only changes the final vote if a majority of players are mafia, at which point the game is over anyways.)

Under this scheme, town wins half the time when the number of mafia is order \(\sqrt{N}\). There is a decent amount of math to prove this is correct, but we can easily write some simulations to see if we can trust their result. Here is a sample script to do this in Python.
```
import random

Ns = [1000, 4000, 9000]

def simulate_game(n, m):
    # Simulate n players with m mafia, return True on win
    while n > 2 * m:
        # Lynch random player
        if random.randint(0, n-1) < m:
            # Mafia lynched
            m -= 1
        if m == 0:
            return True
        # Mafia lynches random player as well. Two total deaths
        n -= 2

    return False


def get_win_odds(n, m):
    TRIALS = 10000
    return sum(simulate_game(n,m) for _ in xrange(TRIALS)) / float(TRIALS)


def find_5050(n):
    threshold = 0.005
    # Binary search to find m since probability of winning strictly increases with m
    low = 1
    high = n
    while True:
        mid = (low + high) // 2
        per = get_win_odds(n, mid)
        print 'low = %d, high = %d, mid = %d, win chance = %f' % (low, high, mid, per)
        if per < 0.5 - threshold:
            high = mid
        elif per > 0.5 + threshold:
            low = mid
        else:
            return mid


for N in Ns:
    print 'N = %d' % N
    print find_5050(N)
```
When \(N = 1000\), the 50/50 win chance is at 11 mafia. When \(N = 4000\), the 50/50 win chance is at 21-22 mafia. When \(N = 9000\), the 50/50 win chance is at 32-34 mafia. This falls in line with growing with \(\sqrt{N}\). (If you run the code for yourself, you may want to increase the threshold, since the variability is more than 0.5%)

One Cop To Rule Them All

It turns out the addition of a single cop radically changes the number of mafia needed to make the game fair. In the one-cop game, the number of mafia needs to be proportional to \(N\) instead of \(\sqrt{N}\)

This may be surprising, but remember that a perfectly intelligent player can act exactly like a vanilla townie. The cop can hide perfectly, and means the mafia’s best strategy is still to lynch randomly. Because the odds of hitting the cop are so small, the cop has several rounds to collect intel. More interestingly, the power of the cop is not in finding mafia members. It is in verifying players are town.

The strategy from the paper is not guaranteed to be optimal, but it is optimal asymptotically - there may be different constant factors with optimizations. The strategy is split into three phases.

Phase 1

Assume there are \(\eta N\) mafia, where \(0 < \eta < 1\). For \(\sqrt{\eta}N\) rounds, the cop randomly investigates, town randomly lynches, and mafia randomly kills. If the cop is killed in phase 1, town immediately forfeits.

Phase 2

The cop sends a hidden message to every investigated player that turned up town, containing an ordered list of all investigation results. In the note passing implementation of this, the cop also sends a note to every other player, but that note is blank, to disguise who the investigated players were. In the public key crypto implementation, this is not necessary because all messages are sent in public (but only investigated town can decrypt them.)

The cop then immediately asks to be lynched. This seems unintuitive; why would the cop want to die? However, once again recall that players are perfectly intelligent. On this round, another player could claim to be cop, send the same messages, and it would be impossible to tell who is genuine. The key is that when a player dies, their role and alignment is announced. So, the announcement that town has lynched the cop is the authentication for the list sent. If multiple people claim cop, they are lynched in order of the claim, since only mafia has motivation to claim cop.

Phase 3

At this phase, there exists a cabal of townies who knows all members of the cabal, and knows all those members are town. Additionally, there are more cabal members than mafia, since \(\sqrt{\eta}N > \eta N\) for \(\eta < 1\).

On each day, players call for a secret vote. The leader of the cabal is defined as the townie highest up the investigation list. (If the leader dies, the second highest townie is the new leader.) One member of the cabal chooses a person \(p\) that is not in the cabal, and sends that message to every other cabal member. Everyone in the cabal votes \(p\) in the secret vote, and all town members abstain. In the open vote, every player votes for the winner of the secret vote. Since there are more cabal than mafia, it is impossible for mafia to hijack the secret vote, and thus every lynch avoids the cabal, improving the successful lynch probability massively.

Overall, the game ends when either
- Cabal size \(\le\) mafia size, in which case town forfeits immediately because mafia can hijack the secret vote.
- Every citizen outside the cabal dies, in which case town wins - there are more cabal members than mafia members, and every lynch made by the cabal hits mafia.
- All mafia members die, in which case town wins.
For \(\eta < 1/49\), this strategy works around half the time. Again, there is quite a bit of math to show the cop lives and the cabal stays large enough, but we can sidestep all of that by doing more simulations. Again, here is a Python implementation.
```
import random

N = 1000

def simulate_game(n, eta):
    # Simulate n players with eta * n mafia, return True on win
    m = int(round(eta * n))
    mafia_left = m
    t = int(round(eta ** 0.5 * n))
    players = set(xrange(n))
    town = set(xrange(m, n))
    mafia = set(xrange(m))
    investigated = set()
    investigated.add(n-1) # Hack to avoid investigating self

    # Phase 1
    for _ in xrange(t):
        if len(mafia) >= len(town):
            return False
        # Lynch random player
        ra = random.choice(tuple(players))
        if ra == n-1:
            # Detective lynched
            return False
        elif ra < m:
            # Mafia lynched
            mafia.remove(ra)
        else:
            town.remove(ra)
        players.remove(ra)
        # Detective investigation
        investigated.add(
            random.choice(tuple(players - investigated))
        )
        # Mafia kill. Must happen after detective investigates
        # since detective can investigated player that gets killed same night.
        ki = random.choice(tuple(town))
        if ki == n-1:
            # Detective killed
            return False
        town.remove(ki)
        players.remove(ki)

    # Phase 2
    investigated.remove(n-1) # Undo investigation hack
    cabal = investigated - mafia
    town.remove(n-1)

    # Phase 3
    while len(cabal) > len(mafia):
        if len(mafia) == 0:
            return True
        if len(town) == 0:
            return True  # Caught by case above but this speeds up the endgame.
        if len(mafia) >= len(town):
            return False
        # Cabal lynch
        lyn = random.choice(tuple(town | mafia - cabal))
        if lyn < m:
            mafia.remove(lyn)
        else:
            town.remove(lyn)
        # Mafia kill
        ki = random.choice(tuple(town))
        town.remove(ki)
        if ki in cabal:
            cabal.remove(ki)
    # Cabal too small while mafia live, failed.
    return False


def get_win_odds(n, eta):
    TRIALS = 10000
    return sum(simulate_game(n,eta) for _ in xrange(TRIALS)) / float(TRIALS)


def find_5050(n):
    threshold = 0.005
    # Binary search to find m since probability of winning strictly increases with m
    low = 0
    high = 0.99
    while True:
        mid = (low + high) / 2.0
        per = get_win_odds(n, mid)
        print 'low = %f, high = %f, mid = %f, win chance = %f' % (low, high, mid, per)
        if per < 0.5 - threshold:
            high = mid
        elif per > 0.5 + threshold:
            low = mid
        else:
            return mid

print find_5050(N)
```
This code is much more complicated, because we need to account for the detective investigating no person twice and the mafia killing a cabal member before the detective reveals themselves. The output in one run ended up being
```
low = 0.000000, high = 0.990000, mid = 0.495000, win chance = 0.000000
low = 0.000000, high = 0.495000, mid = 0.247500, win chance = 0.000000
low = 0.000000, high = 0.247500, mid = 0.123750, win chance = 0.000000
low = 0.000000, high = 0.123750, mid = 0.061875, win chance = 0.024000
low = 0.000000, high = 0.061875, mid = 0.030937, win chance = 0.253700
low = 0.000000, high = 0.030937, mid = 0.015469, win chance = 0.587000
low = 0.015469, high = 0.030937, mid = 0.023203, win chance = 0.398300
low = 0.015469, high = 0.023203, mid = 0.019336, win chance = 0.497000
```
giving \(\eta \approx 0.019336\), which is below the \(1/49 = 0.0204\) claimed.

Remarks

This strategy can be easily improved. In the implementation above, I did not let the cabal lynch mafia members investigated by the detective. I also did not let the game continue when the detective dies or when the cabal becomes too small, even though it is strictly better for town to keep playing because there is still a small chance of winning.

All this reasoning relies on the existence of secret message passing. If that doesn’t sit well with you, the cop can also choose to not reveal themselves until a majority of alive players are verified town. At this point, the cop reveals themselves and gets lynched. In this case, the cabal is public knowledge. Each day, the cabal lynches a random person not in the cabal, and mafia kills a person in the cabal. By construction, the cabal will always be a majority of living players, so eventually all mafia members will be killed. This is proved to work for a smaller \(\eta\), but is still only fair if the number of mafia is proportional to \(N\).

Potential Followups

The obvious place to go from here is to analyze the doctor, or analyze a multi-cop game. Some of this is done in the original paper, and I recommend reading it if you are interested in these results. There are also plenty of other roles to use. Vigilantes, roleblockers, serial killers…if you try, you might find something interesting falls out.

Practically Perfect In Every Way…

The most important part of this paper is not learning how to play Mafia given perfect intelligence.

it’s the existence of the paper itself. Much like the paper showing Nintendo games are NP-hard, it shows that if you look hard enough, you can find interesting problems everywhere.
Comments
Aug 23, 2015
Simulating a Biased Coin With a Fair One
Suppose you need to simulate an event that happens \(p = 1/4\) of the time, and all you have is a fair coin. Well, that’s easy enough. Flip the coin twice. Report success if you get 2 heads, and report failure otherwise.

Suppose \(p = 1/3\) instead. That may not be a power of 2, but there is still a simple solution. Flip the fair coin twice. Report success on HH, report failure on HT or TH, and try again on TT. Each iteration takes \(2\) coin flips, and there is a \(3/4\) probability of halting, giving \(8/3\) expected coin flips.

Now, suppose you need to simulate \(p = 1/10^{100}\). Well, nothing is especially wrong with the current scheme. We need \(\lceil\log_2(10^{100})\rceil = 333\) coin flips to get enough outcomes. Then, label \(1\) case as success, \(10^{100} - 1\) as failure, and the rest as retries. For analyzing runtime, these numbers are starting to get messy, but we can at least ballpark the expected number of flips. The probability of halting is \(> 1/2\), since by construction \(2^{332} < 10^{100} < 2^{333}\), meaning the number of retry cases is less than half of all cases. Thus, the expected number of flips is at most \(666\). Yes, that is…

(•_•) ( •_•)>⌐■-■ (⌐■_■)

a devilish upper bound, but things could be worse.

(I’m not sorry.)

But, what if you need to simulate \(p = 1 / 10^{10^{100}}\)? Or simulate \(p = 1/\pi\)? In the first case, the number of flips needed is enormous. In the second, the scheme fails completely - there is no way to simulate irrational probabilities.

Easy Optimizations

We’ll get back to the irrational case later.

Recall how we simulated \(p=1/4\). The scheme reports success only on HH. So, if the first flip lands tails, we can halt immediately, since whatever way the second coin lands, we will report failure anyways. The revised algorithm is
- Flip a coin. If tails, report failure.
- Flip a coin. Report success on heads and failure on tails.
This gives \(3/2\) expected flips instead of \(2\) flips.

As another example, consider \(p = 1/5\). First, plan to flip 3 coins. Divide the 8 outcomes into 1 success case, 4 failure cases, and 3 retry cases. These can be distributed as
- accept: HHH
- failing: THH, THT, TTH, TTT
- retry: HHT, HTH, HTT
If the first coin is T, we can stop immediately. If the first two coins are HT, we can retry immediately. The only case left is HH, for which we need to see the third flip before deciding what to do.

(Some of you may be getting flashbacks to prefix-free codes. If you haven’t seen those, don’t worry, they won’t show up in this post.)

With clever rearrangement, we can bundle outcomes of a given type under as few unique prefixes as possible. This gives some improvement for rational \(p\), but still does not let us simulate irrational \(p\). For this, we need to switch to a new framework.

A New Paradigm

I did not come up with this method - I discovered it from here. I wish I had come up with it, for reasons that will become clear.

Let \(p = 0.b_1b_2b_3b_4b_5\ldots\) be the binary expansion of \(p\). Proceed as follows.
- Flip a coin until it lands on heads.
- Let \(n\) be the number of coins flipped. Report success if \(b_n = 1\) and report failure otherwise
The probability the process stops after \(n\) flips is \(1/2^n\), so the probability of success is
\[P[success] = \sum_{n: b_n = 1}^\infty \frac{1}{2^n} = p\]
Regardless of \(p\), it takes \(2\) expected flips for the coin to land heads. Thus, any biased coin can be simulated in \(2\) expected flips. This beats out the other scheme, works for all \(p\) instead of only rational \(p\), and best of all you can compute bits \(b_i\) lazily, making this implementable in real life and not just in theory.

Slick, right? This idea may have been obvious to you, but it certainly wasn’t to me. After thinking about the problem more, I eventually recreated a potential chain of reasoning to reach the same result.

Bits and Pieces

(Starting now, I will use \(1\) interchangeably with heads and \(0\) interchangeably with tails.)

Consider the following algorithm.
- Construct a real number in \([0,1]\) by flipping an infinite number of coins, generating a random number \(0.b_1b_2b_3\ldots\), where \(b_i\) is the outcome of coin flip \(i\). Let this number be \(x\).
- Report success if \(x \le p\) and failure otherwise.
This algorithm is correct as long as the decimals generated follow a uniform distribution over \([0, 1]\). I won’t prove this, but for an appeal to intuition: any two bit strings of length \(k\) are generated with the same probability \(1/2^k\), and the numbers these represent are evenly distributed over the interval \([0, 1]\). As \(k \to\infty\) this approaches a uniform distribution.

Assuming this all sounds reasonable, the algorithm works! Only, there is the small problem of flipping \(\infty\)-ly many coins. However, similarly to the \(p = 1/4\) case, we can stop coin flipping as soon as as it is guaranteed the number will fall in or out of \([0, p]\).

When does this occur? For now, limit to the case where \(p\) has an infinite binary expansion. For the sake of an example, suppose \(p = 0.1010101\ldots\). There are 2 cases for the first flip.
1. The coin lands \(1\). Starting from \(0.1\), it is possible to fall inside or outside \([0,p]\), depending on how the next flips go. The algorithm must continue.
2. The coin lands \(0\). Starting from \(0.0\), it is impossible to fall outside \([0,p]\). Even if every coin lands \(1\), \(0.0111\ldots_2 = 0.1_2 < p\). (This is why \(p\) having an infinite binary expansion is important - it ensures there will be another \(1\) down the line.) The algorithm can halt and report success.
So, the algorithm halts unless the coin flip matches the \(1\) in \(p\)’s expansion. If it does not match, it succeeds. Consider what happens next. Starting from \(0.1\), there are 2 cases.
1. Next coin lands \(1\). Since \(0.11 > p\), we can immediately report failure.
2. Next coin lands \(0\). Similarly to before, the number may or may not end in \([0,p]\), so the algorithm must continue.
So, the algorithm halts unless the coin flip matches the \(0\) in \(p\)’s expansion. If it does not match, it fails. This gives the following algorithm.
- Flip a coin until the \(n\)th coin fails to match \(b_n\)
- Report success if \(b_n = 1\) and failure otherwise.
Note this is essentially the same algorithm as mentioned above! The only difference is the ending condition. Instead of halting on heads, the algorithm halts if the random bit does not match the “true” bit. Both happen \(1/2\) the time, so the two algorithms are equivalent.

(If you wish, you can extend this reasoning to \(p\) with finite binary expansions. Just make the expansion infinite by expanding the trailing \(1\) into \(0.1_2 = 0.0111\ldots_2\).)

Here’s a sample run of the algorithm told in pictures. The green region represents the possible values of \(x\) as bits are generated. Initially, any \(x\) is possible.

The first generated bit is \(0\), reducing the valid region to

This still overlaps \([0,p]\), so continue. The second bit is \(1\), giving

This still overlaps, so continue. The third bit is \(1\), giving

The feasible region for \(x\) no longer intersects \([0,p]\), so the algorithm reports failure.

CS-minded people may see similarities to binary search. Each bit chooses which half of the feasible region we move to, and the halving continues until the feasible region is a subset of \([0,p]\) or disjoint from \([0,p]\).

Proving Optimality

This scheme is very, very efficient. But, is it the best we can do? Is there an algorithm that does better than \(2\) expected flips for general \(p\)?

It turns out that no, \(2\) expected flips is optimal. More surprisingly, the proof is not too bad.

For a given \(p\), any algorithm can be represented by a computation tree. That tree encodes whether the algorithm succeeds, fails, or continues, based on the next bit and all previously generated bits.

Two sample computation trees for \(p = 3/4\).

With the convention that the root is level \(0\), children of the root are level \(1\), and so on, let the weight of a node be \(1/2^{\text{level}}\). Equivalently, the weight is the probability the algorithm reaches that node.

For a given algorithm and \(p\), the expected number of flips is the expected number of edges traversed in the algorithm’s computation tree. On a given run, the number of edges traversed is the number of vertices visited, ignoring the root. By linearity of expectation,
\[E[flips] = \sum_{v \in T, v \neq root} \text{weight}(v)\]
To be correct, an algorithm must end at a success node with probability \(p\). Thus, the sum of weights for success nodes must be \(p\). For \(p\) with infinitely long binary expansions, we must have an infinitely deep computation tree. If the tree had finite depth \(d\), any leaf node weight would be a multiple of \(2^{-d}\), and the sum of success node weights would have at most \(d\) decimal places.

Thus, the computation tree must be infinitely deep. To be infinitely deep, every level of the tree (except the root) must have at least 2 nodes. Thus, a lower bound on the expected number of flips is
\[E[flips] \ge \sum_{k=1}^\infty 2 \cdot \frac{1}{2^k} = 2\]
and as we have shown earlier, this lower bound is achievable. \(\blacksquare\)

(For \(p\) with finite binary expansions, you can do better than \(2\) expected flips. Proving the optimal bound for that is left as an exercise.)

In Conclusion…

Every coin in your wallet is now an arbitrarily biased bit generating machine that runs at proven-optimal efficiency. If you run into a bridge troll who demands you simulate several flips of a coin that lands heads \(1/\pi\) of the time, you’ll be ready. If you don’t, at least you know one more interesting thing.
Comments
Aug 20, 2015
Things I Did In My Khan Academy Internship
- Met some really cool people, both interns and full-timers.
- Ate freshly baked bread with self-churned butter. I approve of all company traditions that involve eating delicious things.
- Got to discuss Lockhart’s Lament, the zone of proximal development, and other metacognitive ideas.
- Participated in acroyoga.
- Saw fantastic Jurassic Park cosplays in celebration of Jurassic World. There’s a dinosaur costume in the office, it’s great.
- Discovered a depressing fact with Sal over lunch - an ice cream Snickers has fewer calories than a regular Snickers. So from a certain point of view, ice cream is healthier than Snickers.
- In my first week, improved quality of life in the code deploy process.
- Successfully nerdsniped three engineers for two hours with a puzzle from a puzzle hunt.
- Met a software developer who has played Hatoful Boyfriend, the pigeon dating simulator. Pigeons forever! Okosan is best bird!
- Built an automated backup system for Khan Academy’s mailing list data.
- Wrote some unit test infrastructure that made it easier to test behavior on Khan Academy endpoints.
- Found and fixed a cron job that had been silently broken for 9 months, which involved some of the best debugging I’ve ever done.
- Added ASCII art of an Arcanine to the codebase, although with less alliteration.
- Did some real, honest to God machine learning for the annual Healthy Hackathon. We computed similarities in a sparse user-video matrix. Yes, we were trying to be Netflix. No, we did not replicate Netflix in a week. Yes, it was still really cool.
- Drank far, far too much tea. It’s not my fault the office stocks unsweetened tea, and it’s also not my fault the office is in walking distance of several milk tea places.
- Speaking of which, got into a nuanced discussion about milk tea linguistics. By my understanding, “pearl milk tea” is the standard around Palo Alto, “boba” is the standard almost everywhere else, and “bubble tea” is used interchangeably with “boba” on the East Coast.
- Also got into a discussion about cloning ethics with other interns in a Chinese restaurant. We then realized we were a real life version of the dining philosophers problem. Luckily, we did not deadlock because we had sufficiently many chopsticks.
- Played D&D for the first time. I shot an arrow right into somebody’s face, it was wicked. Somewhat less wickedly, a snake almost Total Party Killed us.
- Taught two interns how to play Dominion, and totally kicked their butts while doing it.
- In return, got my butt totally kicked in Medici, Ticket to Ride, Settlers of Catan, and Tzol’kin.
- Actually won a few games of Coup!
- …after losing many more games of Coup.
- Also, gained a new appreciation for the Coup metagame. I got to pretend to be a Captain that was pretending to be an Ambassador! And I almost got away with it too.
- For the talent show, failed to solve a Rubik’s Cube blindfolded. I still won a prize: a heat pack in the shape of a heart.
- Had a really, really good time.
Comments