Posts

  • Lessons Learned After Six Months of Blogging

    Just over six months ago, I rebooted my personal site and started blogging. So far, I’ve been pleasantly surprised at the response. It’s encouraging to know people like reading my own brand of stupidity.

    As celebration, I decided to reflect on the lessons I’ve learned. The part of me that loves all things meta wouldn’t forgive myself if I skipped a chance to write a blog post about my blog.

    ***

    1. Just Write. It’ll Get Better

    Of my four most recent posts, not counting this one, three are among my favorites. That’s not a coincidence.

    If you’re worried you can’t write well, write anyways. Everybody cringes when they read their past writing, me included. It’s all part of the process.

    The ceramics teacher announced on opening day that he was dividing the class into two groups. All those on the left side of the studio, he said, would be graded solely on the quantity of work they produced, all those on the right solely on its quality. His procedure was simple: on the final day of class he would bring in his bathroom scales and weigh the work of the “quantity” group: fifty pounds of pots rated an “A”, forty pounds a “B”, and so on. Those being graded on “quality”, however, needed to produce only one pot — albeit a perfect one — to get an “A”. Well, came grading time and a curious fact emerged: the works of highest quality were all produced by the group being graded for quantity. It seems that while the “quantity” group was busily churning out piles of work-and learning from their mistakes — the “quality” group had sat theorizing about perfection, and in the end had little more to show for their efforts than grandiose theories and a pile of dead clay.”

    (David Bayles, Art & Fear)

    Or said more succinctly,

    Sucking at something is the first step to becoming sorta good at something.

    (Jake The Dog, Adventure Time)

    2. Your Writing Environment Matters. Take The Time To Find One That Works For You

    My old site ran on Blogger. It worked, but I quickly lost motivation to update it. Blogger abstracts its functionality behind lots of tools, and I got sick of it. To create posts, I had to open Blogger’s site and write in their Microsoft Word-esque editor. To change the site layout, I could either use a drag and drop interface, or edit raw HTML. Plus, the Google+ tie-in was vaguely annoying.

    In contrast, Github Pages has been a complete joy. It’s transparent, I have total control over my site’s design, and best of all I get free edit history thanks to Git. I know it’s incredibly nerdy to say this, but getting to write my blog posts in Markdown using Vim is fantastic.

    So far, my blog uses images, audio clips, code highlighting, Youtube embeds, spoiler buttons, typeset math (from KaTeX), comments (from Disqus), and analytics (from Google Analytics). Not only does it all work, I know how it all works, which is a huge plus.

    If you’re familiar with Git, it’s surprising how natural blogging in it feels. I can switch between different drafts on a whim, and can move from coding to blogging with ease if I need to take a break from debugging. The only issue I’ve had so far is a conflict between KaTeX and the Markdown rendering engine, but even that wasn’t so bad. After figuring out the problem, I added a hotfix the same day.

    If you don’t like your writing environment, you aren’t going to write. There are lots of obstacles between having an idea and writing about it. Your environment shouldn’t be one of them.

    3. It’ll Take Longer Than You Think. No, Longer Than That

    I never planned to have an update schedule. As a student, I knew my work loads could shoot up in an instant, and I didn’t want to feel obligated to write something when I didn’t have the time to do it properly.

    What I didn’t expect was how long writing would take. Without fail, every post I write takes longer to finish than I think it will. This is even after I try accounting for the planning fallacy. As said by Douglas Hofstadter,

    Hofstadter’s Law: It always takes longer than you expect, even when you take into account Hofstadter’s Law.

    (from Gödel, Escher, Bach)

    Proving Hofstadter’s Law wouldn’t be so bad if I didn’t do most of my writing at midnight. To show why it’s awful, here are the commits for my first technical post.

    Commits for coins branch

    Pictured: my slow descent into madness

    • Initial work saved Thursday, 2 AM
    • Completed a first draft Friday night, working from 9 PM to midnight.
    • Started revising Saturday night. Started at 11 PM, hoped to finish by 2 AM. Still working at 3:40 AM.
    • Final launch at Sunday 6 PM. Attempted to fix my sleep schedule for Monday, failed miserably.

    I have literally lost sleep to blogging. It’s silly, it’s stupid, and I don’t see it changing anytime soon. In fact, I’m doing some late night writing right now. Check the commits for this post. You should find a commit at 1 AM, another at 2 AM, a few 3 AM commits mixed in…

    Commits for half-year branch

    Pictured: my descent into madness, complete

    It’s okay to get frustrated at your writing speed. Everyone wishes they could write it well the first time, but no one does. Try not to let it bother you, and keep going. I find that taking my rage out in commit messages helps.

    More things. All the things. man whose going to read these commits anyways (Sat Dec 19 03:36:46 2015)

    Time to in medias res this shit up (Sat Dec 26 16:50:37 2015)

    Too lazy to make good commit message (Sun Jan 3 18:28:00 2016)

    URGH JUST SHIP IT. (Sun Jan 10 11:50:03 2016)

    Incidentally, this is why you should consider a CFAR workshop. Reading about logical fallacies doesn’t actually stop you from making them. I haven’t done one because the cost is scary, but I’ve considered it.

    4. Don’t Worry About Your Ideas. Worry About Your Time

    In the past 6 months, I’ve written fourteen posts. Fifteen, if you count this one. That averages to just over one post every 2 weeks.

    I have blog-worthy ideas more often than once every 2 weeks. When I started this, I was worried I’d run out of things to say, but the opposite has turned true; I have too many ideas, and need to filter them aggressively.

    I consider myself a slow person in all aspects of life. I rarely solve a problem at first glance, unless I’ve seen a problem that’s very very similar. I almost never ask questions in lecture, because I’m usually so lost I can’t articulate what I’m confused about. And when it comes to writing, I have to phrase an idea once in the head and thrice on the page before I’m satisfied.

    Being slow isn’t bad, but it baffles me that web serials like Worm and blogs like Slate Star Codex can update over once a week. I certainly can’t do that. It requires a level of discipline I don’t have yet.

    I think people undervalue how impressive sticking to a schedule is. We’d all be a lot better off if we valorized organization more.

    5. Your Writing Style Is Your Speaking Style. Embrace It

    When I started this blog, I chose to keep it colloquial. It’s my blog, and it’s my voice. If I want to use contractions or swear a bit, I will, formal writing conventions be damned. If my posts drag on, that’s part of who I am, I won’t hide it.

    I don’t want to feel obligated to write a certain way, or present only certain parts about myself. I write things I’ll want to read later. If other people read them too, that’s cool.

    Once I started writing like how I’d talk if I had the time to phrase things just right, I started noticing patterns. I start sentences with “and” or “but” or “because”, since I like the abruptness of the conjunction. I say “A and B and C” instead of “A, B, and C”, because sometimes I don’t like the mental pauses from the commas.

    I use sudden paragraph breaks to emphasize specific sentences. I insert parallel structure to connect disparate ideas.

    Parts of this are just for good style, but style isn’t a list of conventions you have to follow. It’s how you say the things you want to say.

    When you write, you’re trying to say something. So actually say it. Don’t couch it in formal writing. If that means you add Internet memes everywhere, then so be it.

    There’s a time and place for formal writing. Blog posts are neither the time nor the place.

    6. Do Peer Review Whenever You Can

    I usually don’t ask friends to read my posts before I share them. I want to work on my revision skills, and it feels cheap to outsource it to other people.

    However, sometimes I decide I want more polish. I’ve asked for feedback on three posts: The Other Two Envelope Problem, “How Do You Feel About Grad School?”, and A Gentle Introduction to Secure Computation

    Every post improved by an order of magnitude.

    It’s hard to describe how much peer review helps. After reading so many drafts, your brain starts blurring the words together. You miss the forest for the trees, and getting another reader helps immensely with untangling this.

    Even the best writers send drafts to editors and friends. There’s no shame in it. What’s more surprising is how little I valued it before now. It took me until my secure computation post to begin appreciating it, and I think I still don’t value it enough.

    7. Effort Doesn’t Drive Quality, Experience Does

    This one surprised me. Consider two posts, one about Doctor Whooves Adventures, and the other about AlphaGo.

    The first was very difficult for me to write. I didn’t know why I liked Doctor Whooves Adventures, and even after that post I find it hard to explain why I hope the series isn’t dead. Over two weeks, I agonized on how to express my love for the show. I went back and forth between different excerpts and ideas, and eventually posted a final draft to a lukewarm response. Looking back, I can see why.

    In comparison, the post on AlphaGo was written in a single night. I wanted to strike while the iron was hot, since I knew interest in Go would be fleeting. Thanks to prior experience with the research areas involved, I got the gist of the paper in a single read, and thanks to thinking about these topics for over a year, I was able to quickly articulate my feelings on the result.

    On review, my strongest posts are almost all technical. It’s what I have the most experience writing. I’ve been writing proofs since high school, gave a math talk in 11th grade, and did two research presentations last semester. Of course my technical posts are going to be better than my non-technical posts, I’ve had years of practice for the former.

    That’s not saying all my non-technical writing is bad. My best post is still my non-technical think piece about grad school. What I’m saying is that if you’re going to write about something unfamiliar, don’t be surprised if it takes you ages to churn out something mediocre. The first draft of my graduate school post was awful, and it took a month of revising over winter break to get it to that state.

    8. Try Not To Get Hung Up On Popularity

    We all want other people to care about us, even when we give ourselves reasons why they shouldn’t. Writing is no exception. In fact, it’s an extra dangerous area. When you share a personal story or spend hours thinking about the easiest way to explain something, you can’t help but grow attached to your work. It’s like a little baby, and watching your baby get ignored is like a punch to the gut.

    It’s obvious, but it bears repeating: popular things are not always good, and good things are not always popular. I’m a bit obsessed with checking my blog views, more than is healthy, and I need to remember to divorce my feelings about a post from its view count.

    What people think about your work matters. But don’t let it rule everything about you. According to Google Analytics, my three most viewed posts are:

    The first stuck on Hacker News for a day, and the last got shared to the Mystery Hunt subreddit. Note that both reasons for the high view count are only tangentially related to the actual quality.

    If I cared only about view count, I wouldn’t bother writing about a My Little Pony made-for-TV movie, because I know most people won’t be interested. But I care about it, and that’s why I did it.

    (For the record, both my AlphaGo post and my Khan Academy post deserve more views than they have. You really shouldn’t read that MLP post unless you’re interested in me commentating a nerd-snipe.)

    ***

    We’ll see how I feel about these lessons in the future. If there’s one constant in the universe, it’s that everyone thinks their past self is an idiot.

    As for the next few months? I plan to start flushing my queue of non-technical post ideas. On the short list are

    • A post about Gunnerkrigg Court
    • A post on effort icebergs and criticism
    • A post about obsessive fandoms
    • A post detailing the hilariously incompetent history of the Dominion Online implementation

    I may also try writing some short stories. I haven’t done creative writing since high school, but after reading so much web fiction, I’m up for giving it a shot.

    At my current rate, four posts will take two months. With planning fallacy, maybe three. Knowing that I still overshoot my estimates…six months? Hopefully I can do more than that though.

    I’ll close with a quote, something as “me” as I can find.

    SP: Do you see now, Doctor? This land is divided, and there is no fixing it.

    Doctor: Oh, I do see…I see that this cannot stand. I see that things are wrong here. Things are very, very wrong, and I don’t like it very much. I see that there is pain here, and I don’t like that either. I see that you’ve grown complacent. I think everyone has. There is nothing worse for life than complacency, and I shall not have it. I might not know many things about this land of myth and magic, but I know that Twilight is on the other side of this conflict and I know that this conflict is causing misery. I’m going to correct both those things.

    (Doctor Whooves Adventures, “Bells of Fate”)

    Keep going. Keep trying. Because whether it works out or not, it’s sadder to not try at all.

    Onwards and upwards!

    Comments
  • A Gentle Introduction to Secure Computation

    In Fall 2015, I took CS 276, a graduate level introduction to theoretical cryptography. For the final project, I gave a short presentation on secure computation.

    This blog post is adapted and expanded from my presentation notes. The formal proofs of security rely on a lot of background knowledge, but the core ideas are very clean, and I believe anyone interested in theoretical computer science can understand them.

    What is Secure Computation?

    In secure computation, each party has some private data. They would like to learn the answer to a question based off everyone’s data. In voting, this is the majority opinion. In an auction, this is the winning bid. However, they also want to maintain as much privacy as possible. Voters want to keep their votes secret, and bidders don’t want to publicize how much they’re willing to pay.

    More formally, there are \(n\) parties, the \(i\)th party has input \(x_i\), and everyone wants to know \(f(x_1, x_2, \cdots, x_n)\) for some function \(f\). The guiding question of secure computation is this: when can we securely compute functions on private inputs without leaking information about those inputs?

    This post will focus on the two party case, showing that every function is securely computable with the right computational assumptions.

    Problem Setting

    Following crypto tradition, Alice and Bob are two parties. Alice has private information \(x\), and Bob has private information \(y\). Function \(f(x,y)\) is a public function that both Alice and Bob want to know, but neither Alice nor Bob wants the other party to learn their input.

    problem setup

    The classical example is Yao’s Millionaire Problem. Alice and Bob want to know who has more money. It’s socially unacceptable to brag about your wealth, so neither wants to say how much money they have. Let \(x\) be Alice’s wealth, and \(y\) be Bob’s wealth. Securely computing

    \[f(x,y) = \begin{cases} \text{Alice} & \text{if } x > y \\ \text{Bob} & \text{if } x < y \\ \text{same} & \text{if } x = y \end{cases}\]

    lets Alice and Bob learn who is richer.

    Before going any further, I should explain what security guarantees we’re aiming for. We’re going to make something secure against semi-honest adversaries, also known as honest-but-curious. In the semi-honest model, Alice and Bob never lie, following the specified protocol exactly. However, after the protocol, Alice and Bob will analyze the messages received to try to extract information about the other person’s input.

    Assuming neither party lies is naive and doesn’t give a lot of security in real life. However, it’s still difficult to build a protocol that leaks no unnecessary information, even in this easier setting. Often, semi-honest protocols form a stepping stone towards the malicious case, where parties can lie. Explaining protocols that work for malicious adversaries is outside the scope of this post.

    Informal Definitions

    To prove a protocol is secure, we first need to define what security means. What makes secure computation hard is that Alice and Bob cannot trust each other. Every message can be analyzed for information later, so they have to be very careful in what they choose to send to one another.

    Let’s make the problem easier. Suppose there was a trusted third party named Faith. With Faith, Alice and Bob could compute \(f(x,y)\) without directly communicating.

    • Alice sends \(x\) to Faith
    • Bob sends \(y\) to Faith
    • Faith computes \(f(x,y)\) and sends the result back to Alice and Bob

    ideal world

    This is called the ideal world, because it represents the best case scenario. All communication goes through Faith, who never reveals the input she received. She only replies with what the parties want to know, \(f(x,y)\).

    If this is the best case scenario, we should define security by how close we get to that best case. This gives the following.

    A protocol between Alice and Bob is secure if it is as secure as the ideal world protocol between Alice, Bob, and Faith.

    To make this more precise, we need to analyze the ideal world’s security.

    Suppose we securely computed \(f(x,y) = x+y\) in the ideal world. At the end of communication, Alice knows \(x\) and \(x+y\), while Bob knows \(y\) and \(x + y\). If Alice computes

    \[(x+y) - x\]

    she learns Bob’s input \(y\). Similarly, if Bob computes

    \[(x+y) - y\]

    he learns Alice’s input \(x\). Even the ideal world can leak information. In an extreme case like this, possibly even the entire input! If Alice and Bob want to know \(f(x,y) = x + y\), they have to give up any hope of privacy.

    This is an extreme example, but most functions leak something about each party. Going back to the millionaire’s problem, suppose Alice has \(10\) dollars and Bob has \(8\) dollars. They securely compute who has the most money, and learn Alice is richer. At this point,

    • Alice knows Bob has less than \(10\) dollars because she’s richer.
    • Bob knows Alice has more than \(8\) dollars because he’s poorer.

    However, there is still some privacy: neither knows the exact wealth of the other.

    Since \(f(x,y)\) must be given to Alice and Bob, the best we can do is learn nothing more than what was learnable from \(f(x,y)\). This gives the final definition.1

    A protocol between Alice and Bob is secure if Alice learns only information computable from \(x\) and \(f(x,y)\), and Bob learns only information computable from \(y\) and \(f(x,y)\)

    (Side note: if you’ve seen zero-knowledge proofs, you can alternatively think of a protocol as secure if it is simulatable from \(x, f(x,y)\) for Alice and \(y, f(x,y)\) for Bob. In fact, this is how you would formally prove security. If you haven’t seen zero-knowledge proofs, don’t worry, they won’t show up in later sections, and we’ll argue security informally.)

    Armed with a security definition, we can start constructing the protocol. Before doing so, we’ll need a few cryptographic assumptions.

    Cryptographic Primitives

    All of theoretical cryptography rests on assuming some problems are harder than other problems. These problems form the cryptographic primitives, which can be used for constructing protocols.

    For example, one cryptographic primitive is the one-way function. Abbreviated OWF, these functions are easy to compute, but hard to invert, where easy means “doable in polynomial time” and hard means “not doable in polynomial time.” Assuming OWFs exist, we can create secure encryption, pseudorandom number generators, and many other useful things.

    No one has proved one-way functions exist, because proving that would prove \(P \neq NP\), and that’s, well, a really hard problem, to put it mildly. However, there are several functions that people believe to be one-way, which can be used in practice.

    For secure computation, we need two primitives: oblivious transfer, and symmetric encryption.

    Oblivious Transfer

    Oblivious transfer (abbreviated OT) is a special case of secure computation. Alice has two messages \(m_0, m_1\). Bob has a bit \(b\). We treat oblivious transfer as a black box method where

    • Alice gives \(m_0, m_1\)
    • Bob gives bit \(b\), \(0\) or \(1\)
    • If \(b = 0\), Bob gets \(m_0\). Otherwise, he gets \(m_1\). In both cases, Bob does not learn the other message.
    • Alice does not learn which message Bob received. She only knows Bob got one of them.

    Oblivious transfer

    You can think of the OT box as a trusted third party, like Faith. Letting Alice send messages without knowing which message was received will be key to the secure computation protocol. I don’t want to get into the details of implementing OT with only Alice and Bob, since they aren’t that interesting. If you’re curious, Wikipedia has a good example based on RSA.

    Symmetric Encryption

    Symmetric encryption allows two people to send encrypted messages that cannot be decrypted without the secret key. Formally, a symmetric encryption scheme is a pair of functions, encrypter \(E\) and decrypter \(D\), such that

    • \(E_k(m)\) is the encryption of message \(m\) with key \(k\).
    • \(D_k(c)\) is the decryption of ciphertext \(c\) with key \(k\).
    • Decrypting with the same secret key gives the original message, or \(D_k(E_k(m)) = m\),
    • Given just ciphertext \(c\), it is hard to find a key \(k\) and message \(m\) such that \(E_k(m) = c\). (Where again, hard means not solvable in polynomial time.)

    The above is part of the standard definition for symmetric encryption. For the proof, we’ll need one more property that’s less conventional.

    • Decrypting \(E_k(m)\) with a different key \(k'\) results in an error.

    With all that out of the way, we can finally start talking about the protocol itself.

    Yao’s Garbled Circuits

    Secure computation was first formally studied by Andrew Yao in the early 1980s. His introductory paper demonstrated protocols for a few examples, but did not prove every function was securely computable. That didn’t happen until 1986, with the construction of Yao’s garbled circuits.

    To prove every function was securely computable, Yao proved every circuit was securely computable. Every function can be converted to an equivalent circuit, and we’ll see that working in the circuit model of computation makes the construction simpler.

    A Quick Circuit Primer

    For any function \(f(x,y)\), there is some circuit \(C\) such that \(C(x,y)\) gives the same output. That circuit \(C\) is made up of logic gates and wires connecting them. Each logic gate has two input wires and one output wire, and computes a simple Boolean function, like AND or OR.

    The wires in a circuit can be divided into three categories: inputs for the circuit, outputs for the circuit, and intermediate wires between gates.

    example circuit

    Garbling Circuits

    Here is a very obviously insecure protocol for computing \(f(x,y)\)

    • Alice sends circuit \(C\) to Bob.
    • Alice sends her input \(x\) to Bob.
    • Bob evaluates the circuit to get \(f(x,y)\)
    • Bob sends \(f(x,y)\) back to Alice.

    First protocol

    This works, but Alice has to send \(x\) to Bob.

    This is where garbling comes in. Instead of sending \(C\), Alice will send a garbled circuit \(G(C)\). Instead of sending \(x\), she’ll send \(x\) encoded in a certain way. It will be done such that Bob can evaluate \(G(C)\) with Alice’s encoded input, without learning what Alice’s original input \(x\) was, and without exposing any wire values except for the final output.2

    Number the wires of the circuit as \(w_1, w_2, \ldots, w_n\). For each wire \(w_i\), generate two random encryption keys \(k_{i,0}\) and \(k_{i,1}\). These will represent \(0\) and \(1\).

    A garbled circuit is made of garbled logic gates. Garbled gates act the same as regular logic gates, except they operate on the sampled encryption keys instead of bits.

    Suppose we want to garble an OR gate. It has input wires \(w_1,w_2\), and output wire \(w_3\).

    OR gate

    Internally, an OR gate is implemented with transistors. However, at our level of abstraction we treat the OR gate as a black box. The only thing we care about is that it follows this behavior.

    • Given \(w_1 = 0, w_2 = 0\), it sets \(w_3 = 0\).
    • Given \(w_1 = 0, w_2 = 1\), it sets \(w_3 = 1\).
    • Given \(w_1 = 1, w_2 = 0\), it sets \(w_3 = 1\).
    • Given \(w_1 = 1, w_2 = 1\), it sets \(w_3 = 1\).

    This is summarized in the function table below.

    \[\begin{array}{ccc} w_1 & w_2 & w_3 \\ 0 & 0 & 0 \\ 0 & 1 & 1 \\ 1 & 0 & 1 \\ 1 & 1 & 1 \end{array}\]

    A garbled OR gate goes one step further.

    Garbled OR gate

    It instead takes \(k_{1,0}, k_{1,1}\) for \(w_1\), and \(k_{2,0}, k_{2,1}\) for \(w_2\). Treating these keys as bits \(0\) or \(1\), the output is \(k_{3,0}\) or \(k_{3,1}\), depending on which bit the gate is supposed to return. You can think of it as replacing the OR gate with a black box that acts as follows.

    • Given \(w_1 = k_{1,0}, w_2 = k_{2,0}\), it sets \(w_3 = k_{3,0}\).
    • Given \(w_1 = k_{1,0}, w_2 = k_{2,1}\), it sets \(w_3 = k_{3,1}\).
    • Given \(w_1 = k_{1,1}, w_2 = k_{2,0}\), it sets \(w_3 = k_{3,1}\).
    • Given \(w_1 = k_{1,1}, w_2 = k_{2,1}\), it sets \(w_3 = k_{3,1}\).
    \[\begin{array}{ccc} w_1 & w_2 & w_3 \\ k_{1,0} & k_{2,0} & k_{3,0} \\ k_{1,0} & k_{2,1} & k_{3,1} \\ k_{1,1} & k_{2,0} & k_{3,1} \\ k_{1,1} & k_{2,1} & k_{3,1} \end{array}\]

    By replacing every logic gate with a garbled gate, we can evaluate the circuit using random encryption keys instead of bits.

    We implement this using symmetric encryption. For every possible pair of input keys, the correct output is encrypted using both those keys. Doing this for all possible inputs gives the garbled table,

    \[\begin{array}{c} E_{k_{1,0}}(E_{k_{2,0}}(k_{3,0})) \\ E_{k_{1,0}}(E_{k_{2,1}}(k_{3,1})) \\ E_{k_{1,1}}(E_{k_{2,0}}(k_{3,1})) \\ E_{k_{1,1}}(E_{k_{2,1}}(k_{3,1})) \end{array}\]

    With one secret key for each wire, Bob can get the correct output by attempting to decrypt all 4 values. Suppose Bob has \(k_{1,0}\) and \(k_{2,1}\). Bob first decrypts with \(k_{1,0}\). He’ll decrypt exactly two values, and will get an error on the other two.

    \[\begin{array}{c} E_{k_{2,0}}(k_{3,0}) \\ E_{k_{2,1}}(k_{3,1}) \\ \text{error} \\ \text{error} \end{array}\]

    Decrypting the two remaining messages with \(k_{2,1}\) gives

    \[\begin{array}{c} \text{error} \\ k_{3,1} \\ \text{error} \\ \text{error} \end{array}\]

    getting the key corresponding to output bit \(1\) for wire \(w_3\).

    (To be fully correct, we also shuffle the garbled table, to make sure the position of the decrypted message doesn’t leak anything.)

    Creating the garbled table for every logic gate in the circuit gives the garbled circuit. Informally, here’s why the garbled circuit can be evaluated securely.

    • For a given garbled gate, Bob cannot read any value in the garbled table unless he has a secret key for each input wire, because the values are encrypted.
    • Starting with the keys for the circuit input, Bob can evaluate the garbled gates in turn. Each garbled gate returns exactly one key, and that key represents exactly the correct output bit. When Bob gets to the output wires, he must get the same output as the original circuit.
    • Both \(k_{i,0}\) and \(k_{i,1}\) are generated randomly. Given just one of them, Bob has no way to tell whether the key he has represents \(0\) or \(1\). (Alice knows, but Alice isn’t the one evaluating the circuit.)

    Thus, from Bob’s perspective, he’s evaluating the circuit by passing gibberish as input, propagating gibberish through the garbled gates, and getting gibberish out. He’s still doing the computation - he just has no idea what bits he’s actually looking at.

    (The one minor detail left is key generation for the output wires of the circuit. Instead of generating random keys, Alice uses the raw bit \(0\) or \(1\), since Alice needs Bob to learn the circuit output.)

    The Last Step

    Here’s the new protocol, with garbled circuits

    • Alice garbles circuit \(C\) to get garbled circuit \(G(C)\)
    • Alice sends \(G(C)\) to Bob.
    • Alice sends the keys for her input \(x\) to Bob.
    • Bob combines them with the input keys for \(y\), and evaluates \(G(C)\) to get \(f(x,y)\)
    • Bob sends \(f(x,y)\) back to Alice.

    Second protocol

    There’s still a problem. How does Bob get the input keys for \(y\)? Only Alice knows the keys created for each wire. Bob could give Alice his input \(y\) to get the right keys, but then Alice would learn \(y\).

    One solution is to have Alice send both keys for each of \(y\)’s input wires to Bob. For each wire, Bob can then pick the key corresponding to the correct bit of \(y\). That way, Bob doesn’t have to give Alice \(y\), but can still run the circuit.

    However, this has a subtle bug: by giving Bob both keys for his input wires, Alice lets Bob run the circuit with an arbitrary \(y\). Letting Bob evaluate \(f(x,y)\) many times gives Bob more information. Going back to the millionaire’s problem, let \(x = 10\) and \(y = 8\). At the end, Bob learns Alice is richer, meaning \(x > 8\). But, if he later evaluates \(f(x, 9)\), he’ll learn \(x > 9\), which is more than he should know.

    Thus, to be secure, Bob should only get to run the circuit on exactly \(x,y\). To do so, he needs to get the keys for \(y\), and only the keys for \(y\), without Alice learning which keys he wants. If only there was a way for Alice to obliviously transfer those keys…

    Alright, yes, that’s why we need oblivious transfer. Using oblivious transfer to fill in the gap, we get the final protocol.

    • Alice garbles circuit \(C\) to get garbled circuit \(G(C)\)
    • Alice sends \(G(C)\) to Bob.
    • Alice sends the keys for her input \(x\) to Bob.
    • Using oblivious transfer, for each of Bob’s input wires, Alice sends \(k_{i,y_i}\) to Bob.
    • With all input keys, Bob can evaluate the circuit to get \(f(x,y)\)
    • Bob sends \(f(x,y)\) back to Alice.

    Final protocol

    This lets Alice and Bob compute \(f(x,y)\) without leaking their information, and without trusted third parties.

    Conclusion

    This protocol is mostly a theoretical exercise. However, there has been recent work to make it fast enough for practical use. It’s still tricky to use, because the desired function needs to be converted into an equivalent circuit.

    Despite this difficulty, Yao’s garbled circuits are still a very important foundational result. In a post-Snowden world, interest in cryptography is very high. There’s lots of use in designing protocols with decentralized trust, from Bitcoin to secure cloud storage. It’s all very interesting, and I’m sure something cool is just around the corner.

    Thanks to the many who gave comments on drafts of this post.

    Footnotes

    1. By “information computable”, we mean “information computable in polynomial time”. 

    2. There’s a very subtle point I’m glossing over here. When \(f(x,y)\) is converted to a circuit \(C\), the number of input wires is fixed. To create an equivalent circuit, Alice and Bob need to share their input lengths. For a long time, this was seen as inevitable, but it turns out it’s not. When I asked a Berkeley professor about this, he pointed me to a recent paper (2012) showing input length can be hidden. I haven’t read it yet, so for now assume this is fine. 

    Comments
  • Go: More Complicated Than Go Fish

    As something “hot off the presses” (hot off the keyboard?), this is a bit unpolished.

    Yesterday, Google DeepMind announced they had developed an AI called AlphaGo that reached expert level in Go, beating the European Go champion.

    It got picked up by, well, everybody. I don’t want to catalog every article, so here’s a sample: MIT Technology Review, Wired, Ars Technica, and The New York Times. Posts from friends and famous AI researchers have been filling my news feed, and it’s all along the lines of “Well done” or “Holy shit” or “I for one welcome our new robot overlords.”

    Although computer Go isn’t one of my main interests, Monte Carlo Tree Search (MCTS) is. The biggest application of MCTS is in computer Go, and when you spend two semesters researching MCTS, you end up reading a few computer Go papers along the way. So, I figure my perspective is at least slightly interesting.

    Prior Work

    My first reaction was that AlphaGo was very impressive, but not surprising.

    This is at odds with some of the reporting. The original blog post says that

    Experts predicted it would be at least another 10 years until a computer could beat one of the world’s elite group of Go professionals.

    The MIT Technology Review repeats this with

    Google’s AI Masters the Game of Go a Decade Earlier Than Expected

    Wired has an actual quote by said expert, Rémi Coulom.

    In early 2014, Coulom’s Go-playing program, Crazystone, challenged grandmaster Norimoto Yoda at a tournament in Japan. And it won. But the win came with caveat: the machine had a four-move head start, a significant advantage. At the time, Coulom predicted that it would be another 10 years before machines beat the best players without a head start.

    I entirely understand why it sounds like this result came out of nowhere, but it didn’t. Giant leaps are very rare in research. For problems like computer Go, it’s more like trying to punch through a thick wall. At first, the problem is unapproachable. Over the years, people make incremental progress, chipping away bit by bit, building prototype chisels, exploring blueprints for hypothetical pile drivers, and so on. Eventually, all the right ideas collect by the wall, some group puts the pieces together in the right way, and finally breaks down the wall.

    Then reporters only talks about the final breakthrough, and it sounds like complete magic to outsiders.

    This isn’t a dig at science reporting. You have to be steeped in a research field to see the signs of a breakthrough, and asking reporters to do that for every field is unreasonable. Even then, people close to breakthroughs are often wrong. I’m sure there are plenty of professors who were (and maybe still are) pessimistic on neural nets and optimistic on SAT solvers. It’s also often in the interest of authors to play up the strengths of their research for articles.

    Back to Go. I wasn’t surprised because DeepMind released a preprint of their previous Go paper on December 2014. (Incidentally, Clark and Storkey released a similar paper 10 days before them, with worse results. The day before Google announced AlphaGo, Facebook AI Research updated a preprint for their computer Go work. Funny how that goes.)

    Before this paper, the best approach was Monte Carlo Tree Search. In MCTS, the algorithm evaluates board positions by playing many random games, or rollouts. Using previous outcomes, it focuses the search on more promising moves. Progress in MCTS strategies came from tweaking the rollout policy. A stronger rollout player gives more accurate results, but because MCTS runs hundreds of thousands of rollouts, even small increases in computing cost get magnified.

    What made DeepMind’s first paper so impressive was that it achieved good performance with zero search. Pass in the board, get a move in a few milliseconds. Compared to the 5 second search time other programs used, this was radically different. It didn’t beat existing bots, but it didn’t have to.

    The approach was simple. Using supervised learning, they trained a convolutional neural net to predict the next move from a database of expert level games. With a big enough CNN, they got to 55.2% accuracy, which beat all prior move predictors.

    At the end of their 2014 paper, there’s a short section on combining their CNN with search. They didn’t get very far, besides showing it was an obviously good idea if it could be made computationally feasible. Ever since that paper, it’s felt like the last thing computer Go needed was a combination of heavy duty neural net move predictors with MCTS, done such that computation time didn’t explode.

    With AlphaGo, that’s finally happened. So no, these results aren’t that crazy. There was a clear roadmap of what needed to happen, and people figured it out.

    To be fair to Rémi Coulom, his quoted remarks came before DeepMind’s preprint was sent to the computer-go mailing list. I’m sure he’s been following this closely, and it’s an unfortunate citation without context.

    The New Stuff

    I know I’ve been a bit down on AlphaGo, but I don’t want that to fool anybody. AlphaGo is exceptionally impressive. If you have access, I highly recommend reading the Nature paper on your own. It’s very well written, and the key ideas are conveyed well.

    If you don’t have access, here’s the quick summary.

    • Using the supervised dataset of expert level games. they train two networks. One is a small rollout network, used for rollouts in the MCTS. The other is a large CNN, similar to the Dec 2014 paper. This is called the policy network.
    • The policy network is further refined through self-play and reinforcement learning. (The paper calls this the RL policy network.)
    • Finally, they train a value network, which predicts the value of a board if both players play according to the RL policy network.
      • The value network gives an approximation of rollout values with the larger network, which is too expensive to do during real life play. It’s an approximation, but it’s thousands of times faster.

    The first part that piqued my interest was how many networks were trained for AlphaGo. It makes sense: different tasks in MCTS are better suited to different network architectures, so you might as well use a different neural net for each one.

    However, I found this side remark much more interesting.

    Using no search at all, the RL policy network won 85% of games against Pachi, [the strongest open source Go program]. In comparison, the previous state-of-the-art, based only on supervised learning of convolutional networks, won 11% of games against Pachi and 12% against a slightly weaker program, Fuego.

    Before reading the paper, I thought all the improved performance came from just search. The reality is much more exciting, because it suggests there’s still a gap we’re failing to acheive for CNN-based agents. Furthermore, we can bridge this gap by bootstrapping from the supervised data to unsupervised self-play data.

    Let me repeat that: this paper shows that when labelled data is lacking something about the problem, we can still learn from unlabeled data.

    This is huge. This is the kind of thing that makes people want to do research.

    The one dream of anybody in ML is unsupervised learning. It’s very hard, but it’s so much cooler when it works, and the potential of unsupervised learning feels unbounded. Supervised learning will always be held back in part by whether there’s a large enough labeled dataset to capture all the information for a problem. Labels need to be provided by humans, so constructing large datasets takes a lot of manpower.

    By contrast, unsupervised learning is held back only by the sheer difficulty of learning when you don’t know what the answer is supposed to be. This is an idea problem, not a dataset problem. It’s easy to get more unsupervised data; just look around the world more!

    The AlphaGo paper doesn’t do unsupervised learning, but it’s a big step towards that direction, and using supervised learning to pre-train unsupervised learning sounds like a good idea anyways. If you look at the results table, the final policy network (the one before self-play refinement) got 55.4% move prediction accuracy, compared to 55.2% move accuracy from the December 2014 paper. The improved performance doesn’t come from a better supervised CNN, it all comes from self-play. Remember that the 85% win rate against Pachi doesn’t even use search. I’m sure plenty of people are investigating whether a similar approach works on problems besides Go.

    Compared to this, interfacing the neural net with search feels less important. There are some neat ideas here: train a special small network for rollouts, pre-seed initial value estimates with the heavy-duty net (which you can afford because you only need it once per node), and combine the approximate value from the policy network with the true value from the weaker rollout network. From an engineering perspective, they’re doing a crazy asynchronous CPU + GPU architecture that sounds like a nightmare to manage. There are plenty of ideas here, but to me the story is proving the capability of unsupervised learning.

    Final Caveats

    I’ll close with the few downsides I see on skimming the paper.

    • This still uses a lot of computation time. When they announced results for running AlphaGo on one machine, they mean a machine with 48 CPUs and 8 GPUs. The version used against the European Go champion Fan Hui runs on a distributed cluster of 1202 CPUs and 176 GPUs. Let’s just say this is out of the reach of the average enthusiast and move on.
    • The reported 5-0 record against Fan Hui was in formal games, one game per day. Each day, Fan also played an informal game, with shorter time controls. In those games, AlphaGo won 3-2.
    • This takes a lot of training time. From the methods section:

      Updates were applied asynchronously on 50 GPUs using DistBelief […] Training [for the supervised network] took around 3 weeks for 340 million training steps.

      Then, for self-play refinement,

      The [RL policy] network was trained in this way for 10,000 mini-batches of 128 games, using 50 GPUs for one day.

      And finally, for the value network,

      The value network was trained for 50 million mini-batches of 32 positions, using 50 GPUs, for one week.

      That adds up to a month, even on 50 GPUs. So, for anyone looking to try something similar, make sure to use a small problem, and keep a computing cluster handy.

    Comments