Posts
-
The Friendship Paradox And You
The friendship paradox is a cute rule of thumb. Unlike other rules of thumb, it actually has some mathematical justification behind it.
The paradox states that on average, your friends have more friends than you do. At first glance, this may seem strange, because it can’t be true for everybody. Someone has to be more popular than everybody else. And that’s true - somebody has to be on top. That’s why the statement says on average. A small fraction of people are more popular than their friends, and a large fraction are less popular than their friends.
To justify why this could be true, let’s model friendship as an undirected graph. In this graph, people are vertices, and an edge connects two people if they’re friends with one another.
Now let’s introduce some notation. \(V\) is the set of all vertices, \(n\) is the number of vertices, \(v\) is a single vertex, and \(d_v\) is the degree of vertex \(v\). The average number of friends a person has is
\[\text{Average number of friends} = \frac{\sum_{v \in V} d_v}{n}\]Alright. But what’s the average number of friends that someone’s friends have? To count this, let’s imagine that every person \(v\) creates \(d_v\) lists, one for each of their friends. Every list is titled with that friend’s name. Say their friend is named \(u\), for sake of example. On that list, \(v\) writes down all of \(u\)’s friends.
The average number of friends that \(v\)’s friends have is the average length of the lists that \(v\) created, not counting the title. The average number of friends that someone’s friends have is the average length of all these lists.
Each \(v\) creates \(d_v\) lists, giving a total of \(\sum_{v \in V} d_v\) lists. There are \(d_v\) lists titled \(v\), because we get one such list whenever a friend of \(v\) creates lists. Each of those lists has \(d_v\) names on it. Thus, each person \(v\) contributes \(d_v^2\) to the total length.
Overall, this gives
\[\text{Average number of friends of friends} = \frac{\sum_{v \in V} d_v^2}{\sum_{v \in V} d_v}\]Now, apply the Cauchy-Schwarz inequality. This inequality states that for any two vectors \(a\) and \(b\), their dot product is at most the product of their norms. We’ll use the version where we square both sides.
\[\langle a, b \rangle^2 \le \|a\|^2\|b\|^2\]Let \(a\) be the vector of all ones, and \(b\) be the vector of degrees \(d_v\). Since there are \(n\) vertices, we get
\[\left(\sum_{v \in V} d_v\right)^2 \le n \sum_{v \in V} d_v^2\]which rearranges to
\[\frac{\sum_{v \in V} d_v}{n} \le \frac{\sum_{v \in V} d_v^2}{\sum_{v \in V} d_v}\]The left hand side is the average number of friends, and the right hand side is the average number of friends of friends. That concludes the proof. \(\blacksquare\)
At a high level, the friendship paradox happens because the popularity of popular people spreads through the network - they have lots of friends, each of whom sees that some of their friends are popular.
Importantly, this only says something about the average. Arguing anything more requires making assumptions about how people interact and how friendships work.
A natural extension is to ask whether a similar result holds in directed graphs. A lot of relationships aren’t symmetric, so if a similar result holds, it makes the principle more applicable.
It turns out such a result does exist. Let’s model a directed edge as a producer-consumer relationship. There’s an edge from \(u\) to \(v\) if \(u\) produces something that \(v\) consumes.
People both produce things and consume things, represented by out-edges and in-edges respectively. Let \(d_{v,out}\) and \(d_{v,in}\) be the number of out-edges and in-edges for \(v\).
Let’s consider the average number of outgoing edges. This is the average number of things that people produce.
\[\frac{\sum_{v \in V} d_{v,out}}{n}\]For a given \(v\), let’s compare this to the number of things produced by content producers that \(v\) follows. For each incoming edge, create a list for the source of that edge, writing down every consumer of that source. (This list will be the endpoint of every outgoing edge from that source.)
From here we can apply similar logic. Each person creates one list for each outgoing edge, giving \(\sum_v d_{v,out}\) lists total. Each \(v\) is the title of \(d_{v,out}\) lists, one for each consumer. Each of those lists will have \(d_{v,out}\) items on it. All together, the average across all \(v\) is
\[\frac{\sum_{v \in V} d_{v,out}^2}{\sum_{v \in V} d_{v,out}}\]which we can once again apply Cauchy-Schwarz too. The conclusion?
On average, the content producers you follow make more things than you do.
I call this the producer view, because you’re always counting the edges that leave each vertex. We can also take the consumer view, counting the edges that are entering each vertex instead. By performing a similar argument, you get this conclusion instead.
On average, the people who follow your work follow more things than you do.
Both views are valid, and give different interpretations of the same graph.
Again, this argument only says something about the average, and you need assumptions about graph connectivity to argue anything stronger. In fact, despite its mathematical underpinnings, I would hesitate on treating the friendship paradox as a truth about the world. I see it more like a principle, that’s useful for flavoring different arguments, but not strong enough to hold an argument on its own.
* * *
In the derivation above, the only requirement was that we could model interactions as a graph.
There’s a branch of mathematics called category theory. I don’t know it very well, but the impression I get is that you let objects represent something, you let arrows represent some relation between objects, you draw arrows between different objects, and then you interpret all of mathematics as special cases of those objects and arrows. This lets you do things like explain finance by drawing a bunch of arrows.
For some reason I know \(\epsilon > 0\) fans of category theory are going to read this post, so as an homage to them, let’s make a bunch of wild claims about society by generating different interpretations of vertices and edges.
Let vertices be Twitter accounts. An edge connects \(u\) to \(v\) if \(u\) follows \(v\). In the producer view, on average the accounts you follow have more followers than you. In the consumer view, the people who follow you are more likely to follow more people than you do.
Let vertices be people. Instead of friendship, say there’s an edge from \(u\) to \(v\) if \(u\) has a crush on \(v\). To the disappointment of many people, crushes aren’t symmetric. In the producer view, on average the people you crush on have more admirers than you do. In the consumer view, on average people who have crushes on you have crushes on more people than you do. I don’t know if this makes anyone feel better about their love life, but there you go?
Again, let vertices be people. This time, there is an edge from \(u\) to \(v\) if \(u\) writes something that \(v\) reads. In the producer view, on average your readership is smaller than the readerships of writers you follow. In the consumer view, on average your readership reads more things than you do. Now, not everybody writes, but we could substitue writing with any form of communication. Blogs, articles, Facebook posts, speeches, Youtube videos, research papers, memes…
In general, any prolific person not only makes lots of things, they become well-known for making lots of things. Their reputation both precedes them and outgrows them.
* * *
I like the friendship paradox a lot. Why?
Well, for one, it’s great for addressing imposter syndrome issues. For example, sometimes I feel like I should be writing more. When I poke at the feeling, it often turns into this.
- I should write more.
- Why do I think that? It’s partly because I read cool things from people that write more than I do.
- But by friendship paradox, it’s expected that those writers are writing more than me.
- So hey, maybe I shouldn’t feel too bad.
More importantly, the friendship paradox touches on another, more important idea: the things you see don’t have to reflect reality. If you base your assumptions of popularity on the popularity of your friends, on average you’ll come up short. If you base your assumptions of productivity by things you read online, it’s easier to see evidence of productivity from people who are very productive. If you base your views of somebody by things you hear them say, it’s warped by the chances you would have heard their views in the first place. And so on down.
Not all of these are applications of the friendship paradox, but it’s easy to forget about these things, and thinking about the paradox is a nice reminder.
-
Two Years Later
I’ll be honest: I completely forgot to prepare a post for my two year blogging anniversary until today. I’m vaguely disgusted that “two year blogging anniversary” is a valid English phrase, but sometimes languages are silly like that.
Ironically this is because I’ve been working on a larger post that’s been taking up most of my writing time. Hopefully I get that done soon.
I do feel I’ve been less productive this year than last year, blogging wise. Let’s see if that’s backed up empirically. Now, word count is a pretty bad metric for writing time, because in my experience most of writing is rewriting. But it’s the best metric I’ve got. Here’s a list of all posts I wrote this year, along with their word count.
1244 2016-09-07-contradictions.markdown 1936 2016-10-18-politics.markdown 1750 2016-11-08-right-questions.markdown 895 2016-11-12-hybrid-argument.markdown 1384 2017-01-20-mh-2017.markdown 931 2017-01-25-research-comm.markdown 592 2017-02-12-default-arguments.markdown 4035 2017-02-22-wasserstein-gan.markdown 1827 2017-03-18-we-have-a-problem.markdown 834 2017-04-19-acss.markdown 1925 2017-04-26-perils-batch-norm.markdown 3443 2017-06-27-hyperparam-spectral.markdown 1613 2017-08-12-openai-dota-2.markdown 22409 total
Here’s the same list for last year.
542 2015-08-20-things-i-did-in-my-khan-academy-internship.markdown 1744 2015-08-23-simulating-a-biased-coin-with-a-fair-one.markdown 2592 2015-08-25-perfectly-intelligent-mafia.markdown 599 2015-09-01-optimizing-process.markdown 1580 2015-09-09-the-other-two-envelope-problem.markdown 1988 2015-09-24-how-an-audio-play-about-a-time-traveling-pony-turned-me-into-a-fanboy.markdown 1703 2015-10-11-humor-proposal.markdown 1871 2015-11-02-friendship-games-geometry.markdown 465 2015-11-29-mlp-season-5-review.markdown 6376 2016-01-03-grad-school.markdown 1231 2016-01-20-mysteryhunt.markdown 1784 2016-01-27-deepmind-go.markdown 3101 2016-02-11-secure-computation.markdown 2647 2016-02-20-half-year.markdown 4537 2016-03-17-alphago-lsd.markdown 1353 2016-03-22-primes-pi.markdown 1323 2016-06-13-hillary-google.markdown 817 2016-07-03-email-time.markdown 714 2016-07-04-fruit.markdown 1361 2016-07-17-ml-sleep.markdown 1016 2016-08-03-pokemon-go-equality.markdown 39344 total
So, I wrote around 17,000 fewer words this year. What changed?
* * *
The most important change is that I’m been working full time the entire year. It’s been a lot harder for me to get side projects done than it was in college. This is a well-known experience with well-known explanations, but let me explain anyways.
In college, I didn’t have designated work times. I went to class, I had lunch and dinner, and the rest of the day was a blend between work and free time. It was very easy for me to work for 20 minutes, switch to playing Dustforce for 40 minutes, blog for 10 minutes, flowing back and forth depending on mood until 3 AM. Then I would wake up the next morning and start the process over again.
It was even easier for me to do this because I didn’t have many group projects, and I had very few homework groups or study groups. Overall there was no accountability on my schedule, and the blending was a natural consequence of that.
This wasn’t a bad thing. I wasn’t very focused, but at any time there was a chance I could switch to being productive, and the activation barrier between everything was much lower. Blogging slotted itself very naturally into my day. (Well, more like into my night.)
All that has changed. I have meetings, coworkers, managers. I’m working on a project with collaborators across three offices. Coordination is a lot easier if everybody’s working times overlap - one of my collaborators works from London, and he’s deliberately shifted his schedule forward because the only good meeting time is his evenings (which are our mornings in California).
It’s the same problem that most companies have, and there’s a shared solution: a homogenized schedule with similar working hours. When I was a student, I worked more, but the work was spread out over the entire week, like maple syrup on pancakes. Sure, sometimes I didn’t finish work until 2 AM. But I had relaxed enough during the afternoon to make this feel okay.
In a job, all the work gets compressed into a single block. I do take breaks, and I’m lucky enough to work at a company that doesn’t care how long your break is…as long as you get your work done. That nagging feeling of getting my work done means I don’t take a lot of breaks.
The end result is that my energy and motivation gets spent at work in one big burst. By the time I get home it’s hard to do much more than browse the Internet and play games.
I’ve thought a bit about whether I can reproduce the college working schedule, so to speak, but there are benefits to working similar hours to everybody else. Overall I don’t think there’s a lot of value in changing how I work.
* * *
Okay, time for more uplifting news! In last year’s post, I mentioned a few posts I said I would write this year! Let’s see how I did.
The post about Gunnerkrigg Court is going to happen. I swear it will.
Uhhhh, that post didn’t happen. Okay, next!
I wrote the start of a really, really stupid short story about dinosaurs. […] I hope that story sees the light of day, because the premise is so stupid that it might just wrap back around to brilliant.
That one didn’t happen either. Alright, what else did I say I was going to work on.
Oh, I said I was going to write a post about AI safety! That one is in progress! And not done! A bit far from done, in fact, but let’s round that up to a win. If failure is success rounded down, then surely success is failure rounded up.
To be serious for a bit: yes, I am a bit disappointed I haven’t written posts that I said I wanted to write. But I’m not actually that disappointed. It feels like I’m spent about as much time writing as I wanted to spend this year.
Well, maybe I’ve written a bit less than I wanted to.
Let’s say I’m more at peace and leave it at that.
In terms of life trajectory: I didn’t apply to PhD programs, mostly out of laziness. I should have applied, just to keep the option open, but I don’t see myself going to academia any time soon, unless my research interests or work drastically change.
I think I’ve changed less this year compared to last year. Again, this is a common sentiment. I found it easy to grow every year of college - it just happened naturally. I’ve been finding it harder to do so after leaving that environment.
When I think about what I would regret the most, it’s something like this: the worry that five years from now, I’ll look back on my life, and have no idea where five years of my life went. Something kind of like this.
Apple Bloom has sunk into a psychotic depression, because she knows this is how it all starts. Pretty soon they’ll all be living in different cities, and they’ll only see each other at New Year’s. […] Sure, they’ll start talking about things they did back in the day, like “Oh hey, remember that time we were happy and life felt like it had meaning?” And after a few drinks, it’ll feel like there’s still a connection between them, and they’ll share all their online handles and say they should get together sometime. […] Then the next day when they sober up, they’ll get right back to never getting around to it.
(Totally Legit Recap, “On Your Marks”. Warning: NSFW humor about My Little Pony.)
It’s that worry that pushes me to work on things, even when I have reasons not to.
Will pre-emptively thinking about quarter-life crises stop me from having a quarter-life crisis? Eh, maybe. We’ll find out.
I think I’ve rambled long enough. Two years down. Here’s to two more.
-
Emergency Post Because OpenAI Just Revealed Their Dota 2 1v1 Bot And I Have Opinions
See title!
The International is the tournament for Dota 2, and OpenAI just demoed their bot that went undefeated against several pro players in 1v1..
Is this awesome? Yes.
Does this represent a step forward in what machine learning can do? Maybe. That’s where it gets more complicated.
* * *
Let’s start with a broader question: why do machine learning researchers make game AIs?
I claim they do so for two reasons.
- To prove it’s possible for computers to beat humans at the game.
- To use the game as an environment for developing new machine learning techniques.
Now, there’s a healthy intersection between gamers and ML researchers, so these objectives certainly aren’t mutually exclusive. I bring it up because a number of game AIs are very tailored to the game they’re trying to solve. To judge whether a game AI is impressive or not, you need to know details behind how it works.
For example, there’s a Pokemon speedrun bot that tries to complete Pokemon Red as quickly as possible. It’s achieved record times, but it didn’t learn how to do it. Rather, a human designed a flowchart of what the bot should do, and it follows it until it achieves success.
Similarly, a while back there was some buzz about a computer that beat an expert fighter pilot at an aerial combat simulator. It rather predictably spawned many alarmist headlines. Read the details, however, and you find it’s mostly a rule-based AI (if this then that), learned through genetic algorithms, with some expert knowledge added to the training process. Still cool, but a lot of domain specific knowledge went into the problem.
Neither of these are super impressive from the standpoint of seeing what machine learning is capable of, because they both rely on a human providing specific domain knowledge.
In contrast, AlphaGo was impressive because it both solved a new game, and did so by combining supervised pretraining + reinforcement learning + Monte Carlo Tree Search in a way that hadn’t been done before. Generally, the best games for research are ones where developing new techniques is necessary to achieve superhuman performance, and Go was an example of this.
Now, DeepMind cares about StarCraft, because real-time is hard, partial information is hard, the wide variety of units means you have lots to learn, and most importantly the long-term strategy and planning is something that RL systems tend to lack.
Dota 2 fills a similar niche - also real-time, also partial information, also has tons of heroes, and also requires long-term planning to play well. With current approaches, we can’t solve StarCraft, and we can’t solve Dota 2. We need new ideas to teach computers to play these games at a competent level.
That’s why people care.
* * *
So far, OpenAI has not released many details behind the bot. Speculating on what the bot means for the field would be massively irresponsible, until more info comes out to provide context.
That being said…speculating how the bot might work is also massively fun. So let’s get into it!
First of all, the bot most likely only know how to play 1v1, Shadow Fiend vs Shadow Fiend. Rule of thumb: for demos and for papers, assume that if something isn’t mentioned or shown, it isn’t proven to work yet. This is not a strike against the work! There’s no reason to make the problem more difficult if you aren’t even sure the problem is solvable.
Next, from the header:
The bot learned the game from scratch by self-play, and does not use imitation learning or tree search.
So most likely, it’s reinforcement learning. Restricting the game to only 1v1 Shadow Fiend gives you a ton of nice things.
- The model only needs to understand how a single character in the game works.
- The same policy can be used for both players.
In many ways, this reminds me of the Super Smash Brothers Melee bot that beat several pro players - it can beat pros, but only if both players are Captain Falcon, and only if the players are P1 and P2, and only if they play on Battlefield. Cool? Yes. Limited? Also yes.
The existence of that SSBM bot makes me suspect there weren’t many breakthroughs behind this bot, just some grunt work on getting the system together and getting an off-the-shelf RL algorithm to train. I don’t play Dota (used to play League), but from what I’ve heard 1v1 laning is much more about character micro. I can see RL learning to deal with that - it’s the macro level play that I expect to be hard.
Assuming it’s an RL approach, there are several follow-up questions.
- How is game state getting represented?
- How are game actions getting represented?
- How often does the AI output actions?
- How is reward defined?
- What reinforcement learning algorithm is used?
- How long does it take to train?
The impressiveness of the achievement depends on the answer to these questions. Once again, we don’t know details yet. But I’m going to throw out some guesses. Hopefully they aren’t completely wrong, I’ll update when more information comes out.
-
State representation: I assume OpenAI is collaborating with Valve on this. Valve has an official Dota 2 API, and I assume they’re using it, because why wouldn’t you? I mean, you could try to learn from raw pixels, but that would be ridiculously impressive, almost too much so. It’s much more plausible that it’s getting exact positions and health of all units within its vision, which certainly helps on the micro front. Human players have all that info too, but humans don’t have the attentional bandwidth to know everything down to the precision that an API could give you.
-
Action representation: I would guess raw (x, y) locations for mouse clicks, the skill hotkeys, and special actions for clicking on units. In theory you only need mouse clicks, but if you’re already getting information from the game API, you might as well use the API for targeting functionality too. That way your model doesn’t have to how to learn how to click on units, it can focus on learning when clicking units is good and when clicking units is bad.
-
Action frequency: No idea. They say that their APM is comparable to an average human player, which gives some indication of how often the network can send commands. I’m mostly curious on whether there are limits on APM, or if the neural net is slow enough to give human-level APM automatically.
This is one way real time problems can be more difficult - in the latency between polling for an action, computing an action, and sending it back to the game, the environment could be in a very different state. On the other hand, humans have the same problem - we have to process the game visuals, decide on an action, then send the command to our hands. Clearly we can deal with it fine, so maybe I’m overestimating how important this is.
Aside: in StarCraft, you can make much better usage of APM because you have an entire army to command. Let a StarCraft bot do thousands of actions per minute and you can get ridiculous things like this. In Dota 2, because there’s just one controllable character, there’s a cap on how much APM can improve your gameplay. This is mentioned in the blog post too, but I want to emphasize it. Whenever people post StarCraft bots, people invariable say they’ll wait for human level APM. In Dota 2’s case, I think it actually won’t matter that much - what matters is precision of movement and knowledge of health bars down to a single hit point. That’s where computers have the advantage.
- Reward definition: Here’s where I get really curious. I assume the reward isn’t just “1 if you win, -1 if you lose.” If it is, that would be insane. I mean, it’s not of the question - I think that barely kisses the realm of possibility. But I’d expect that the reward is shaped, giving partial reward based on EXP, gold, or health. (In RL, we say a reward is shaped if it gives small reward for doing things you expect to be useful. Reward shaping is one of the big ways to add domain knowledge to your problem.)
Generally, shaped rewards are easier to learn from than the pure “win/loss” signal, but only if the shaping is done well, and tuning the reward can be a huge pain. I’d actually rather have it be the win/loss reward, because if it was, it would be even cooler that the bot can learn so many complex behaviors from purely win/loss.
It could even be using the system from the Learning from Human Preferences paper - I could see it being useful for given partial signal to behaviors that are hard to define.
-
Learning algorithm: Probably PPO. It works well, and it supports continuous control, which may be a better fit for mouse movement.
-
Training time: “Two weeks real time”, except that says basically nothing for how many machines they used, whether they were able to run the game faster than real time, etc. For context, DeepMind says they got up to 2000 frames per second in a single thread for their StarCraft API. Let’s just say it took “a while” and leave it at that.