Posts
-
I Got Fruit Opinions
Back in 2008, Randall Munroe, creator of xkcd, put up this comic.
In a perfect storm of bikeshedding, this comic was by far his most controversial ever. He got more email from this than his endorsement of Obama. It seemed that everyone had a different opinion about fruit, and wanted to tell Randall exactly how wrong he was.
For a while, I’ve wanted to know what the average fruit opinion is. One day, I decided to stop being lazy and make a survey for this.
Survey Design
Participants rated each fruit’s difficulty and tastiness on a scale from 1 to 10. I shared the survey among my Facebook friends, the xkcd subreddit, and the SampleSize subreddit. One of my friends reshared the survey to the UC Berkeley Computer Science group.
Along the way, I got some comments about the survey’s design. My surveyed population is biased towards people who frequent those subreddits, and placing the xkcd comic at the start of the survey biases participants. It’s true, but in my defense, I wanted to give context for why I was conducting this survey, and the survey is about fruit. Literally fruit. It wasn’t supposed to be a big deal.
Analysis
In total, I got over 1300 responses. Before going into results, fun fact: everyone’s fruit opinion is indeed unique. No two responses had the exact same fruit rating. To be fair, it would be more surprising if this weren’t true, since I asked for ratings on over 20 different fruits. In fact, it would have been incredibly strange if I did find two people with the exact same rating.
Results
The five most difficult fruits are
- Pineapple
- Pomegranate
- Honeydew
- Grapefruit
- Cantaloupe
Pineapple and pomegranate are the clear losers here. Honeydew, grapefruit, and cantaloupe are all roughly tied in difficulty.
The five easiest fruits are
- Seedless Grapes
- Blueberries
- Strawberries
- Bananas
- Red Apples
No surprises here.
Now, onto taste. The five worst tasting fruits are
- Lemons
- Grapefruit
- Tomatoes
- Honeydew
- Cantaloupe
As a fellow grapefruit hater, I approve. As a fan of cantaloupe, I question people’s judgment.
The five best tasting fruits are
- Strawberries
- Seedless Grapes
- Peaches
- Blueberries
- Mango
Personally, I’m surprised seedless grapes was 2nd, while seeded grapes was 13th. It’s not a small difference either - here are the two histograms.
Seems like it was hard to disentangle the taste of the fruit from its difficulty.
Finally, controversial fruits. I measured this by fitting a normal distribution to the cloud of replies for each fruit. The distribution is fitted to difficulty and tastiness simultaneously. A fruit is controversial if its normal distribution has large variance.
The five least controversial fruits are
- Strawberries
- Seedless Grapes
- Oranges
- Blueberries
- Tangerines
By definition, this shouldn’t be surprising.
The five most controversial fruits are
- Tomatoes
- Grapefruit
- Mango
- Watermelon
- Lemons
Yep, the tastiness of tomatoes was definitely controversial - this is almost like a uniform distribution, compared to the histograms for grapes. Also, congratulations to mango, for being both controversial and in the top five tastiest fruits.
Graphs
I had to bend the numbers a little bit here. It turns out the average tastes for each fruit are a lot closer than their average difficulty. Stretching the taste-axis exaggerates the relative difference between fruits without changing their ranking.
Here’s the same plot overlaid onto the original comic, if you want to compare and contrast.
Finally, here’s the plot with confidence ellipses from the fitted normal distributions, to visualize controversy. Don’t read too much into the absolute size of each ellipse. Only a small fraction of responses actually lie within each ellipse, but drawing a 90% confidence ellipse would have made the ellipses overlap more than they already do.
For the technically inclined, a CSV of responses can be downloaded here, and the code used to create these graphs can be seen here (requires NumPy and Matplotlib).
-
The Minor Annoyance of Email
I started work about a month ago. It’s been a whirlwind, and I still don’t feel settled down yet, but things are starting to fall into place.
One of my first tasks at work is to reflect on my research interests, find other people’s research interests, and figure out a meaningful project that lies in both. That means sending lots of emails to figure out what people are working on and what they’re interested in.
Instant messaging apps like Slack may be the new kid on the block, but email is still the lifeblood of a company’s communication. It’s asynchronous, it’s free, everyone has it, and filters give you huge control over how to manage it. Email is here to stay, as annoying as it may be.
Let’s say I send an email. Three days later, I haven’t gotten a reply. Any of the following could be true.
- They read the email, and don’t want to reply to it.
- They read the email, and are planning to reply to it when they’re less busy.
- They read the email, and have forgotten to reply to it.
- They haven’t read the email because they’re on vacation.
What I should do in response depends on which of these is true. If they’re busy or on vacation, I should wait until they get to me. If they’ve forgotten, I should send a reminder. If they don’t want to reply, I should either send an email to convince them it’s worth replying, or stop sending emails to respect their decision.
However, all I observe is no reply in my inbox. So, which one’s true?
Information theory 101: if there are several reasonable hypotheses that all lead to the same observation, and you want to figure out which one is true, you are screwed.
So far, my rule of thumb is to model people as busy and working in good faith. That means being patient with email replies, sending reminders for important requests, and dropping email threads if I don’t get a response to a polite request.
This has worked well enough, but can we do better?
Let’s suppose people do this.
- They go on vacation. Before leaving, they set up an auto-reply saying they’re on vacation, and will be back in a week.
- They want to reply to an email, but it’ll take time to respond properly, and other work has higher priority. They decide to send a quick reply explaining the situation, promising to look at it more thoroughly later.
By doing so, they change the observation that the sender observes, letting them make a more informed decision.
If you want to be treated differently depending on how busy you are, you have to tell people how busy you are. Otherwise, you run into the problem Philip Guo observed: several people will make polite requests for your time, saying no to any polite request makes you feel like an asshole, and saying yes to all of them consumes all your time.
Which brings me to the most insidious pair of scenarios. The pair which has no easy way to change the observation.
- They don’t want to reply to my email.
- They forgot to reply to my email.
I’ve almost never gotten an email verifying the first one. Instead, they don’t send a reply, and hope you’ll read the signs. I’m guilty of this too. Just ask all the people in my LinkedIn inbox.
I think people don’t want to hurt the sender’s feelings, or want to have plausible deniability. But, it’s self-evident to me that not all my emails deserve meaningful replies. I work with busy people, who don’t have time for everything they’re asked to do. Not getting a reply isn’t an insult to my request. All it means is that they have limited bandwidth and I didn’t make it through the pipe.
The issue is that if people are interested, but forgot to follow up, or don’t know how to follow up, you end up reading the wrong signs, and think they aren’t interested at all. In work contexts, this isn’t so bad. In romantic contexts…things can get complicated.
I’ve mentioned similar ideas to some of my friends, and they don’t think the same way. I think these ideas all flow from my social awkwardness. I can navigate how social interactions work (somewhat), but that doesn’t mean I have to like it.
I want communication to work differently. There’s definitely room for more polite “No”s. But communication doesn’t change unless all parties involved agree it should work differently. At a societal level, these shifts come incredibly slowly. At best, we can write down our ideas, share them with other people, and see where that goes.
Until then, all those LinkedIn emails are going to collect virtual dust.
-
Hillary, Google, and a Rant
Turns out the secret to getting more blog posts out of me is to royally piss me off.
There’a a viral video claiming Google manipulated search results in favor of Hillary. I’ve seen several people watch this and say, “Yeah, seems likely.” In contrast, my reaction was to laugh at the video, and I did, before I found out how many people believed it. Then I was just mad.
I may regret this, but here’s why I find the whole affair preposterous.
The Argument
While researching for a report about the last presidential primaries, SourceFed found that Google’s autocomplete suggestions for search queries involving Hillary differed from Yahoo’s and Bing’s. Google’s autocompletes were both more positive and less common than Yahoo’s and Bing’s suggestions. In contrast, a few autocompletes tried for Sanders and Trump matched across all search engines.
They then examine the relationship between Hillary and Eric Schmidt, to give Google a motive for biasing search results.
My Rebuttal
Autocompletes are Not Biased Just for Hillary
These are results found by Snopes. They have similar results for Trump. So, right off the bat it seems like the video didn’t try enough search queries.
When there is more uncertainty in what search the user is making, Google biases towards a more positive result. This is something that happens whenever the search includes anyone’s name. It hold for Sanders, and for Trump, and for Hillary, with no preference to any of them. See Google’s explanation here.
Note this is not the same as Google always having positive search results. If someone searches “Bernie Sanders so”, it’s likely it will get completed to “Bernie Sanders socialist”, since Sanders self-identifies as a democratic socialist. On the other hand, he does not identify as communist, making “Bernie Sanders communist” a rarer phrase. That’s why Google autocompletes “socialist” but not “communist”.
The only interesting thing here is that “socialist” has better sentiment than “communist”, and that it isn’t seen as an offensive enough word to avoid.
It Would Be Too Difficult to Hide The Conspiracy
Let’s set aside motive. Because actually, I don’t give a shit about the motive. I don’t care how Hillary and Eric Schmidt are related. A good motive increases the likelihood a person does something, but it doesn’t magically make that something happen.
So to play devil’s advocate, let’s ignore motive, and assume Google has been manipulating search results for Hillary. In such a world, what also has to be true?
Code acting specifically for Hillary had to be added in by a programmer. That code then made it past code review. No one on the entire search team noticed this, or anyone who did decided to stay quiet. Let’s say the team is on the order of 100 engineers. Search is a core product of Google and the company is over 50,000 employees, so I think this is a close enough guess.
This is a lot of people! And nobody acted as a whistleblower? Note that Sanders outraised Clinton in Silicon Valley, so there must be a sizable contingent of Sanders fans, who would certainly report this if it actually existed.
If the conspiracy is true, then none of those 100 engineers reported it, despite having direct access to Google’s codebase. And, despite a sizable fraction of them being Bernie supporters. AND, despite them knowing search very well because it is their literal full-time job.
This is incredibly preposterous. I’ve read many arguments for why Google would bias their autocomplete results, but I’ve yet to see anyone explain how a company could hide a conspiracy to influence the presidential election. Scandals are really, really hard to cover up - think of how many government scandals the public finds out about.
The Part Where I Go Off The Rails and Rant and Maybe Piss People Off
I could stop this post here, because all I needed to do was cast reasonable doubt on this video. But just for kicks, let me explain why I got trolled into writing this post.
SourceFed claims the conspiracy was so subtle that they only realized it after all the presidential primaries ended. Why? Why now, instead of after other big election days? It could be random chance, but I have another explanation.
Here’s my claim: while Bernie had a chance at becoming the nominee, people focused on ways Bernie could win. Once Bernie lost in California, instead of grieving in silence, they turned to finding reasons Bernie deserved to win. Because that’s how the narrative goes! The system is corrupt, it’s fighting against the One True Candidate at every turn. It doesn’t even have to be a conscious decision, a subconscious bias towards believing Bernie was cheated makes people susceptible to ballooning a small difference in autocomplete results into the extraordinary claim that Google is in cahoots with Hillary. They didn’t ask if their evidence was strong enough to back their claim, because it fit the storyline.
This is symptomatic of a trend I’ve noticed among the most intolerable Bernie supporters. I am not talking about the Bernie supporters who like his policies and really wanted him to win the nomination. The majority of Bernie supporters I interact with are reasonable people who don’t let their love for Bernie blind themselves from the realities of the situation.
No, I’m talking about the die hard fans. The ones who have deified Bernie practically to the level of the Second Coming. Here comes Bernie Sanders, our Lord and Savior, descending from the mountaintops of Vermont! Wielding pen and paper, he’ll fight the establishment at every turn! Where establishment = exactly every part of politics working against Bernie Sanders.
To these people, the idea that Hillary won because a large percentage of voters agreed with her platform is ridiculous. Instead, they’d prefer to believe Hillary bought all her votes, as if everyone who voted for her was a mindless drone or sheep. Because she’s establishment, and establishment is evil. Establishment is the bane of our existence. They are the Enemy, they hold all the keys. Whoever the establishment backs wins, which is why the two nominees are Hillary Clinton and Jeb Bush. YEP. THAT IS DEFINITELY WHAT HAPPENED.
Believing the system is corrupt is fine. Wanting to decrease money’s influence in politics is an admirable goal. Voting for a candidate you believe in is the whole point of democracy. And grieving when a candidate loses just shows how much you cared about electing the person you believed would make the world a better place.
But if you’re going to revel in tribalism, if you’re going to shift the boundaries whenever it lets you avoid getting mad about something you like, or if you’re going to decide someone’s intelligence based on which candidate they like, I will find it very hard to respect you. I will find it very hard to tolerate you. If you refuse to be civil in debates, I will ignore you.
In contrast, if you understand that people are messy, political beliefs are a maze of dead ends, and arguments where people respect each other are doable, then holy shit, we should keep in touch.
If you can’t do that? Well, then no. We shouldn’t keep in touch. It’s not worth my time, and you’ll find plenty of people to get mad at instead of me.
Oh, and if you make a conspiracy video with horrible justification, I’ll feel free to laugh at you. It’s not nice, in fact it’s downright mean, but I find it a hell of a lot easier than thinking about a world where people will believe something because an eight minute video pandered to them in the right way.