Who is AI for right now? There are obvious use cases. Image generation for people who want filler art for work presentations, or just to mess around. Coding assistance for people who code, vibe coding for people who don’t. Speech-to-text for automatic captioning, and text-to-speech for grinding out TikTok videos that read Reddit comments to farm engagement. But also…math contest solving? Science Olympiad questions? Who is that for?

I think the easy answer to this question is that right now, AI is for the AI developers. People working in AI respect math and science contests, doing well on them is high status, so they test LLMs on those contests, even if most people do not care. It’s why so many LLM developers are focusing on code. It’s why we get announcement posts advertising that a new model is good at answering Typescript questions. Code is useful, it makes money, it is a testbed for AI speeding up the development of AI, and it is easy. Not in it being easy to improve coding, but in it being easy to evaluate. In this era, if it isn’t easy to evaluate, you have no hope. Should we be surprised that people who write code can tell what models are good at coding? Did we focus on code because it was the best thing to do, or because it was the closest thing that seemed approachable?

People have argued that teaching models to reason in math, code, and science domains generalizes to reasoning elsewhere. To be clear, those arguments are bearing fruit. Yet it doesn’t feel like the only path we could have taken. We say we want our AI to be truth-seeking, and then define truth-seeking as passing unit tests.

Recently, I sharpened a #2 pencil and took the history section of “Humanity’s Last Exam.” Consisting of 3,000 extremely difficult questions, the test is intended for AI, not me. According to its creators and contributors, Humanity’s Last Exam will tell us when artificial general intelligence has arrived to supersede human beings, once a brilliant bot scores an A. […] Of the thousands of questions on the test, a mere 16 are on history. By comparison, over 1,200 are on mathematics. This is a rather rude ratio for a purported Test of All Human Knowledge.

Asking Good Questions is Harder Than Giving Great Answers, by Dan Cohen

If CGP Grey videos have taught me anything, we should have been asking the historians what it means for something to be true. Or philosophers, if the point was to build rigorous arguments in natural language. The story of philosophy is people writing increasingly nitpicky arguments attacking imperfect, imprecise definitions of how to view the world. Not that that’s my cup of tea, but if people in another universe got reasoning to work with a DeepConfucius or OpenAristotle model instead, I wouldn’t have been surprised.

People talk about \(p(doom)\), but not \(p(good)\) or \(p(thrive)\). It’s as if people assume \(p(good)\) is exactly \({1 - p(doom)}\), that any case outside the worst case will be amazing for the world. I don’t think people make this assumption explicitly, it’s just in the way people talk. No one would seriously think this on reflection, right?

I’m working in AI because it pays well and is potentially really good for the world. The x-risk questions are worth consideration. But I would claim that even assuming the existential risk questions are overblown, AI’s not going to be good for the world unless we (the field) proactively work for it. I think that requires more engagement with outsiders and more intentional choices of domains to focus on than we’ve done so far.

When OpenAI first announced their video generation model, Sora, they gave closed beta access to a few filmmakers to see what they made of it. The result was that it didn’t help much, because the model didn’t understand the filmmakers, because it wasn’t in the data and OpenAI devs never spotted it as an issue.

With cinematic shots, the ideas of ‘tracking’, ‘panning’, ’tilting’ or ‘pushing in’ are all not terms or concepts captured by metadata. As much as object permanency is critical for shot production, so is being able to describe a shot, which Patrick noted was not initially in SORA. “Nine different people will have nine different ideas of how to describe a shot on a film set. And the (OpenAI) researchers, before they approached artists to play with the tool, hadn’t really been thinking like filmmakers.” Shy Kids knew that their access was very early, but “the initial version about camera angles was kind of random.” Whether or not SORA was actually going to register the prompt request or understand it was unknown as the researchers had just been focused on image generation. Shy Kids were almost shocked by how much the OpenAI was surprised by this request. “But I guess when you’re in the silo of just being researchers, and not thinking about how storytellers are going to use it… SORA is improving, but I would still say the control is not quite there. You can put in a ‘Camera Pan’ and I think you’d get it six out of 10 times.”

Actually Using Sora, by Mike Seymour

To be fair, I assume Sora is better at this now, given that this article was from last year. OpenAI deserves props for asking for feedback in the first place, because many do not.

I’ve lurked in a few fan art communities in my time. The artists did not know what AI was, but when they learned, they quickly decided they did not want it. But we automate art because we can, and it’s easier than learning to do the plumbing. Do the plumbers want AI? Genuinely, I don’t know. Maybe someone should ask them.

Things move fast in AI, and we’ve speedrun the journey from cute toy to symbol of capitalism and big tech in record time. If people expect their benevolent AGI to cure cancer, stop aging, and open up new vistas of knowledge, it’d be real cool if we make sure that’s actually something we focus on. Since right now, I’m not convinced we are. It feels like the most likely outcome is that people go all-in on pushing raw intelligence, in the way that AI developers can measure it, leaving behind those that are not like AI developers. We’ll follow the path of easy-to-generate text rather than text that encourages a better life, take the road of whatever gives the best PR, and eventually marvel when we create the final victory of capital over labor. It doesn’t feel like a great path to me. I don’t know how much power I have to change that path. I just feel some notion to make it a little better.