Posts

Aug 18, 2024
Nine Years Later
Sorta Insightful turns nine years old today!

Highlights

I took a break from writing puzzles this year. That’s led to a lot more free time. Puzzle writing has been one of my main recent hobbies, but I’ve found the problem is that I can’t do low key puzzlehunt writing. I either go all-in or I don’t, and when I go all-in, it takes up enough headspace that I have trouble making any other major life decisions. This year was a year where I needed to do other things. I wanted to change things up.

And I did change things up! Some of it was silent backend migrations (both Universal Analytics and my time tracker got deprecated this year), but the most notable change is that I switched career trajectories to AI safety.

I try to keep my Twitter / X usage low, but I retweet all posts whenever I write them, and I’ve noticed my switch into AI safety post has had significantly more Twitter engagement than my other posts. I chalk this up to “the alignment mafia” - a distributed set of people who reshare and promote anything supporting their views on AGI. Listen, I appreciate the support, but I haven’t done anything yet! Chill.

The time it took me to navigate my career change was much less than the time I’ve spent on puzzle writing in the past. I expected to fill that void with more blog writing, but that’s not what happened. So, what filled the void in its place?

Video games. It was video games. A quick review of some of them:

Hi-Fi Rush

Hi-Fi Rush starts simple but gets great once you unlock all the battle mechanics and get into the rhythm based flow. It’s very colorful, never takes itself that seriously, and is filled with fantastic set pieces.

Islands of Insight

Islands of Insight is…fine? It pitched itself as a puzzle-based MMO, where the highlight is a huge set of handmade Nikoli-style logic puzzles in an explorable world. The puzzles are good, but the game is poorly optimized, the social MMO aspects are pretty minimal, and the huge skill tree and unlockables feel like they’re just there to encourage higher engagement, rather than being more fun. The puzzles are great though, and if that’s good enough for you, I had fun with that.

Undertale Yellow

Undertale Yellow is a fantastic fan game, that’s been in development for 7 years and comes out feeling like a canon entry made by Toby Fox. I have small nitpicks about the plot and lack of variety in pacifist boss strategies, but the overall package works. I would have gladly paid money for this, but it’s free. If you liked Undertale check it out.

Hades 1

I bought the first Hades three years ago and never installed it. When Hades 2 went into early access, it was a great excuse to play the first one. I pushed up to Epilogue, all weapon aspects, and 21 heat before setting it down. At some point, it does get easy to fall into the builds you know are overpowered, but that’s the fate of every roguelike. What Hades 1 does well is make you feel overpowered when you get a good run going - and then killing you regardless if you don’t respect the bosses enough. It just does a lot of things right. I don’t like how grindy the story progression gets by the end, where you’re just waiting for a character to provide a dialogue option, but most players will not reach that point.

Incremental Games

Incremental games were a mistake, do not recommend. If you’re unfamiliar with the genre, Cookie Clicker is the prototypical example. You start with a small number of resources, then achieve exponentially larger resources through increasingly complicated progression and automation systems. I tried one of the highly recommended ones, and no joke, it took up 4 months of my life. If you are the kind of person who likes theorycrafting stat builds to maximize damage throughput, incremental games are a game where you only do that. Except, instead of maximizing damage, you’re minimizing time-to-unlock the next progression layer. I was spending 3 hours a day to push my game to a state where I could minimize how much time I’d need to wait to unlock the next layer, and if I didn’t spend those 3 hours I would have had to wait days for progress instead. It felt a lot like machine learning work, where you try to get everything done so that you can let the model train overnight and inspect it next morning. The experience was interesting, but I wouldn’t go through it again.

Statistics

Posts

I wrote 9 posts this year, up from 6 last year.

Time Spent

I spent 139 hours, 56 minutes writing for my blog this year, around 0.75x as much as last year.

View Counts

These are view counts from August 18, 2023 to today.
```
300   2023-08-18-eight-years.markdown  
247   2023-09-05-efnw-2023.markdown  
628   2023-11-25-neopoints-making-guide.markdown  
15837 2024-01-11-ai-timelines-2024.markdown  
1939  2024-01-21-mh-2024.markdown  
5076  2024-03-23-crew-battle.markdown  
826   2024-04-30-puzzlehunting-201.markdown  
8641  2024-07-08-tragedies-of-reality.markdown  
3923  2024-08-06-switching-to-ai-safety.markdown  
```
This continues to be a reminder that my view counts are heavily driven by who reshares things. I expected the AI related posts to be popular, but the post on the math of Smash Bros. crew battles is a big outlier. I shared it to the Smash subreddit, someone from there shared it to Hacker News, and that’s why it has so many views. (By the way, I’ve made some minor edits since that post went live, including a proof sketch for the final conjecture. Check it out if you missed it.)

Based on Twitter views, I can also see there’s a 6% clickthrough rate from my tweet saying I was joining an AI safety team to people actually reading the blog post.

Posts in Limbo

Is this a good time to confess that I never look back at the list of in-progress posts I write each year? I just write up the list, never read it, then go “oh yeah, that” when I reread my old posts to prepare the next anniversary post.

I’m no longer sure I get any value from sharing half-baked ideas, and may cut this in the future.

Post about Dominion:

Odds of writing this year: 5%
Odds of writing eventually: 10%

I haven’t touched my draft of this in a long, long time. I’m realizing it’s the kind of thing that could form a good Youtube essay (big fan of the History of the 2002-2005 Yu-Gi-Oh! meta video), but longform video content is not my skillset and not something I’m interested in getting better at.

Post about Dustforce:

Odds of writing this year: 20%
Odds of writing eventually: 60%

One hypothesis of the world is that positive and negative reinforcement plays a much larger role in behavior than people think it does. I’m partial to this, because I can tell I’ve played less Dustforce recently, almost entirely because of my personal laptop developing sticky key issues that make it just a bit more annoying to play. The game is still great, but it’s not a game you want to play with a keyboard that eats 5% of your jumps at random. This also affected my blogging motivation. Imagine trying to write longform text when your E key randomly sends 0 E presses or 2 E presses each time you touch it. So far, blogging has been saved by either writing on my work laptop or plugging in an external keyboard. Neither of those solutions work for Dustforce, since I won’t install it on my work laptop, and my external keyboards have key ghosting issues.

My personal laptop is getting pretty worn down, and I’m going to need to replace it before I travel for Mystery Hunt this year. (Pro tip: gaming laptops tend to prioritize good initial specs over long-term reliability.) One more thing to change - those video games aren’t gonna play themselves.
Comments
Aug 6, 2024
I'm Switching Into AI Safety
You can read the title. As of last month, I’ve been winding down my existing robotics projects, and have switched to the AI safety team within Google DeepMind. Surprise!

It took me a while to start working on this post, because my feelings on AI safety are complicated, but changing jobs is a big enough life update that I have to write this post. Such is the life of a blogger.

The Boring Personal Reasons

I’ve been working on the robotics team for 8 years now, and I just felt like I needed to mix it up. It was a little unsettling to realize I had quietly become one of the most senior members of the team, and that I had been there longer than my manager, my manager before that, and my manager before that. Really, this is something I thought about doing three years ago, but then my puzzlehunt team won MIT Mystery Hunt, meaning we had to write next year’s Mystery Hunt. Writing Mystery Hunt took up all of my 2022, and recovering from it took up much of my 2023. (That and Tears of the Kingdom, but let’s not talk about that.)

Why change fields, rather than change within robotics? Part of me is just curious to see if I can. I’ve always found the SMBC “Seven Years” comic inspiring, albeit a bit preachy.

(Edited from original comic)

I believe I have enough of a safety net to be okay if I bomb out.

When discussing careers with someone else, he said the reason he wasn’t switching out of robotics was because capitalism rewards specialization, research especially so. Robotics was specialized enough to give him comparative advantages over more general ML. That line of argument makes sense, and it did push against leaving robotics. However, as I’ve argued in my previous post, I expect non-robotics fields to start facing robotics-style challenges, and believe that part of my experience will transfer over. I’m also not starting completely from zero. I’ve been following AI safety for a while. My goal is to work on projects that can leverage my past expertise, while I get caught up.

The Spicier Research Interests Reasons

The current way robot agents are trained can broadly be grouped into control theory, imitation learning, and reinforcement learning. Of those, I am a fan of reinforcement learning the most, due to its generality and potential to exceed human ability.

Exceeding human ability is not the current bottleneck of robot learning.

Reinforcement learning was originally a dominant paradigm in robot learning research, since it led to the highest success rates. Over the years, most of its lunch has been eaten by imitation learning methods that are easier to debug and show signs of life earlier. I don’t hate imitation learning, I’ve happily worked on several imitation learning projects, it’s just not the thing I’m most interested in. Meanwhile, there are some interesting applications of RL-style ideas to LLMs right now, from its use in RLHF to training value functions for search-based methods like AlphaProof.

When I started machine learning research, it was because I found learning and guiding agent behavior to be really interesting. The work I did was in a robotics lab, but I always cared more about the agents than the robots, the software more than the hardware. What kept me in robotics despite this was that in robotics, you cannot cheat the real world. It’s gotta work on the real hardware. This really focused research onto things that actually had real world impact, rather than impact in a benchmark too artificial to be useful. (Please, someone make more progress on reset-free RL.)

Over the past few years, software-only agents have started appearing on the horizon. This became an important decision point for me - where will the real-world agents arrive first? Game playing AIs have been around forever, but games aren’t real. These LLM driven systems…those were more real. In any world where general robot agents are created, software-only agents will have started working before then. I saw a future where more of my time was spent learning the characteristics of the software-hardware boundary, rather than improving the higher-level reasoning of the agent, and decided I’d rather work on the latter. If multimodal LLMs are going to start having agentic behaviors, moving away from hardware would have several quality of life benefits.

One view (from Ilya Sutskever, secondhand relayed by Eric Jang) is that “All Successful Tech Companies Will be AGI Companies”. It’s provocative, but if LLMs are coming to eat low-level knowledge work, the valuable work will be in deep domain expertise, to give feedback on whether our domain-specific datasets have the right information and whether the AI’s outputs are good. If I’m serious about switching, I should do so when it’s early, because it’ll take time to build expertise back up. The right time to have max impact is always earlier than the general public thinks it is.

And, well, I shouldn’t need to talk about impact to justify why I’m doing this. I’m not sitting here crunching the numbers of my expected utility. “How do we create agents that choose good actions” is just a problem I’m really interested in.

(I tried to get an LLM to make this for me and it failed. Surely this is possible with the right prompt, but it’s not worth the time I’d spend debugging it.)

The Full Spice “Why Safety” Reasons

Hm, okay, where do I start.

There’s often a conflation between the research field of AI safety and the community of AI safety. It is common for people to say they are attacking the field when they are actually attacking the community. The two are not the same, but are linked enough that it’s not totally unreasonable to conflate them. Let’s tackle the community first.

I find interacting with the AI safety community to be worthwhile, in moderation. It’s a thing I like wading into, but not diving into. I don’t have a LessWrong account but have read posts sent to me from LessWrong. I don’t read Scott Alexander but have read a few essays he’s written. I don’t have much interaction with the AI Alignment Forum, but have been reading more of it recently. I don’t go to much of the Bay Area rationalist / effective altruism / accelerationist / tech bro / whatever scene, but I have been to some of it, mostly because of connections I made in my effective altruism phase around 2015-2018. At the time, I saw it as a movement I wasn’t part of, but which I wanted to support. Now I see it as a movement that I know exists, where I don’t feel much affinity towards it or hatred against it. “EA has problems” is a statement I think even EAs would agree with, and “Bay Area rationalism has problems” is something rationalists would agree with too.

The reason AI safety the research topic is linked to that scene is because a lot of writing about the risks of AGI and superintelligence originate from those rationalist and effective altruist spaces. Approving of one can be seen as approving the other. I don’t like that I have to spill this much digital ink spelling it out, but that is not the case here. Me thinking AI safety is important is not an endorsement for or against anything else in the broader meme space it came from.

Is that clear? I hope so. Let’s get to the other half. Why do I think safety is worth working on?

* * *

The core tenets of my views on AI safety are that:
1. It is easy to have an objective that is not the same as the one your system is optimizing, either because it is easier to optimize a proxy objective (negative log likelihood vs 0-1 classification accuracy), or because your objective is hard to describe. People run into this all the time.
2. It’s easy to have a system that generalizes poorly because you weren’t aware of some edge case of its behavior, due to insufficient eval coverage, poor model probing, not asking the right questions, or more. This can either be because the system doesn’t know how to handle a weird input, or because your data is not sufficient to define the intended solution.
3. The way people solve this right now is to just…pay close attention to what the model’s doing, use humans in the loop to inspect eval metrics, try small examples, reason about how trustworthy the eval metrics are, etc.
4. I’m not sold our current tooling scales to better systems, especially superhuman systems that are hard to judge, or high volume systems spewing millions of reviewable items per second.
5. I’m not sold that superhuman systems will do the right thing without better supervision than we can currently provide.
6. I expect superhuman AI in my lifetime.
7. The nearest-term outcomes rely on the current paradigm making it to superhuman AI. There’s a low chance the current paradigm gets all the way there. The chance is still higher than I’m comfortable with.
In so far as intelligence can be defined as the ability to notice patterns, pull together disparate pieces of information, and overall have the ability to get shit done, there is definitely room to be better than people. Evolution promotes things that are better at propagating or replicating, but it works slow. The species that took over the planet (us) is likely the least intelligent organism possible that can still create modern civilization. There’s room above us for sure.

I then further believe in the instrumental convergence theory: that systems can evolve tendencies to stay alive even if that is not directly what their loss function promotes. You need a really strong optimizer and model for that to arise, so far models do not have that level of awareness, but I don’t see a reason that wouldn’t happen. At one point, I sat down and went through a list of questions for a “P(doom)” estimate - the odds you think AI will wreck everything. How likely do you think transformative AI is by this date, if it exists how likely is it to develop goal-seeking behavior, if it has goals how likely are they to be power-seeking, if it’s power-seeking how likely is it to be successful, and so on. I ended up with around 2%. I am the kind of person who thinks 2% risks of doom are worth looking at.

Anecdotally, my AI timelines are faster than the general public, and slower than the people directly working on frontier LLMs. People have told me “10% chance in 5 years” is crazy, in both directions! There is a chance that alignment problems are overblown, existing common sense in LLMs will scale up, models will generalize intent correctly, and OSHA / FDA style regulations on deployment will capture the rare mistakes that do happen. This doesn’t seem that likely to me. There are scenarios where you want to allow some rule bending for the sake of innovation, but to me AI is a special enough technology that I’m hesitant to support a full YOLO “write the regulations if we spill too much blood” strategy.

I also don’t think we have to get all the way to AGI for AI to be transformative. This is due to an argument made by Holden Karnofsky, that if a lab has the resources to train an AI, it has the resources to run millions of copies of that AI at inference time, enough to beat humans due to scale and ease of use rather than effectiveness. (His post claims “several hundred millions of copies” - I think this is an overestimate, but the core thesis is correct.)

So far, a number of alignment problems have been solved by capitalism. Companies need their AIs to follow user preferences enough for their customers to use them. I used to have the view that the best thing for alignment would be getting AI products into customer’s hands in low stakes scenarios, to get more data in regimes where no mistake was too dangerous. This happened with ChatGPT, and as I’ve watched the space evolve, I…wish there was more safety research than there has been. Capitalism is great at solving the blockers to profitability, but it’s also very willing to identify economic niches where you can be profitable while ignoring the hard problems. People are too hungry for the best models to do due diligence. The level of paranoia I want people to have about LLMs is not the level of paranoia the market has.

Historically, AI safety work did not appeal to me because of how theoretical it was. This would have been in the early 2010s, but it was very complexity theory based. Bounded Turing machines, the AIXI formulation, Nash equilibria, trying to formalize agents that can take actions to expand their action space, and so on. That stuff is my jam, but I was quite pessimistic any of it would matter. I would like someone to be trying the theory angle, but that someone isn’t me. There is now a lot of non-theory work going on in AI safety, which better fits my skill set. You can argue whether that work is actually making progress on aligning superhuman systems, but I think it is. I considered Fairness in Machine Learning too, but a lot of existing literature focuses on fairness in classification problems, like algorithmic bias in recidivism predictors and bank loan models. Important work, but it didn’t have enough actions, RL, or agent-like things to appeal to me. The claims of a war between the fairness and alignment communities feel overblown to me. The average person I’ve met from either is not interested in trying to make a person “switch sides”. They’re just happy someone’s joining to make the field larger, because there is so much work to do, and people have natural inclinations towards one or another. Even if the sociologies of the fields are quite different, the fundamentals of both are that sometimes, optimization goes wrong.

I’m aware of the arguments that most AI safety work so far has either been useless or not that different from broader AI work. Scaling laws came from safety-motivated people and are the core of current frontier models. RLHF developments led to InstructGPT, then ChatGPT. Better evaluation datasets to benchmark models led to faster hill climbing of models without corresponding safety guarantees. Most recently, there’s been hype about representation engineering, an interpretability method that’s been adopted enthusiastically…by the open-source community, because it enables better jailbreaks at cheaper cost. Those who don’t think safety matters brands this as typical Silicon Valley grandstanding, where people pretend they’re not trying to make money. Those who care about safety a lot call this safetywashing, the stapling of “safety” to work that does not advance safety. But…look, you can claim people are insincere and confused about anything. It is a nuclear weapon of an argument, because you can’t convince people it’s wrong in the moment, you just continue to call them insincere or confused. It can only be judged by the actions you take afterwards. I don’t know, I think most people I talk about safety with are either genuine, or confused rather than insincere. Aiming for safety while confused is better than not aiming at all.

So, that’s what I’m doing. Aiming for safety. It may not be a permanent move, but it feels right based on the current climate. The climate may turn to AI winter, and if it does I will reconsider. Right now, it is very sunny. I’d like it if we didn’t get burned.
Comments
Jul 8, 2024
The Tragedies of Reality Are Coming for You
In 2023, I was at an ML conference. The night was young, the drinks were flowing, and the topic turned to a question: “if you could take any machine learning subfield, and give all their resources to a different one, what are you killing and what are you boosting?”

I don’t remember what they boosted, but one person said they’d kill robotics. When I pushed them on it, they said robotics progress is too slow and nothing happens relative to everything else.

I think they were correct that robotics progress was slower than software-only machine learning subfields. But I would also add two things:
- The reason robot learning progress is slower is because it’s very hard to do anything without tackling the hard problems.
- The hard problems of robotics are not unique to robotics.
One of the very common refrains in robotics is “reality is messy”. I would extend it to reality is complicated, relative to code, and in robotics you’re often pushing a messy reality into an abstraction nice enough for code to act on it. As a field, computer science has spent decades creating nice abstraction layers between hardware and software. Code describes how to drive electricity to my hard drive, processor, and monitor, reliably enough that I don’t have to even think about it.

From xkcd

There’s a lot of benefit to doing this! Once you’ve done the hard work, and moved your progress towards acting in abstract logical space, everything’s easier. Code and data is incredibly replicable. I have copies of the file representing the draft of this blog post synced across 3 devices, and don’t even think about it.

However, to quote Joel Spolsky, all abstractions are leaky to some degree, and I’ve found those leaks tend to be bigger in robotics. There are many ways for things to go wrong that have nothing to do with the correctness of your code.

Is that because of something fundamental to the subject? A little bit. A lot of robot hardware is more experimental than a laptop or Linux server. Consumer robots aren’t as big an industry yet. “Experimental” tends to mean “weird, easier to reach failure states”.

But, I don’t think the hardware is the primary driver of friction. It’s reality that’s the friction. Benjamin Holson put it really well in his “Mythical Non-Roboticist” post:

The first kind of hard part is that robots deal with the real-world, imperfectly sensed and imperfectly actuated. Global mutable state is bad programming style because it’s really hard to deal with, but to robot software the entire physical world is global mutable state, and you only get to unreliably observe it and hope your actions approximate what you wanted to achieve.

Robotics research relies on building new bridges between reality and software, but that happens outside of robotics too. Any software that interfaces with reality will have imperfect knowledge of that reality. Any software that tries to affect real world change has to deal with reality’s global mutable state. Any software whose actions depend on what’s going on in reality invites adversarial sources of noise and complexity.

Game AI is an instructive example here. Chess AIs are reliably superhuman. However, some superhuman Go AIs are beatable if you play in a specific way, as discovered by Wang and Gleave et al, ICML 2023. Adversarial techniques found a strategy legible enough for humans to reproduce.

In Appendix G.2 one of our authors, a Go expert, was able to learn from our adversary’s game records to implement this [cyclic] attack without any algorithmic assistance. Playing in standard human conditions on the online Go server KGS they obtained a greater than 90% win rate against a top ranked KataGo bot that is unaffiliated with the authors. The author even won giving the bot 9 handicap stones, an enormous advantage: a human professional with this handicap would have a virtually 100% win rate against any opponent, whether human or AI. They also beat KataGo and Leela Zero playing with 100k visits each, which is normally far beyond human capabilities. Other humans have since used cyclic attacks to beat a variety of other top Go AIs.

Meanwhile, a few years ago OpenAI created a system that defeated the reigning world champions at Dota 2. After making the system available to the public to test its robustness, a team engineered a strategy that achieved a 10 game win streak.

Based on this, one pessimistic view you could hold is that connecting even a simple “reality” of a 19 x 19 Go board or Dota 2 is enough additional complexity to make robust behavior challenging. I think this view is unfair, because neither of these systems had robustness as a top-level objective, but I do think they’re an interesting case study.

There’s been a recent wave of hype around LLMs - what they can do, where they can apply. Implicit in all of this is the belief that LLMs can make significant changes to how people interface with technology, in their work and in their leisure. In other words, that LLMs will change how we mediate with reality. I’m actually on board this hype wave, to be specific I suspect foundation models are overhyped short-term and underhyped long term. However, that means all the messiness of reality is coming for a field that historically does a bad job at considering reality. At the same ML conference where this person said robotics was a waste of resources, I mentioned that we were experimenting with foundation models in real robots. I was told this seemed a bit scary, and I reassured them it was a research prototype. But I also find LLMs generating and executing software a little scary, and thought it was interesting they implicitly worried about one but not the other. Silicon Valley types have a bit of a paradox to them. They both believe that software can power amazing transformational startups, and that their software doesn’t merit contemplation or introspection. I consider the world of bits to be as much a part of reality as the world of atoms. Operating on a different level, but a part of it nonetheless.

I’ve noticed (with some schadenfreude) that LLM practitioners keep discovering pain points that robotics has hit before. “We can’t reproduce these training runs because it’s too capital-intensive.” Yeah, this has been a discussion point in robotics for at least a decade. “I can’t get Bing to gaslight me about Avatar 2’s release date, since it keeps pulling up news articles written about itself, and self-corrects before generation.” We’re now in a world where any publicly available Internet text can irrecoverably influence retrieval-augmented generation. Welcome to globally mutable state. Every time I see someone claim there’s a regression in ChatGPT’s behavior, I’m reminded of the conspiracies I and others have come up with to explain sudden, inexplicable drops in robot performance, and whether the problem is the model, the environment, or us extrapolating too much from anecdata.

There’s a saying that “all robot demos lie”, and people are discovering all LLM demos lie too. I think this is fundamentally impossible to avoid, because of the limitations of human attention. What’s important is evaluating the type, size, and importance of the lie. Did they show how it could generalize? Did they mention how cherry-picked the examples were? These questions become more complicated once you connect reality into the mix. Sure, Messi’s looked like a good player so far, but “can he do it on a cold rainy night in Stoke”?

What makes it complicated is that the answer to these questions isn’t always “no”. Messi could do it on a cold rainy night in Stoke. He was good enough. And that makes it hard, because being correct on a “yes” matters much more than being correct on a “no”. As LLMs get better, as AI becomes more common in daily life - we, as a society, will need to get increasingly good at deciding if the models have proven themselves. One of my main worries about the future is that we get bad at evaluating if the models have proven themselves. But, I expect roboticists to be ahead of the curve. We were complaining about evaluation years before claims of LLMs gaming common benchmarks. We were trying to get enough data to capture the long tail of self-driving long before “we need better data coverage” became the rallying cry of foundation model pretraining teams. Machine learning has lived in a bubble that was the envy of roboticists and chemists and biologists and neuroscientists, and as it starts to actually work, we’ll all be running into the same walls of reality that others have dealt with for years and years. These challenges can be overcome, but it’ll be hard. Welcome to the real world. Welcome to the pain.
Comments