Posts

  • My 2022 r/place Adventure

    Every April Fool’s, Reddit runs a social experiment. I’ve always had fun checking them out, as part of my journey from “people are too hard, I’m just going to do math and CS”, to “people are hard, but like, in a really interesting way?” The shift was realizing that culture is a decentralized distributed system of human interaction, and Reddit’s April Fool’s experiments are a smaller, easier to understand microcosm of that.

    Some have been pretty bad. Second was trash. I know people who liked The Button a lot, but r/place is easily their most popular one. Each user can place one pixel every 5 minutes in a canvas shared by the entire Internet. Alone, it’s hard to do anything. But working together, you can create all kinds of pretty pixel art…that can then get griefed by anyone who wants to. It’s all anarchy.

    I didn’t contribute to r/place in 2017, but for the 2022 run I figured I would chip in a few pixels to the SSBM and My Little Pony projects, then mostly spectate. And yeah, that’s how it started on the first day! The MLP subreddit put out a Rainbow Dash template on March 31, and when I checked in, I saw their location was right in the path of the ever-expanding Ukrainian flag.

    Before...

    ...and After

    The irony of people fighting for r/place land while a real-world land grab was going on was not lost on me. I deliberately skipped joining the coordination Discord, because I had no interest in getting involved more than the surface level, but from what I heard, the MLP r/place Discord decided to play the long game. They believed that although the Ukraine flag was taking over space now, and although most subreddits supported Ukraine in the Russia-Ukraine conflict, the flag owners would eventually face too much pressure and would have to make concessions to allow some art within its borders. The MLP Discord wanted to maintain their provisional claim formed by Rainbow Dash’s torso, to be expanded later. They were correct that the Ukraine flag would make concessions. They were wrong that My Little Pony would get to keep it. There’s a whole saga of alliance and betrayal there, which eventually forced MLP to change locations, but I did not follow it and I’m sure someone else will tell that story. The one I want to tell is much smaller.

    For about 7 years, I’ve been a fan of Dustforce, an indie platformer. When people ask what video games I’ve played recently, I always tell them I’ve been playing Dustforce, and they’ve always never heard of it. It’s one of my favorite games of all time. Maybe I’ll explain why in another post, but the short version is that Dustforce is the SSBM of platformers. Super deep movement system, practically infinite tech skill ceiling, levels that are hard but satisfying to finish, and a great replay system that always gives you tools to get better. It’s not very big, but it had representation in 2017’s r/place.

    Dustman

    Lots of big communities have little interest in r/place, and lots of little communities have outsized presence in r/place. You don’t need to be big, you just need a subset with enough r/place engagement and organizational will. Dustforce is tiny, speedrun livestreams get at most 30 people, but we made it 5 years ago. I believed we could totally make it in 2022. The Dustforce Discord talked about doing something for r/place, but hadn’t done anything, so I made a pixel art template in hopes it would get the ball rolling.

    Dustkid

    Dustkid, a character in Dustforce. We’ll be seeing her a lot.

    Pixel art is not my strong suit. To make this, I downloaded the favicon for dustkid.com, then translated the pixels to the r/place color palette. After scanning existing r/place pixel art, I realized our target image was somewhat big for our community size, so I prepared a smaller version instead. Any representation is better than none.

    Dustkid, smaller

    I wasn’t interested in organizing r/place for Dustforce long term, but I’m a big believer that you get movements going by decreasing the initial effort required. Having a template gives a way for uninterested people to contribute their 1 pixel. I’m happy to report this plan succeeded! Although, it was a long journey to get there. To be honest, this is primarily the work of other community members who took up the mantle I started.

    * * *

    We didn’t want to take pixels away from established territory. I don’t think we could have even if we wanted to. Our best odds were to find a small pocket of space that didn’t have art yet. After some scouting, I proposed taking (1218, 135) to (1230, 148), a region right underneath the Yume Nikki character that looked uncontested. I also argued that even though we would mostly coordinate over Discord, it was important to make a post on the r/dustforce subreddit. Since r/place was a Reddit event, we needed a land claim on Reddit for discoverability. With that claim made, we started placing pixels.

    The starts of Dustkid

    The beginnings of the Dustkid hat

    As we continued, we noticed a problem pretty quickly. We had picked the same spot as r/Taiwan.

    Uh oh

    Someone tracked down the r/Taiwan Reddit post. Their aim was to add a flag of the World Taiwanese Congress. For Dustforce, this was baaaaaad news. Flags historically have a lot of power in r/place. Their simplicity makes it easy for randoms to contribute pixels (it’s easy to fix errors in solid blocks of color), and patriotism is a good mobilizer to get more people to chip in. We were right in the middle of their flag, and it would take up all the available free space in that section. Maybe we could have negotiated living on their flag, but I highly doubt r/Taiwan would have agreed, and no one from r/dustforce even bothered asking.

    r/Taiwan flag

    Alright, back to square one. Other people in the community proposed alternate spots, without much luck. Any spot that opened up was quickly taken by other communities or bots. By and large, people on r/place will respect existing artwork, so once we got something down there was a good chance we could protect it. The problem was that in the time it took us to make something recognizable, other groups or bots would place pixels faster. By the end of Day 2, we had nothing. All our previous efforts were run over. We were simply outmatched, and I didn’t think we’d make it.

    * * *

    On the last day, Reddit doubled the size of the canvas once more, saying it would be the final expansion. With a new block of free space, that expansion represented our best chance of getting something into r/place. It’s now or never.

    First, the expansion had increased the color palette, so we adjusted the template to be more game-accurate.

    New template

    By this point I was content to leave organizing to others. They first reached out to the Celeste community, asking if we could fit Dustkid into their banner, seeing as how both Dustforce and Celeste are momentum-based precision platformers. After some discussion, they felt it would clash too much. This wasn’t entirely fruitless however - during this discussion, they invited Dustforce to the r/place Indie Alliance. It was exactly what it sounded like: an r/place alliance between indie game communities.

    I joined the Indie Alliance Discord to stay in the loop, but quickly found it was too fast for me. Lots of shitposts, lots of @everyone pings, and lots of panicking whenever a big Twitch streamer went live and tried to force their will onto the r/place canvas. There were even accusations of spies and saboteurs trying to join the alliance. I never quite understood how having a spy in r/place would help things. In a real fight, there’s fog of war, it takes time to mobilize forces, and intelligence on troop movements or new weapons can be a decisive edge. But in r/place, there’s no fog of war because the entire canvas is public, people can “attack” (place pixels) anywhere they want with no travel time, and everyone knows how to find r/place bots if they want to. There’s no real benefit to knowing an r/place attack is coming, nor is there much benefit in knowing a place will be defended - by default, everything is defended.

    Our eventual target was a space nearby the Celeste banner, currently occupied by AmongUs imposters.

    Target location

    They laid out the argument: AmongUs crewmates were scattered all throughout r/place, usually to fill up space without compromising the overall artwork. Since there were so many crewmates, the odds any specific patch was fiercely defended was quite low. That meant AmongUs space looked more defended than it really was. If we blitzed the space fast enough, we likely wouldn’t face retribution, since the AmongUs people likely wouldn’t care. We went for it.

    Dustkid, round 2

    We did run slightly afoul of r/avali, a subreddit for a furry species. Our art template and their art template overlapped by 1 pixel, and we both really wanted that pixel. In a dumb parody of the Israel-Palestine conflict, they wanted both an orange and indigo border around the Avali, but our pixel threatened the indigo border, and we really didn’t like the aesthetic of having an indigo border everywhere besides that pixel.

    Conflict

    Indie game-Furry mascot conflict, April 2022

    Now, if r/avali had decided to fight, we would have lost. Luckily, we figured out a solution before it came to that. If we shifted the Dustkid head diagonally down-right 1 pixel, it would resolve the dispute. Plus, we’d be more symmetric between the Rocket League bot logo towards our north and the Mona Lisa towards our south. We let r/avali know, and did the migration.

    Conflict resolution

    With that, we made it! We even had time to adjust our template and fill in more space with Dustforce pixel art, adding the S+ icon we had last time r/place happened.

    Final image

    It was surprisingly low on drama. Everyone nearby was friendly. The Go subreddit r/baduk to our right could have conflicted, but they recognized our land claim and adjusted their template so that it wouldn’t clash. We eventually made a heart connecting the two. A Minecraft wolf invaded the space where we planned our pixel art expansion, but we found it was all created by one user (!), so we offered to adopt and relocate the wolf to a separate corner, which they were happy with. RLbot to our north even offered to give us more space, since they had abandoned their logo a while back. We never took them up on that offer, since by the time we were done tuning things in our corner, r/place had ended.

    We really lucked out on our location, and were never the target of a community big enough to trample over us. The theory was that we were close to big art pieces like the Mona Lisa and One Piece, and although those art pieces weren’t looking to expand, no one wanted to challenge them and this gave us protection by proxy. Perhaps you could call Dustforce a vassal state, but I’m not sure they even care about us.

    With Dustkid settled, I went back to helping cleanup My Little Pony art, which had been griefed enough that they had added a counter for “# of times we’ve rebuilt” to their template.

    My Little Pony

    (It’s the 22 on the middle of the right border, just above Rarity)

    I also helped a bit when someone from the Indie Alliance got raided. But that was small fry, compared to the work beforehand. I was happy to be done with r/place. It really took up more of my attention and time than I expected it to.

    There’s this old quote from The Sandman. “Everybody has a secret world inside of them.” Nowhere is that more true than r/place. The adventure to put a 15x15 Dustkid head was just one piece of the overall 2000x2000 canvas. There’s all sorts of complexity,

    Zoom 1

    that gets lost,

    Zoom 2

    as you take in the bigger picture.

    Zoom 3

    Comments
  • The Dawn of Do What I Mean

    Boy, last week was busy for deep learning. Let’s start with the paper I worked on.

    SayCan is a robot learning system that we’ve been developing for about the past year. The paper is here, and it builds on a lot of past work we’ve done in conditional imitation learning and reinforcement learning.

    Suppose you have a robot that can do some small tasks we tell it to do with natural language, like “pick up the apple” or “go to the trash can”. If you have these low-level tasks, you can chain them together into more complex tasks. If I want the robot to throw away the apple, I could say, “pick up the apple, then go to the trash can, then place it in the trash can”, and assuming those three low-level tasks are learned well, the robot will complete the full task.

    Now, you wouldn’t want to actually say “pick up the apple, then go to the trash can, then place it in the trash can”. That’s a lengthy command to give. Instead, we’d like to just say “throw away the apple” and have the rest be done automatically. Well, in the past few years, large language models (LLMs) have shown they can do well at many problems, as long as you can describe the input and output with just language. And this problem of mapping “throw away the apple” to “pick / go to trash / place” fits that description exactly! With the right prompt, the language model can generate the sequence of low-level tasks to perform.

    Diagram of the SayCan model

    This, by itself, is not enough. Since the LLM is not aware of the robot’s surroundings or capabilities, using it naively may generate sentences the robot isn’t capable of performing. This is handled with a two-pronged approach.

    1. The language generation is constrained to the skills the robot can (currently) perform.
    2. Each generated instruction is scored based on a learned value function, which maps the image + language to the estimated probability the robot can complete the task.

    You can view this as the LLM estimating the best-case probability an instruction helps the high-level goal, and the value function acting as a correction to that probability. They combine to pick a low-level task the robot can do that’s useful towards the high-level goal. We then repeat the process unless the task is solved.

    This glosses over a lot of work on how to learn the value function, how to learn the policy for the primitive tasks, prompt engineering for the large language model, and more. If you want more details, feel free to read the paper! My main takeaway is that LLMs are pretty good. The language generation is the easy part, while the value function + policy are the hard parts. Even assuming that LLMs don’t get better, there is a lot of slack left for robot capabilities to get better and move towards robots that do what you mean.

    * * *

    LLMs are not the bottleneck in SayCan, but they’re still improving (which should be a surprise to no one). As explained in the GPT-3 paper, scaling trendlines showed room for at least 1 order of magnitude, and recent work suggests there may be more.

    DeepMind put out a paper for their Chinchilla model. Through more careful investigation, they found that training corpus size had not increased in size relative to parameter count as much as it could have. By using about 4x more training data (300 billion tokens → 1.4 trillion tokens), they reduced model size by 4x (280B parameters → 70B parameters) while achieving better performance.

    Chincilla extrapolation curve

    Estimated compute-optimal scaling, using larger datasets and fewer parameters than previous scaling laws predicted.

    Meanwhile, Google Brain announced their PaLM language model, trained with 540B parameters on 780 billion tokens. That paper shows something similar to the GPT-2 → GPT-3 shift. Performance increases on many tasks that were already handled well, but on some tasks, there are discontinuous improvements, where the increase in scale leads to a larger increase in performance than predicted from small scale experiments.

    PaLM result curves

    Above is Figure 5 of the PaLM paper. Each plot shows model performance on a set of tasks where PaLM’s performance vs model size is log-linear (left), “discontinuous” (middle), or relatively flat (right). I’m not even sure the flat examples are even that flat, they look slightly under log-linear at worst. Again, we can say that loss will go down as model size goes up, but the way that loss manifests in downstream tasks doesn’t necessarily follow the same relationship.

    The emoji to movie and joke explanation results are especially interesting to me. They feel qualitatively better in a way that’s hard to describe, combining concepts with a higher level of complexity than I expect.

    Emoji movie explanation

    Neither of these works have taken the full 1 order of magnitude suggested by prior work, and neither indicates we’ve hit a ceiling on model scaling. As far as I know, no one is willing to predict whether or what qualitatively new capabilities we’ll see from the next large language model. This is worth emphasizing - people genuinely don’t know. Before seeing the results of the PaLM paper, I think you could argue that language models would have more trouble learning math-based tasks, and the results corroborate this (both navigate and mathematical_induction from the figure above are math-based). You could also have predicted that at least one benchmark would get qualitatively better. I don’t see how you could have predicted that english_proverbs and logical_sequence in particular would improve faster than their power low curve.

    The blog post for the Chinchilla model notes that given the PaLM compute budget, they expect you could match it with 140B params if you used a dataset of 3 trillion tokens of language. In other words, there’s room for improvement without changing the model architecture, as long as you crawl more training data. I don’t know how hard that is, but it has far less research uncertainty than anything from the ML side.

    Let’s just say it’s not a good look for anyone claiming deep learning models are plateauing.

    * * *

    That takes us to DALL·E 2.

    DALL-E 2 generations

    On one hand, image generation is something that naturally captures the imagination. You don’t have to explain why it’s cool, it’s just obviously cool. Similar to language generation, progress here might overstate the state of the field, because it’s improving things we naturally find interesting. And yet, I find it hard to say this doesn’t portend something.

    From a purely research standpoint, I was a bit out of the loop on what was state-of-the-art in image generation, and I didn’t realize diffusion based image synthesis was outperforming autoregressive image synthesis. Very crudely, the difference between the two is that diffusion gradually updates the entire image towards a desired target, while autoregressive generation draws each image patch in sequence. Empirically, diffusion has been working better, and some colleagues told me that it’s because diffusion better handles the high-dimensional space of image generation. That seems reasonable to me, but, look, we’re in the land of deep learning. Everything is high-dimensional. Are we going to claim that language is not a high-D problem? If diffusion models are inherently better in that regime, then diffusion models should be taking over more of the research landscape.

    Well, maybe they are. I’ve been messing around with Codex a bit, and would describe it as “occasionally amazing, often frustrating”. It’s great when it’s correct and annoying when it’s not. Almost-correct language is amusing. Almost-correct code is just wrong, and I found it annoying to continually delete bad completions when trying to coax the model to generate better ones. There was a recent announcement of improving Codex to edit and insert text, instead of just completing it. It’s better UX for sure, and in hindsight, it’s likely using the same core technology DALL-E uses for image editing.

    Edit examples

    We’re taking an image and dropping a sofa in it, or we’re taking some text and changing the sentence structure. It’s the same high level problem, and maybe it’s doing diffusion-based generation under the hood.

    * * *

    Where does this leave us?

    In general, there is a lot of hype and excitement about models with a natural language API. There is a building consensus that text is a rich enough input space to describe our intentions towards ML models. It may not be the only input space, but it’s hard to see anything ignoring it. If you believe the thesis that language unlocked humanity’s ability to share complex ideas in short amounts of time, then computers learning what to do based on language should be viewed as a similar sea change in how we interact with ML models.

    It feels like we are heading for a future where more computer systems are “do what I mean”, where we hand more agency to models that we believe have earned the right to that agency. And we’ll do so as long as we can convince ourselves that we understand how these systems work.

    I don’t think anyone actually understands how these systems work. All the model disclosure analysis I’ve read feels like it’s poking the outside of the model and cataloging how the black box responds, without any generalizable lesson aside from “consider things carefully”. Sure, that’s fine for now, but that approach gets harder when your model is capable of more things. I hope people are paying attention.

    Comments
  • MIT Mystery Hunt 2022

    This has spoilers for MIT Mystery Hunt 2022. Spoilers are interspersed and will not be labeled ahead of time.

    So, teammate won Mystery Hunt.

    Since the winner of MIT Mystery Hunt has to write the next year’s hunt, things will be…interesting for the next year. I still need to decide what level of involvement I’ll have in that, but to be safe this post isn’t going to do any speculation about the future of Hunt.

    This year was the first year where teammate didn’t do open signups. In previous years, teammate was around 60-80 people, and based on a team survey after Mystery Hunt 2021, people generally felt the team was growing too big. So for 2022, we did a closed roster of dedicated team members, which was defined as people who helped write Teammate Hunt, or had hunted with teammate in 2021 and at least one previous year. I think this is always a tricky thing to navigate, since the natural state of hunt teams is to grow larger over time, and people will always end up close to whatever line you draw.

    We had 53 people this year, which puts us around the size of Left Out. And this year was a really close race. Left Out was basically tied with us all through Saturday, and Death & Mayhem was ahead of us until they slowed down around Sunday 4 AM. (We…kept going.)

    I don’t think we made any big changes from last year with regards to Sheets norms or remote solving tools. Based on what Wei-Hwa said in the afterparty Discord, I think the reason teammate edged out Left Out was that we were more gung ho about going for the win, whereas Left Out was more uncertain and didn’t fully let go of the brakes until the end. More specifically, Left Out saved all 3 free answers until they were sure about things, hammering them all on Sci-Ficisco. We had used one free answer in New You City, and two free answers in Howtoona, based on our observation that it would be easy to backsolve after we finished the meta. That likely sped up our meta solves and let us make up for our slightly slower puzzle solving average (relative to Left Out and Death & Mayhem).

    I’m not sure how this compares to other teams, but teammate has always had a decent number of people who really like the meta-level strategizing of how to get from the start of Hunt to the end as effectively as possible, and team policy has always been that it’s okay to hunt this way. This means that, for example, you never need to ask permission to backsolve, and every year we’ve had a puzzle that got sniped from someone who was about to finish their forward solve. Depending on viewpoint, this is either fine or horrifying. But, well, it does work, and it can be fun to improvise a good speedrun route.

    Thoughts on Hunt

    Good hunt.

    Fun theming, many puns, solid puzzles, good art, and cool that there was an intermediate tier of accomplishment between the first round and Pen Station. I would have preferred the team interactions to not be recorded ahead of time, since I felt that made it easier to skip the story, but I realize that it’s logistically a lot harder to make that happen while also running New You City.

    So still. Good hunt.

    Pre-Hunt

    Our pre-hunt socials were mostly different groups of 2-4 people solving Star Rats, and I think literally every group backsolved The Rescuers. This probably says something about what puzzles we tend to start working on first. We also played some escape room games and other board games.

    Like last year, we had a #conspiracies Discord channel, where I claimed that it would be important that MLK day was a full moon. Nope, that didn’t matter, but we did find a Scriptnotes episode that seemed too good to be true. It mentioned puzzles, palindromes, and “anagrams of MATE”, and a tweet made on January 17, which was MLK day this year. Based on that and the mention of crossword alignment charts, it really seemed like it was seeding puzzle content. It wasn’t, and in retrospect, the date of the podcast (2 weeks after winning Hunt) should have been a sign that it probably wasn’t important.

    For this hunt, we did three optional in-person hubs, with one in Bay Area, one in Seattle, and one in Boston. This was a bit risky, but I went to the Bay Area one since I felt it was within my risk tolerance. No one was flying in, everyone attending had gotten a booster, masks required, it was in South Bay where people generally take COVID pretty seriously, and you had to show a negative rapid test day of. I’m not too sure how much the in-person hub helped solving, because we still needed to join voice chat to talk to people outside the Bay Area, but it definitely helped on the physical puzzles.

    I had been playing through the Ace Attorney series prior to Hunt. Right now, I’m on the 2nd case of Apollo Justice, which features a noodle shop called Eldoon’s Noodles. The characters mention it’s an almost-palindrome in game, and even talk about “Team Meat” when you inspect it more closely, which added up to an eerie pre-hunt coincidence. And then the first round was called The Investigation! Please, try to convince me this wasn’t a work of the divine.

    Ace Attorney

    The Investigation

    The Missing Piece - First puzzle I worked on in Hunt. I recognized the Palindrome name badge theming right away. I did find the 2009 Star Rats badge during hunt, which was a bit of a trip, but I figured there was no way Palindrome would make a puzzle in the first round depend on previous real-world badges. We never figured out what the years meant - after we counted the lanyard overlaps, we were pretty sure that was correct and figured we had just skipped a step somewhere.

    Where the Wild Things Are - This was the first physical puzzle, and mostly got claimed by the Boston hub, but the Bay Area hub got the jigsaw puzzle, which ended up being one of the longer ones. With no patterned edges, pairing pieces up was pretty tough. We figured out all of the mechanics, but our data was incomplete and the meta got solved before we could patch it up.

    The Ministry

    Harold and the Purple Crayon - fun dataset, although I came in after most of the IDing was done and mostly threw around extraction ideas with other people. I don’t love that you’re actually supposed to throw out the ones that don’t fit, but it is neat to see the final step.

    Oxford Children’s Dictionary - We pretty quickly decided that “one side will be regular definitions, and one side will be jank definitions”, but it took us a long time to determine exactly what form of jank it would be. Once we got a few examples, it was pretty fun, although we ended up skipping about half of the letters out of difficulty, solving from “??? budweiser? + ??? seashore? ???”.

    The Talking Tree - Pretty cute puzzle, and somehow these diagrams were a lot easier to fill out by intuition than the ones in the similar puzzle from Silph Puzzle Hunt. I did think it was funny that one person solving had written a phonetics puzzle for Teammate Hunt. Also I just realized how relevant the title is, it’s literally talking syntax trees.

    Dinotopia - I view puzzles as creating order out of chaos. To borrow an analogy from Alex Rosenthal’s talk, it is a coincidence that the number of piano keys matches the number of constellations, but once you know this coincidence exists, you kind of have to make a puzzle that pretends this coincidence is vitally important. Sometimes you have to do a bunch of work to make the contrived coincidence work, but sometimes the pieces are already there and it feels like you’re just discovering it.

    In this sense, it’s really cool that the writing system referenced in this puzzle is inherently ambiguous if you arrange the symbols in the right way, and that parsing out readable text is a satisfying challenge. I mostly figured out a-has of what we were supposed to do and how to extract, then got lunch while watching other people do all the work, which is really a 10/10 solve experience, would recommend.

    The Ministry - When we unlocked The Ministry, we had 25/25 feeders thanks to backsolves. We noticed “bit = binary” right away, and I mentioned that COLORFUL HEAD could be a predicate for “starts with ROYGBIV”, at which point we quickly inferred all 5 mechanics. Since this was the only puzzle we had left, I got to witness the terrifying sight of Sheets locusts descend on the 5 x 25 bit matrix. I think we filled out all the bits in 30 seconds. Very scary.

    Fruit Around - We didn’t pre-solve the “hungry caterpillar” connection before The Ministry, but we had a few minutes between solving The Ministry and doing the interaction with Palindrome, and we guessed “bookwyrm = very hungry caterpillar” in that gap, so Fruit Around was pretty straightforward for us.

    In general, the construction of The Ministry is pretty impressive. I didn’t even realize that the meta answers were also semantic descriptions of the bookwyrm until after we finished Hunt. There are a lot of layers of constraints going on and it’s nice that the circle closes and it all comes together.

    Noirleans

    I mostly missed this round, aside from…

    Curious and Determined - We ended up having to backsolve this puzzle, since we didn’t see the way to map letters to each clue. However, the realization that “wait it’s these colors” was fantastic, and it was funny we said “The Shawshank Redemption isn’t really any color aside from blue and orange, in the same way that every movie poster is blue and orange. I guess the main character’s named Red…wait a second.”

    Lake Eerie

    Large-scale Anthropomorphism - after IDing a few of the animals, we dumped all of the animals into Google at once, and got search results for taikyoku shogi, which translates to “ultimate chess”, a terrifyingly complicated game played on a 36x36 board. We IDed the appropriate adjectives, guessed the Chinese (well, Japanese) numerals were cluing rows, and figured it was a taikyoku shogi chess problem from hell. That seemed like exactly the level of ridiculousness to expect from Mystery Hunt.

    Taikyoku Shogi

    Now, if you solved the puzzle yourself, you might have noticed one issue - the game that’s used is actually dai shogi, which is played on a 15x15 board. Most piece movement is the same, but notably, the king in taikyoku shogi is allowed to move up to 2 spaces, which made it significantly harder to decide on the best reply. When we got the king out of checkmate according to taikyoku rules, we figured we had made an error somewhere.

    Interestingly, whether you interpret it as taikyoku shogi or dai shogi, you extract FIC first, and we said “it’s probably something like FICTION or FICTIONAL or FICKLE” many times. We tried the first two, came back after someone told us we were using the wrong game, then got stuck with weird letters from a non-optimal mate where the falcon didn’t capture anything. After a closer reading of the rules, we figured out the igui rule to take a piece without moving out of the pin, extracted a K, and realized it was FICKLE an hour after we talked about guessing it. On one hand, we could have saved a lot of grief, but we also found the most unique part of the shogi logic, so I’m not too upset in the end.

    The Graveyard - If I remember correctly, this was a meta that was only unlocked partway through the round. I took a look when it opened, stared a bit, tried to think of appropriately eerie connections, then said “hang on, aren’t these the Pacman ghost colors?” I confirmed the year lined up, someone else found the Japanese ghost names, and then we went on a backsolve spree. It was a little fuzzy, I believe we ended up solving from ROFA, TIONS, and a penciled in IRAPP on the group we had 3/4 on. I do think it’s a little odd that the Ghostbusters group was ordered by credits, when the names appear after 1/2/3/4 letters of the string.

    The Quest Coast

    A Number of Games - When this unlocked, I figured I would have to work on it because I’d done some combinatorial game theory before. But then it turned out teammate is just a bunch of nerds, because lots of us had done combinatorial game theory before. I drifted off of this puzzle in favor for…

    Something Command - I’m heavily biased, but this is my favorite puzzle from Hunt. We did the math on the Eldrazi one first, in case the indexing was based on how much over lethal you could get. After getting lethal exactly with 2 different lines (attacking with Nettle Drone or not), we believed the extraction would only be based on the missing card name, solving at 4/7 correct cards. I think the puzzle is doable if you don’t know MTG, but knowing MTG definitely made it faster, and it was cool that you could intuit your way to figuring out the missing card even if you didn’t exactly work out all the math.

    Sorcery for Dummies - Cool puzzle - I mostly came in after all the individual letters were IDed, to try to figure out paths for each monster. This ended up overlapping with technical difficulties that took down interactive puzzles though, and during the downtime I started on another puzzle. Once the backend went back-up, I decided I’d rather finish what I started and this puzzle was completed by the time I looked at it again.

    Once Upon a Time in the Quest - When I went to bed around 3 AM Saturday morning, we hadn’t made much progress on this meta. I woke up at 7 AM, earlier than I planned to. I was going to go back to sleep, but I saw a message asking about Dinotopia’s mechanics. The overnight crew had broken into the Quest meta. I got on voice chat, described Dinotopia, then went back to sleep for an hour. Waking up to more forward solves in Quest, I read over the work, saw that THE DARK hadn’t been done yet, considered clue phrases that needed to start “THE DARK” instead of just “DARK”, proposed “The Dark Knight”, and was pleasantly surprised when that was correct. That got us to 6/8 of the Step 3s, which was enough to wheel of fortune the answer. I still think it’s a bit weird that there’s an extra IT’S at the start, but maybe it makes it easier to find good words or get to a nice round length.

    New You City

    Does Any Kid Still Do This Anymore? - I did this as my last puzzle before sleeping for Saturday, so it was mostly about grinding trigrams in a daze. I got Antioch in the first one from Nutrimatic, and decided that was fake until we did a few more examples. I remember saying, “I can confirm that kids still take the BART.” That’s technically true, but the meaning of the title became clearer after we got the main a-ha.

    Bad Beginnings - We didn’t get this until after we won Hunt, but I’m mentioning it here anyways because the dataset used is great and it’s worth revisiting.

    Proof by Induction - I came in at the end, after all the ideas were figured out, to help ID some of the missing lines. I hadn’t heard of this language before and it’s a pretty good one.

    Recipeoria

    Sunday Dinner - Okay, I didn’t actually work on this puzzle, but I clutched the finale. We have a channel where people can crowdsource help on anything, and the call for help was “we have a cluephrase EIGHT PAST, about a NYT Sunday crossword with a food theming”. I tossed some terms into Google and said “SPAGHETTI”? It was right. I didn’t understand the clue at all. (They later explained it was an anagram and wordplays.com must have indexed a cryptic that used it before.)

    Heartford

    Somehow I saw nothing in this round, not even the meta.

    Whoston

    Rotten Little Scamps - I pitched in a bit to some of this puzzle, which sparked an ask for “does anyone have an Icelandic Nutrimatic”, which might be the best request I heard all Hunt. We did not find an Icelandic Nutrimatic.

    Reference Point

    You Took the Fifth - I got tagged to work on this puzzle the moment it opened. More of a word puzzle than an Ace Attorney puzzle, but still good, and it was cool once we figured out the reason it was presented as an objection.lol. It did take us quite a while to decide on the right interpretation of each line, but I believe that was part of the intended difficulty.

    On Second Thought - Now that I think about it, I haven’t seen the phonetic step in very many puzzles before, but it didn’t feel too bad when we were working on this. The puzzle is kinda ISISy, but not in a bad way.

    Diced Turkey Hash - Something Command was my favorite puzzle of the Hunt, but Diced Turkey Hash was by far the most memorable. This was the second physical puzzle, and I do think it’s a shame it unlocked so late. We figured out the binary grid pretty quickly, as well as the Mayan numerals and Tarot symbols. A bit more work got us the Dzonghka numbers as well. From there, we figured that “face-to-face” was a clue indicating we should take a walk between adjacent faces of the d20s. Based on the given text, we started identifying mappings between pairs of dice, noticing more relevant text every time we reread the given text. It’s impressively dense, and the “director’s chair” realization was great.

    We struggled with it a lot, but given it was one of the least solved puzzles in Hunt, it wasn’t just us. Our difficulties came from two rabbit holes. First, the flavor about going on a journey really felt like we needed to trace a path along both pairs of dice, but the final extraction only relies on the mapping between the two dice, which could be determined without looking at the face topology. The given text about the journey definitely helped us confirm we were doing the right thing, but we also managed to translate it into a reasonably unique path that visited each face once. The most extreme argument was when we did the Tarot dice. We aimed for a path through all prime numbers in descending order, taking the shortest path when possible. At one point, the two primes were on completely opposite faces, and there was no unique shortest path, but I argued that “be a walker, not a rider - but do not wait” could be interpreted as “take the shortest path that does not go through The Chariot”, which left only one path on the d20. This was not the intention of that text, but it did lead us to writing down the correct pairing (via a circuitous route).

    The other rabbit hole is something Palindrome definitely did not predict. We had rolled the dice, and didn’t spot anything weird in what face it landed on, but then we decided to try rolling them into water, which would magnify any weight difference. To our surprise, there was a consistent face that would point up. Given this didn’t show up at all when rolling onto a table, I was not a believer, until I tried for myself. I’ve recreated a video below.

    I was still not convinced, and argued that other d20s might show the same behavior, even if they were fair. One person drove home and back to bring control group d20s, while we recruited help from remote solvers.

    Discord screenshot

    All the control group d20s landed on different faces between trials. And thus our solve group was split in twain - we could not deny the dice were weighted, but did it matter to the puzzle? Or was it just a manufacturing defect?

    Arguments for it mattering:

    • This could be a reason this was a physical puzzle.
    • The dice feel high quality, which is evidence against manufacturing defects.
    • Previous Mystery Hunt puzzles have used gimmicked dice before.
    • The given text mentioned “an ocean of blue” - perhaps this was a hint to use water to figure out the dice were weighted.

    Arguments against:

    • The distribution of faces when rolled onto a table is not clearly weighted (tested by doing the eye test of rolling the dice many many times). It would be very easy to miss the weighting unless you thought of the test we did.
    • Nothing about the puzzle presentation suggested the die would be weighted.
    • The gimmicked dice from the previous Mystery Hunt puzzle were more clearly gimmicked than our dice.

    No one was making much progress convincing anybody else, so we decided to settle it by seeing whether the virtual version of Diced Turkey Hash left any way for a virtual solver to learn the dice were weighted. Only problem was that it wouldn’t unlock for another hour. So, we spun wheels a bit, until the virtual version unlocked and we learned that no, the virtual version only showed the faces of the each dice, and the weighting did not matter in the slightest.

    It wasn’t real, it certainly wasn’t intentional, but it was a very entertaining argument, so thanks Palindrome, and thanks to the sponsor HRT for manufacturing slightly unfair dice.

    Reference Desk - We random anagrammed the answer, and just could not figure out the ordering step when trying to backsolve. Without the ordering idea, backsolving was pretty impossible, so we just left the round incomplete. This likely contributed to our lower total solve count in the end.

    Howtoona

    How to Install a Handle - The moment this unlocked, I scrolled down the page, and said “YOOOOO IT’S MATHDANIEL SQUIRREL”. I had actually emailed the relevant dataset to Ryan North a week before Hunt thanks to this Dinosaur Comics, so it was fresh in my mind. I can confirm that he didn’t know about the bracket before, but does now.

    How to Make the Right Move - The pair of us that first looked at this puzzle immediately figured out the puzzle mashup, then tagged other people to work on it. I credit working on the Sleeping Beauty meta from Inception Hunt. I also believe we sent an errata request about board 9, claiming it was impossible. It wasn’t, that board was just too bigbrain. We got it eventually.

    How to Require Some More Assembly (Picture Puzzle 3): I got to come in to save extraction, which is always fun. Saw a completed grid and a highlighted entry, interpreted it slightly differently than the group that filled out the grid, and then we solved. Although it took us a while because we read “invert Y” as “flip across Y-axis”, which made the final image look more like a vampire bat than the actual answer.

    How to Do Quality Reviews - Helped with initial data entry, but then it hit the part of the puzzle that was not very parallelizable and I drifted back to…

    How to Find a Component - I opened the meta at the start of the round, didn’t see how to do anything, and left. A while later, I opened our meta sheet again, and saw someone had written “these numbers are the alphabetical permutation of the given words”. Seeing that all our answers were 9 letters long with unique letters, I wrote up my theories for how I wanted the meta to work, then tried starting from 3/9. We didn’t have enough pieces to place anything, but I suspected the mechanic was very constraining on the answers. I was right. After applying the constraints that the answer was 9 letters, all letters were unique, and numbers in each column should be unique, my Scrabble dictionary was reduced to just 226 words.

    A teammate and I spent the next 30 minutes looking for thematic backsolves on abandoned puzzles, and failed to backsolve anything. In retrospect, most of the puzzle answers were not very thematic, and I suspect this was intentional. The constraints are really strong if you squeeze them for everything you can.

    Around this time, we were down to 1 manuscrip left, and win-comm (the group watching overall strategy) proposed using the free answer in Howtoona. We had gone up to 5/9 Howtoona answers, and our estimate was that we’d need 1 more answer to crack the meta open. However, multiple puzzles in Howtoona were moving forward without getting stuck, and if one of those got finished, we could instead have an extra feeder in Sci-Fi. We knew that teammate was in contention to win (since we had gotten a phone call saying so), but we also knew we were not in the lead on unlock progress (because one puzzle had an errata issued 30 minutes before we unlocked it). After some discussion, we said to give us 15-20 minutes to solve the meta, then ask again. About 20 minutes later, we were still at 5/9, increasingly confident we’d need a 6th answer, and unwilling to backsolve for it because our candidate word list was too big. Given that, we pulled the trigger and redeemed our manuscrip on a Howtoona puzzle no one was looking at…right as the 6th feeder was forward solved. At 7/9 it was pretty trivial to backsolve the remaining answers and we finished the meta within 5 minutes.

    I think the win-Hunt play would have been to use the manuscrip 20 minutes before we did, and the max-fun play would have been to use it on a Sci-Fi puzzle and let people finish their Howtoona puzzles. Instead we did something in-between that was somewhat unsatisfying on both ends. It wasn’t a perfect call, but it was a tricky decision given that teammate agreed we were both going to have fun and go for the win. I think the one we came to was acceptable.

    In case you were wondering, that previous section is what I mean by “meta-level strategizing”. A conversation like that tends to happens every year.

    The Plot Device

    (This is wildly non-chronological, because I worked on this throughout the Hunt, but this felt like the approximate time where I did the most work on it.)

    Narnia Beeswax - We got our first Plot Device solve in the middle of this puzzle. My understanding is that the group working on Order of Apparitions was getting really fed up with the “cluephrase” they had, and tried guessing the whole thing in frustration. In Narnia, we confirmed that we could submit single answers, and good thing too - it took us so many tries to solve EQUALLY SNUG. In general, the answer guess timeout was more annoying in this round, and we often spent 5 minutes locked out from guesses.

    A Crying Shamus - I was not good enough to help on any of the cryptics, but I was good enough at searching all the words with “mystery” and “detective” to figure out the answer ordering. I was also good enough at reading last letters to finish the puzzle. For whatever reason the last letters popped out for me more than the first ones.

    Synonym Toast - Just a neat word puzzle in general, although we ran into an errata on enumeration. I think we explained the issue to Palindrome quite poorly, because it took a few iterations for us to explain what we thought the error was. Sorry!

    Step by Step Ladder - We had tried to presolve the Plot Device meta by building a word ladder, so when we unlocked the puzzle that was actually a word ladder, we sped through it pretty quickly (10 minutes based on our solve info).

    Sci-Ficisco

    I missed everything from this round aside from the meta, but I’ve heard good things about both Replicator Droid and Lists of Large Integers

    Communicating With the Aliens - When we got to this puzzle, it was about 3:30 AM on the West Coast, and I was listening to ideas about lines of symmetry without really having much input. I felt I needed some sleep but also knew we were really close to finishing, so I decided to take a 30 minute nap and hope we’d have 1 or 2 more feeders by the time I woke up. I set one alarm (and 5 backup alarms), woke up at 4 AM, physically got out of bed at 4:20 AM (nice), and tried again. It took us an embarrassingly long time to reorder our puzzles in the given round order, and after doing so it also took a while to get just 1 of our answers in the grid. We had a lot of competing theories about what the symbols meant, none of which let us place the DES MOINES answer. Eventually, we got it in, and once we noticed how the three-dot symbols lined up, the rest was pretty fast. I decided to start the Nutrimatic query while most people were placing words, and mentioned a lot of the regexes were matching phrases starting with FIRST, which oriented us to the right pun idea.

    Endgame

    We entered endgame knowing that the coin hadn’t been found yet, but we also knew we were probably very close with other teams, given we were behind on puzzle unlocks at one point. Our best hope was to assume we had leapfrogged teams on the metas and could finish endgame before they caught up.

    Battery Pack - Unlike other teams, we did not pre-solve the Plot Device meta, but it was pretty clear what to do once we saw the shape on the meta page. We complicated it for ourselves a bit by assuming it would mix-and-match across all answers, rather than coming pre-grouped.

    The Tollbooth - It took us about 1.5 hours to finish this puzzle, and I’m really curious what the solve time was for other finishing teams. We split into three groups: one to find books, one to solve book titles, and one to solve the printer’s devilry clues. We got most of the data, and assumed the extraction would be based on the pairing the puzzles described with the printer’s devilry puzzle. When we failed to notice anything special from that pairing, we tried increasingly conspiratorial ideas, including “take random letters out of each word to spell a punny phrase”, which got surprisingly far on building a reasonable yet totally incorrect answer.

    With the entire Hunt as a potential data source, we tried a bunch of things, like the number of lines on each PDF, or the map on the Pen Station round page. After what felt like an eternity, someone who decided “borders” was important noticed the leaves on the PDF. We did the indexing, and won Hunt.

    Personally, it didn’t really feel real. For me, it felt like, “oh, we won, wooooo, I’m going to sleep.” Even now, Mystery Hunt 2023 feels like this thing that doesn’t exist yet, and it’s hard for me to have an opinion on it before it becomes more solid.

    Maybe I’m a little jaded, but winning Mystery Hunt didn’t light any big fires of motivation for me. I’ve already exorcised all my puzzle demons and written the puzzles I had to write. The ones left in my puzzle ideas file are generally uninspiring and don’t feel good enough for Mystery Hunt. After writing puzzles fairly continuously for 3 years (MLP: Puzzles are Magic into Teammate Hunt 2020 into Teammate Hunt 2021), I have a better sense of how easy it is for me to let puzzles consume all my free time and destroy my ability to make small talk or start any other projects that need concentrated effort. Sure, making puzzles is rewarding, but lots of things are rewarding, and I feel I need to set stricter boundaries on the time I allocate to this way of life - boundaries that are likely to get pushed the hardest by working on Mystery Hunt of all things.

    The opportunity to write for Mystery Hunt doesn’t come around very often, which makes me think I should go for it, but I’m not expecting to write anything super crazy. Hunt is Hunt, and I am cautiously optimistic that I have enough experience with the weight of expectations to get through the writing process okay.

    Comments