Jay Bailey's Shortform
post by Jay Bailey · 2022-08-01T02:05:42.506Z · LW · GW · 6 commentsContents
6 comments
6 comments
Comments sorted by top scores.
comment by Jay Bailey · 2022-08-01T02:05:42.748Z · LW(p) · GW(p)
Speedrunners have a tendency to totally break video games in half, sometimes in the strangest and most bizarre ways possible. I feel like some of the more convoluted video game speedrun / challenge run glitches out there are actually a good way to build intuition on what high optimisation pressure (like that imposed by a relatively weak AGI) might look like, even at regular human or slightly superhuman levels. (Slightly superhuman being a group of smart people achieving what no single human could)
Two that I recommend:
https://www.youtube.com/watch?v=kpk2tdsPh0A - Tool-assisted run where the inputs are programmed frame by frame by a human, and executed by a computer. Exploits idiosyncracies in Super Mario 64 code that no human could ever use unassisted in order to reduce the amount of times the A button needs to be pressed in a run. I wouldn't be surprised if this guy knows more about SM64 code than the devs at this point.
https://www.youtube.com/watch?v=THtbjPQFVZI - A glitch using outside-the-game hardware considerations to improve consistency on yet another crazy in-game glitch. Also showcases just how large the attack space is.
These videos are also just incredibly entertaining in their own right, and not ridiculously long, so I hypothesise that they're a great resource to send more skeptical people if they understand the idea of AGI but are systematically underestimating the difference between "bug-free" (Program will not have bugs during normal operation) and secure. (Program will not have bugs when deliberately pushed towards narrow states designed to create bugs)
For a more serious overview, you could probably find obscure hardware glitches and such to achieve the same lesson.
Replies from: Dagon↑ comment by Dagon · 2022-08-01T16:03:53.368Z · LW(p) · GW(p)
I'm not sure I agree that it's a useful intuition pump for the ways an AGI can surprisingly-optimize things. They're amusing, but fundamentally based on out-of-game knowledge about the structure of the game. Unless you're positing a simulation hypothesis, AND that AGI somehow escapes the simulation, it's not really analogous.
Replies from: gwern↑ comment by gwern · 2022-08-01T16:38:35.738Z · LW(p) · GW(p)
They're amusing, but fundamentally based on out-of-game knowledge about the structure of the game.
Evolutionary and DRL methods are famous for, model-free, within the game, finding exploits and glitches. There's also chess endgame databases as examples.
comment by Jay Bailey · 2023-03-03T20:42:45.310Z · LW(p) · GW(p)
A frame that I use that a lot of people I speak to seem to find A) Interesting and B) Novel is that of "idiot units".
An Idiot Unit is the length of time it takes before you think your past self was an idiot. This is pretty subjective, of course, and you'll need to decide what that means for yourself. Roughly, I consider my past self to be an idiot if they have substantially different aims or are significantly less effective at achieving them. Personally my idiot unit is about two years - I can pretty reliably look back in time and think that compared to year T, Jay at year T-2 had worse priorities or was a lot less effective at pursuing his goals somehow.
Not everyone has an Idiot Unit. Some people believe they were smarter ten years ago, or haven't really changed their methods and priorities in a while. Take a minute and think about what your Idiot Unit might be, if any.
Now, if you have an Idiot Unit for your own life, what does that imply?
Firstly, hill-climbing heuristics should be upweighted compared to long-term plans. If your Idiot Unit is U, any plan that takes more than U time means that, after U time, you're following a plan that was designed by an idiot. Act accordingly.
That said, a recent addition I have made to this - you should still make long-term plans. It's important to know which of your plans are stable under Idiot Units, and you only get that by making those plans. I don't disagree with my past self about everything. For instance, I got a vasectomy at 29, because not wanting kids had been stable for me for at least ten years, so I don't expect more Idiot Units to change this.
Secondly, if you must act according to long-term plans (A college/university degree takes longer than U for me, especially since U tends to be shorter when you're younger) try to pick plans that preserve or increase optionality. I want to give Future Jay as many options as possible, because he's smarter than me. When Past Jay decided to get a CS degree, he had no idea about EA or AI alignment. But a CS degree is a very flexible investment, so when I decided to do something good for the world, I had a ready-made asset to use.
Thirdly, longer-term investments in yourself (provided they aren't too specific) are good. Your impact will be larger a few years down the track, since you'll be smarter then. Try asking what a smarter version of you would likely find useful and seek to acquire that. Resources like health, money, and broadly-applicable knowledge are good!
Fourthly, the lower your Idiot Unit is, the better. It means you're learning! Try to preserve it over time - Idiot Units naturally grow with age so if yours stands still, you're actually making progress.
I'm not sure if it's worth writing up a whole post on this with more detail and examples, but I thought I'd get the idea out there in Shortform.
comment by Jay Bailey · 2023-09-27T01:09:56.417Z · LW(p) · GW(p)
One of the core problems of AI alignment is that we don't know how to reliably get goals into the AI - there are many possible goals that are sufficiently correlated with doing well on training data that the AI could wind up optimising for a whole bunch of different things.
Instrumental convergence claims that a wide variety of goals will lead to convergent subgoals such that the agent will end up wanting to seek power, acquire resources, avoid death, etc.
These claims do seem a bit...contradictory. If goals are really that inscrutable, why do we strongly expect instrumental convergence? Why won't we get some weird thing that happens to correlate with "don't die, keep your options open" on the training data, but falls apart out of distribution?