Did they or didn't they learn tool use? 2021-07-29T13:26:32.031Z
How much compute was used to train DeepMind's generally capable agents? 2021-07-29T11:34:10.615Z
DeepMind: Generally capable agents emerge from open-ended play 2021-07-27T14:19:13.782Z
What will the twenties look like if AGI is 30 years away? 2021-07-13T08:14:07.387Z
Taboo "Outside View" 2021-06-17T09:36:49.855Z
Vignettes Workshop (AI Impacts) 2021-06-15T12:05:38.516Z
ML is now automating parts of chip R&D. How big a deal is this? 2021-06-10T09:51:37.475Z
What will 2040 probably look like assuming no singularity? 2021-05-16T22:10:38.542Z
How do scaling laws work for fine-tuning? 2021-04-04T12:18:34.559Z
Fun with +12 OOMs of Compute 2021-03-01T13:30:13.603Z
Poll: Which variables are most strategically relevant? 2021-01-22T17:17:32.717Z
Birds, Brains, Planes, and AI: Against Appeals to the Complexity/Mysteriousness/Efficiency of the Brain 2021-01-18T12:08:13.418Z
How can I find trustworthy dietary advice? 2021-01-17T13:11:54.158Z
Review of Soft Takeoff Can Still Lead to DSA 2021-01-10T18:10:25.064Z
DALL-E by OpenAI 2021-01-05T20:05:46.718Z
Dario Amodei leaves OpenAI 2020-12-29T19:31:04.161Z
Against GDP as a metric for timelines and takeoff speeds 2020-12-29T17:42:24.788Z
How long till Inverse AlphaFold? 2020-12-17T19:56:14.474Z
Incentivizing forecasting via social media 2020-12-16T12:15:01.446Z
What are the best precedents for industries failing to invest in valuable AI research? 2020-12-14T23:57:08.631Z
What technologies could cause world GDP doubling times to be <8 years? 2020-12-10T15:34:14.214Z
The AI Safety Game (UPDATED) 2020-12-05T10:27:05.778Z
Is this a good way to bet on short timelines? 2020-11-28T12:51:07.516Z
Persuasion Tools: AI takeover without AGI or agency? 2020-11-20T16:54:01.306Z
How Roodman's GWP model translates to TAI timelines 2020-11-16T14:05:45.654Z
How can I bet on short timelines? 2020-11-07T12:44:20.360Z
What considerations influence whether I have more influence over short or long timelines? 2020-11-05T19:56:12.147Z
AI risk hub in Singapore? 2020-10-29T11:45:16.096Z
The date of AI Takeover is not the day the AI takes over 2020-10-22T10:41:09.242Z
If GPT-6 is human-level AGI but costs $200 per page of output, what would happen? 2020-10-09T12:00:36.814Z
Where is human level on text prediction? (GPTs task) 2020-09-20T09:00:28.693Z
Forecasting Thread: AI Timelines 2020-08-22T02:33:09.431Z
What if memes are common in highly capable minds? 2020-07-30T20:45:17.500Z
What a 20-year-lead in military tech might look like 2020-07-29T20:10:09.303Z
Does the lottery ticket hypothesis suggest the scaling hypothesis? 2020-07-28T19:52:51.825Z
Probability that other architectures will scale as well as Transformers? 2020-07-28T19:36:53.590Z
Lessons on AI Takeover from the conquistadors 2020-07-17T22:35:32.265Z
What are the risks of permanent injury from COVID? 2020-07-07T16:30:49.413Z
Relevant pre-AGI possibilities 2020-06-20T10:52:00.257Z
Image GPT 2020-06-18T11:41:21.198Z
List of public predictions of what GPT-X can or can't do? 2020-06-14T14:25:17.839Z
Preparing for "The Talk" with AI projects 2020-06-13T23:01:24.332Z
Reminder: Blog Post Day III today 2020-06-13T10:28:41.605Z
Blog Post Day III 2020-06-01T13:56:10.037Z
Predictions/questions about conquistadors? 2020-05-22T11:43:40.786Z
Better name for "Heavy-tailedness of the world?" 2020-04-17T20:50:06.407Z
Is this viable physics? 2020-04-14T19:29:28.372Z
Blog Post Day II Retrospective 2020-03-31T15:03:21.305Z
Three Kinds of Competitiveness 2020-03-31T01:00:56.196Z
Reminder: Blog Post Day II today! 2020-03-28T11:35:03.774Z


Comment by Daniel Kokotajlo (daniel-kokotajlo) on How much compute was used to train DeepMind's generally capable agents? · 2021-07-31T07:51:47.339Z · LW · GW

Huh, thanks! I guess my guesstimate is wrong then. So should I multiply everything by 8?

Comment by Daniel Kokotajlo (daniel-kokotajlo) on How much compute was used to train DeepMind's generally capable agents? · 2021-07-31T07:51:01.836Z · LW · GW

I did, sorry -- I guesstimated FLOP/step and then figured parameters is probably a bit less than 1 OOM less than that. But since this is recurrent maybe it's even less? IDK. My guesstimate is shitty and I'd love to see someone do a better one!

Comment by Daniel Kokotajlo (daniel-kokotajlo) on How much compute was used to train DeepMind's generally capable agents? · 2021-07-30T15:08:58.740Z · LW · GW

Michael Dennis tells me that population-based training typically sees strong diminishing returns to population size, such that he doubts that there were more than one or two dozen agents in each population/generation. This is consistent with AlphaStar I believe, where the number of agents was something like that IIRC...

Anyhow, suppose 30 agents per generation. Then that's a cost of $5,000/mo x 1.3 months x 30 agents = $195,000 to train the fifth generation of agents. The previous two generations were probably quicker and cheaper. In total the price is probably, therefore, something like half a million dollars of compute?

This seems surprisingly low to me. About one order of magnitude less than I expected. What's going on? Maybe it really was that cheap. If so, why? Has the price dropped since AlphaStar? Probably... It's also possible this just used less compute than AlphaStar did...

Comment by Daniel Kokotajlo (daniel-kokotajlo) on DeepMind: Generally capable agents emerge from open-ended play · 2021-07-30T09:17:27.753Z · LW · GW

Let's define AGI as "AI that is generally intelligent, i.e. it isn't limited to a narrow domain, but can do stuff and think stuff about a very wide range of domains."

Human-level AGI (Sometimes confusingly shortened to "AGI") is AGI that is similarly competent to humans at a similarly large-and-useful range of domains.

My stance is that GPT-3 is AGI, but not human-level AGI. (Not even close).

I'd also add agency as an important concept -- an AI is agenty if it behaves in a goal-directed way. I don't think GPT-3 is agenty. But unsurprisingly, lots of game-playing AIs are agenty. AlphaGo was an agenty narrow AI. I think these new 'agents' trained by DeepMind are agenty AGI, but just extremely crappy agenty AGI. There is a wide domain they can perform in --but it's not that wide. Not nearly as wide as the human range. And they also aren't that competent even within the domain.

Thing is though, as we keep making these things bigger and train them for longer on more diverse data... it seems that they will become more competent, and the range of things they do will expand. Eventually we'll get to human-level AGI, though it's another question exactly how long that'll take.

Comment by Daniel Kokotajlo (daniel-kokotajlo) on How much compute was used to train DeepMind's generally capable agents? · 2021-07-30T09:02:58.983Z · LW · GW

Also for comparison, I think this means these models were about twice as big as AlphaStar. That's interesting.

Comment by Daniel Kokotajlo (daniel-kokotajlo) on Rob B's Shortform Feed · 2021-07-30T08:13:43.255Z · LW · GW

For all I know you are right about Yudkowsky's pre-2011 view about deep math. However, (a) that wasn't Bostrom's view AFAICT, and (b) I think that's just not what this OP quote is talking about. From the OP:

I feel like a bunch of people have shifted a bunch in the type of AI x-risk that worries them (representative phrase is "from Yudkowsky/Bostrom to What Failure Looks Like part 2 part 1") and I still don't totally get why.

It's Yudkowsky/Bostrom, not Yudkowsky. And it's WFLLp1, not p2. Part 2 is the one where the AIs do a treacherous turn; part 1 is where actually everything is fine except that "you get what you measure" and our dumb obedient AIs are optimizing for the things we told them to optimize for rather than for what we want.

I am pretty confident that WFLLp1 is not the main thing we should be worrying about; WFLLp2 is closer, but even it involves this slow-takeoff view (in the strong sense, in which economy is growing fast before the point of no return) which I've argued against. I do not think that the reason people shifted from "yudkowsky/bostrom" (which in this context seems to mean "single AI project builds AI in the wrong way, AI takes over world" and to WFLLp1 is that people rationally considered all the arguments and decided that WFLLp1 was on balance more likely. I think instead that probably some sort of optimism bias was involved, and more importantly win by default (Yud + Bostrom stopped talking about their scenarios and arguing for them, whereas Paul wrote a bunch of detailed posts laying out his scenarios and arguments, and so in the absence of visible counterarguments Paul wins the debate by default). Part of my feeling about this is that it's a failure on my part; when Paul+Katja wrote their big post on takeoff speeds I disagreed with it and considered writing a big point-by-point response, but never did, even after various people posted questions asking "has there been any serious response to Paul+Katja?"

Comment by Daniel Kokotajlo (daniel-kokotajlo) on DeepMind: Generally capable agents emerge from open-ended play · 2021-07-30T07:58:34.162Z · LW · GW
We need to understand information encoding in the brain before we can achieve full AGI.

I disagree; it seems that we are going to brute-force AGI by searching for it, rather than building it, so to speak. Stochastic gradient descent on neural networks is basically a search in circuit-space for a circuit that does the job.

The machine learning stuff comes with preexisting artificial encoding. We label stuff ourselves.

I'm not sure what you mean by this but it seems false to me. "Pretraining" and "unsupervised learning" are a really big deal these days. I'm pretty sure lots of image classifiers, for example, generate their own labels for things basically, because they are just trained to predict stuff and then in order to do so they end up coming up with their own categories and concepts. GPT-3 et al did this too.

Without innate information encoding for language, infants won't be learning much about the world through language, let alone picking up an entirely language from purely listening experiences.

GPT-3 begins as a totally blank slate. Yet it is able to learn language, and quite a lot about the world through language. It can e.g. translate between English and Chinese even though all it's done is read loads of text. This strongly suggests that whatever innate stuff is present in the human brain is nice but nonessential, at least for the basics like learning language and learning about the world.

I'd like to be corrected if I'm wrong on AGI. I've only read a little about it back in my freshman years in college. I'm sure there has been a lot of development since then, and I'd like to learn from this community. From my experience of reading about AGI, it's still dealing with more or less the confines of the computational statistics nature of intelligence.

I don't know when you went to college, but a lot has changed in the last 10 years and in the last 2-5 years especially. If you are looking for more stuff to read on this, I recommend Gwern on the Scaling Hypothesis.

Comment by Daniel Kokotajlo (daniel-kokotajlo) on DeepMind: Generally capable agents emerge from open-ended play · 2021-07-29T22:28:49.659Z · LW · GW


Your view is still common, but decreasingly so in the ML community and almost nonexistent on LW I think. My opinion is similar to Abram's, Yair's, and Quintin's. I'd be happy to say more if you like!

Comment by Daniel Kokotajlo (daniel-kokotajlo) on DeepMind: Generally capable agents emerge from open-ended play · 2021-07-29T22:18:43.240Z · LW · GW

As a counterpoint, when I feel like a circle jerk downvote is happening I usually upvote to counteract it. I did in this case with your OP, even though I think it's wrong. And apparently I'm not the only one who does so. So it's not all bad... but yeah, I agree it is disappointing. I don't think your post was low-quality, it was just wrong. Therefore not deserving of a negative score, instead deserving of thoughtful replies.

Comment by Daniel Kokotajlo (daniel-kokotajlo) on Did they or didn't they learn tool use? · 2021-07-29T17:08:37.640Z · LW · GW

Nice, I missed that! Thanks!

Comment by Daniel Kokotajlo (daniel-kokotajlo) on Did they or didn't they learn tool use? · 2021-07-29T16:02:55.876Z · LW · GW

None of the prompts tell it what to do, they aren't even in english. (Or so I think? correct me if I'm wrong!) Instead they are in propositional logic, using atoms that refer to objects, colors, relations, and players. They just give the reward function in disjunctive normal form (i.e. big chain of disjunctions) and present it to the agent to observe.

Comment by Daniel Kokotajlo (daniel-kokotajlo) on Did they or didn't they learn tool use? · 2021-07-29T16:01:02.237Z · LW · GW

In that case, the thing in the paper must be a typo, because the "Tool Use" graph here is clearly >0 reward, even for the 1G agent.

Comment by Daniel Kokotajlo (daniel-kokotajlo) on How much compute was used to train DeepMind's generally capable agents? · 2021-07-29T14:01:43.301Z · LW · GW

I have a guesstimate for number of parameters, but not for overall compute or dollar cost:

Each agent was trained on 8 TPUv3's, which cost about $5,000/mo according to a quick google, and which seem to produce 90 TOPS, or about 10^14 operations per second. They say each agent does about 50,000 steps per second, so that means about 2 billion operations per step. Each little game they play lasts 900 steps if I recall correctly, which is about 2 minutes of subjective time they say (I imagine they extrapolated from what happens if you run the game at a speed such that the physics simulation looks normal-speed to us). So that means about 7.5 steps per subjective second, so each agent requires about 15 billion operations per subjective second.

So... 2 billion operations per step suggests that these things are about the size of GPT-2, i.e. about the size of a rat brain? If we care about subjective time, then it seems the human brain maybe uses 10^15 FLOP per subjective second, which is about 5 OOMs more than these agents.

Comment by Daniel Kokotajlo (daniel-kokotajlo) on Did they or didn't they learn tool use? · 2021-07-29T13:35:45.054Z · LW · GW

Also, what does the "1G 38G 152G" mean in the image? I can't tell. I would have thought it means number of games trained on, or something, except that at the top it says 0-Shot.

Comment by Daniel Kokotajlo (daniel-kokotajlo) on Rob B's Shortform Feed · 2021-07-29T09:38:41.395Z · LW · GW

Ah, this is helpful, thanks -- I think we just have different interpretations of Bostrom+Yudkowsky. You've probably been around before I was and read more of their stuff, but I first got interested in this around 2013, pre-ordered Superintelligence and read it with keen interest, etc. and the scenario you describe as mine is what I always thought Bostrom+Yudkowsky believed was most likely, and the scenario you describe as theirs -- involving "deep math" and "one hard step at the end" is something I thought they held up as an example of how things could be super fast, but not as what they actually believed was most likely.

From what I've read, Yudkowsky did seem to think there would be more insights and less "just make blob of compute bigger" about a decade or two ago, but he's long since updated towards "dear lord, people really are just going to make big blobs of inscrutable matrices, the fools!" and I don't think this counts as a point against his epistemics because predicting the future is hard and most everyone else around him did even worse, I'd bet.

Comment by Daniel Kokotajlo (daniel-kokotajlo) on Rob B's Shortform Feed · 2021-07-28T09:07:12.912Z · LW · GW

Likewise! I'm up for a video call if you like. Or we could have a big LW thread, or an email chain. I think my preference would be a video call. I like Walled Garden, we could do it there and invite other people maybe. IDK.

Comment by Daniel Kokotajlo (daniel-kokotajlo) on Rob B's Shortform Feed · 2021-07-28T09:06:03.953Z · LW · GW

It's a combination of not finding Paul+Katja's counterarguments convincing (AI Impacts has a slightly different version of the post, I think of this as the Paul+Katja post since I don't know how much each of them did), having various other arguments that they didn't consider, and thinking they may be making mistakes in how they frame things and what questions they ask. I originally planned to write a line-by-line rebuttal of the Paul+Katja posts, but instead I ended up writing a sequence of posts that collectively constitute my (indirect) response. If you want a more direct response, I can put it on my list of things to do, haha... sorry... I am a bit overwhelmed... OK here's maybe some quick (mostly cached) thoughts:

1. What we care about is point of no return, NOT GDP doubling in a year or whatever.

2. PONR seems not particularly correlated with GDP acceleration time or speed, and thus maybe Paul and I are just talking past each other -- he's asking and answering the wrong questions.

3. Slow takeoff means shorter timelines, so if our timelines are independently pretty short, we should update against slow takeoff. My timelines are independently pretty short. (See my other sequence.) Paul runs this argument in the other direction I think; since takeoff will be slow, and we aren't seeing the beginnings of it now, timelines must be long. (I don't know how heavily he leans on this argument though, probably not much. Ajeya does this too, and does it too much I think.) Also, concretely, if crazy AI stuff happens in <10 years, probably the EMH has failed in this domain and probably we can get AI by just scaling up stuff and therefore probably takeoff will be fairly fast (at least, it seems that way extrapolating from GPT-1, GPT-2, and GPT-3. One year apart, significantly qualitatively and quantitatively better. If that's what progress looks like when we are entering the "human range" then we will cross it quickly, it seems.)

4. Discontinuities totally do sometimes happen. I think we shouldn't expect them by default, but they aren't super low-prior either; thus, we should do gears-level modelling of AI rather than trying to build a reference class or analogy to other tech.

5. Most of Paul+Katja's arguments seem to be about continuity vs. discontinuity, which I think is the wrong question to be asking. What we care about is how long it takes (in clock time, or perhaps clock-time-given-compute-and-researcher-budget-X, given current and near-future ideas/algorithms) for AI capabilities to go from "meh" to "dangerous." THEN once we have an estimate of that, we can use that estimate to start thinking about whether this will happen in a distributed way across the whole world economy, or in a concentrated way in a single AI project, etc. (Analogy: We shouldn't try to predict greenhouse gas emissions by extrapolating world temperature trends, since that gets the causation backwards.)

6. I think the arguments Paul+Katja makes aren't super convincing on their own terms. They are sufficient to convince me that the slow takeoff world they describe is possible and deserves serious consideration (more so than e.g. Age of Em or CAIS) but not overall convincing enough for me to say "Bostrom and Yudkowsky were probably wrong." I could go through them one by one but I think I'll stop here for now.

Comment by Daniel Kokotajlo (daniel-kokotajlo) on DeepMind: Generally capable agents emerge from open-ended play · 2021-07-28T08:08:41.787Z · LW · GW

If you click on my name, it'll take you to my LW page, where you can see my posts and post sequences. I have a sequence on timelines. If you want a number, 50% by 2030.

Comment by Daniel Kokotajlo (daniel-kokotajlo) on DeepMind: Generally capable agents emerge from open-ended play · 2021-07-27T21:33:24.852Z · LW · GW

Thanks! This is exactly the sort of thoughtful commentary I was hoping to get when I made this linkpost.

--I don't see what the big deal is about laws of physics. Humans and all their ancestors evolved in a world with the same laws of physics; we didn't have to generalize to different worlds with different laws. Also, I don't think "be superhuman at figuring out the true laws of physics" is on the shortest path to AIs being dangerous. Also, I don't think AIs need to control robots or whatnot in the real world to be dangerous, so they don't even need to be able to understand the true laws of physics, even on a basic level.

--I agree it would be a bigger deal if they could use e.g. first-order logic, but not that much of a bigger deal? Put it this way: wanna bet about what would happen if they retrained these agents, but with 10x bigger brains and for 10x longer, in an expanded environment that supported first-order logic? I'd bet that we'd get agents that perform decently well at first-order logic goals.

--Yeah, these agents don't seem smart exactly; they seem to be following pretty simple general strategies... but they seem human-like and on a path to smartness, i.e. I can easily imagine them getting smoothly better and better as we make them bigger and train them for longer on more varied environments. I think of these guys as the GPT-1 of agent AGI.

Comment by Daniel Kokotajlo (daniel-kokotajlo) on DeepMind: Generally capable agents emerge from open-ended play · 2021-07-27T17:46:35.851Z · LW · GW

What makes you think our latest transcription AI's don't understand the meaning of the speech? Also what makes you think they have reached a sufficient level of accuracy that your past self would have claimed that they must understand the meaning of the speech? Maybe they still make mistakes sometimes and maybe your past self would have pointed to those mistakes and said "see, they don't really understand."

Comment by Daniel Kokotajlo (daniel-kokotajlo) on Superintelligence FAQ · 2021-07-27T12:18:09.549Z · LW · GW

I'm normally very sympathetic to "don't downvote, respond with careful argument" but in this case Bardstale's comment was such a long, rambling pile of gibberish that I don't have the patience. If you or Bardstale want to get a serious response from me, write something serious for me to respond to. I'd actually be happy to do so -- e.g. if you tell me (in your words) what good point you thought Bardstale was making, I'll take a few minutes to tell you what I think. (Depending on what it is, it might be less than a few minutes--for maybe I'll agree!)

Comment by Daniel Kokotajlo (daniel-kokotajlo) on Is the argument that AI is an xrisk valid? · 2021-07-25T10:05:16.204Z · LW · GW

Laying my cards on the table, I think that there do exist valid arguments with plausible premises for x-risk from AI, and insofar as you haven't found them yet then you haven't been looking hard enough or charitably enough. The stuff I was saying above is a suggestion for how you could proceed: If you can't prove X, try to prove not-X for a bit, often you learn something that helps you prove X. So, I suggest you try to argue that there is no x-risk from AI (excluding the kinds you acknowledge, such as AI misused by humans) and see where that leads you. It sounds like you have the seeds of such an argument in your paper; I was trying to pull them together and flesh them out in the comment above.

Comment by Daniel Kokotajlo (daniel-kokotajlo) on Is the argument that AI is an xrisk valid? · 2021-07-25T07:13:43.536Z · LW · GW

Yeah, sorry, I misspoke. You are critiquing one of the arguments for why there is XRisk from AI. One way to critique an argument is to dismiss it on "purely technical" grounds, e.g. "this argument equivocates between two different meanings of a term, therefore it is disqualified." But usually when people critique arguments, even if on technical grounds, they also have more "substantive" critiques in mind, e.g. "here is a possible world in which the premises are true and the conclusion false." (Or both conclusion and at least one premise false). I was guessing that you had such a possible world in mind, and trying to get a sense of what it looked like.

Comment by Daniel Kokotajlo (daniel-kokotajlo) on Decision Duels · 2021-07-23T11:32:26.772Z · LW · GW

Is this fictional or real? If it's real, which organizations use it?

Comment by Daniel Kokotajlo (daniel-kokotajlo) on Covid 7/22: Error Correction · 2021-07-23T11:05:48.386Z · LW · GW

Good points. I think we should compare the two worlds (China model vs. free speech wild west) more explicitly to see how they fare. My intuition right now is that I'd vastly prefer the free speech wild west to the china model, even if this gets tens of thousands of people killed on the regular because of believing stupid memes. Basically, as bad as that situation is, totalitarianism seems a lot worse... But I'm not sure.

Comment by Daniel Kokotajlo (daniel-kokotajlo) on The Walking Dead · 2021-07-23T09:03:02.264Z · LW · GW

I also find this fascinating. Related, from my shortform:

In stories (and in the past) important secrets are kept in buried chests, hidden compartments, or guarded vaults. In the real world today, almost the opposite is true: Anyone with a cheap smartphone can roam freely across the Internet, a vast sea of words and images that includes the opinions and conversations of almost every community. The people who will appear in future history books are right now blogging about their worldview and strategy! The most important events of the next century are right now being accurately predicted by someone, somewhere, and you could be reading about them in five minutes if you knew where to look! The plans of the powerful, the knowledge of the wise, and all the other important secrets are there for all to see -- but they are hiding in plain sight. To find these secrets, you need to be able to swim swiftly through the sea of words, skimming and moving on when they aren't relevant or useful, slowing down and understanding deeply when they are, and using what you learn to decide where to look next. You need to be able to distinguish the true and the important from countless pretenders. You need to be like a detective on a case with an abundance of witnesses and evidence, but where the witnesses are biased and unreliable and sometimes conspiring against you, and the evidence has been tampered with. The virtue you need most is rationality.
Comment by Daniel Kokotajlo (daniel-kokotajlo) on [AN #156]: The scaling hypothesis: a plan for building AGI · 2021-07-23T06:18:53.638Z · LW · GW

OK cool, sorry for the confusion. Yeah I think ESRogs interpretation of you was making a bit stronger claim than you actually were.

Comment by Daniel Kokotajlo (daniel-kokotajlo) on My Marriage Vows · 2021-07-21T14:56:24.817Z · LW · GW

I like where you are going with this. One issue with that phrasing is that it may be hard to fulfill that vow, since you don't yet know what decision theory you will come to believe in.

Comment by Daniel Kokotajlo (daniel-kokotajlo) on Why is Kleros valued so low? · 2021-07-21T14:49:55.888Z · LW · GW

I'm a crypto newbie. Can you give an example of a real-world application Kleros is already being successfully used for?

Comment by Daniel Kokotajlo (daniel-kokotajlo) on My Marriage Vows · 2021-07-21T14:40:59.978Z · LW · GW

Also: Congratulations by the way! I'm happy for you! Also, I think it's really cool that you are putting this much thought into your vows. :)

Comment by Daniel Kokotajlo (daniel-kokotajlo) on My Marriage Vows · 2021-07-21T14:37:38.683Z · LW · GW
Or, you mean commitment races between us and other agents? The intent here is making decision theoretic commitments towards each other, not necessarily committing to any decision theory towards the outside more than we normally would be.

Ah, good, that negates most of my concern. If you didn't already you should specify that this only applies to your actions and commitments "towards each other." This is an awkward source of vagueness perhaps, since many actions and commitments affect both your spouse and other entities in the world and thus are hard to classify.

Re: the usefulness of precision: Perhaps you could put a line at the end of the policy that says "We aren't actually committing to all that preceding stuff. However, we do commit to take each other's interests into account to a similar extent to the extent implied by the preceding text."

Comment by Daniel Kokotajlo (daniel-kokotajlo) on [AN #156]: The scaling hypothesis: a plan for building AGI · 2021-07-21T12:23:03.563Z · LW · GW

Huh, then it seems I misunderstood you then! The fourth bullet point claims that GPT-N will go on filling in missing words rather than doing a treacherous turn. But this seems unsupported by the argument you made, and in fact the opposite seems supported by the argument you made. The argument you made was:

There are several pretraining objectives that could have been used to train GPT-N other than next word prediction (e.g. masked language modeling). For each of these, there's a corresponding model that the resulting GPT-N would "try" to <do the thing in the pretraining objective>. These models make different predictions about what GPT-N would do off distribution. However, by claim 3 it doesn't matter much which pretraining objective you use, so most of these models would be wrong.

Seems to me the conclusion of this argument is that "In general it's not true that the AI is trying to achieve its training objective." The natural corollary is: We have no idea what the AI is trying to achieve, if it is trying to achieve anything at all. So instead of concluding "It'll probably just keep filling in missing words as in training" we should conclude "we have no idea what it'll do; treacherous turn is a real possibility because that's what'll happen for most goals it could have, and it may have a goal for all we know."

Comment by Daniel Kokotajlo (daniel-kokotajlo) on My Marriage Vows · 2021-07-21T12:03:02.269Z · LW · GW
Everything I do will be according to the policy which is the Kalai-Smorodinski solution to the bargaining problem defined by my [spouse]’s and my own priors and utility functions, with the disagreement point set at the counterfactual in which we did not marry. This policy is deemed to be determined a priori and not a posteriori. That is, it requires us to act as if we made all precommitments that would a priori be beneficial from a Kalai-Smorodinksi bargaining point of view[6]. Moreover, if I deviate from this policy for any reason then I will return to optimal behavior as soon as possible, while preserving my [spouse]’s a priori expected utility if at all possible

Idk, I have a bad feeling about this, for reasons I attempted to articulate in this post. The notion of optimal behavior you are using here may in fact be bad, and I question whether the benefits outweigh the costs. What are the benefits exactly? Why use all this specific, concrete decision theory jargon when you can just say "I promise to take my partner's interest (as they judge it, not as I judge it) into account to a significant extent" or something like that. Much more vague, but I think that's a feature not a bug since you have the good faith clause and since you are both nice people who presumably don't have really fucked up notions of good faith.

Comment by Daniel Kokotajlo (daniel-kokotajlo) on One Study, Many Results (Matt Clancy) · 2021-07-21T11:53:28.332Z · LW · GW
In both the study of soccer players and the one on immigration, participating researchers reported their beliefs before doing their analysis. In both cases there wasn’t a statistically significant correlation between prior beliefs and reported results.

What? No way! ... are you sure? This seems to be evidence that confirmation bias isn't a thing, at least not for scientists analyzing data. But that conclusion is pretty surprising and implausible. Is there perhaps another explanation? E.g. maybe the scientists involved are a particularly non-bias-prone bunch, or maybe they knew why they were being asked to report their beliefs so they faked that they didn't have an opinion one way or another when really they did?

This is fascinating, thanks!

Comment by Daniel Kokotajlo (daniel-kokotajlo) on Open Problems with Myopia · 2021-07-21T09:35:39.237Z · LW · GW

Also: I think making sure our agents are DDT is probably going to be approximately as difficult as making them aligned. Related: Your handle for anthropic uncertainty is:

never reason about anthropic uncertainty. DDT agents always think they know who they are.

"Always think they know who they are" doesn't cut it; you can think you know you're in a simulation. I think a more accurate version would be something like "Always think that you are on an original planet, i.e. one in which life appeared 'naturally,' rather than a planet in the midst of some larger interstellar civilization, or a simulation of a planet, or whatever. Basically, you need to believe that you were created by humans but that no intelligence played a role in the creation and/or arrangement of the humans who created you. Or... no role other than the "normal" one in which parents create offspring, governments create institutions, etc. I think this is a fairly specific belief, and I don't think we have the ability to shape our AIs beliefs with that much precision, at least not yet.

Comment by Daniel Kokotajlo (daniel-kokotajlo) on Some thoughts on David Roodman’s GWP model and its relation to AI timelines · 2021-07-20T08:30:18.506Z · LW · GW

Nice post! I basically agree with you here. Trying to forecast AI using GDP data is like trying to forecast fossil fuel production by looking at global mean temperature data. But it's useful for rebutting people who think that transformative AI, AGI, etc. is crazy/unprecedented/low-prior.

Nitpick: I don't think it's helpful to describe some of the arguments as "inside view" and others as "outside view" here. This can mislead people into thinking that e.g. the arguments you label "outside view" should be treated differently from those you label "inside view," that e.g. people who aren't experts should put more weight on the "outside view" arguments, etc., and that this is justified by Tetlock's experiments and the surrounding literature.

Whereas in fact Tetlock's experiments etc. were about a different sort of thing than the kinds of arguments you are considering here. Besides, the terms "inside view" and "outside view" mean so many different things today that they basically just tell the audience how you feel about an argument and how you want the audience to feel about it. Taboo outside view! I would suggest you replace instances of "inside-viewy" with "model-based" or "technical," or "implausible assumptions" and replace instances of "outside-viewy" with "Sanity check" or "implausible predictions."

Comment by Daniel Kokotajlo (daniel-kokotajlo) on Is the argument that AI is an xrisk valid? · 2021-07-19T16:48:41.517Z · LW · GW

Thanks for posting this here! As you might expect, I disagree with you. I'd be interested to hear your positive account of why there isn't x-risk from AI (excluding from misused instrumental intelligence). Your view seems to be that we may eventually build AGI, but that it'll be able to reason about goals, morality, etc. unlike the cognitiviely limited instrumental AIs you discuss, and therefore it won't be a threat. Can you expand on the italicized bit? Is the idea that if it can reason about such things, it's as likely as we humans are to come to the truth about them? (And, there is in fact a truth about them? Some philosophers would deny this about e.g. morality.) Or indeed perhaps you would say it's more likely than humans to come to the truth, since if it were merely as likely as humans then it would be pretty scary (humans come to the wrong conclusions all the time, and have done terrible things when granted absolute power).

Comment by Daniel Kokotajlo (daniel-kokotajlo) on We need a standard set of community advice for how to financially prepare for AGI · 2021-07-17T11:15:09.319Z · LW · GW

It does not make the case against money at all; it just states the conclusion. If you want to hear the case against money, well, I guess I can write a post about it sometime. So far I haven't really argued at all, just stated things. I've been surprised by how many people disagree (I thought it was obvious).

To the specific argument you make: Yeah, sure, that's one factor. Ultimately a minor one in my opinion, doesn't change the overall conclusion.

Comment by Daniel Kokotajlo (daniel-kokotajlo) on What will the twenties look like if AGI is 30 years away? · 2021-07-15T21:30:12.273Z · LW · GW

The two posts I linked above explain my view on what EAs should care about for timelines; it's pretty similar to yours. I call it AI-PONR, but basically it just means "a chunk of time where the value of interventions drops precipitously, to a level significantly below its present value, such that when we make our plans for how to use our money, our social capital, our research time, etc. we should basically plan to have accomplished what we want to have accomplished by then." Things that could cause AI-PONR: An AI takes over the world. Persuasion tools destroy collective epistemology. AI R&D tools make it so easy to build WMD's that we get a vulnerable world. Etc. Note that I disagree that the time when AI fundamentally transforms the world is what we care about, because I think AI-PONR will come before that point. (By fundamentally transforms the world, do you mean something notably different from "accelerates GDP?") I'd be interested to hear your thoughts on this framework, since it seems you've been thinking along similar lines and might have more expertise than me with the background concepts from economics.

So it sounds like we do disagree on something substantive, and it's how early in takeoff AI-PONR happens. And/or what timelines look like. I think there's, like, a 25% chance that nanobots will be disassembling large parts of Earth by 2030, but I think that the 2030's will look exactly as you predict up until it's too late.

Comment by Daniel Kokotajlo (daniel-kokotajlo) on rohinmshah's Shortform · 2021-07-15T21:15:51.713Z · LW · GW

OK, fair enough. But what if it writes, like, 20 posts in the first 20 days which are that good, but then afterwards it hits diminishing returns because the rationality-related points it makes are no longer particularly novel and exciting? I think this would happen to many humans if they could work at super-speed.

That said, I don't think this is that likely I guess... probably AI will be unable to do even three such posts, or it'll be able to generate arbitrary numbers of them. The human range is small. Maybe. Idk.

Comment by Daniel Kokotajlo (daniel-kokotajlo) on What would it look like if it looked like AGI was very near? · 2021-07-15T17:25:40.287Z · LW · GW

Yeah, I do remember NVIDIA claiming they could do 100T param models by 2023. Not a quadrillion though IIRC.

However, (a) this may be just classic overoptimistic bullshit marketing, and thus we should expect it to be off by a couple years, and (b) they may have been including Mixture of Expert models, in which case 100T parameters is much less of a big deal. To my knowledge a 100T parameter MoE model would be a lot cheaper (in terms of compute and thus money) to train than a 100T parameter dense model like GPT, but also the performance would be significantly worse. If I'm wrong about this I'd love to hear why!

Comment by Daniel Kokotajlo (daniel-kokotajlo) on rohinmshah's Shortform · 2021-07-15T11:15:07.263Z · LW · GW

Ah right, good point, I forgot about cherry-picking. I guess we could make it be something like "And the blog post wasn't cherry-picked; the same system could be asked to make 2 additional posts on rationality and you'd like both of them also." I'm not sure what credence I'd give to this but it would probably be a lot higher than 10%.

Website prediction: Nice, I think that's like 50% likely by 2030.

Major research area: What counts as a major research area? Suppose I go calculate that Alpha Fold 2 has already sped up the field of protein structure prediction by 100x (don't need to do actual experiments anymore!), would that count? If you hadn't heard of AlphaFold yet, would you say it counted? Perhaps you could give examples of the smallest and easiest-to-automate research areas that you think have only a 10% chance of being automated by 2030.

20,000 LW karma: Holy shit that's a lot of karma for one year. I feel like it's possible that would happen before it's too late (narrow AI good at writing but not good at talking to people and/or not agenty) but unlikely. Insofar as I think it'll happen before 2030 it doesn't serve as a good forecast because it'll be too late by that point IMO.

Productivity tool UI's obsolete thanks to assistants: This is a good one too. I think that's 50% likely by 2030.

I'm not super certain about any of these things of course, these are just my wild guesses for now.

Comment by Daniel Kokotajlo (daniel-kokotajlo) on What would it look like if it looked like AGI was very near? · 2021-07-15T07:59:24.001Z · LW · GW

Human scale neural nets would be 3 OOMs bigger than GPT-3, a quadrillion parameters would be 1 OOM bigger still. According to the scaling laws and empirical compute-optimal scaling trends, it seems that anyone training a net 3 OOMs bigger than GPT-3 would also train it for, ike, 2 OOMs longer, for a total of +5 OOMs of compute. For a quadrillion-parameter model, we're looking at +6 OOMs or so.

There's just no way that's possible by 2023. GPT-3 costs millions of dollars of compute to train, apparently. +6 OOMs would be trillions. Presumably algorithmic breakthroughs will lower the training cost a bit, and hardware improvements will lower the compute cost, but I highly doubt we'd get 3 OOMs of lower cost by 2023. So we're looking at a 10-billion-dollar price tag, give or take. I highly doubt anyone will be spending that much in 2023, and even if someone did, I am skeptical that the computing infrastructure for such a thing will have been built in time. I don't think there are compute clusters a thousand times bigger than the one GPT-3 was trained on (though I might be wrong) and even if we were, to achieve your prediction we'd need it to be tens or hundreds of thousands of times bigger.

On alignment optimism: As I see it, three things need to happen for alignment to succeed.

1. A company that is sympathetic to alignment concerns has to have a significant lead-time over everyone else (before someone replicates or steals code etc.), so that they can do the necessary extra work and spend the extra time and money needed to implement an alignment solution.

2. A solution needs to be found that can be implemented in that amount of lead-time.

3. This solution needs to be actually chosen and implemented by the company, rather than some other, more appealing but incorrect solution chosen instead. (There will be dozens of self-experts pitching dozens of proposed solutions to the problem, each of which will be incorrect by default. The correct one needs to actually rise to the top in the eyes of the company leaders, which is hard since the company leaders don't know much alignment literature and may not be able to judge good from bad solutions.)

On 1: In my opinion there are only 3 major AI projects sympathetic to alignment concerns, and the pace of progress is such (and the state of security is such) that they'll probably have less than six months of lead time.

On 2: In my opinion we are not at all close to finding a solution that works even in principle; finding one that works in six months is even harder.

On 3: In my opinion there is only 1 major AI project that has a good chance of distinguishing viable solutions from fake solutions, and actually implementing it rather than dragging feet or convincing themselves that the danger is still in the future and not now. (e.g. "takeoff is supposed to be slow, we haven't seen any warning shots yet, this system can't be that dangerous yet")

Currently, I think the probability of all three things happening seems to be <1%. Happily there's model uncertainty, unknown unknowns, etc. which is why I'm not quite that pessimistic. But still, it's pretty scary.

Comment by Daniel Kokotajlo (daniel-kokotajlo) on What will the twenties look like if AGI is 30 years away? · 2021-07-15T07:41:21.073Z · LW · GW

Makes sense. I also agree that this is what the 2030's will look like; I don't expect GDP growth to accelerate until it's already too late.

The quest for testable-prior-to-AI-PONR predictions continues...

Comment by Daniel Kokotajlo (daniel-kokotajlo) on What will the twenties look like if AGI is 30 years away? · 2021-07-15T07:33:23.398Z · LW · GW

All right, sounds good! This feels right to me. I'll taboo "short" and "long" when talking timelines henceforth!

Comment by Daniel Kokotajlo (daniel-kokotajlo) on rohinmshah's Shortform · 2021-07-14T12:55:11.625Z · LW · GW

Nice! I really appreciate that you are thinking about this and making predictions. I want to do the same myself.

I think I'd put something more like 50% on "Rohin will at some point before 2030 read an AI-written blog post on rationality that he likes more than the typical LW >30 karma post." That's just a wild guess, very unstable.

Another potential prediction generation methodology: Name something that you think won't happen, but you think I think will.

Comment by Daniel Kokotajlo (daniel-kokotajlo) on What would it look like if it looked like AGI was very near? · 2021-07-14T10:26:23.509Z · LW · GW
It's easy to imagine a version of the story where the winner of the arms race is not benevolent, or where there is an alignment-failure and humans lose control of the AGI entirely.

I would frame it a bit differently: Currently, we haven't solved the alignment problem, so in this scenario the AI would be unaligned and it would kill us all (or do something similarly bad) as soon as it suited it. We can imagine versions of this scenario where a ton of progress is made in solving the alignment problem, or we can imagine versions of this scenario where surprisingly it turns out "alignment by default" is true and there never was a problem to begin with. But both of these would be very unusual, and distinct, scenarios, requiring more text to be written.

Comment by Daniel Kokotajlo (daniel-kokotajlo) on What will the twenties look like if AGI is 30 years away? · 2021-07-14T08:58:22.656Z · LW · GW

Yeah, fair enough, a billion is a lot & some of my questions were a bit too poorly specified. Thanks for the answers!

Comment by Daniel Kokotajlo (daniel-kokotajlo) on What would it look like if it looked like AGI was very near? · 2021-07-14T08:26:05.121Z · LW · GW

Thanks! This is the sort of thing that we aimed for with Vignettes Workshop. The scenario you present here has things going way too quickly IMO; I'll be very surprised if we get to human-scale neural nets by 2022, and quadrillion parameters by 2023. It takes years to scale up, as far as I can tell. Gotta build the supercomputers and write the parallelization code and convince the budget committee to fund things. If you have counterarguments to this take I'd be interested to hear them!

(Also I think that the "progress stalled because people didn't deploy AI because of alignment concerns" is way too rosy-eyed a view of the situation, haha)

Comment by Daniel Kokotajlo (daniel-kokotajlo) on What will the twenties look like if AGI is 30 years away? · 2021-07-14T08:04:24.233Z · LW · GW

I'm not presenting my view as consensus. There is no consensus of any sort on the matter of AI timelines, at least not a publicly legible one. (There's the private stuff, like the one I mentioned about how everyone who I deem to be reasonable fits within a certain range). This is a symmetric consideration; if you have a problem with me calling 20+ year timelines "long" then you should also have a problem with people calling 10 year timelines "short." Insofar as the former is illicitly claiming there exists a consensus and it supports a certain view, so is the latter.

I'd actually be fine with a solution where we all agree to stop using the terms "long timelines" and "short timelines" and just use numbers instead. How does that sound?

EDIT: Minor point: Ajeya's report says median 2050, no? It's been a while since I read it but I'm pretty sure that was what she said. Has it changed to 2055? I thought it updated down to 2045 or so after the bees investigation?

EDIT EDIT: Information cascades are indeed a big problem; I think they are one of the main reasons why people's timelines are on average as long as they are. I think if information cascades didn't exist people would have shorter timelines on average, at least in our community. One weak piece of evidence for this is that in my +12 OOMs post I polled people asking for their "inside views" and their "all things considered views" and their inside views were notably shorter-timelines. Another weak piece of evidence for this is that there is an asymmetry in public discourse, where people with <15 year timelines often don't say so in public, or if they do they take care to be evasive about their arguments, for infohazard reasons. Another asymmetry is that generally speaking shorter timelines are considered by the broader public to be crazier, weirder, etc. There's more of a stigma against having them.