Posts

Relevant pre-AGI possibilities 2020-06-20T10:52:00.257Z · score: 16 (7 votes)
Image GPT 2020-06-18T11:41:21.198Z · score: 30 (14 votes)
List of public predictions of what GPT-X can or can't do? 2020-06-14T14:25:17.839Z · score: 20 (10 votes)
Preparing for "The Talk" with AI projects 2020-06-13T23:01:24.332Z · score: 60 (21 votes)
Reminder: Blog Post Day III today 2020-06-13T10:28:41.605Z · score: 14 (2 votes)
Blog Post Day III 2020-06-01T13:56:10.037Z · score: 33 (7 votes)
Predictions/questions about conquistadors? 2020-05-22T11:43:40.786Z · score: 7 (2 votes)
Better name for "Heavy-tailedness of the world?" 2020-04-17T20:50:06.407Z · score: 28 (6 votes)
Is this viable physics? 2020-04-14T19:29:28.372Z · score: 55 (22 votes)
Blog Post Day II Retrospective 2020-03-31T15:03:21.305Z · score: 17 (8 votes)
Three Kinds of Competitiveness 2020-03-31T01:00:56.196Z · score: 33 (10 votes)
Reminder: Blog Post Day II today! 2020-03-28T11:35:03.774Z · score: 16 (3 votes)
What are the most plausible "AI Safety warning shot" scenarios? 2020-03-26T20:59:58.491Z · score: 35 (12 votes)
Could we use current AI methods to understand dolphins? 2020-03-22T14:45:29.795Z · score: 9 (3 votes)
Blog Post Day II 2020-03-21T16:39:04.280Z · score: 38 (10 votes)
What "Saving throws" does the world have against coronavirus? (And how plausible are they?) 2020-03-04T18:04:18.662Z · score: 25 (12 votes)
Blog Post Day Retrospective 2020-03-01T11:32:00.601Z · score: 26 (6 votes)
Cortés, Pizarro, and Afonso as Precedents for Takeover 2020-03-01T03:49:44.573Z · score: 115 (54 votes)
Reminder: Blog Post Day (Unofficial) 2020-02-29T15:10:17.264Z · score: 29 (5 votes)
Response to Oren Etzioni's "How to know if artificial intelligence is about to destroy civilization" 2020-02-27T18:10:11.129Z · score: 29 (9 votes)
What will be the big-picture implications of the coronavirus, assuming it eventually infects >10% of the world? 2020-02-26T14:19:27.197Z · score: 37 (22 votes)
Blog Post Day (Unofficial) 2020-02-18T19:05:47.140Z · score: 46 (16 votes)
Simulation of technological progress (work in progress) 2020-02-10T20:39:34.620Z · score: 20 (11 votes)
A dilemma for prosaic AI alignment 2019-12-17T22:11:02.316Z · score: 43 (12 votes)
A parable in the style of Invisible Cities 2019-12-16T15:55:06.072Z · score: 28 (12 votes)
Why aren't assurance contracts widely used? 2019-12-01T00:20:21.610Z · score: 33 (11 votes)
How common is it for one entity to have a 3+ year technological lead on its nearest competitor? 2019-11-17T15:23:36.913Z · score: 53 (15 votes)
Daniel Kokotajlo's Shortform 2019-10-08T18:53:22.087Z · score: 5 (2 votes)
Occam's Razor May Be Sufficient to Infer the Preferences of Irrational Agents: A reply to Armstrong & Mindermann 2019-10-07T19:52:19.266Z · score: 49 (14 votes)
Soft takeoff can still lead to decisive strategic advantage 2019-08-23T16:39:31.317Z · score: 116 (51 votes)
The "Commitment Races" problem 2019-08-23T01:58:19.669Z · score: 76 (31 votes)
The Main Sources of AI Risk? 2019-03-21T18:28:33.068Z · score: 78 (33 votes)

Comments

Comment by daniel-kokotajlo on Daniel Kokotajlo's Shortform · 2020-07-01T21:46:53.545Z · score: 10 (5 votes) · LW · GW

For fun:

“I must not step foot in the politics. Politics is the mind-killer. Politics is the little-death that brings total obliteration. I will face my politics. I will permit it to pass over me and through me. And when it has gone past I will turn the inner eye to see its path. Where the politics has gone there will be nothing. Only I will remain.”

Makes about as much sense as the original quote, I guess. :P

Comment by daniel-kokotajlo on How do takeoff speeds affect the probability of bad outcomes from AGI? · 2020-06-30T11:34:49.054Z · score: 4 (4 votes) · LW · GW

Nice! I'd add human cloning, human genetic engineering, and eugenics to the list of scary technologies that humanity decided not to pursue. As far as I can tell the reasons we aren't pursuing them are almost entirely ethical, i.e. if the Nazis won the war they would probably have gone full steam ahead.

If we expect significant changes to the state of the world during takeoff, it makes it harder to predict what kinds of landscape the AI researchers of that time will be facing.

Shameless plug for this tool for speculating about what kinds of changes to the world might happen prior to advanced AGI.

Comment by daniel-kokotajlo on A reply to Agnes Callard · 2020-06-28T23:02:53.454Z · score: 3 (2 votes) · LW · GW

I signed the petition myself. I think our disagreement is smaller than it seems. I think partly my concern is that this is a symmetric weapon, but partly it's simply what I said: There is a slippery slope; we would do well to think about fences. Does our culture have a clearly defined fence on this slope already? If so, I'm not aware of it.

Comment by daniel-kokotajlo on A reply to Agnes Callard · 2020-06-28T10:11:28.250Z · score: 3 (10 votes) · LW · GW

Yeah IDK. There's a slippery slope from "This is the interface this institution has with the world, so of course we should use it" to "Our enemies use tactic X, so there's nothing wrong with us using tactic X too," which then becomes "Our enemies used tactic X once, so now we are justified in using it a lot." We need to find a shelling fence or avoid the slope entirely.

Here is a brainstorm of suggestions:

--Our petition should have a clause talking about how terrible it is for the NYT to bow to mobs of enraged internet elites but that it would be hypocritical of them to choose now as their moment to grow a spine. At least this gets the right ideas across.

--We also take some action to encourage them to grow a spine, so that they become more resistant to this tactic in general in the future. That way, we are using tactic X now while making X less viable for everyone in the future.

--We don't do our petition at all, since that's an example of the tactic we dislike, but instead we do some tactic we like, such as challenging the NYT to a third-party moderated public debate on the matter, or simply raising tons of awareness about what's happening, with the goal of convincing third parties of the rightness of our cause rather than the goal of directly influencing the NYT.

--We take some steps to make our petition a non-mob. Like, maybe we require that everyone who signs it restate it in their own words or something, or that everyone who signs it be someone initially skeptical who changed their mind as a result of hearing both sides.




On the question of whether we should have one:

Comment by daniel-kokotajlo on GPT-3 Fiction Samples · 2020-06-26T10:10:37.173Z · score: 2 (1 votes) · LW · GW

On the bright side, according to OpenAI's scaling laws paper, GPT-3 is about the size that scaling was predicted to start breaking down. So maybe GPT-4 won't actually be better than GPT-3. I'm not counting on it though.

Comment by daniel-kokotajlo on [META] Building a rationalist communication system to avoid censorship · 2020-06-23T20:28:24.475Z · score: 5 (3 votes) · LW · GW

The problem with that is that people who don't use throwaway accounts might be tarred by association, e.g. "I googled their name and it looks like they are a LWer; haven't you heard LW is a haven for white supremacists now? Here's a screenshot of a post on LW that got upvoted instead of removed..."

Comment by daniel-kokotajlo on Preparing for "The Talk" with AI projects · 2020-06-23T00:53:40.266Z · score: 2 (1 votes) · LW · GW

Sounds good. Thanks! My current opinion is basically not that different from yours.

Comment by daniel-kokotajlo on The affect heuristic and studying autocracies · 2020-06-22T16:43:00.098Z · score: 2 (1 votes) · LW · GW

Thanks for writing this post -- it's relevant to my work somewhat, and more importantly I really enjoyed reading about your experience with this project.

Comment by daniel-kokotajlo on List of public predictions of what GPT-X can or can't do? · 2020-06-21T23:57:19.982Z · score: 2 (1 votes) · LW · GW

Thanks! Maybe we could get around the BPE encoding by doing it with sentences instead of words? Like, "Please scramble the word order in the following sentences: I ate a nice juicy steak. = steak nice a juicy ate I. Happy people usually sleep well at night. = people sleep well usually at night happy. Orange juice is like wine for amateurs. = " Or would that be less impressive for some reason?

Comment by daniel-kokotajlo on Image GPT · 2020-06-21T23:49:22.420Z · score: 2 (1 votes) · LW · GW

Right. The use case I had in mind for electric cars was the standard "You see someone walking by the edge of the street; are they going to step out into the street or not? It depends on e.g. which way they are facing, whether they just dropped something into the street, ... etc." That seems like something where pixel-based image prediction would be superior to e.g. classifying the entity as a pedestrian and then adding a pedestrian token to your 3D model of your enviornment.

Comment by daniel-kokotajlo on How to analyze progress, stagnation, and low-hanging fruit · 2020-06-21T23:46:34.387Z · score: 2 (1 votes) · LW · GW

I'm not sure I agree. Propulsively landing rockets, especially orbital-class rockets, seems pretty freaking new and awesome. Making an electric car that is actually good... well, it doesn't require anything mind-bendingly new, but it requires a ton of small innovations adding up, many more small innovations than normally occur in product development cycles. As for money, Musk has less money than Bezos, for example, but it's SpaceX, not Blue Origin, that's revolutionizing the industry. And of course the established companies have way more money than either Musk or Bezos. I think really it's what I said it was: The ability to attract and motivate top talent.

Would you agree that if Starship gets working, then SpaceX will have developed new technology in the relevant sense?

Comment by daniel-kokotajlo on Image GPT · 2020-06-21T19:36:42.101Z · score: 2 (1 votes) · LW · GW

How hard do you think it would be to do Image GPT but for video? That sounds like it could be pretty cool to see. Probably can be used to create some pretty trippy shit. Once it gets really good it could be used in robotics. Come to think of it, isn't that sorta what self driving cars need? Something that looks at a video of the various things happening around the car and predicts what's going to happen next?

Comment by daniel-kokotajlo on Image GPT · 2020-06-21T14:50:46.039Z · score: 4 (2 votes) · LW · GW

Wow, this is really good and really funny! I don't know if it counts as knowing how to write an email to ask for a job. On the one hand it knows like 99% of it... but on the other hand even the first letter comes across as immature.

Comment by daniel-kokotajlo on Image GPT · 2020-06-21T14:44:14.888Z · score: 2 (1 votes) · LW · GW

OK, thanks. I don't find that hard to believe at all.

Comment by daniel-kokotajlo on Image GPT · 2020-06-21T13:43:47.894Z · score: 2 (1 votes) · LW · GW

Thanks, I agree AlphaStar doesn't seem to have it. What do you think about GPT's arithmetic and anagram stuff? Also, you say that AIdungeon uses GPT-3, but their "About" page still says they use GPT-2. Anyhow, I now think I was too confident in my original claim, and am further revising downwards.

Comment by daniel-kokotajlo on [deleted post] 2020-06-20T18:30:16.147Z

OK, sure, will do.

Comment by daniel-kokotajlo on Likelihood of hyperexistential catastrophe from a bug? · 2020-06-20T18:04:44.582Z · score: 3 (2 votes) · LW · GW

Because you think it'll be caught in time, etc. Yes. I think it will probably be caught in time too.

OK, so yeah, the solution isn't quite as cheap as simply "Shout this problem at AI researchers." It's gotta be more subtle and respectable than that. Still, I think this is a vastly easier problem to solve than the normal AI alignment problem.

Comment by daniel-kokotajlo on [deleted post] 2020-06-20T13:39:42.541Z

Maybe this should be merged or deleted, given that I made a linkpost already?

Comment by daniel-kokotajlo on Preparing for "The Talk" with AI projects · 2020-06-19T23:01:26.959Z · score: 2 (1 votes) · LW · GW
It did. Part of me thought it was better not to comment, but then I figured the entire point of the post was how to do outreach to people we don't agree with, so I decided it was better to express my frustration.

Well said. I'm glad you spoke up. Yeah, I don't want people to rationalize their way into thinking AI should never be developed or released either. Currently I think people are much more likely to make the opposite error, but I agree both errors are worth watching out for.

I don't know of a standard reference for that claim either. Here is what I'd say in defense of it:

--AIXItl was a serious proposal for an "ideal" intelligent agent. I heard the people who came up with it took convincing, but eventually agreed that yes, AIXItl would seize control of its reward function and kill all humans.

--People proposed Oracle AI, thinking that it would be safe. Now AFAICT people mostly agree that there are various dangers associated with Oracle AI as well.

--People sometimes said that AI risk arguments were founded on these ideal models of AI as utility maximizers or something, and that they wouldn't apply to modern ML systems. Well, now we have arguments for why modern ML systems are potentially dangerous too. (Whether these are the same arguments rephrased, or new arguments, is not relevant for this point.)

--In my personal experience at least, I keep discovering entirely new ways that AI designs could fail, which I hadn't thought of before. For example, paul's "The Universal Prior is Malign." Or oracles outputting self-fulfilling prophecies. Or some false philosophical view on consciousness or something being baked into the AI. This makes me think maybe there are more which I haven't yet thought of.

Comment by daniel-kokotajlo on Preparing for "The Talk" with AI projects · 2020-06-19T22:49:03.603Z · score: 3 (2 votes) · LW · GW

Ah, that sounds much better to me. Yeah, maybe the cheapest EU lies in trying to make these worlds more likely. I doubt we have much control over which paradigms overtake ML, and I think that the intervention I'm proposing might help make the first and second kinds of world more likely (because maybe with a month of extra time to analyze their system, the relevant people will become convinced that the problem is real)

Comment by daniel-kokotajlo on Likelihood of hyperexistential catastrophe from a bug? · 2020-06-19T22:16:10.401Z · score: 5 (3 votes) · LW · GW

I agree. However, in my case at least the 1/million probability is not for that reason, but for much more concrete reasons, e.g. "It's already happened at least once, at a major AI company, for an important AI system, yes in the future people will be paying more attention probably but that only changes the probability by an order of magnitude or so."

Isn't the cheap solution just... being more cautious about our programming, to catch these bugs before the code starts running? And being more concerned about these signflip errors in general? It's not like we need to solve Alignment Problem 2.0 to figure out how to prevent signflip. It's a just an ordinary bug. Like, what happened already with OpenAI could totally have been prevented with an extra hour or so of eyeballs poring over the code, right? (or more accurately, whoever wrote the code in the first place being on the lookout for this kind of error?)

Comment by daniel-kokotajlo on Preparing for "The Talk" with AI projects · 2020-06-19T22:10:47.328Z · score: 3 (2 votes) · LW · GW

Thanks, that was an illuminating answer. I feel like those three worlds are decently likely, but that if those worlds occur purchasing additional expected utility in them will be hard, precisely because things will be so much easier. For example, if safety concerns are part of mainstream AI research, then safety research won't be neglected anymore.

Comment by daniel-kokotajlo on Image GPT · 2020-06-19T18:40:51.608Z · score: 2 (1 votes) · LW · GW

Thanks for sharing that. Now having watched the video, I am updating towards that position. I'm now only something like 80% confident that reasoning isn't a roadblock. I look forward to learning whether GPT-3 can do word scrambling tasks.

Comment by daniel-kokotajlo on Likelihood of hyperexistential catastrophe from a bug? · 2020-06-19T18:08:48.106Z · score: 2 (1 votes) · LW · GW

Unfortunately I didn't have a specific credence beforehand. I felt like the shift was about an order of magnitude, but I didn't peg the absolute numbers. Thinking back, I probably would have said something like 1/3000 give or take 1 order of magnitude. The argument you make pushes me down by an order of magnitude.

I think even a 1 in a million chance is probably way too high for something as bad as this. Partly for acausal trade reasons, though I'm a bit fuzzy on that. It's high enough to motivate much more attention than is currently being paid to the issue (though I don't think it means we should abandon normal alignment research! Normal alignment research probably is still more important, I think. But I'm not sure.) Mainly I think that the solution to this problem is very cheap to implement, and thus we do lots of good in expectation by raising more awareness of this problem.

Comment by daniel-kokotajlo on List of public predictions of what GPT-X can or can't do? · 2020-06-19T15:59:22.672Z · score: 2 (1 votes) · LW · GW

Now that you have access to GPT-3, would you mind seeing whether it can scramble words? I'm dying to know whether this prediction was correct or not.

Comment by daniel-kokotajlo on What is meant by Simulcra Levels? · 2020-06-19T13:14:46.536Z · score: 2 (1 votes) · LW · GW

I agree that's a fun irony, but I don't think it's a perfect irony -- e.g. if I had actually read Baudrillard and tried to represent their thought in my answer, that would be a more perfect instance of the phenomenon they are talking about than what actually happened. I wasn't talking about Baudrillard's or anyone else's concept, but only about my own, and said so. So I was dealing with territory itself, so to speak.

Comment by daniel-kokotajlo on If AI is based on GPT, how to ensure its safety? · 2020-06-19T10:34:30.300Z · score: 9 (5 votes) · LW · GW

One would hope that GPT-7 would achieve accurate predictions about what humanoids do because it is basically a human. It's algorithm is "OK, what would a typical human do?"

However, another possibility is that GPT-7 is actually much smarter than a typical human in some sense--maybe it has a deep understanding of all the different kinds of humans, rather than just a typical human, and maybe it has some sophisticated judgment for which kind of human to mimic depending on the context. In this case it probably isn't best understood as a set of humans with an algorithm to choose between them, but rather something alien and smarter than humans that mimics them in the way that e.g. a human actress might some large set of animals.

Using Evan's classification, I'd say that we don't know how training-competitive GPT-7 is but that it's probably pretty good on that front; GPT-7 is probably not very performance-competitive because even if all goes well it just acts like a typical human; GPT-7 has the standard inner alignment issues (what if it is deceptively aligned? What if it actually does have long-term goals, and pretends not to, since it realizes that's the only way to achieve them? though perhaps they have less force since its training is so... short-term? I forget the term) and finally I think the issue pointed to with "The universal prior is malign" (i.e. probable environment hacking) is big enough to worry about here.

In light of all this, I don't know how to ensure its safety. I would guess that some of the techniques Evan talks about might help, but I'd have to go through them and refamiliarize myself with them.

Comment by daniel-kokotajlo on Image GPT · 2020-06-19T10:08:04.508Z · score: 5 (3 votes) · LW · GW

Thanks for pointing this out--funnily enough, I actually read the OpenAI thing last year and thought it was cool, but then forgot about it by the time this came out! (The thing from a decade ago I hadn't heard of)

Comment by daniel-kokotajlo on If AI is based on GPT, how to ensure its safety? · 2020-06-19T00:11:21.378Z · score: 4 (2 votes) · LW · GW

Oh, OK. So perhaps we give it a humanoid robot body, so that it is as similar as possible to the humans in its dataset, and then we set up the motors so that the body does whatever GPT-7 predicts it will do, and GPT-7 is trained on datasets of human videos (say) so if you ask it to bring the coffee it probably will? Thanks, this is much clearer now.

Comment by daniel-kokotajlo on What is meant by Simulcra Levels? · 2020-06-19T00:07:05.705Z · score: 2 (1 votes) · LW · GW

I'm not sure what you mean. If you are asking why the name "simulacra" was chosen for this concept, I have no idea.

Comment by daniel-kokotajlo on What is meant by Simulcra Levels? · 2020-06-18T21:54:50.278Z · score: 8 (4 votes) · LW · GW

My answer:

Consider a 2x2 grid. On the top row we have "naive deontological strategies." On the bottom row we have "consequentialist strategies." On the left we have "Truth." On the right we have "Teams."

Level 1: Top left: Naive deontological + Truth = You assert the statement if you think it is true, and not otherwise.

Level 3: Top right: Naive deontological + Teams = You assert the statement if you identify as part of the team associated with the statement, and not otherwise. (In some cases the statement is associated with being part of any team other that a certain team, i.e. the statement roughly means "I'm not part of team X." In this case you assert the statement if you identify with some opposing team, and not otherwise.)

Level 2: Bottom left: Consequentialist + Truth = You assert the statement if you desire your listener to think you think it is true, and not otherwise. Lying is a special case of this, but you need not be lying to be doing this.

Level 4: Bottom right: Consequentialist + Teams = You assert the statement if you desire your listener to think you identify as part of the team associated with the statement, and not otherwise. (Or, the corresponding thing in case the statement is anti-team-X.)

I haven't thought this through that much or compared it to the "primary texts" so I would bet that my interpretation is at least somewhat different from that of others.

There's an obvious tendency for communities operating at level 1 to devolve into level 2, and from 3 to 4.

There's a less obvious tendency for communities operating at level 2 to devolve into level 3, or so people claim, and I find this somewhat plausible.

Comment by daniel-kokotajlo on Likelihood of hyperexistential catastrophe from a bug? · 2020-06-18T21:34:33.662Z · score: 14 (8 votes) · LW · GW

This has lowered my credence in such a catastrophe by about an order of magnitude. However, that's a fairly small update for something like this. I'm still worried.

Maybe some important AI will learn faster than we expect. Maybe the humans in charge will be grossly negligent. Maybe the architecture and training process won't be such as to involve a period of dumb-misaligned-AI prior to smart-misaligned-AI. Maybe some unlucky coincidence will happen that prevents the humans from noticing or correcting the problem.

Comment by daniel-kokotajlo on If AI is based on GPT, how to ensure its safety? · 2020-06-18T21:01:44.536Z · score: 6 (3 votes) · LW · GW

Can you give more details on how it works? I'm imagining that it has some algorithm for detecting whether a command has been fulfilled, and it is rewarded partially for accurate predictions and partially for fulfilled commands? That means there must be some algorithm that detects whether a command has been fulfilled? How is that algorithm built or trained?

Comment by daniel-kokotajlo on [AN #104]: The perils of inaccessible information, and what we can learn about AI alignment from COVID · 2020-06-18T20:46:29.284Z · score: 4 (3 votes) · LW · GW
They are both particularly critical of the idea that we can get general intelligence by simply scaling up existing deep learning models, citing the need for reasoning, symbol manipulation, and few-shot learning, which current models mostly don’t do

Huh. GPT-3 seems to me like something that does all three of those things, albeit at a rudimentary level. I'm thinking especially about its ability to do addition and anagrams/word letter manipulations. Was this interview recorded before GPT-3 came out?

Comment by daniel-kokotajlo on Image GPT · 2020-06-18T17:19:13.541Z · score: 12 (6 votes) · LW · GW

I'm neither claiming that just the architecture is reasoning, nor that the architecture would work for any task. I'm also not saying GPT is a general intelligence. I agree that GPT-3 and iGPT are separate things. However, what happens with one can be evidence for what is going on inside the other, given that they have the same architecture.

What I'm thinking is this: The path to AGI may involve "roadblocks," i.e. things that won't be overcome easily, i.e. things that won't be solved simply by tweaking and recombining our existing architectures and giving them orders of magnitude more compute, data, etc. Various proposals have been made for possible roadblocks, in the form of claims about what current methods cannot do: Current methods can't do long-term planning, current methods can't do hidden-information games, current methods can't do reasoning, current methods can't do common sense, etc.

Occasionally something which is hypothesized to be a roadblock turns out not to be. E.g. it turns out AlphaStar, OpenAI Five, etc. work fine with hidden information games, and afaik this didn't involve any revolutionary new insights but just some tweaking and recombining of existing ideas along with loads more compute.

My claim is that the GPTs are evidence against reasoning and common sense understanding being roadblocks. There may be other roadblocks. And probably GPT isn't "reasoning" nearly as well or as comprehensively and generally as we humans do. Similarly, it's common sense isn't as good as mine. But it has a common sense, and it's improving as we make bigger and bigger GPTs.

One thing I should say as a caveat is that I don't have a clear idea of what people mean when they say reasoning is a roadblock. I think reasoning is a fuzzy and confusing concept. Perhaps I am wrong to say this is evidence against reasoning being a roadblock, because I'm misunderstanding what people mean by reasoning. I'd love to hear someone explain carefully what reasoning is and why it's likely a roadblock.

Comment by daniel-kokotajlo on Image GPT · 2020-06-18T14:10:46.848Z · score: 5 (3 votes) · LW · GW

It's not clear to me that Stuart was saying we won't be able to use deep learning to write a job application letter--rather, perhaps he just meant that deep learning folks typically seem to think that we'll be able to do this via supervised learning, but they are wrong because we'll never have enough data. Idk. You might be right.

Comment by daniel-kokotajlo on Preparing for "The Talk" with AI projects · 2020-06-18T13:19:08.393Z · score: 2 (1 votes) · LW · GW

Thanks for the thoughtful pushback! It was in anticipation of comments like this that I put hedging language in like "it think" and "perhaps." My replies:


This seems a bit like writing the bottom line first?
Like, AI fears in our community have come about because of particular arguments.  If those arguments don't apply, I don't see why one should strongly assume that AI is to be feared, outside of having written the bottom line first.

1. Past experience has shown that even when particular AI risk arguments don't apply, often an AI design is still risky, we just haven't thought of the reasons why yet. So we should make a pessimistic meta-induction and conclude that even if our standard arguments for risk don't apply, the system might still be risky--we should think more about it.

2. I intended those two "perhaps..." statements to be things the person says, not necessarily things that are true. So yeah, maybe they *say* the standard arguments don't apply. But maybe they are wrong. People are great at rationalizing, coming up with reasons to get to the conclusion they wanted. If the conclusion they want is "We finally did it and made a super powerful impressive AI, come on come on let's take it for a spin!" then it'll be easy to fool yourself into thinking your architecture is sufficiently different as to not be problematic, even when your architecture is just a special case of the architecture in the standard arguments.

Points 1 and 2 are each individually sufficient to vindicate my claims, I think.

It also seems kind of condescending to operate under the assumption that you know more about the AI system someone is creating than the person who's creating it knows?  You refer to their safety strategy as "amateur", but isn't there a chance that having created this system entitles them to a "professional" designation?  A priori, I would expect that an outsider not knowing anything about the project at hand would be much more likely to qualify for the "amateur" designation.

3. I'm not operating under the assumption that I know more about the AI system someone is creating than the person who's creating it knows. The fact that you said this dismays me, because it is such an obvious staw man. It makes me wonder if I touched a nerve somehow, or had the wrong tone or something, to raise your hackles.

4. Yes, I refer to their safety strategy as amateur. Yes, this is appropriate. AI safety is related to AI capabilities, but the two are distinct sub-fields, and someone who is great at one could be not so great at another. Someone who doesn't know the AI safety literature, who does something to make their AI safe, probably deserves the title amateur. I don't claim to be a non-amateur AI scientist, and whether I'm a non-amateur AI safety person is irrelevant because I'm not going to be one of the people in The Talk. I do claim that e.g. someone like Paul Christiano or Stuart Russell is a professional AI safety person, whereas most AI scientists are not.

This isn't obvious to me.  One possibility is that there will be some system which is safe if used carefully, and having a decent technological lead gives you plenty of room to use it carefully, but if you delay your development too much, competing teams will catch up and you'll no longer have space to use it carefully.  I think you have to learn more about the situation to know for sure whether a month of delay is a good thing.

5. I agree that this is a possibility. This is why I said "say it buys us a month;" I meant that to be an average of the various possibilities. In retrospect I was unclear; I should have clarified that It might not be a good idea to delay at all, for the reasons you mention. I agree we have to learn more about the situation; in retrospect I shouldn't have said "I think it would be better for these conversations to end X way" (even though that is what I think is most likely) but rather found some way to express the more nuanced position.

6. I agree with everything you say about overconfidence, echo chambers, etc. except that I don't think I was writing the bottom line first in this case. I was making a claim without arguing for it, but then I argued for it in the comments when you questioned it. It's perfectly reasonable (indeed necessary) to have some unargued for claims in any particular finite piece of writing.

Comment by daniel-kokotajlo on Preparing for "The Talk" with AI projects · 2020-06-17T17:58:26.123Z · score: 2 (1 votes) · LW · GW

Interesting. I'd love to hear more about the sorts of worlds conditioned on in your (b). For my part, the worlds I described in the original post seem both the most likely and also not completely hopeless--maybe with a month of extra effort we can actually come up with a solution, or else a convincing argument that we need another month, etc. Or maybe we already have a mostly-working solution by the time The Talk happens and with another month we can iron out the bugs.

Comment by daniel-kokotajlo on Preparing for "The Talk" with AI projects · 2020-06-17T14:05:33.036Z · score: 2 (1 votes) · LW · GW
Though the world this points at is pretty scary (a powerful AI system ready to go, only held back by the implementors buying safety concerns), the intervention does seem cheap and good.

By scary, do you mean (or mean to imply) unlikely?

I think that if AI happens soon (<10 years) it'll likely happen at an org we already know about, so 1 is feasible. If AI doesn't happen soon, all bets are off and 1 will be very difficult.

Comment by daniel-kokotajlo on How to analyze progress, stagnation, and low-hanging fruit · 2020-06-16T13:03:46.361Z · score: 4 (3 votes) · LW · GW

Elon Musk is perhaps another piece of evidence for this. Turns out spaceflight, vehicles, and perhaps tunneling and brain-machine interfaces too can all be revolutionized if you get the right team of people working on it. Instead of just saying Elon is amazing, we could say: There are lots of low-hanging fruit to be picked outside computing because computing has sucked up so much of the talent. Elon is good at finding those fruits and attracting talent to work on picking them.

Comment by daniel-kokotajlo on Simulacra Levels and their Interactions · 2020-06-15T14:39:26.917Z · score: 23 (15 votes) · LW · GW

Thanks Zvi, I'd read a bunch of posts on simulacra before but didn't really get it, nor the usefulness of it, until now. The thing that helped the most was laying out the different kinds of people (Oracle, Sage, Lawyer, Drone, etc.).

Comment by daniel-kokotajlo on Preparing for "The Talk" with AI projects · 2020-06-14T14:53:55.730Z · score: 2 (1 votes) · LW · GW

Sure, I'd be happy to talk. Note that I am nowhere near the best person to talk to about this; there are plenty of people who actually work at an AI project, who actually talk to AI scientists regularly, etc.

Comment by daniel-kokotajlo on List of public predictions of what GPT-X can or can't do? · 2020-06-14T14:50:28.953Z · score: 7 (4 votes) · LW · GW

Sorry, I edited to include the timestamp.

I think what he means is: Do the reverse of the word-unscrambling test that they already did in the paper. So, prompt the model with something like this:

Scramble the following words:
Large --> egLar
Orange --> ngareO
Book --> koBo
Snow -->

And see if it answers with a string of four letters, S, n, o, and w, but not in that order.

Comment by daniel-kokotajlo on Coronavirus as a test-run for X-risks · 2020-06-14T09:50:26.692Z · score: 3 (2 votes) · LW · GW

Well, if the takeoff is sufficiently fast, by the time people freak out it will be too late. The question is, how slow does the takeoff need to be, for the MNM effect to kick in at some not-useless point? And what other factors does it depend on, besides speed? It would be great to have a better understanding of this.

Comment by daniel-kokotajlo on Coronavirus as a test-run for X-risks · 2020-06-13T23:16:52.896Z · score: 6 (3 votes) · LW · GW

Nice post! I admit I myself underestimated the ferocity of the public lockdowns in March, and totally didn't predict the R0=1 control system phenomenon. So I'm convinced.

I'd love to see more thought about how the MNM effect might look in an AI scenario. Like you said, maybe denials and assurances followed by freakouts and bans. But maybe we could predict what sorts of events would trigger the shift?

There's a theory which I endorse which goes something like "Change only happens in a crisis. The leaders and the people flail around and grab whatever policy solutions happen to be lying around in prestigious places, and implement them. So, doing academic policy work can be surprisingly impactful; even if no one listens to you now, they might when it really matters."

Comment by daniel-kokotajlo on Blog Post Day II Retrospective · 2020-06-10T22:57:54.963Z · score: 5 (3 votes) · LW · GW

FYI another one is happening in a few days. :)

Comment by daniel-kokotajlo on Blog Post Day II Retrospective · 2020-06-10T22:57:20.916Z · score: 5 (3 votes) · LW · GW

OK, fyi it's happening in a few days. :)

Comment by daniel-kokotajlo on Blog Post Day II Retrospective · 2020-06-10T22:57:04.724Z · score: 5 (3 votes) · LW · GW

OK, fyi it's happening in a few days.

Comment by daniel-kokotajlo on TurnTrout's shortform feed · 2020-06-10T22:54:12.979Z · score: 4 (2 votes) · LW · GW

The people at NeurIPS who reviewed the paper might notice if resubmission occurred elsewhere? Automated tools might help with this, by searching for specific phrases.

There's been talk of having a Journal of Infohazards. Seems like an idea worth exploring to me. Your suggestion sounds like a much more feasible first step.

Problem: Any entity with halfway decent hacking skills (such as a national government, or clever criminal) would be able to peruse the list of infohazardy titles, look up the authors, cyberstalk them, and then hack into their personal computer and steal the files. We could hope that people would take precautions against this, but I'm not very optimistic. That said, this still seems better than the status quo.

Comment by daniel-kokotajlo on [deleted post] 2020-06-08T12:57:35.403Z

Old saying: "He who would sacrifice essential liberty to achieve security deserves neither and will lose both."

I think something like this is typically true with rationality and effective altruism:

"They who imbibe falsehoods in order to win, eventually lose."

It isn't always true, of course. You give some counterexamples above. But it's true often enough, I think, to be worth making into a litany.