Blog Post Day II Retrospective 2020-03-31T15:03:21.305Z · score: 15 (7 votes)
Three Kinds of Competitiveness 2020-03-31T01:00:56.196Z · score: 31 (9 votes)
Reminder: Blog Post Day II today! 2020-03-28T11:35:03.774Z · score: 16 (3 votes)
What are the most plausible "AI Safety warning shot" scenarios? 2020-03-26T20:59:58.491Z · score: 35 (12 votes)
Could we use current AI methods to understand dolphins? 2020-03-22T14:45:29.795Z · score: 9 (3 votes)
Blog Post Day II 2020-03-21T16:39:04.280Z · score: 38 (10 votes)
What "Saving throws" does the world have against coronavirus? (And how plausible are they?) 2020-03-04T18:04:18.662Z · score: 25 (12 votes)
Blog Post Day Retrospective 2020-03-01T11:32:00.601Z · score: 26 (6 votes)
Cortés, Pizarro, and Afonso as Precedents for Takeover 2020-03-01T03:49:44.573Z · score: 101 (39 votes)
Reminder: Blog Post Day (Unofficial) 2020-02-29T15:10:17.264Z · score: 29 (5 votes)
Response to Oren Etzioni's "How to know if artificial intelligence is about to destroy civilization" 2020-02-27T18:10:11.129Z · score: 28 (8 votes)
What will be the big-picture implications of the coronavirus, assuming it eventually infects >10% of the world? 2020-02-26T14:19:27.197Z · score: 37 (22 votes)
Blog Post Day (Unofficial) 2020-02-18T19:05:47.140Z · score: 46 (16 votes)
Simulation of technological progress (work in progress) 2020-02-10T20:39:34.620Z · score: 20 (11 votes)
A dilemma for prosaic AI alignment 2019-12-17T22:11:02.316Z · score: 43 (12 votes)
A parable in the style of Invisible Cities 2019-12-16T15:55:06.072Z · score: 28 (12 votes)
Why aren't assurance contracts widely used? 2019-12-01T00:20:21.610Z · score: 33 (11 votes)
How common is it for one entity to have a 3+ year technological lead on its nearest competitor? 2019-11-17T15:23:36.913Z · score: 53 (15 votes)
Daniel Kokotajlo's Shortform 2019-10-08T18:53:22.087Z · score: 5 (2 votes)
Occam's Razor May Be Sufficient to Infer the Preferences of Irrational Agents: A reply to Armstrong & Mindermann 2019-10-07T19:52:19.266Z · score: 49 (14 votes)
Soft takeoff can still lead to decisive strategic advantage 2019-08-23T16:39:31.317Z · score: 114 (50 votes)
The "Commitment Races" problem 2019-08-23T01:58:19.669Z · score: 67 (29 votes)
The Main Sources of AI Risk? 2019-03-21T18:28:33.068Z · score: 78 (33 votes)


Comment by daniel-kokotajlo on Cortés, Pizarro, and Afonso as Precedents for Takeover · 2020-04-06T18:56:15.131Z · score: 8 (4 votes) · LW · GW

Update: According to the wiki article on lateen sails, they existed for several hundred years in the Mediterranean before spreading to the Atlantic, and the Nile, and then finally they arrived in the Indian ocean with the Portuguese, at which point the locals quickly adopted it on their vessels also. (Within 20 years!)

What the hell? Why did it take so long? If it was so good that it gave a huge advantage, such that everyone copied the design within two decades of the Portuguese arrival... why did no one notice this for almost a thousand years? Surely there were travelers who sailed on both the Med. and Red seas, for example. Surely the Ottomans and Mamelukes, who maintained fleets in both the Med. and the Indian Ocean, should have been able to realize that the lateen sail was a thing and would be useful? (Especially since being able to sail against the wind seems super useful precisely when the wind doesn't change direction very much, e.g. in monsoon-regions like the Indian Ocean) Also apparently the pacific islanders independently invented the lateen sail, yet it didn't spread from there to the Indian Ocean either. I am very confused.

Comment by daniel-kokotajlo on Conflict vs. mistake in non-zero-sum games · 2020-04-06T15:38:34.215Z · score: 5 (3 votes) · LW · GW

I like this theory. It seems to roughly map to how the distinction works in practice, too. However: Is it true that mistake theorists feel like they'll be in a better negotiating position later, and conflict theorists don't?

Take, for example, a rich venture capitalist and a poor cashier. If we all cooperate to boost the economy 10x, such that the existing distribution is maintained but everyone is 10x richer in real terms... yeah, I guess that would put the cashier in a worse negotiating position relative to the venture capitalist, because they'd have more stuff and hence less to complain about, and their complaining would be seen more as envy and less as righteous protest.

What about two people, one from an oppressor group and one from an oppressed group, in the social justice sense? E.g. man and woman? If everyone gets 10x richer, then arguably that would put the man in a worse negotiating position relative to the woman, because the standard rationales for e.g. gender roles, discrimination, etc. would seem less reasonable: So what if men's sports make more money and thus pouring money into male salaries is a good investment whereas pouring it into female salaries is a money sink? We are all super rich anyway, you can afford to miss out on some profits. (Contrast this with e.g. a farmer in 1932 arguing that his female workers are less strong than the men, and thus do less work, and thus he's gonna pay them less, so he can keep his farm solvent. When starvation or other economic hardships are close at hand, this argument is more appealing.)

More abstractly, it seems to me that the richer we all are, the more "positional" goods matter, intuitively. When we are all starving, things like discrimination and hate speech seem less pressing, compared to when we all have plenty.

Interesting. Those are the first two examples I thought of, and the first one seems to support your theory and the second one seems to contradict it. Not sure what to make of this. My intuitions might be totally wrong of course.

Comment by daniel-kokotajlo on Three Kinds of Competitiveness · 2020-04-04T19:15:54.327Z · score: 2 (1 votes) · LW · GW

I had been thinking it is sometimes nice to talk about competitiveness of AI designs more generally, not just alignment schemes. e.g. neuromorphic AI is more date-competitive, cost-competitive, and performance-competitive than uploads, probably. (It might be less date-competitive though).

Comment by daniel-kokotajlo on Implications of the Doomsday Argument for x-risk reduction · 2020-04-03T20:57:51.543Z · score: 5 (3 votes) · LW · GW

Agreed. If you want to talk more about these ideas sometime, I'd be happy to video chat!

Comment by daniel-kokotajlo on Implications of the Doomsday Argument for x-risk reduction · 2020-04-03T15:32:16.110Z · score: 5 (2 votes) · LW · GW

re: inverse proportinality: Good point, I'll have to think about that more. Maybe it does neatly cancel out, or even worse, since my utility function isn't linear in happy lives lived, maybe it more than cancels out.

I for one have seriously investigated all those weird philosophical ideas you mentioned. ;) And I think our community has been pretty good about taking these ideas seriously, especially compared to, well, literally every other community, including academic philosophy. Our overton window definitely includes all these ideas, I'd say.

But I agree with your general point that there is a tension we should explore. Even if we are OK seriously discussing these ideas, we often don't actually live by them. Our overton window includes them, but our median opinion doesn't. Why not?

I think there is a good answer, and it has to do with humility/caution. Philosophy is weird. If you follow every argument where it leads you, you very quickly find that your beliefs don't add up to normality, or anything close. Faith that beliefs will (approximately) add up to normality seems to be important for staying sane and productive, and moreover, seems to have been vindicated often in the past: crazy-sounding arguments turn out to have flaws in them, or maybe they work but there is an additional argument we hadn't considered that combines with it to add up to normality.

Comment by daniel-kokotajlo on Implications of the Doomsday Argument for x-risk reduction · 2020-04-03T12:38:58.313Z · score: 3 (2 votes) · LW · GW

Yeah. It depends on how you define extinction. I agree that most simulations don't last very long. (You don't even need the doomsday argument to get that conclusion, I think)

Comment by daniel-kokotajlo on Implications of the Doomsday Argument for x-risk reduction · 2020-04-03T01:39:35.843Z · score: 4 (3 votes) · LW · GW

I'd like to see someone explore the apparent contradiction in more detail. Even if I were convinced that we will almost certainly fail, I might still prioritize x-risk reduction, since the stakes are so high.

Anyhow, my guess is that most people think the doomsday argument probably doesn't work. I am not sure myself. If it does work though, its conclusion is not that we will all go extinct soon, but rather that ancestor simulations are one of the main uses of cosmic resources.

Comment by daniel-kokotajlo on What achievements have people claimed will be warning signs for AGI? · 2020-04-01T14:29:50.182Z · score: 9 (5 votes) · LW · GW

AI Impacts has a list of reasons people give for why current methods won't lead to human-level AI. With sources. It's not exactly what you are looking for, but it's close, because most of these could be inverted and used as warning signs for AGI, e.g. "Current methods can't build good, explanatory causal models" becomes "When we have AI which can build good, explanatory causal models, that's a warning sign."

Comment by daniel-kokotajlo on Call for volunteers: assessing Kurzweil, 2019 · 2020-04-01T13:28:02.156Z · score: 4 (2 votes) · LW · GW

I'd be happy to volunteer a bit. I don't have much time, but this sounds fun, so maybe I could do a few.

Comment by daniel-kokotajlo on Blog Post Day II Retrospective · 2020-04-01T10:50:19.022Z · score: 4 (3 votes) · LW · GW

OK, so you + Gyrodiot are making me think maybe I should do another one soon. But to be honest I need to focus less on blogging and more on working for a bit, so I personally won't be ready for at least a few weeks I think.

Whenever it happens, I should schedule it far in advance I think. That way people have more of a chance to find out about it.

Comment by daniel-kokotajlo on Three Kinds of Competitiveness · 2020-03-31T20:01:57.619Z · score: 2 (1 votes) · LW · GW

Oh right, how could I forget! This makes me very happy. :D

Comment by daniel-kokotajlo on Three Kinds of Competitiveness · 2020-03-31T17:42:13.744Z · score: 3 (2 votes) · LW · GW

Good point about inner alignment problems being a blocker to date-competitiveness for IDA... but aren't they also a blocker to date-competitiveness for every other alignment scheme too pretty much? What alignment schemes don't suffer from this problem?

I'm thinking "Do anything useful that a human with a lot of time can do" is going to be substantially less capable than full-blown superintelligent AGI. However, that's OK, because we can use IDA as a stepping-stone to that. IDA gets us an aligned system substantially more capable than a human, and we use that system to solve the alignment problem and build something even better.

It's interesting how Paul advocates merging cost and performance-competitiveness, and you advocate merging performance and date-competitiveness. I think it's fine to just talk about "competitiveness" full stop, and only bother to specify what we mean more precisely when needed. Sometimes we'll mean one of the three, sometimes two of the three, sometimes all three.

Comment by daniel-kokotajlo on Three Kinds of Competitiveness · 2020-03-31T17:34:29.489Z · score: 2 (1 votes) · LW · GW

Yes. An upgrade to an AI that makes it run faster, with no side-effects, would be an improvement to both performance and cost-competitiveness.

Comment by daniel-kokotajlo on Three Kinds of Competitiveness · 2020-03-31T17:33:26.052Z · score: 2 (1 votes) · LW · GW

I knew that the goal was to get IDA to be cost-competitive, but I thought current versions of it weren't. But that was just my rough impression; glad to be wrong, since it makes IDA seem even more promising. :) Of all the proposals I've heard of, IDA seems to have the best combination of cost, date, and performance-competitiveness.

Comment by daniel-kokotajlo on Three Kinds of Competitiveness · 2020-03-31T17:29:34.548Z · score: 2 (1 votes) · LW · GW

I agree this may be true in most cases, but the chance of it not being true for AI is large enough to motivate the distinction. Besides, not all cases in which performance and cost can be traded off are the same; in some scenarios the "price" of performance is very high whereas in other scenarios it is low. (e.g. in Gradual Economic Takeover, let's say, a system being twice as qualitatively intelligent could be equivalent to being a quarter the price. Whereas in Final Conflict, a system twice as qualitatively intelligent would be equivalent to being one percent the price.) So if we are thinking of a system as "competitive with X% overhead," well, X% is going to vary tremendously depending on which scenario is realized. Seems worth saying e.g. "costs Y% more compute, but is Z% more capable."

Comment by daniel-kokotajlo on Three Kinds of Competitiveness · 2020-03-31T11:34:51.548Z · score: 2 (1 votes) · LW · GW

Mmm, nice. Thanks! I like your distinction also. I think yours is sufficiently different that we shouldn't see the two sets of distinctions as competing.* A system which has an objective which would be capable on paper but isn't capable in practice due to inner-alignment failures would be performance-uncompetitive but objective-competitive. For this reason I think we shouldn't equate objective and performance competitiveness.

If operating an AI system turns out to be an important part of the cost, then cost+date competitiveness would turn out to be different from training competitiveness, because cost competitiveness includes whatever the relevant costs are. However I expect operating costs will be much less relevant to controlling the future than costs incurred during the creation of the system (all that training, data-gathering, infrastructure building, etc.) so I think the mapping between cost+date competitiveness and training competitiveness basically works.

*Insofar as they are competing, I still prefer mine; as you say, it applies to more than just prosaic AI alignment proposals. Moreover, it makes it easier for us to talk about competitions as well, e.g. "In the FOOM scenario we need to win a date competition; cost-competitiveness still matters but not as much." Moreover cost, performance, and date are fairly self-explanatory terms, whereas as you point out "objective" is more opaque. Moreover I think it's worth distinguishing between cost and date competitiveness; in some scenarios one will be much more important than the other, and of course the two kinds of competitiveness vary independently in AI safety schemes (indeed maybe they are mildly anti-correlated? Some schemes are fairly well-defined and codified already, but would require tons of compute, whereas other schemes are more vague and thus would require tons of tweaking and cautious testing to get right, but don't take that much compute. I do like how your version maps more onto the inner vs. outer alignment distinction.

Comment by daniel-kokotajlo on Three Kinds of Competitiveness · 2020-03-31T01:14:39.543Z · score: 3 (2 votes) · LW · GW

Some thoughts that came to me after I wrote this post:

--I'm not sure I should define date-competitive the way I do. Maybe instead of "can be built" it should be "is built." If we go the latter route, the FOOM scenario is an extremely intense date competition. If we go the former route, the FOOM scenario is not necessarily an intense date competition; it depends on what other factors are at play. For example, maybe there are only a few major AI projects and all of them are pretty socially responsible, so a design is more likely to win if it can be built sooner, but it won't necessarily win; maybe cooler heads will prevail and build a safer design instead.

--Why is date-competitiveness worth calling a kind of competitiveness at all? Why not just say: "We want our AI safety scheme/design to be cost- and performance-competitive, and also we need to be able to build it fairly quickly compared to the other stuff that gets built." Well, 1. Even that is clunky and awkward compared to the elegant "...and also date-competitive." 2. It really does have the comparative flavor of competition to it; what matters is not how long it takes us to complete our safety scheme, but how long it takes relative to unaligned schemes, and it's not as simple as just "we need to be first," rather it's that sooner is better but doing it later isn't necessarily game over... 3. It seems to be useful for describing date competitions, which are important to distinguish from situations which are not date competitions or less so. (Aside: A classic criticism of the "Let's build uploads first, and upload people we trust" strategy is that neuromorphic AI will probably come before uploads. In other words, this strategy is not date-competitive.)

--I'm toying with the idea of adding "alignment-competitiveness" (meaning, as aligned or more aligned than competing systems) and "alignment competition" to the set of definitions. This sounds silly, but it would be conceptually neat, because then we can say: We hope for scenarios in which control of the future is a very intense alignment competition, and we are working hard to make it that way. "

Comment by daniel-kokotajlo on Any rebuttals of Christiano and AI Impacts on takeoff speeds? · 2020-03-29T11:33:38.614Z · score: 8 (5 votes) · LW · GW

Just wanna say, I intend to get around to writing rebuttals someday. I definitely have several counterarguments in mind; the forceful takedowns you mention weren't very convincing to me, though they did make me update away from fast takeoff.

Comment by daniel-kokotajlo on Benito's Shortform Feed · 2020-03-28T23:30:03.874Z · score: 2 (1 votes) · LW · GW

Well, that wasn't the scenario I had in mind. The scenario I had in mind was: People in the year 2030 pass a law requiring future governments to make ancestor simulations with happy afterlives, because that way it's probable that they themselves will be in such a simulation. (It's like cryonics, but cheaper!) Then, hundreds or billions of years later, the future government carries out the plan, as required by law.

Not saying this is what we should do, just saying it's a decision I could sympathize with, and I imagine it's a decision some fraction of people would make, if they thought it was an option.

Comment by daniel-kokotajlo on When to assume neural networks can solve a problem · 2020-03-28T17:48:56.084Z · score: 5 (3 votes) · LW · GW

Interesting. Well, I imagine you don't have the time right now, but I just want to register that I'd love to hear more about this. What questionable assumptions does Superintelligence make, that aren't made by Human Compatible? (This request for info goes out to everyone, not just Rohin)

Comment by daniel-kokotajlo on When to assume neural networks can solve a problem · 2020-03-28T11:08:50.970Z · score: 1 (2 votes) · LW · GW
It’s hard to summarize without apparently straw-man arguments, e.g. “AIX + Moore’s law means that all powerful superhuman intelligence is dangerous, inevitable and close.” That’s partly because I’ve never seen a consistent top-to-bottom reasoning for it. Its proponents always seem to start by assuming things which I wouldn’t hold as given about the ease of data collection, the cost of computing power, the usefulness of intelligence.

I object to pretty much everything in this quote. I think the straw-man argument you give is pretty obviously worse than many other summaries you could give, e.g. Stuart Russell's "Look, humans have a suite of mental abilities that gives them dominance over all other life forms on this planet. The goal of much AI research is to produce something which is better in those mental abilities than humans. What if we succeed? We'd better figure out how to prevent history from repeating itself, and we'd better do it before it's too late."

Also no one in the AI safety sphere thinks that all powerful superhuman intelligence is dangerous; otherwise what would be the point of AI alignment research?

Also if you read almost anything on the subject, people will be constantly saying how they don't think superhuman intelligence is inevitable or close. Have you even read Superintelligence?

What do you mean, you've never seen a consistent top-to-bottom reasoning for it? This is not a rhetorical question, I am just not sure what you mean here. If you are accusing e.g. Bostrom of inconsistency, I am pretty sure you are wrong about that. If you are just saying he hasn't got an argument in premise-conclusion form, well, that seems true but not very relevant or important. I could make one for you if you like.

I don't know what assumptions you think the case for AI safety depends on -- ease of data collection? Cost of computing power? Usefulness of intelligence? -- but all three of these things seem like things that people have argued about at length, not assumed. Also the case for AI safety doesn't depend on these things being probable, only on them being not extremely unlikely.

Comment by daniel-kokotajlo on Benito's Shortform Feed · 2020-03-28T10:54:55.209Z · score: 2 (1 votes) · LW · GW

I'm not sure it makes sense either, but I don't think it is accurately described as "cause yourself to believe false things." I think whether or not it makes sense comes down to decision theory. If you use evidential decision theory, it makes sense; if you use causal decision theory, it doesn't. If you use functional decision theory, or updateless decision theory, I'm not sure, I'd have to think more about it. (My guess is that updateless decision theory would do it insofar as you care more about yourself than others, and functional decision theory wouldn't do it even then.)

Comment by daniel-kokotajlo on When to assume neural networks can solve a problem · 2020-03-27T22:39:05.678Z · score: 4 (3 votes) · LW · GW

Thanks for writing this and posting it here. I for one am a big fan of the "Bostromian position" as you call it, and moreover I think Stuart Russell is too. ("Human Compatible" is making basically the same points as "Superintelligence," only in a dumbed-down and streamlined manner, with lots of present-day examples to illustrate.) So I don't think your dismissal of positions 1 and 2 is fair. But I'm glad to see dialogue happening between the likes of me and the likes of you.

Moreover, I think you are actually right about Scott Alexander's GPT-2 chess thingy. As you've explained, we knew neural nets could do this sort of thing already, and so we shouldn't be too surprised if GPT-2 can do it too with a little retraining.

I suppose, in Scott's defense, perhaps he wasn't surprised but rather just interested, and using the fact that GPT-2 can play chess to argue for some further claim about how useful neural nets are in general and how soon AGI will appear. But I currently prefer your take.

Comment by daniel-kokotajlo on What are the most plausible "AI Safety warning shot" scenarios? · 2020-03-27T19:54:47.978Z · score: 5 (4 votes) · LW · GW

Thanks for this reply. Yes, I was talking about intent alignment warning shots. I agree it would be good to consider smaller warning shots that convince, say, 10% of currently-skeptical people. (I think it is too early to say whether COVID-19 is a 50%-warning shot for existential risk from pandemics. If it does end up killing millions, the societal incompetence necessary to get us to that point will be apparent to most people, I think, and thus most people will be on board with more funding for pandemic preparedness even if before they would have been "meh" about it.) If we are looking at 10%-warning shots, smaller-scale things like you are talking about will be more viable.

(Whereas if we are looking at 50%-warning shots, it seems like at least attempting to take over the world is almost necessary, because otherwise skeptics will say "OK yeah so one bad apple embezzled some funds, that's a far cry from taking over the world. Most AIs behave exactly as intended, and no small group of AIs has the ability to take over the world even if it wanted to.")

I'm not imagining that they all want to take over the world. I was just imagining that minor failures wouldn't be sufficiently convincing to count as 50%-warning shots, and it seems you agree with me on that.

Yes, I think it's true of humans: Almost all humans are incapable of getting even close to taking over the world. There may be a few humans who have a decent shot at it and also the motivation and incaution to try it, but they are a very small fraction. And if they were even more competent than they already are, their shot at it would be more than decent. I think the crux of our whole disagreement here was just the thing you identified right away about 50% vs. 10% warning shots. Obviously there are plenty of humans capable and willing to do evil things, and if doing evil things is enough to count as a warning shot, then yeah it's not true of humans, and neither would it be true of AI.

I think you've also pointed out an unfairness in my definition, which was about single events. A series of separate minor events gradually convincing most skeptics is just as good, and now that you mention it, much more likely. I'll focus on these sorts of things from now on, when I think of warning shots.

Comment by daniel-kokotajlo on AGI in a vulnerable world · 2020-03-27T13:18:53.509Z · score: 3 (2 votes) · LW · GW

I agree about the strong commercial incentives, but I don't think we will be in a context where people will follow their incentives. After all, there are incredibly strong incentives not to make AGI at all until you can be very confident it is perfectly safe -- strong enough that it's probably not a good idea to pursue AI research at all until AI safety research is much more well-established than it is today -- and yet here we are.

Basically, people won't recognize their incentives, because people won't realize how much danger they are in.

Comment by daniel-kokotajlo on What are the most plausible "AI Safety warning shot" scenarios? · 2020-03-27T12:14:48.336Z · score: 2 (1 votes) · LW · GW

I'm not ready to give up on prediction yet, but yeah I agree with your basic point. Nice phrase about hands and fingers. My overall point is that this doesn't seem like a plausible warning shot; we are basically hoping that something we haven't accounted for will come in and save us.

Comment by daniel-kokotajlo on What are the most plausible "AI Safety warning shot" scenarios? · 2020-03-27T11:37:56.400Z · score: 2 (1 votes) · LW · GW

Mmm, OK, but if it takes long enough for the copy to damage the original, the original won't care. So it just needs to create a copy with a time-delay.

Comment by daniel-kokotajlo on Benito's Shortform Feed · 2020-03-27T11:12:26.635Z · score: 2 (1 votes) · LW · GW

One big reason why it makes sense is that the simulation is designed for the purpose of accurately representing reality.

Another big reason why (a version of it) makes sense is that the simulation is designed for the purpose of inducing anthropic uncertainty in someone at some later time in the simulation. e.g. if the point of the simulation is to make our AGI worry that it is in a simulation, and manipulate it via probable environment hacking, then the simulation will be accurate and lawful (i.e. un-tampered-with) until AGI is created.

I think "polluting the lake" by increasing the general likelihood of you (and anyone else) being in a simulation is indeed something that some agents might not want to do, but (a) it's a collective action problem, and (b) plenty of agents won't mind it that much, and (c) there are good reasons to do it even if it has costs. I admit I am a bit confused about this though, so thank you for bringing it up, I will think about it more in the coming months.

Comment by daniel-kokotajlo on Benito's Shortform Feed · 2020-03-27T11:07:04.271Z · score: 5 (3 votes) · LW · GW

Your first point sounds like it is saying we are probably in a simulation, but not the sort that should influence our decisions, because it is lawful. I think this is pretty much exactly what Bostrom's Simulation Hypothesis is, so I think your first point is not an argument for the second disjunct of the simulation argument but rather for the third.

As for the second point, well, there are many ways for a simulation to be unlawful, and only some of them are undesirable--for example, a civilization might actually want to induce anthropic uncertainty in itself, if it is uncertainty about whether or not it is in a simulation that contains a pleasant afterlife for everyone who dies.

Comment by daniel-kokotajlo on What are the most plausible "AI Safety warning shot" scenarios? · 2020-03-27T00:22:22.054Z · score: 3 (2 votes) · LW · GW

Currently, the answer that seems most plausible to me is an AGI that is (a) within the narrow range I described, and (b) willing to take a gamble for some reason -- perhaps, given its values and situation, it has little to gain from biding its time. So it makes an escape-and-conquest attempt even though it knows it only has a 1% chance of success; it gets part of the way to victory and then fails. I think I'd assign, like, 5% credence to something like this happening.

Comment by daniel-kokotajlo on What are the most plausible "AI Safety warning shot" scenarios? · 2020-03-27T00:17:22.443Z · score: 4 (2 votes) · LW · GW

Thanks. Hmm, I guess these still don't seem that plausible to me. What is your credence that something in the category you describe will happen, and count as a warning shot?

(It's possible that an AI might shoot itself in the foot, but before it does anything super scary, such that it doesn't have the warning shot effect.)

Note my edit to the original question about the meaning of "substantial."

Comment by daniel-kokotajlo on What are the most plausible "AI Safety warning shot" scenarios? · 2020-03-27T00:12:48.491Z · score: 4 (2 votes) · LW · GW

Thank you. See, this sort of thing illustrates why I wanted to ask the question -- the examples you gave don't seem plausible to me, (that is, it seems <1% likely that something like that will happen). Probably AI will understand chaos theory before it does a lot of damage; ditto for pascal's mugging, etc. Probably a myopic AI won't actually be able to hack nukes while also being unable to create non-myopic copies of itself. Etc.

As for really well-written books... We've already had a few great books, and they moved the needle, but by "substantial fraction" I meant something more than that. If I had to put a number on it, I'd say something that convinces more than half of the people who are (at the time) skeptical or dismissive of AI risk to change their minds. I doubt a book will ever achieve this.

Comment by daniel-kokotajlo on AGI in a vulnerable world · 2020-03-26T20:53:37.944Z · score: 4 (2 votes) · LW · GW

But I am skeptical that we'll get sufficiently severe warning shots. I think that by the time AGI gets smart enough to cause serious damage, it'll also be smart enough to guess that humans would punish it for doing so, and that it would be better off biding its time.

Comment by daniel-kokotajlo on AGI in a vulnerable world · 2020-03-26T20:46:06.812Z · score: 9 (3 votes) · LW · GW

I think AGIs which are copies of each other -- even AGIs which are built using the same training method -- are likely to coordinate very well with each other even if they are not given information about each other's existence. Basically, they'll act like one agent, as far as deception and treacherous turns and decisive strategic advantage are concerned.

EDIT: Also, I suspect this coordination might extend further, to AGIs with different architectures also. Thus even the third-tier $10K AGIs might effectively act as co-conspirators with the latest model, and/or vice versa.

Comment by daniel-kokotajlo on AGI in a vulnerable world · 2020-03-26T20:44:27.983Z · score: 4 (2 votes) · LW · GW

I guess you are more optimistic than me about humanity. :) I hope you are right!

Good point about the warning shots leading to common knowledge thing. I am pessimistic that mere argumentation and awareness-raising will be able to achieve an effect that large, but combined with a warning shot it might.

Comment by daniel-kokotajlo on AGI in a vulnerable world · 2020-03-26T13:03:44.500Z · score: 4 (3 votes) · LW · GW

I think many-people-can-build-AGI scenarios are unlikely because before they happen, we'll be in a situation where a-couple-people-can-build-AGI, and probably someone will at that point. And once there is at least one AGI running around, things will either get a lot worse or a lot better very quickly.

I think many-people-can-build-AGI scenarios are still likely enough to be worth thinking about, though, because they could happen if there is a huge amount of hardware overhang (and insufficient secrecy about AGI-building techniques) or if there is a successful-for-some-time policy effort to ban or restrict AGI research.

I think the second scenario you bring up is also interesting. It's sorta a rejection of my "things will either get a lot worse or a lot better very quickly" claim above. I think it is also plausible enough to think more about.

Comment by daniel-kokotajlo on Could we use current AI methods to understand dolphins? · 2020-03-25T20:28:53.909Z · score: 3 (2 votes) · LW · GW

Thanks, I found that explanation very helpful.

Comment by daniel-kokotajlo on Blog Post Day II · 2020-03-23T22:21:58.471Z · score: 5 (3 votes) · LW · GW

Sounds fine to me!

Comment by daniel-kokotajlo on What are examples of Rationalist posters or Rationalist poster ideas? · 2020-03-22T17:38:42.513Z · score: 3 (2 votes) · LW · GW

I made this meme a while back

Comment by daniel-kokotajlo on Could we use current AI methods to understand dolphins? · 2020-03-22T17:30:22.207Z · score: 4 (2 votes) · LW · GW

So the blocker I mentioned. OK, thanks. Well, maybe we could make a translator between whales and dolphins then.

Or we could make a translator between a corpus of scuba diver conversations and dolphins.

We might be able to parse dolphin signals into separate words using ordinary unsupervised learning, no?

Why does the relative size of the vocabularies matter? I'd guess it would be irrelevant, the main factor would be how much overlap the two languages have. Maybe the absolute (as opposed to relative) sizes would matter.

Comment by daniel-kokotajlo on Alignment as Translation · 2020-03-20T13:36:45.783Z · score: 4 (2 votes) · LW · GW

Holy shit, that's awesome. I wonder if it would work to figure out what dolphins, whales, etc. are saying.

Comment by daniel-kokotajlo on Alignment as Translation · 2020-03-19T23:21:23.617Z · score: 4 (2 votes) · LW · GW

"What if we verify the translation by having one group translate English-to-Korean, another group translate back, and reward both when the result matches the original?"

This is a fun idea. Does it work in practice for machine translation?

In the AI safety context, perhaps it would look like: A human gives an AI in a game world some instructions. The AI then goes and does stuff in the game world, and another AI looks at it and reports back to the human. The human then decides whether the report is sufficiently similar to the instructions that both AIs deserve reward.

I feel like eventually this would reach a bad equilibria where the acting-AI just writes out the instructions somewhere and the reporting-AI just reports what they see written.

Comment by daniel-kokotajlo on Positive Feedback -> Optimization? · 2020-03-18T00:59:00.508Z · score: 2 (1 votes) · LW · GW

Mmm, good point. My hasty generalization was perhaps too hasty. Perhaps we need some sort of robust-to-different-initial-conditions sort of criterion.

Comment by daniel-kokotajlo on Positive Feedback -> Optimization? · 2020-03-17T18:41:26.837Z · score: 4 (2 votes) · LW · GW

+1. The multiple feedback loops have to be competing in some important sense; it's just not true that "whenever there’s a dynamical system containing multiple instabilities (i.e. positive feedback loops) ... there should be a canonical way to interpret that system as multiple competing subsystems..."

In the OP's case study, the molecules are competing for scarce resources. More abstractly, perhaps we can say that there are multiple feedback loops such that when the system has travelled far enough in the direction pushed by one feedback loop, it destroys or otherwise seriously inhibits movement in the directions pushed by the other feedback loops.

Comment by daniel-kokotajlo on Interfaces as a Scarce Resource · 2020-03-15T12:25:03.539Z · score: 2 (1 votes) · LW · GW

But it's not novel, though? Like, I feel like everyone already knows it's important to think about companies in terms of what they do, i.e. in terms of what they take as inputs and then what they produce as outputs.

Comment by daniel-kokotajlo on Interfaces as a Scarce Resource · 2020-03-15T00:16:45.357Z · score: 3 (2 votes) · LW · GW

I feel like if we count Amazon as an interface company, then we're going to have to count pretty much everything as an interface company, and the concept becomes trivial. If Amazon is an interface between factories and consumers, then factories are an interface between raw materials and Amazon, and Rio Tinto is an interface between Mother Earth and factories.

Comment by daniel-kokotajlo on Cortés, Pizarro, and Afonso as Precedents for Takeover · 2020-03-13T13:19:26.085Z · score: 3 (2 votes) · LW · GW

Update: I do think it would be good to look at the Black Death in Europe and see whether there were similar political "upsets" where a small group of outsiders took over a large region in the turmoil. I predict that there mostly weren't; if it turns out this did happen a fair amount, then I agree that is good evidence that disease was really important.

Comment by daniel-kokotajlo on Cortés, Pizarro, and Afonso as Precedents for Takeover · 2020-03-13T13:10:01.042Z · score: 3 (2 votes) · LW · GW

I accept that these points are evidence in your favor. Here are some more of my own:

--Smallpox didn't hit the Aztecs until Cortes had already killed the Emperor and allied with the Tlaxcalans, if I'm reading these summaries correctly. (I really should go read the actual books...) So it seems that Cortes did get really far on the path towards victory without the help of disease. More importantly, there doesn't seem to be any important difference in how people treated Cortes before or after the disease. They took him very seriously, underestimated him, put too much trust in him, allied with him, etc. before the disease was a factor.

--When Pizarro arrived in Inca lands, the disease had already swept through, if I'm reading these stories right. So the period of most chaos and uncertainty was over; people were rebuilding and re-organizing.

--Also, it wasn't actually a 90% reduction in population. It was more like a 50% reduction at the time, if I am remembering right. (Later epidemics would cause further damage, so collectively they were worse than any other plague in history.) This is comparable to e.g. the Black Death in Europe, no? But the Black Death didn't result in the collapse of most civilizations who went through it, nor did it result in random small groups of adventurers taking over governments, I predict. (I haven't actually read up on the history of it)

Comment by daniel-kokotajlo on Anthropic effects imply that we are more likely to live in the universe with interstellar panspermia · 2020-03-11T21:20:20.227Z · score: 4 (2 votes) · LW · GW

Ahhh, OK now I think I agree with you. Thanks.

Not sure about the proof by contrary, I'll need to think about it more.

Comment by daniel-kokotajlo on Anthropic effects imply that we are more likely to live in the universe with interstellar panspermia · 2020-03-11T16:35:04.469Z · score: 2 (1 votes) · LW · GW

I disagree that there are (many) more civilizations and simulations in panspermia universes. Panspermia spreads life at a much slower rate than colonization shockwaves do, and so the colonization shockwave will catch up to and surpass the fruits of panspermia. The pre-colonization, post-panspermia civilizations will be a drop in the bucket compared to the simulations and whatnot created post-colonization. So yeah there will be more, but only a tiny drop more.