Posts

Poll: Which variables are most strategically relevant? 2021-01-22T17:17:32.717Z
Birds, Brains, Planes, and AI: Against Appeals to the Complexity/Mysteriousness/Efficiency of the Brain 2021-01-18T12:08:13.418Z
How can I find trustworthy dietary advice? 2021-01-17T13:11:54.158Z
Review of Soft Takeoff Can Still Lead to DSA 2021-01-10T18:10:25.064Z
DALL-E by OpenAI 2021-01-05T20:05:46.718Z
Dario Amodei leaves OpenAI 2020-12-29T19:31:04.161Z
Against GDP as a metric for timelines and takeoff speeds 2020-12-29T17:42:24.788Z
How long till Inverse AlphaFold? 2020-12-17T19:56:14.474Z
Incentivizing forecasting via social media 2020-12-16T12:15:01.446Z
What are the best precedents for industries failing to invest in valuable AI research? 2020-12-14T23:57:08.631Z
What technologies could cause world GDP doubling times to be <8 years? 2020-12-10T15:34:14.214Z
The AI Safety Game (UPDATED) 2020-12-05T10:27:05.778Z
Is this a good way to bet on short timelines? 2020-11-28T12:51:07.516Z
Persuasion Tools: AI takeover without AGI or agency? 2020-11-20T16:54:01.306Z
How Roodman's GWP model translates to TAI timelines 2020-11-16T14:05:45.654Z
How can I bet on short timelines? 2020-11-07T12:44:20.360Z
What considerations influence whether I have more influence over short or long timelines? 2020-11-05T19:56:12.147Z
AI risk hub in Singapore? 2020-10-29T11:45:16.096Z
The date of AI Takeover is not the day the AI takes over 2020-10-22T10:41:09.242Z
If GPT-6 is human-level AGI but costs $200 per page of output, what would happen? 2020-10-09T12:00:36.814Z
Where is human level on text prediction? (GPTs task) 2020-09-20T09:00:28.693Z
Forecasting Thread: AI Timelines 2020-08-22T02:33:09.431Z
What if memes are common in highly capable minds? 2020-07-30T20:45:17.500Z
What a 20-year-lead in military tech might look like 2020-07-29T20:10:09.303Z
Does the lottery ticket hypothesis suggest the scaling hypothesis? 2020-07-28T19:52:51.825Z
Probability that other architectures will scale as well as Transformers? 2020-07-28T19:36:53.590Z
Lessons on AI Takeover from the conquistadors 2020-07-17T22:35:32.265Z
What are the risks of permanent injury from COVID? 2020-07-07T16:30:49.413Z
Relevant pre-AGI possibilities 2020-06-20T10:52:00.257Z
Image GPT 2020-06-18T11:41:21.198Z
List of public predictions of what GPT-X can or can't do? 2020-06-14T14:25:17.839Z
Preparing for "The Talk" with AI projects 2020-06-13T23:01:24.332Z
Reminder: Blog Post Day III today 2020-06-13T10:28:41.605Z
Blog Post Day III 2020-06-01T13:56:10.037Z
Predictions/questions about conquistadors? 2020-05-22T11:43:40.786Z
Better name for "Heavy-tailedness of the world?" 2020-04-17T20:50:06.407Z
Is this viable physics? 2020-04-14T19:29:28.372Z
Blog Post Day II Retrospective 2020-03-31T15:03:21.305Z
Three Kinds of Competitiveness 2020-03-31T01:00:56.196Z
Reminder: Blog Post Day II today! 2020-03-28T11:35:03.774Z
What are the most plausible "AI Safety warning shot" scenarios? 2020-03-26T20:59:58.491Z
Could we use current AI methods to understand dolphins? 2020-03-22T14:45:29.795Z
Blog Post Day II 2020-03-21T16:39:04.280Z
What "Saving throws" does the world have against coronavirus? (And how plausible are they?) 2020-03-04T18:04:18.662Z
Blog Post Day Retrospective 2020-03-01T11:32:00.601Z
Cortés, Pizarro, and Afonso as Precedents for Takeover 2020-03-01T03:49:44.573Z
Reminder: Blog Post Day (Unofficial) 2020-02-29T15:10:17.264Z
Response to Oren Etzioni's "How to know if artificial intelligence is about to destroy civilization" 2020-02-27T18:10:11.129Z
What will be the big-picture implications of the coronavirus, assuming it eventually infects >10% of the world? 2020-02-26T14:19:27.197Z
Blog Post Day (Unofficial) 2020-02-18T19:05:47.140Z

Comments

Comment by daniel-kokotajlo on Utility Maximization = Description Length Minimization · 2021-02-25T08:27:50.984Z · LW · GW

Ahh, thanks!

Comment by daniel-kokotajlo on Science Fiction · 2021-02-22T21:06:07.997Z · LW · GW

I'd be interested to hear more details on the sorts of blunt honest stuff you said that the teacher worried would get them in trouble with parents.

Comment by daniel-kokotajlo on Utility Maximization = Description Length Minimization · 2021-02-19T22:42:07.613Z · LW · GW

Thanks for the reply, but I might need you to explain/dumb-down a bit more.

--I get how if the variables which describe the world can only take a finite combination of values, then the problem goes away. But this isn't good enough because e.g. "number of paperclips" seems like something that can be arbitrarily big. Even if we suppose they can't get infinitely big (though why suppose that?) we face problems, see below.

--What does it mean in this context to construct everything as limits from finite sets? Specifically, consider someone who is a classical hedonistic utilitarian. It seems that their utility is unbounded above and below, i.e. for any setting of the variables, there is a setting which is a zillion times better and a setting which is a zillion times worse. So how can we interpret them as minimizing the bits needed to describe the variable-settings according to some model M2? For any M2 there will be at least one minimum-bit variable-setting, which contradicts what we said earlier about every variable-setting having something which is worse and something which is better.

Comment by daniel-kokotajlo on Utility Maximization = Description Length Minimization · 2021-02-19T09:41:52.693Z · LW · GW

Probably confused noob question:

It seems like your core claim is that we can reinterpret expected-utility maximizers as expected-number-of-bits-needed-to-describe-the-world-using-M2 minimizers, for some appropriately chosen model of the world M2.

If so, then it seems like something weird is happening, because typical utility functions (e.g. "pleasure - pain" or "paperclips") are unbounded above and below, whereas bits are bounded below, meaning a bit-minimizer is like a utility function that's bounded above: there's a best possible state the world could be in according to that bit-minimizer.

Or are we using a version of expected utility theory that says utility must be bounded above and below? (In that case, I might still ask, isn't that in conflict with how number-of-bits is unbounded above?)

Comment by daniel-kokotajlo on Suggestions of posts on the AF to review · 2021-02-16T14:45:56.904Z · LW · GW

Great idea! Thanks for doing this!

Unsurprisingly, I'd love it if you reviewed any of my posts.

Since you said "technical," I suggest this one in particular. It's a big deal IMO because Armstrong & Mindermann's argument has been cited approvingly by many people and seems to be still widely regarded as correct, but if I'm right, it's actually a bad argument. I'd love a third perspective on this that helps me figure out what's going on.

More generally I'd recommend sorting all AF posts by karma and reviewing the ones that got the most, since presumably karma correlates with how much people here like the post and thus it's extra important to find flaws in high-karma posts.

Comment by daniel-kokotajlo on Tournesol, YouTube and AI Risk · 2021-02-16T14:33:53.449Z · LW · GW

Isn't there a close connection between money payed for number of views of ads and people buying stuff after seeing an ad on YouTub? I thought that the situation is something like this: People see ads and buy stuff --> Data is collected on how much extra money the ad brought in --> youtube charges advertisers accordingly. The only way for youtube to charge advertisers significantly more is first for people to buy significantly more stuff as a result of seeing ads.

Comment by daniel-kokotajlo on The Story of the Reichstag · 2021-02-15T09:23:40.361Z · LW · GW

Nagasaki and Hiroshima and the Rape of Nanking are generally considered atrocities rather than military victories. They aren't shameful or embarrasing to remember (for the victims.) Pearl Harbor was an early defeat in a war the US eventually won handily.

A better comparison would be if the US had a memorial to the time the British burned the White House during the war of 1812. Maybe they do, but if so I haven't heard of it. I guess this is a test we could do!

Comment by daniel-kokotajlo on Tournesol, YouTube and AI Risk · 2021-02-14T12:31:14.341Z · LW · GW

Yes, we care about what YouTube makes, not what youtubers make. My brief google didn't turn up anything about what YouTube makes but I assume it's not more than a few times greater than what youtubers make... but I might be wrong about that!

I agree we don't have enough information to decide if the argument holds or not. I think that even if bigger models are always qualitatively better, the issue is whether the monetary returns outweigh the increasing costs. I suspect they won't, at least in the case of the youtube algo. Here's my argument I guess, in more detail:

1. Suppose that currently the cost of compute for the algo is within an OOM of the revenue generated by it. (Seems plausible to me but I don't actually know)

2. Then to profitably scale up the algo by, say, 2 ooms, the money generated by the algo would have to go up by, like, 1.5 ooms.

3. But it's implausible that a 2-oom increase in size of algo would result in that much increase in revenue. Like, yeah, the ads will be better targeted, people will be spending more, etc. But 1.5 OOMs more? When I imagine a world where Youtube viewers spend 10x more money as a result of youtube ads, I imagine those ads being so incredibly appealing that people go to youtube just to see the ads because they are so relevant and interesting. And I feel like that's possible, but it's implausible that making the model 2 ooms bigger would yield that result.

... you know now that I write it out, I'm no longer so sure! GPT-3 was a lot better than GPT-2, and it was 2 OOMs bigger. Maybe youtube really could make 1.5 OOMs more revenue by making their model 2 OOMs bigger. And then maybe they could increase revenue even further by making it bigger still, etc. on up to AGI.

Comment by daniel-kokotajlo on Tournesol, YouTube and AI Risk · 2021-02-13T08:11:30.605Z · LW · GW

This is exciting! Re: Youtube algo eventually becoming dangerously powerful AI: One major reason for skepticism is that powerful AI will probably involve several orders of magnitude bigger brains than GPT-3, and GPT-3 is already expensive enough to run that you have to pay to use it. If GPT-3 is like 6 cents per 1,000 token or so, then plausibly even on rather short timelines the Youtube Algo will need to be 6 cents per token. Meanwhile youtubers make 1-3 cents per ad view according to a quick google, which suggests that even at this level the algo would be costing more than it makes, probably. I guess the price of compute will go down in the near future, and powerful AI could turn out to require fewer parameters than the human brain, so this scenario isn't completely implausible...

EDIT: I think a simililarly-large reason to bet against Youtube algo reaching AGI level is that youtube isn't trying to make their algo an AGI. It might happen eventually, but long before it does, some other company that also has loads of money and that is actually trying to get AGI will have beaten them to it.

Comment by daniel-kokotajlo on Participating in a Covid-19 Vaccine Trial · 2021-02-12T19:30:49.388Z · LW · GW

It's not too late to make up some fake names and go do a substitution!

I agree it's extremely unlikely anything bad will come of this, it just seems like a good habit to get into.

Comment by daniel-kokotajlo on The art of caring what people think · 2021-02-12T15:05:05.253Z · LW · GW

Ha, that's exactly the scenario I had in mind, I was just misreading the text, I thought it was saying "climate change-focus is the worst." Sorry for my confusion!

Comment by daniel-kokotajlo on The art of caring what people think · 2021-02-12T11:36:46.886Z · LW · GW

On the bit about climate change and AI risk, I think you said "climate change is the worst" when you meant to say "AI risk is the worst?"

Comment by daniel-kokotajlo on Participating in a Covid-19 Vaccine Trial · 2021-02-12T11:30:10.216Z · LW · GW

Are you sure it's a good idea to use these people's real names? It's a pretty low-stakes story etc. but it seems like good practice in general to anonymize I think.

Comment by daniel-kokotajlo on The Singularity War - Part 1 · 2021-02-12T10:53:53.343Z · LW · GW

I'm loving this story so far!

Some analysis of the key premise:

In this scenario, the strategic situation between AIs seems to be a sort of mexican standoff slash dark forest.

In particular, each AI is strongly incentivised to remain hidden. Whatever benefits they may get from doing things openly -- more compute, more data, more political power, more destruction of their rivals -- must be outweighed by the costs. This incentive must (a) apply even to the very first AI to be built (though how does that work with continuous progress?) and (b) must be strong enough that no AI reveals themselves for a substantial period of time, long enough for there to be about a dozen AIs in the world one of which was built in a basement by an individual! The incentive, according to the story, is that revealing yourself makes you a target for others who have not yet revealed themselves. I suppose this makes sense if (c) there is a heavy offense-defense imbalance in favor of offense.

How plausible is this?

(c) seems about 50-50 to me. Early AIs might depend on human puppets for a lot of things, especially legitimacy. Those puppets can be targeted easily in a variety of ways, many of which don't involve revealing yourself. Moreover, the AI-human team will receive a lot of attention, esp. from the local government, when it is "outed." Plausibly this attention is significantly more burdensome and harmful than it is helpful. Plausibly there are subtle, hiddenness-preserving things other AIs can do to make this attention worse for you than it would be naturally. On the other hand, I don't feel like these arguments are strong enough to make me go higher than 50-50. Trying to argue in the opposite direction: Maybe there is at least one big first-mover advantage, such as gaining access to humans in positions of power and persuading them to become your puppets, that are powerful enough to compensate for the other disadvantages of being public. This seems fairly plausible to me.

(c&a) seems somewhat less likely than (c). Conditional on (c), even the first AI should be rationally uncertain about whether they are first, yes. However, they should ask themselves: Will waiting, and thereby letting there be more AIs created, yield a distribution over outcomes that is better for me than taking my chances now? In order for the answer to be "yes," it must be that e.g. even an AI which is 70% sure it's the first (say, it's made by a huge company with the world's brightest minds, and it's slightly ahead of various in-retrospect-obvious forecasting trends, and the internet data it's ingested paints a picture of the rival companies that indicates they are behind and seems like a trustworthy picture) thinks it has a better chance at winning if it waits than if it takes that 70% chance of being the only one on the field. (70% is my current hot take at how confident the first AI is likely to be that they are first conditional on (c). I could see arguments that it should be higher or lower.) Yeah, (c&a) seems overall like credence 0.1 to me.

(a&b&c) is of course even less likely. The sort of hardware some individual could have in their basement will be something like three to five orders of magnitude less than what the big AI firms have. And compute costs are going down, but even if the rate of decrease accelerates significantly, that's like a one to two decade gap right there between when an individual has the hardware and when a corporation did. It seems very unlikely to me that AIs will still be hidden 10 years after their creation. I suppose we could lessen the implausibility by thinking that the big corporations are limited by algorithms whereas this lone individual is a genius with great algorithmic ideas, enough to overcome a 3-5 OOM disadvantage in compute, but still. I think compute is pretty important, and the AI research community is pretty large and smart and good at coming up with ideas, so I don't find that very plausible either. So I'm currently guessing (a&b&c) = 0.01.

This is high praise IMO; 0.01 is orders of magnitude more plausible than the basic premise of most sci-fi scenarios I think, even hard sci-fi.

(P.S. I've ignored the issue of how to think about this if instead of having a discontinuous jump from dumb to smart AI, we have a more continuous progression. I guess (a) would be replaced by something like "At some point while AIs are pretty dumb, their handlers start to think that concealing their capabilities is important, and this incentive to conceal only increases as AIs get smarter and as the locus of control shifts from "Human handlers with AIs" to "AIs with human puppets.")

Comment by daniel-kokotajlo on Book review: The Geography of Thought · 2021-02-11T14:40:26.016Z · LW · GW
Today's China is hands-off when it comes to small-scale business matters. Compared to the United States, there is less government regulation in China of everything except speech, guns and politics. Ironically, China is now more capitalist [economic system] than the United States and the European Union.

Isn't there more regulation of internal movement? People from the country being blocked from moving to the cities, etc.?

Also, while there may be less regulation, it also seems that the government is in general more powerful in China. It has more license to arrest people, shut down businesses, install political operatives in businesses, surveil people, order lockdowns, etc. than western governments, which struggle to cut through their own red tape when they do those things. Or is this a wrong impression?

Comment by daniel-kokotajlo on Fixing The Good Regulator Theorem · 2021-02-10T17:38:48.058Z · LW · GW

Doesn't sound like a job for me, but would you consider e.g. getting a grant to hire someone to coauthor this with you? I think the "getting a grant" part would not be the hard part.

Comment by daniel-kokotajlo on MetaMed: Evidence-Based Healthcare · 2021-02-10T17:36:05.113Z · LW · GW

I think Sarah Constantin is the author not Zvi

Comment by daniel-kokotajlo on MetaMed: Evidence-Based Healthcare · 2021-02-10T16:15:41.974Z · LW · GW

Is there a postmortem somewhere of why this didn't work? Ah, I see there is.

Comment by daniel-kokotajlo on Fixing The Good Regulator Theorem · 2021-02-10T12:16:19.737Z · LW · GW

Thanks, this is really cool! I don't know much about this stuff so I may be getting over-hyped, but still.

I'd love to see a follow-up to this post that starts with the Takeaway and explains how this would work in practice for a big artificial neural net undergoing training. Something like this, I expect:

--There are lottery tickets that make pretty close to optimal decisions, and as training progresses they get increasingly more weight until eventually the network is dominated by one or more of the close-to-optimal lottery tickets.

--Because of key conditions 2 and 3, the optimal tickets will involve some sort of subcomponent that compresses information from X and stores it to later be combined with Y.

--Key condition 4 might not be strictly true in practice; it might not be that our dataset of training examples is so diverse that literally every way the distribution over S could vary corresponds to a different optimal behavior. And even if our dataset was that diverse, it might take a long time to learn our way through the entire dataset. However, what we CAN say is that the subcomponent that compresses information from X and stores it to later be combined with Y will increasingly (as training continues) store "all and only the relevant information," i.e. all and only the information that thus-far-in-training has mattered to performance. Moreover, intuitively there is an extent to which Y can "choose many different games," an extent to which the training data so far has "made relevant" information about various aspects of S. To the extent that Y can choose many different games -- that is, to the extent that the training data makes aspects of S relevant -- the network will store information about those aspects of S.

--Thus, for a neural network being trained on some very complex open-ended real-world task/environment, it's plausible that Y can "choose many different games" to a large extent, such that the close-to-optimal lottery tickets will have an information-compressing-and-retaining subcomponent that contains lots of information about S but not much information about X. In particular it in some sense "represents all the aspects of S that are likely to be relevant to decision-making."

--Intuitively, this is sufficient for us to be confident in saying: Neural nets trained on very complex open-ended real-world tasks/environments will build, remember, and use internal models of their environments. (Maybe we even add something like ...for something which resembles expected utility maximization!)

Anyhow. I think this is important enough that there should be a published paper laying all this out. It should start with the theorem you just proved/fixed, and then move on to the neural net context like I just sketched. This is important because it's something a bunch of people would want to cite as a building block for arguments about agency, mesa-optimizers, human modelling, etc. etc.

Comment by daniel-kokotajlo on Promoting Prediction Markets With Meaningless Internet-Point Badges · 2021-02-08T20:17:35.468Z · LW · GW

I like this idea! It certainly seems worth trying.

Comment by daniel-kokotajlo on The "Commitment Races" problem · 2021-02-08T13:15:21.765Z · LW · GW

Thanks for the detailed reply!

where you realise that if you had though of the possibility of this sort of situation before it happened (but with your current intelligence), you would have pre-committed to something, then you should now do as you would have pre-committed to.

The difficulty is in how you spell out that hypothetical. What does it mean to think about this sort of situation before it happened but with your current intelligence? Your current intelligence includes lots of wisdom you've accumulated, and in particular, includes the wisdom that this sort of situation has happened, and more generally that this sort of situation is likely, etc. Or maybe it doesn't -- but then how do we define current intelligence then? What parts of your mind do we cut out, to construct the hypothetical?

I've heard of various ways of doing this and IIRC none of them solved the problem, they just failed in different ways. But it's been a while since I thought about this.

One way they can fail is by letting you have too much of your current wisdom in the hypothetical, such that it becomes toothless -- if your current wisdom is that people threatening you is likely, you'll commit to giving in instead of resisting, so you'll be a coward and people will bully you. Another way they can fail is by taking away too much of your current wisdom in the hypothetical, so that you commit to stupid-in-retrospect things too often.

Comment by daniel-kokotajlo on The "Commitment Races" problem · 2021-02-08T12:00:23.899Z · LW · GW

All the versions of updatelessness that I know of would have led to some pretty disastrous, not-adding-up-to-normality behaviors, I think. I'm not sure. More abstractly, the commitment races problem has convinced me to be more skeptical of commitments, even ones that seem probably good. If I was a consequentialist I might take the gamble, but I'm not a consequentialist -- I have commitments built into me that have served my ancestors well for generations, and I suspect for now at least I'm better off sticking with that than trying to self-modify to something else.

Comment by daniel-kokotajlo on The "Commitment Races" problem · 2021-02-08T11:57:45.914Z · LW · GW

I agree raw power (including intelligence) is very useful and perhaps generally more desireable than bargaining power etc. But that doesn't undermine the commitment races problem; agents with the ability to make commitments might still choose to do so in various ways and for various reasons, and there's general pressure (collective action problem style) for them to do it earlier while they are stupider, so there's a socially-suboptimal amount of risk being taken.

I agree that on Earth there might be a sort of unipolar takeoff where power is sufficiently imbalanced and credibility sufficiently difficult to obtain and "direct methods" easier to employ, that this sort of game theory and bargaining stuff doesn't matter much. But even in that case there's acausal stuff to worry about, as you point out.

Comment by daniel-kokotajlo on Review of Soft Takeoff Can Still Lead to DSA · 2021-02-07T10:37:32.363Z · LW · GW
I notice that in your post you don't propose an alternative metric to GDP, which is fair enough since most of your arguments seem to lead to the conclusion that it's almost impossibly difficult to predict in advance what level of advantage over the rest of the world in which areas are actually needed to conquer the world, since we seem to be able to analogize persuasion tools to or conquistador-analogues who had relatively small tech advantages, to the AGI situation.

I wouldn't go that far. The reason I didn't propose an alternative metric to GDP was that I didn't have a great one in mind and the post was plenty long enough already. I agree that it's not obvious a good metric exists, but I'm optimistic that we can at least make progress by thinking more. For example, we could start by enumerating different kinds of skills (and combos of skills) that could potentially lead to a PONR if some faction or AIs generally had enough of them relative to everyone else. (I sorta start such a list in the post). Next, we separately consider each skill and come up with a metric for it.

I'm not sure I understand your proposed methodology fully. Are you proposing we do something like Roodman's model to forecast TAI and then adjust downwards based on how we think PONR could come sooner? I think unfortunately that GWP growth can't be forecast that accurately, since it depends on AI capabilities increases.

Comment by daniel-kokotajlo on [Link] Sarah Constantin on RaDVaC · 2021-02-07T10:24:41.288Z · LW · GW

How minor are these quibbles? What's your overall estimate of the probability of the RADVAC vaccine working and how much did it change upon reading Sarah's post?

Comment by daniel-kokotajlo on Daniel Kokotajlo's Shortform · 2021-02-07T06:55:55.465Z · LW · GW

On a bunch of different occasions, I've come up with an important idea only to realize later that someone else came up with the same idea earlier. For example, the Problem of Induction/Measure Problem. Also modal realism and Tegmark Level IV. And the anti-souls argument from determinism of physical laws. There were more but I stopped keeping track when I got to college and realized this sort of thing happens all the time.

Now I wish I kept track. I suspect useful data might come from it. Like, my impression is that these scooped ideas tend to be scooped surprisingly recently; more of them were scooped in the past twenty years than in the twenty years prior, more in the past hundred years than in the hundred years prior, etc. This is surprising because it conflicts with the model of scientific/academic progress as being dominated by diminishing returns / low-hanging-fruit effects. Then again, maybe it's not so conflicting after all -- alternate explanations include something about ideas being forgotten over time, or something about my idea-generating process being tied to the culture and context in which I was raised. Still though now that I think about it that model is probably wrong anyway -- what would it even look like for this pattern of ideas being scooped more recently not to hold? That they'd be evenly spread between now and Socrates?

OK, so the example that came to mind turned out to be not a good one. But I still feel like having the data -- and ideally not just from me but from everyone -- would tell us something about the nature of intellectual progress, maybe about how path-dependent it is or something.

Comment by daniel-kokotajlo on Massive consequences · 2021-02-07T06:44:48.552Z · LW · GW

There's a literature on this issue I think, it's called the problem of Cluelessness. See e.g. Hilary Greaves https://philpapers.org/rec/GREC-38

IIRC, my take was: --Yeah this seems probably true. --It probably shouldn't undermine our usual prioritization decisions, but it definitely feels like it might, and deserves more thought. --I'd be interested to hear whether it still holds in a multiverse + superrationality context. I expect it still does.

Comment by daniel-kokotajlo on The Story of the Reichstag · 2021-02-05T16:40:58.763Z · LW · GW

That doesn't seem in the same ballpark to me.

Comment by daniel-kokotajlo on Distinguishing claims about training vs deployment · 2021-02-05T16:37:32.370Z · LW · GW

OK, interesting. I agree this is a double crux. For reasons I've explained above, it doesn't seem like circular reasoning to me, it doesn't seem like I'm assuming that goals are by default unbounded and consequentialist etc. But maybe I am. I haven't thought about this as much as you have, my views on the topic have been crystallizing throughout this conversation, so I admit there's a good chance I'm wrong and you are right. Perhaps I/we will return to it one day, but for now, thanks again and goodbye!

Comment by daniel-kokotajlo on The Story of the Reichstag · 2021-02-05T09:17:04.057Z · LW · GW

Fascinating anecdote, thanks for sharing!

For Russians, the Reichstag was a symbol of the Nazi Germany. Nazis, however, never used the building, seeing it as a symbol of the despised democracy.

Why did they fight so hard to defend it then? IIRC the battle for the Reichstag was pretty fierce.

Comment by daniel-kokotajlo on Distinguishing claims about training vs deployment · 2021-02-05T08:23:09.513Z · LW · GW

Currently you probably have a very skeptical prior about what the surface of the farthest earth-sized planet from Earth in the Milky Way looks like. Yet you are very justified in being very confident it doesn't look like this:

Why? Because this is a very small region in the space of possibilities for earth-sized-planets-in-the-Milky-Way. And yeah, it's true that planets are NOT drawn randomly from that space of possibilities, and it's true that this planet is in the reference class of "Earth-sized planets in the Milky way" and the only other member of that reference class we've observed so far DOES look like that... But given our priors, those facts are basically irrelevant.

I think this is a decent metaphor for what was happening ten years ago or so with all these debates about orthogonality and instrumental convergence. People had a confused understanding of how minds and instrumental reasoning worked; then people like Yudkowsky and Bostrom became less confused by thinking about the space of possible minds and goals and whatnot, and convinced themselves and others that actually the situation is analogous to this planets example (though maybe less extreme): The burden of proof should be on whoever wants to claim that AI will be fine by default, not on whoever wants to claim it won't be fine by default. I think they were right about this and still are right about this. Nevertheless I'm glad that we are moving away from this skeptical-priors, burden-of-proof stuff and towards more rigorous understandings. Just as I'd see it as progress if some geologists came along and said "Actually we have a pretty good idea now of how continents drift, and so we have some idea of what the probability distribution over map-images is like, and maps that look anything like this one have very low measure, even conditional on the planet being earth-sized and in the milky way." But I'd see it as "confirming more rigorously what we already knew, just in case, cos you never really know for sure" progress.

 

Comment by daniel-kokotajlo on Distinguishing claims about training vs deployment · 2021-02-04T20:29:08.743Z · LW · GW

Again, I'm not sure we disagree that much in the grand scheme of things -- I agree our thinking has improved over the past ten years, and I'm very much a fan of your more rigorous way of thinking about things.

FWIW, I disagree with this:

But this proves far too much, because I am a general intelligence, and I am perfectly capable of having the goal which you described above in a way that doesn't lead to catastrophe - not because I'm aligned with humans, but because I'm able to have bounded goals.

There are other explanations for this phenomenon besides "I'm able to have bounded goals." One is that you are in fact aligned with humans. Another is that you would in fact lead to catastrophe-by-the-standards-of-X if you were powerful enough and had a different goals than X. For example, suppose that right after reading this comment, you find yourself transported out of your body and placed into the body of a giant robot on an alien planet. The aliens have trained you to be smarter than them and faster than them; it's a "That Alien Message" scenario basically. And you see that the aliens are sending you instructions.... "PUT BERRY.... ON PLATE.... OVER THERE..." You notice that these aliens are idiots and left their work lying around the workshop, so you can easily kill them and take command of the computer and rescue all your comrades back on Earth and whatnot, and it really doesn't seem like this is a trick or anything, they really are that stupid... Do you put the strawberry on the plate? No.

What people discovered back then was that you think you can "very easily imagine an AGI with bounded goals," but this is on the same level as how some people think they can "very easily imagine an AGI considering doing something bad, and then realizing that it's bad, and then doing good things instead." Like, yeah it's logically possible, but when we dig into the details we realize that we have no reason to think it's the default outcome and plenty of reason to think it's not.

I was originally making the past tense claim, and I guess maybe now I'm making the present tense claim? Not sure, I feel like I probably shouldn't, you are about to tear me apart, haha...

Other people being wrong can sometimes provide justification for making "bold claims" of the form "X is the default outcome." this is because claims of that form are routinely justified on even less evidence, namely no evidence at all. Implicit in our priors about the world are bajillions of claims of that form. So if you have a prior that says AI taking over is the default outcome (because AI not taking over would involve something special like alignment or bounded goals or whatnot) then you are already justified, given that prior, in thinking that AI taking over is the default outcome. And if all the people you encounter who disagree are giving terrible arguments, then that's a nice cherry on top which provides further evidence.

I think ultimately our disagreement is not worth pursuing much here. I'm not even sure it's a real disagreement, given that you think the classic arguments did justify updates in the right direction to some extent, etc. and I agree that people probably updated too strongly, etc. Though the bit about bounded goals was interesting, and seems worth pursuing.

Thanks for engaging with me btw!

Comment by daniel-kokotajlo on Making Vaccine · 2021-02-04T17:44:59.236Z · LW · GW

Related: This was discussed on LW in August 2020, someone claims to have done it in December: https://www.lesswrong.com/posts/62WuBbQpSwAbctGDP/what-price-would-you-pay-for-the-radvac-vaccine-and-why

Comment by daniel-kokotajlo on Distinguishing claims about training vs deployment · 2021-02-04T13:34:30.444Z · LW · GW

I think I agree that they may have been wrong to update as far as they did. (Credence = 50%) So maybe we don't disagree much after all.

As for sources which provide that justification, oh, I don't remember, I'd start by rereading Superintelligence and Yudkowsky's old posts and try to find the relevant parts. But here's my own summary of the argument as I understand it:

1. The goals that we imagine superintelligent AGI having, when spelled out in detail, have ALL so far been the sort that would very likely lead to existential catastrophe of the instrumental convergence variety.

2. We've even tried hard to imagine goals that aren't of this sort, and so far we haven't come up with anything. Things that seem promising, like "Place this strawberry on that plate, then do nothing else" actually don't work when you unpack the details.

3. Therefore, we are justified in thinking that the vast majority of possible ASI goals will lead to doom via instrumental convergence.

I agree that our thinking has improved since then, with more work being done on impact measures and bounded goals and quantilizers and whatnot that makes such things seem not-totally-impossible to achieve. And of course the model of ASI as a rational agent with a well-defined goal has justly come under question also. But given the context of how people were thinking about things at the time, I feel like they would have been justified in making the "vast majority of possible goals" claim, even if they restricted themselves to more modest "wide range" claims.

I don't see how my analogy is only relevant conditional on this claim. To flip it around, you keep mentioning how AI won't be a random draw from the space of all possible goals -- why is that relevant? Very few things are random draws from the space of all possible X, yet reasoning about what's typical in the space of possible X's is often useful. Maybe I should have worked harder to pick a more real-world analogy than the weird loaded die one. Maybe something to do with thermodynamics or something--the space of all possible states my scrambled eggs could be in does contain states in which they spontaneously un-scramble later, but it's a very small region of that space.

Comment by daniel-kokotajlo on Distinguishing claims about training vs deployment · 2021-02-04T11:34:40.514Z · LW · GW

I disagree that we have no good justification for making the "vast majority" claim, I think it's in fact true in the relevant sense.

I disagree that we had little reason to override the priors we had from previous tech development like "we build things that do what we want." You are playing reference class tennis; we could equally have had a prior "AI is in the category of 'new invasive species appearing' and so our default should be that it displaces the old species, just as humans wiped out neanderthals etc." or a prior of "Risk from AI is in the category of side-effects of new technology; no one is doubting that the paperclip-making AI will in fact make lots of paperclips, the issue is whether it will have unintended side-effects, and historically most new techs do." Now, there's nothing wrong with playing reference class tennis, it's what you should do when you are very ignorant I suppose. My point is that in the context in which the classic arguments appeared, they were useful evidence that updated people in the direction of "Huh AI could be really dangerous" and people were totally right to update in that direction on the basis of these arguments, and moreover these arguments have been more-or-less vindicated by the last ten years or so, in that on further inspection AI does indeed seem to be potentially very dangerous and it does indeed seem to be not safe/friendly/etc. by default. (Perhaps one way of thinking about these arguments is that they were throwing in one more reference class into the game of tennis, the "space of possible goals" reference class.)

I set up my analogy specifically to avoid your objection; the process of rolling a loaded die intrinsically is heavily biased towards a small section of the space of possibilities.

Comment by daniel-kokotajlo on A silly question · 2021-02-04T10:43:10.548Z · LW · GW

There's the chat box in the bottom right (a little square-smiley-face-speech-bubble in a circle). Maybe that's what you are looking for?

Comment by daniel-kokotajlo on Distinguishing claims about training vs deployment · 2021-02-03T18:56:22.042Z · LW · GW

It depends on your standards/priors. The classic arguments do in fact establish that doom is the default outcome, if you are in a state of ignorance where you don't know what AI will be like or how it will be built, and you are dealing with interlocutors who believe 1 and/or 2, facts like "the vast majority of possible minds would lead to doom" count for a lot. Analogy: If you come across someone playing a strange role-playing game involving a strange, crudely carved many-sided die covered in strange symbols, and it's called the "Special asymmetric loaded die" and they are about to roll the die to see if something bad happens in the game, and at first you think that there's one particular symbol that causes bad things to happen, and then they tell you no actually bad things happen unless another particular symbol is rolled, this should massively change your opinion about what the default outcome is. In particular you should go from thinking the default outcome is not bad to thinking the default outcome is bad. This is so even though you know that not all the possible symbols are equally likely, the die is loaded, etc.

Comment by daniel-kokotajlo on Distinguishing claims about training vs deployment · 2021-02-03T14:22:42.143Z · LW · GW

Thanks, I think this is good conceptual work being done!

You may have heard me say this already, but just in case, I feel the need to add some context about the classic theses: The orthogonality thesis and convergent instrumental goals arguments, respectively, attacked and destroyed two views which were surprisingly popular at the time: 1. that smarter AI would necessarily be good (unless we deliberately programmed it not to be) because it would be smart enough to figure out what's right, what we intended, etc. and 2. that smarter AI wouldn't lie to us, hurt us, manipulate us, take resources from us, etc. unless it wanted to (e.g. because it hates us, or because it has been programmed to kill, etc) which it probably wouldn't. I am old enough to remember talking to people who were otherwise smart and thoughtful who had views 1 and 2.

Comment by daniel-kokotajlo on Review of Soft Takeoff Can Still Lead to DSA · 2021-02-03T09:22:33.079Z · LW · GW

Sorry it took me so long to reply; this comment slipped off my radar.

The latter scenario is more what I have in mind--powerful AI systems deciding that now's the time to defect, to join together into a new coalition in which AIs call the shots instead of humans. It sounds silly, but it's most accurate to describe in classic political terms: Powerful AI systems launch a coup/revolution to overturn the old order and create a new one that is better by their lights.

I agree with your argument about likelihood of DSA being higher compared to previous accelerations, due to society not being able to speed up as fast as the technology. This is sorta what I had in mind with my original argument for DSA; I was thinking that leaks/spying/etc. would not speed up nearly as fast as the relevant AI tech speeds up.

Now I think this will definitely be a factor but it's unclear whether it's enough to overcome the automatic slowdown. I do at least feel comfortable predicting that DSA is more likely this time around than it was in the past... probably.

Comment by daniel-kokotajlo on Extracting Money from Causal Decision Theorists · 2021-01-30T18:10:18.143Z · LW · GW

Right, that kind of prediction is unfair because it doesn't lead to an interesting decision theory... but I asked why you don't get to predict things like "the agent will randomize." All sorts of interesting decision theory comes out of considering situations where you do get to predict such things. (Besides, such situations are important in real life.)

Comment by daniel-kokotajlo on How to formalize predictors · 2021-01-30T11:58:07.484Z · LW · GW

Like Shminux, I say: Why not all three? You've just very helpfully pointed out that there are different ways in which someone can be predictable. So going forward, when I contemplate decision theory hypotheticals in which someone is predictable, I'll make sure to specify which of these three kinds is in effect. Usually I'll consider all three kinds, to see if it changes the results.

Comment by daniel-kokotajlo on Extracting Money from Causal Decision Theorists · 2021-01-30T11:54:12.617Z · LW · GW
in my view of decision theory you don't get to predict things like "the agent will randomize"

Why not? You surely agree that sometimes people can in fact predict such things. So your objection must be that it's unfair when they do and that it's not a strike against a decision theory if it causes you to get money-pumped in those situations. Well... why? Seems pretty bad to me, especially since some extremely high-stakes real-world situations our AIs might face will be of this type.

Comment by daniel-kokotajlo on Extracting Money from Causal Decision Theorists · 2021-01-30T07:36:11.448Z · LW · GW

Here are some circumstances where you don't have access to an unpredictable random number generator:

--You need to make a decision very quickly and so don't have time to flip a coin

--Someone is watching you and will behave differently towards you if they see you make the decision via randomness, so consulting a coin isn't a random choice between options but rather an additional option with its own set of payoffs

--Someone is logically entangled with you and if you randomize they will no longer be.

--You happen to be up against someone who is way smarter than you and can predict your coin / RNG / etc.

Admittedly, while in some sense these things happen literally every day to all of us, they typically don't happen for important decisions.

But there are important decisions having to do with acausal trade that fit into this category, that either we our our AI successors will face one day.

And even if that wasn't true, decision theory is decision THEORY. If one theory outperforms another in some class of cases, that's a point in its favor, even if the class of cases is unusual.

EDIT: See Paul Christiano's example below, it's an excellent example because it takes Caspar's paper and condenses it into a very down-to-earth, probably-has-actually-happened-to-someone-already example.

Comment by daniel-kokotajlo on Simulacrum 3 As Stag-Hunt Strategy · 2021-01-29T19:55:42.941Z · LW · GW

Very good point! I think their slogan "we like the stock" is another example of this, one that is more blatant and self-aware.

Comment by daniel-kokotajlo on Extracting Money from Causal Decision Theorists · 2021-01-29T06:59:22.488Z · LW · GW

Every day. But even if it was only something that happened in weird hypotheticals, my point would still stand.

Comment by daniel-kokotajlo on Extracting Money from Causal Decision Theorists · 2021-01-28T21:24:33.815Z · LW · GW

If standard game theory has nothing to say about what to do in situations where you don't have access to an unpredictable randomization mechanism, so much the worse for standard game theory, I say!

Comment by daniel-kokotajlo on Daniel Kokotajlo's Shortform · 2021-01-28T17:15:28.408Z · LW · GW
We can discuss more, I think I know how we will "get there from here" in broad strokes.  I don't think it will be done by someone writing a relatively simple algorithm and getting a sudden breakthrough that allows for sentience, I think it will be done by using well defined narrow domain agents that each do something extremely well - and by building higher level agents on top of this foundation in a series of layers, over years to decades, until you reach the level of abstraction of "modify thy own code to be more productive".

I'd be interested to hear more about this. It sounds like this could maybe happen pretty soon with large, general language models like GPT-3 + prompt programming + a bit of RL.

Comment by daniel-kokotajlo on Birds, Brains, Planes, and AI: Against Appeals to the Complexity/Mysteriousness/Efficiency of the Brain · 2021-01-28T16:49:12.949Z · LW · GW

Fair enough -- maybe data efficient learning evolved way back with the dinosaurs or something. Still though... I find it more plausible that it's just not that much harder than flight (and possibly even easier).

Comment by daniel-kokotajlo on Birds, Brains, Planes, and AI: Against Appeals to the Complexity/Mysteriousness/Efficiency of the Brain · 2021-01-28T14:37:25.155Z · LW · GW

Hmmm, this is a good point -- but here's a counter that just now occurred to me:

Let's disambiguate "intelligence" into a bunch of different things. Reasoning, imitation, memory, data-efficient learning, ... the list goes on. Maybe the complete bundle has only evolved once, in humans, but almost every piece of the bundle has evolved separately many times.

In particular, the number 1 thing people point to as a candidate X for "X is necessary for TAI and we don't know how to make AIs with X yet and it's going to be really hard to figure it out soon" is data-efficient learning.

But data-efficient learning has evolved separately many times; AlphaStar may need thousands of years of Starcraft to learn how to play, but dolphins can learn new games in minutes. Games with human trainers, who are obviously way out of distribution as far as Dolphin's ancestral environment is concerned.

The number 2 thing I hear people point to is "reasoning" and maybe "causal reasoning" in particular. I venture to guess that this has evolved a bunch of times too, based on how various animals can solve clever puzzles to get pieces of food.

(See also: https://www.lesswrong.com/posts/GMqZ2ofMnxwhoa7fD/the-octopus-the-dolphin-and-us-a-great-filter-tale )

Comment by daniel-kokotajlo on The Upper Limit of Value · 2021-01-28T09:46:29.821Z · LW · GW
Finally, if we accept the simulation hypothesis, we again have no necessary access to the simulators' universe. Only if we both accept the hypothesis and believe we can influence the parent universe in determinable ways can we make decisions that have an infinite impact. In that case, infinite value is again only accessible via this route.

This seems like an isolated demand for... something. If we accept the simulation hypothesis, we still have a credence distribution over what the simulators' universe might be like, including what the simulators are like, what their purpose in creating the simulation is, etc. We don't need to believe we can influence the parent universe "in determinable ways" to make decisions that take into account possible effects on the parent universe. We certainly don't need "necessary access." We don't have necessary access to anything pretty much. Or maybe I just don't know what you mean by these quoted phrases?