Posts

What can we learn from insecure domains? 2024-11-01T23:53:30.066Z
Why is there Nothing rather than Something? 2024-10-26T12:37:50.204Z
Most arguments for AI Doom are either bad or weak 2024-10-12T11:57:50.840Z
COT Scaling implies slower takeoff speeds 2024-09-28T16:20:00.320Z
If I wanted to spend WAY more on AI, what would I spend it on? 2024-09-15T21:24:46.742Z
How do we know dreams aren't real? 2024-08-22T12:41:57.380Z
an effective ai safety initiative 2024-05-06T07:53:34.205Z
Anti MMAcevedo Protocol 2024-04-16T22:32:28.629Z
Is there a "critical threshold" for LLM scaling laws? 2024-03-30T12:23:11.938Z
Are AIs conscious? It might depend 2024-03-15T23:09:26.621Z
Theoretically, could we balance the budget painlessly? 2024-01-03T14:46:04.753Z
Various AI doom pathways (and how likely they are) 2023-12-14T00:54:09.424Z
Normative Ethics vs Utilitarianism 2023-11-30T15:36:00.602Z
AI Alignment [progress] this Week (11/19/2023) 2023-11-21T16:09:40.996Z
AI Alignment [progress] this Week (11/12/2023) 2023-11-14T22:21:06.205Z
AI Alignment [Progress] this Week (11/05/2023) 2023-11-07T13:26:21.995Z
AI Alignment [progress] this Week (10/29/2023) 2023-10-30T15:02:26.265Z
ELI5 Why isn't alignment *easier* as models get stronger? 2023-10-28T14:34:37.588Z
AI Alignment [Incremental Progress Units] this Week (10/22/23) 2023-10-23T20:32:37.998Z
A NotKillEveryoneIsm Argument for Accelerating Deep Learning Research 2023-10-19T16:28:52.218Z
AI Alignment [Incremental Progress Units] this week (10/08/23) 2023-10-16T01:46:56.193Z
AI Alignment Breakthroughs this week (10/08/23) 2023-10-08T23:30:54.924Z
AI Alignment Breakthroughs this Week [new substack] 2023-10-01T22:13:48.589Z
Instrumental Convergence Bounty 2023-09-14T14:02:32.989Z
An embedding decoder model, trained with a different objective on a different dataset, can decode another model's embeddings surprisingly accurately 2023-09-03T11:34:20.226Z
Towards Non-Panopticon AI Alignment 2023-07-06T15:29:39.705Z
Re: The Crux List 2023-06-01T04:48:24.320Z
Malthusian Competition (not as bad as it seems) 2023-05-25T15:30:18.534Z
Corrigibility, Much more detail than anyone wants to Read 2023-05-07T01:02:35.442Z
Where is all this evidence of UFOs? 2023-05-01T12:13:33.706Z
What if we Align the AI and nobody cares? 2023-04-19T20:40:30.251Z
A List of things I might do with a Proof Oracle 2023-02-05T18:14:27.701Z
2+2=π√2+n 2023-02-03T22:27:22.247Z
A post-quantum theory of classical gravity? 2023-01-23T20:39:03.564Z
2022 was the year AGI arrived (Just don't call it that) 2023-01-04T15:19:55.009Z
Natural Categories Update 2022-10-10T15:19:11.107Z
What is the "Less Wrong" approved acronym for 1984-risk? 2022-09-10T14:38:39.006Z
A Deceptively Simple Argument in favor of Problem Factorization 2022-08-06T17:32:24.251Z
Bureaucracy of AIs 2022-06-09T23:03:06.608Z
An Agent Based Consciousness Model (unfortunately it's not computable) 2022-05-21T23:00:47.417Z
The Last Paperclip 2022-05-12T19:25:50.891Z
Various Alignment Strategies (and how likely they are to work) 2022-05-03T16:54:17.173Z
How confident are we that there are no Extremely Obvious Aliens? 2022-05-01T10:59:41.956Z
Does the Structure of an algorithm matter for AI Risk and/or consciousness? 2021-12-03T18:31:40.185Z
AGI is at least as far away as Nuclear Fusion. 2021-11-11T21:33:58.381Z
How much should you be willing to pay for an AGI? 2021-09-20T11:51:33.710Z
The Walking Dead 2021-07-22T16:19:48.355Z
Against Against Boredom 2021-05-16T18:19:59.909Z
TAI? 2021-03-30T12:41:29.790Z
(Pseudo) Mathematical Realism Bad? 2020-11-22T18:21:30.831Z

Comments

Comment by Logan Zoellner (logan-zoellner) on A shortcoming of concrete demonstrations as AGI risk advocacy · 2024-12-13T19:38:58.419Z · LW · GW

If I think AGI x-risk is >>10%, and you think AGI x-risk is 1-in-a-gazillion, then it seems self-evident to me that we should be hashing out that giant disagreement first; and discussing what if any government regulations would be appropriate in light of AGI x-risk second

 

I do not think arguing about p(doom) in the abstract is a useful exercise.  I would prefer the Overton Window for p(doom) look like 2-20%, Zvi thinks it should be 20-80%.  But my real disagreement with Zvi is not that his P(doom) is too high, it is that he supports policies that would make things worse.

As for the outlier cases (1-in-a-gazillon or 99.5%), I simply doubt those people are amenable to rational argumentation.  So, I suspect the best thing to do is to simply wait for reality to catch up to them. I doubt when there are 100M's of humanoid robots out there on the streets, people will still be asking "but how will the AI kill us?"

(If it makes you feel any better, I have always been mildly opposed to the six month pause plan.)

That does make me feel better.

Comment by Logan Zoellner (logan-zoellner) on A shortcoming of concrete demonstrations as AGI risk advocacy · 2024-12-13T18:37:05.973Z · LW · GW

It's hard for me to know what's crux-y without a specific proposal. 

I tend to take a dim view of proposals that have specific numbers in them (without equally specific justifications). Examples include the six month pause, and sb 1047.

Again, you can give me an infinite number of demonstrations of "here's people being dumb" and it won't cause me to agree with "therefore we should also make dumb laws"

If you have an evidence-based proposal to reduce specific harms associated with "models follow goals" and "people are dumb", then we can talk price.

Comment by Logan Zoellner (logan-zoellner) on A shortcoming of concrete demonstrations as AGI risk advocacy · 2024-12-13T15:39:10.821Z · LW · GW

“OK then! So you’re telling me: Nothing bad happened, and nothing surprising happened. So why should I change my attitude?”

 

I consider this an acceptable straw-man of my position.

To be clear, there are some demos that would cause me to update.

For example, I think the Solomonoff Prior is Malign to be basically a failure to do counting correctly.  And so if someone demonstrated a natural example of this, I would be forced to update.

Similarly, I think the chance of a EY-style utility-maximizing agent arising from next-token-prediction are (with caveats) basically 0%.  So if someone demonstrated this, it would update my priors. I am especially unconvinced of the version of this where the next-token predictor simulates a malign agent and the malign agent then hacks out of the simulation.

But no matter how many times I am shown "we told the AI to optimize a goal and it optimized the goal... we're all doomed", I will continue to not change my attitude.

Comment by Logan Zoellner (logan-zoellner) on Why Isn't Tesla Level 3? · 2024-12-11T19:57:45.472Z · LW · GW

Tesla fans will often claim that Tesla could easily do this

 

Tesla fan here. 

Yes, Tesla can easily do the situation you've described (stop and go traffic on a highway in good weather with no construction). With higher reliability than human beings.

I suspect the reason Tesla is not pursuing this particular certification is because given the current rate of progress it would be out of date by the time it was authorized.  There have been several significant leaps in capabilities in the last 2 years (11->12, 12->12.6, and I've been told 12->13).  Most likely Elon (who has undeniably been over optimistic) is waiting to get FSD certified until it is at least level 4.

It's worth noting that Tesla has significantly relaxed the requirements for FSD (from "hands on wheel" to "eyes on road") and has done so for all circumstances, not just optimal ones.

Comment by Logan Zoellner (logan-zoellner) on How should TurnTrout handle his DeepMind equity situation? · 2024-12-03T13:02:35.475Z · LW · GW

Seems like he could just fake this by writing a note to his best friend that says "during the next approved stock trading window I will sell X shares of GOOG to you for Y dollars".  

Admittedly:
1. technically this is a derivative (maybe illegal?)
2. principal agent risk (he might not follow through on the note)
3. his best friend might encourage him to work harder for GOOG to succeed

But I have a hard time believing any of those would be a problem in the real world, assuming TurnTrout and his friend are reasonably virtuous about actually not wanting TurnTrout to make a profit off of GOOG.

You could come up with more complicated versions of the same thing.  For example instead of his best friend, TurnTrout could gift the profit to an for-charity LLC that had AI Alignment as its mandate.  This would (assuming it was set up correctly) eliminate 1. and 3.

Comment by Logan Zoellner (logan-zoellner) on How should TurnTrout handle his DeepMind equity situation? · 2024-12-02T15:45:03.765Z · LW · GW

Isn't there just literally a financial product for this?  TurnTrout could sell Puts for GOOG exactly equal to his vesting amounts/times.

Comment by Logan Zoellner (logan-zoellner) on China Hawks are Manufacturing an AI Arms Race · 2024-11-30T21:52:19.635Z · LW · GW

Einstein didn't write a half-assed NYT op-ed about how vague 'advances in science' might soon lead to new weapons of war and the USA should do something about that; he wrote a secret letter hand-delivered & pitched to President Roosevelt by a trusted advisor.

Strongly agree.

What other issues might there be with this new ad hoced strategy...?

I am not a China Hawk.  I do not speak for the China Hawks.  I 100% concede your argument that these conversations should be taking place in a room that neither you our I are in right now.

Comment by Logan Zoellner (logan-zoellner) on China Hawks are Manufacturing an AI Arms Race · 2024-11-30T19:48:21.385Z · LW · GW

I would like to see them state things a little more clearly than commentators having to guess 'well probably it's supposed to work sorta like this idk?'

Meh.  I want the national security establishment to act like a national security establishment.  I admit it is frustratingly opaque from the outside, but that does not mean I want more transparency at the cost of it being worse.  Tactical Surprise and Strategic Ambiguity are real things with real benefits.

A great example, thank you for reminding me of it as an illustration of the futility of these weak measures which are the available strategies to execute.

I think both can be true true: Stuxnet did not stop the Iranian nuclear program and if there was a "destroy all Chinese long-range weapons and High Performance Computing clusters" NATSEC would pound that button.  

Is your argument that a 1-year head start on AGI is not enough to build such a button, or do you really think it wouldn't be pressed?

It is a major, overt act of war and utter alarming shameful humiliating existential loss of national sovereignty which crosses red lines so red that no one has even had to state them - an invasion that no major power would accept lying down and would likely trigger a major backlash

The game theory implications of China waking up to finding all of their long-range military assets and GPUs have been destroyed are not what you are suggesting.  A very telling current example being the current Iranian non-response to Israel's actions against Hamas/Hezbollah.

Nukes were a hyper-exponential curve too.

While this is a clever play on words, it is not a good argument.  There are good reasons to expect AGI to affect the offense-defense balance in ways that are fundamentally different from nuclear weapons. 

Comment by Logan Zoellner (logan-zoellner) on China Hawks are Manufacturing an AI Arms Race · 2024-11-30T16:07:08.976Z · LW · GW

Because the USA has always looked at the cost of using that 'robust military superiority', which would entail the destruction of Seoul and possibly millions of deaths and the provoking of major geopolitical powers - such as a certain CCP - and decided it was not worth the candle, and blinked, and kicked the can down the road, and after about three decades of can-kicking, ran out of road.

 

I can't explicitly speak for the China Hawks (not being one myself), but I believe one of the working assumptions is that AGI will allow the "league of free nations" to disarm China without the messiness of millions of deaths.  Probably this is supposed to work like EY's "nanobot swarm that melts all of the GPUs".  

I agree that the details are a bit fuzzy, but from an external perspective "we don't publicly discuss capabilities" and "there are no adults in the room" are indistinguishable.  OpenAI openly admits the plan is "we'll as the AGI what to do".  I suspect NATSEC's position is more like "amateurs discuss tactics, experts discuss logistics" (i.e. securing decisive advantage is more important that planning out exactly how to melt the GPUs)

To believe that the same group that pulled of Stuxnet and this lack the imagination or will to use AGI enabled weapons strikes me as naive, however.

 

The USA, for example, has always had 'robust military superiority' over many countries it desired to not get nukes, and yet, which did get nukes.

It's also worth nothing AGI is not a zero-to-one event but rather a hyper-exponential curve.  Theoretically it may be possible to always stay far-enough-ahead to have decisive advantage (unlike nukes where even a handful is enough to establish MAD).

Comment by Logan Zoellner (logan-zoellner) on China Hawks are Manufacturing an AI Arms Race · 2024-11-30T10:57:09.682Z · LW · GW

Okay, this at least helps me better understand your position.  Maybe you should have opened with "China Hawks won't do the thing they've explicitly and repeatedly said they are going to do"
 


 

Comment by Logan Zoellner (logan-zoellner) on China Hawks are Manufacturing an AI Arms Race · 2024-11-30T00:54:33.266Z · LW · GW

What does winning look like? What do you do next?

 

This question is a perfect mirror of the brain-dead "how is AGI going to kill us?" question.  I could easily make a list of 100 things you might do if you had AGI supremacy and wanted to suppress the development of AGI in China.  But the whole point of AGI is that it will be smarter than me, so anything I put on the list would be redundant.

Comment by Logan Zoellner (logan-zoellner) on AI #92: Behind the Curve · 2024-11-29T13:18:06.356Z · LW · GW

Playing the AIs definitely seems like the most challenging role

 

Seems like a missed opportunity not having the AIs be played bi AIs

Comment by Logan Zoellner (logan-zoellner) on "The Solomonoff Prior is Malign" is a special case of a simpler argument · 2024-11-26T11:33:18.255Z · LW · GW

yes

Comment by Logan Zoellner (logan-zoellner) on "The Solomonoff Prior is Malign" is a special case of a simpler argument · 2024-11-25T15:30:19.386Z · LW · GW

This is a bad argument, and to understand why it is bad, you should consider why you don't routinely have the thought "I am probably in a simulation, and since value is fragile the people running the simulation probably have values wildly different than human values so I should do something insane right now"

Comment by Logan Zoellner (logan-zoellner) on China Hawks are Manufacturing an AI Arms Race · 2024-11-21T12:17:01.653Z · LW · GW

Chinese companies explicitly have a rule not to release things that are ahead of SOTA (I've seen comments of the form "trying to convince my boss this isn't SOTA so we can release it" on github repos).  So "publicly release Chinese models are always slightly behind American ones" doesn't prove much.

Comment by Logan Zoellner (logan-zoellner) on Could we use current AI methods to understand dolphins? · 2024-11-10T12:25:02.795Z · LW · GW

Current AI methods are basically just fancy correlations, so unless the thing you are looking for is in the dataset (or is a simple combination of things in the dataset) you won't be able to find it.

This means "can we use AI to translate between humans and dolphins" is mostly a question of "how much data do you have?"

Suppose, for example that we had 1 billion hours of audio/video of humans/dolphins doing things.  In this case, AI could almost certainly find correlations like: when dolphins pick up the seashell, they make the <<dolphin word for seashell>> sound, when humans pick up the seashell they make the <<human word for seashell>> sound.  You could then do something like CLIP to find a mapping between <<human word for seashell>> and <<dolphin word for seashell>>.  The magic step here is because we use the same embedding model for video in both cases, <<seashell>> is located at the same position in both our dolphin and human CLIP models.

But notice that I am already simplifying here.  There is no such thing as <<human word for seashell>>.  Instead, humans have many different languages.  For example Papua New Guinea has over 800 languages in a land area of a mere 400k square kilometers.  Because dolphins are living in what is essentially a hunter-gatherer existence, none of the pressures (trade, empire building) that cause human languages to span widespread areas exist.  Most likely each pod of dolphins has at a minimum its own dialect. (one pastime I noticed when visiting the UK was that people there liked to compare how towns only a few miles apart had different words for the same things)

Dolphin lives are also much simpler than human lives, so their language is presumably also much simpler.  Maybe like Eskimos have 100 words for snow, dolphins have 100 words for water.  But it's much more likely that without the need to coordinate resources for complex tasks like tool-making, dolphins simply don't have as complex a grammar as humans do.  Less complex grammar means less patterns means less for the machine learning to pick up on (machine learning loves patterns).  

So, perhaps the correct analogy is: if we had a billion hours of audio/video of a particular tribe of humans and billion hours of a particular pod of dolphins we could feed it into a model like CLIP and find sounds with similar embeddings in both languages.  As pointed out in other comments, it would help if the humans and dolphins were doing similar things, so for the humans you might want to pick a group that focused on underwater activities.

In reality (assuming AGI doesn't get there first, which seems quite likely), the fastest path to human-dolphin translation will take a hybrid approach.  AI will be used to identify correlations in dolphin language.  For example this study that claims to have identified vowels in whale speech.  Once we have a basic mapping: dolphin sounds -> symbols humans can read, some very intelligent and very persistent human being will stare at those symbols, make guesses about what they mean, and then do experiments to verify those guesses.  For example, humans might try replaying the sounds they think represent words/sentences to dolphins and seeing how they respond.  This closely matches how new human languages are translated: a human being lives in contact with the speakers of the language for an extended period of time until they figure out what various words mean.

What would it take for an only-AI approach to replicate the path I just talked about (AI generates a dictionary of symbols that a human then uses to craft a clever experiment that uses the least amount of data possible)?  Well, it would mean overcoming the data inefficiency of current machine learning algorithms.  Comparing how many "input tokens" it takes to train a human child vs GPT-3, we can estimate that humans are ~1000x more data efficient than modern AI techniques.  

Overcoming this barrier will likely require inference+search techniques where the AI uses a statistical model to "guess" at an answer and then checks that answer against a source of truth.  One important metric to watch is the ARC prize, which intentionally has far less data than traditional machine learning techniques require.  If ARC is solved, it likely means that AI-only dolphin-to-human translation is on its way (but it also likely means that AGI is immanent).

So, to answer your original question: "Could we use current AI methods to understand dolphins?"  Yes, but doing so would require an unrealistically large amount of data and most likely other techniques will get there sooner.

Comment by Logan Zoellner (logan-zoellner) on What can we learn from insecure domains? · 2024-11-02T23:40:01.272Z · LW · GW

Plausible something between 5 and 100 stories will taxonomize all the usable methods and you will develop a theory through this sort of investigation.

 

That sounds like something we should work on, I guess.

Comment by Logan Zoellner (logan-zoellner) on What can we learn from insecure domains? · 2024-11-02T21:49:03.541Z · LW · GW

plus you are usually able to error-correct such that a first mistake isn't fatal."

 

This implies the answer is "trial and error", but I really don't think the whole answer is trial and error.  Each of the domains I mentioned has the problem that you don't get to redo things.  If you send crypto to the wrong address it's gone.  People routinely type their credit card information into a website they've never visited before and get what they wanted.  Global thermonuclear war didn't happen.  I strongly predict that when LLM agents come out, most people will successfully manage to use them without first falling for a string of prompt-injection attacks and learning from trial-and-error what prompts are/aren't safe.

Humans are doing more than just trial and error, and figuring out what it is seems important.

Comment by Logan Zoellner (logan-zoellner) on What can we learn from insecure domains? · 2024-11-02T21:10:50.705Z · LW · GW

and then trying to calibrate to how much to be scared of "dangerous" stuff doesn't work.

 

Maybe I was unclear in my original post, because you seem confused here.  I'm not claiming the thing we should learn is "dangerous things aren't dangerous".  I'm claiming: here are a bunch of domains that have problems of adverse selection and inability to learn from failure, and yet humans successfully negotiate these domains. We should figure out what strategies humans are using and how far they generalize because this is going to be extremely important  in the near future.

Comment by Logan Zoellner (logan-zoellner) on What can we learn from insecure domains? · 2024-11-02T11:54:34.052Z · LW · GW

That was a lot of words to say "I don't think anything can be learned here".

Personally, I think something can be learned here.

Comment by Logan Zoellner (logan-zoellner) on What can we learn from insecure domains? · 2024-11-02T11:41:15.393Z · LW · GW

MAD is obviously governed by completely different principles than crypto is

 

Maybe this is obvious to you.  It is not obvious to me. I am genuinely confused what is going on here.  I see what seems to be a pattern: dangerous domain -> basically okay.  And I want to know what's going on.

Comment by Logan Zoellner (logan-zoellner) on What can we learn from insecure domains? · 2024-11-02T11:16:25.271Z · LW · GW

It's easy to write "just so" stories for each of these domains: only degens use crypto, credit card fraud detection makes the internet safe, MAD happens to be a stable equilibrium for nuclear weapons.

These stories are good and interesting, but my broader point is this just keeps happening.  Humans invent an new domain that common sense tells you should be extremely adversarial and then successfully use it without anything too bad happening.

I want to know what is the general law that makes this the case.

Comment by Logan Zoellner (logan-zoellner) on What can we learn from insecure domains? · 2024-11-02T11:10:48.458Z · LW · GW

The insecure domains mainly work because people have charted known paths, and shown that if you follow those paths your loss probability is non-null but small.

 

I think this is a big part of it, humans have some kind of knack for working in dangerous domains successfully.  I feel like an important question is: how far does this generalize?  We can estimate the IQ gap between the dumbest person who successfully uses the internet (probably in the 80's) and the smartest malware author (got to be at least 150+).  Is that the limit somehow, or does this knack extend across even more orders of magnitude?

If imagine a world where 100 IQ humans are using an internet that contains malware written by 1000 IQ AGI, do humans just "avoid the bad parts"?  What goes wrong exactly, and where?

Comment by Logan Zoellner (logan-zoellner) on What can we learn from insecure domains? · 2024-11-02T10:59:45.232Z · LW · GW

Attacks roll the dice in the hope that maybe they'll find someone with a known vulnerability to exploit, but presumably such exploits are extremely temporary.

Imagine your typical computer user (I remember being mortified when running anti-spyware tool on my middle-aged parents' computer for them).  They aren't keeping things patched and up-to-date. What I find curious is how can it be the case that their computer is both: filthy with malware and they routinely do things like input sensitive credit-card/tax/etc information into said computer.

but if it turns out to be hopelessly insecure, I'd expect the shops to just decline using them.

My prediction is despite having glaring "security flaws" (prompt injection, etc) people will nonetheless use LLM agents for tons of stuff that common sense tells you shouldn't be doing in an insecure system.

I fully expect to live in a world where its BOTH true that: Pilny the Liberator can PWN any LLM agent in minutes AND people are using LLM agents to order 500 chocolate cupcakes on a daily basis.

I want to know WHAT IS IT that makes it so things can be both deeply flawed and basically fine simultaneously.

Comment by Logan Zoellner (logan-zoellner) on Why is there Nothing rather than Something? · 2024-10-28T04:48:37.943Z · LW · GW

I can just meh my way out of thinking more than 30s on what the revelation might be, the same way Tralith does

 

I'm glad you found one of the characters sympathetic.  Personally I feel strongly both ways, which is why I wrote the story the way that I did.

Comment by Logan Zoellner (logan-zoellner) on Overview of strong human intelligence amplification methods · 2024-10-17T23:11:06.337Z · LW · GW

No, I think you can keep the data clean enough to avoid tells.

 

What data?  Why not just train it on literally 0 data (muZero style)? You think it's going to derive the existence of the physical world from the Peano Axioms? 

Comment by Logan Zoellner (logan-zoellner) on Most arguments for AI Doom are either bad or weak · 2024-10-14T16:34:57.266Z · LW · GW

If you think without contact with reality, your wrongness is just going to become more self-consistent.

 

Please! I'm begging you! Give me some of this contact with reality!  What is the evidence you have seen and I have not? Where?

Comment by Logan Zoellner (logan-zoellner) on Most arguments for AI Doom are either bad or weak · 2024-10-14T14:26:01.925Z · LW · GW

I came and asked "the expert concensus seems to be that AGI doom is unlikely.  This is the best argument I am aware of and it doesn't seem very strong.  Are there any other arguments?"

 

Responses I have gotten are:

  • I don't trust the experts, I trust my friends
  • You need to read the sequences
  • You should rephrase the argument in a way that I like

And 1 actual attempt at giving an answer (which unfortunately includes multiple assumptions I consider false or at least highly improbable)

If I seem contrarian, it's because I believe that the truth is best uncovered by stating one's beliefs and then critically examining the arguments.  If you have arguments or disagree with me fine, but saying "you're not allowed to think about this, you just have to trust me and my friends" is not a satisfying answer.

Comment by Logan Zoellner (logan-zoellner) on Most arguments for AI Doom are either bad or weak · 2024-10-14T14:03:44.435Z · LW · GW

"Can you explain in a few words why you believe what you believe"

 

"Please read this 500 pages of unrelated content before I will answer your question"

 

No.

Comment by Logan Zoellner (logan-zoellner) on Most arguments for AI Doom are either bad or weak · 2024-10-14T14:02:01.793Z · LW · GW

This is self-evidently true, but you (and many others) disagree

 

A fact cannot be self evidently true if many people disagree with it. 

Comment by Logan Zoellner (logan-zoellner) on Most arguments for AI Doom are either bad or weak · 2024-10-13T16:47:04.448Z · LW · GW

If your answer depends on me reading 500 pages of EY fan-fiction, it's not a good answer.

Comment by Logan Zoellner (logan-zoellner) on Most arguments for AI Doom are either bad or weak · 2024-10-13T16:46:13.104Z · LW · GW

Making a point-by-point refutation misses the broader fact that any long sequence of argument like this adds up to very little evidence.

Even if you somehow convince me that each of your (10) arguments was like 75% true, they're still going to add up to nothing because 

Unless you can summarize you argument in at most 2 sentences (with evidence), it's completely ignoreable.

Comment by Logan Zoellner (logan-zoellner) on Most arguments for AI Doom are either bad or weak · 2024-10-13T16:42:22.289Z · LW · GW

metaculus did a study where they compared prediction markets with a small number of participants to those with a large number and found that you get most of the benefit at relative small numbers (10 or so).  So if you randomly sample 10 AI experts and survey their opinions, you're doing almost as good as a full prediction market.  The fact that multiple AI markets (metaculus, manifold) and surveys all agree on the same 5-10% suggests that none of these methodologies is wildly flawed.

Comment by Logan Zoellner (logan-zoellner) on Most arguments for AI Doom are either bad or weak · 2024-10-13T12:51:37.977Z · LW · GW

No one.  I trust prediction markets far more than any single human being.

Comment by Logan Zoellner (logan-zoellner) on Most arguments for AI Doom are either bad or weak · 2024-10-13T12:50:36.225Z · LW · GW

I realize I should probably add a 3rd category of argument: arguments which assume a specific (unlikely) path for AGI development and then argue this particular path is bad.

This is an improvement over "bad" arguments (in the sense that it's at least a logical sequence of argumentation rather than a list of claims), but unlikely to move the needle for me, since the specific sequence involved is unlikely to be true.

Ideally, what one would like to do is "average over all possible paths for AGI development".  But I don't know of a better way to do that average than to just use an expert-survey/prediction market.

 

Let's talk in detail about why this particular path is improbable, by trying to write it as a sequence of logical steps:

  1. "Right now, every powerful intelligence (e.g. nation-states) is built out of humans, so the only way for such organizations to thrive is to make sure the constituent humans thrive"
    1. this is empirically false. genocide and slavery have been the norm across human history.  We are currently in the process of modifying our atmosphere in a way that is deadly to humans and almost did so recently in the past
  2. "AI is going to loosen up this default pull."
    1. this assumes a specific model for AI: humans use the AI to do highly adversarial search and then blindly implement the results.  Suppose instead humans only implement the results after verifying them, or require the AI to provide a mathematical proof that "this action won't kill all humans"
  3. "There's lots of places where we'd expect adversarial searches to be incentivized"
    1. none of these are unique to AGI.  We have the same problem with nuclear weapons, biological weapons and any number of other technologies.  AGI is uniquely friendly in the sense that at first it's merely software: it has no impact on the real world unless we choose to let it
  4. "The current situation for war/national security is already super precarious due to nukes, and I tend to reason by an assumption that if a nuke is used again then that's going to be the end of society. "
    1. How is this an argument for AGI risk?
  5. "and it's unclear how to generalize this to other case. For instance, outlawing propaganda would seem to interfere with free speech"
    1. Something being unclear is not an argument for doom.  At best it's a restatement of my original weak argument: AGI will be powerful, therefore it might be bad
  6. "So a plausible model seems to me to be, people are gradually developing ways of integrating computers with the physical world, by giving them deeper knowledge of how the world works and more effective routines for handling small tasks. "
    1. even if this is a plausible model, it is by no means the only model or the default path.
  7. "but as it gets more and more robust and well-understood, it becomes more and more feasible to run searches over it to find more powerful activities."
    1. it is equally plausible (in my opinion more so) that there is a limit to how far ahead intelligence can predict and science is fundamentally rate-limited by the speed of physical experimentation
  8. "thus can just "do the thing" you're asking them to, but in adversarial circumstances, the adversaries will exploit your weakness "
    1. why are we assuming the adversaries will exploit your weakness?  Why not assume we build corrigible AI that tries to help you instead.
  9. "similar to a dangerous utility-maximizer."
    1. A utility-maximizer is a specific design of AGI, and moreover totally different from the next-token-prediction AIs that currently exists.  Why should I assume that this particular design will suddenly become popular (despite the clear disadvantages that you have already stated)?
Comment by Logan Zoellner (logan-zoellner) on Most arguments for AI Doom are either bad or weak · 2024-10-13T12:33:23.339Z · LW · GW

I mostly try to look around to who's saying what and why and find that the people I consider most thoughtful tend to be more concerned and take "the weak argument" or variations thereof very seriously

 

We apparently have different tastes in "people I consider thoughtful".  "Here are some people I like and their opinions" is an argument unlikely to convince me (a stranger).

Comment by Logan Zoellner (logan-zoellner) on Most arguments for AI Doom are either bad or weak · 2024-10-13T12:17:53.804Z · LW · GW

my apologizes.  that is in a totally different thread, which I will respond to.

Comment by Logan Zoellner (logan-zoellner) on Most arguments for AI Doom are either bad or weak · 2024-10-13T12:05:46.688Z · LW · GW

narrower categories like AGI which individually have high probabilities of being destructive.

 

If AGI has a "high probably of being destructive", show me the evidence. What amazingly compelling argument has led you to have beliefs that are wildly different from the expert-consensus?

Comment by Logan Zoellner (logan-zoellner) on Most arguments for AI Doom are either bad or weak · 2024-10-13T11:58:49.994Z · LW · GW

My claim is not that the tail risks of AGI are important, my claim is that AGI is a tail risk of technology. 

 

Okay, I'm not really sure why we're talking about this, then.

Consider this post a call to action of the form "please provide reasons why I should update away from the expert-consensus that AGI is probably going to turn out okay"

I agree talking about how we could handle technological changes as a broader framework is a meaningful and useful thing to do.  I'm just don't think it's related to this post.

Comment by Logan Zoellner (logan-zoellner) on Most arguments for AI Doom are either bad or weak · 2024-10-13T11:34:40.858Z · LW · GW

but it just shows the percentage of years with wars without taking the severity of the wars into account.

 

If you look at the probability of dying by violence, it shows a similar trend

This stuff is long-tailed, so past average is no indicator of future averages.

I agree that tail risks are important.  What I disagree with is that only tail risks from AGI are important.  If you wish to convince me that tail-risks from AGI are somehow worse than (nuclear war, killer drone swarms, biological weapons, global warming, etc) you will need evidence.  Otherwise, you have simply recreated the weak argument (which I already agree with) "AGI will be different, therefore it could be bad".

Comment by Logan Zoellner (logan-zoellner) on Most arguments for AI Doom are either bad or weak · 2024-10-13T11:04:32.090Z · LW · GW

but that also means the market itself tells you much less than a "true" prediction market would

 

This doesn't exempt you from the fact that if your prediction is wildly different from what experts predict you should be able to explain your beliefs in a few words.

Comment by Logan Zoellner (logan-zoellner) on Most arguments for AI Doom are either bad or weak · 2024-10-13T11:01:14.752Z · LW · GW

Has it? I'm under the impression technology has lead to much more genocide and war.

 

You're impression is wrong.  Technology is (on average) a civilizing force.

Which political/religious beliefs?

 

I'm not going into details about which people want to murder me and why for the obvious reason.  You can probably easily imagine any number of groups whose existence is tolerated in America but not elsewhere.

Comment by Logan Zoellner (logan-zoellner) on Most arguments for AI Doom are either bad or weak · 2024-10-13T10:57:15.496Z · LW · GW

So losing cosmic wealth is sufficient to qualify an outcome as doom

 

My utility function roughly looks like:

  • my survival
  • the survival of the people I know and care about
  • the distant future is populated by beings that are in some way "descended" from humanity and share at least some of the values (love, joy, curiosity, creativity) that I currently hold

Basically, if I sat down with a human from 10,000 years ago, I think there's a lot we would disagree about, but at the end of the day I think they would get the feeling that I'm an "okay person".  I would like to imagine the same sort of thing holding for whatever follows us.

I don't find the hair-splitting arguments like "what if the AGI takes over the universe but leaves Earth intact" particularly interesting except insofar as it allows for all 3 of the above.  I also don't think most people have a huge faction of P(~doom) on such weird technicalities.

Comment by Logan Zoellner (logan-zoellner) on Most arguments for AI Doom are either bad or weak · 2024-10-13T10:45:15.585Z · LW · GW

If it's a guess, the base rate is key.

 

If your base rate is strongly different from the expert consensus there should be some explainable reason for the difference.  

If the reason for the difference is "I thought a lot about it, but I can't explain the details to you", I will happily add yours to the list of "bad arguments".

A good argument should be:

  • simple
  • backed up by facts that are either self-evidently true or empirically observable

If you give me a list of "100 things make me nervous", I can just as easily give you "a list of 100 things that make me optimistic".

Comment by Logan Zoellner (logan-zoellner) on Most arguments for AI Doom are either bad or weak · 2024-10-13T10:40:28.142Z · LW · GW

50% of the humans currently on Earth want kill me because of my political/religious beliefs.  My survival depends on the existence of a nice game-theory equilibrium, not because of the benevolence of other humans.  I agree (note the 1 bit) that the new game-theory equilibrium after AGI could be different.  However, historically, increasing the level of technology/economic growth has led to less genocide/war/etc, not more.

Comment by Logan Zoellner (logan-zoellner) on Most arguments for AI Doom are either bad or weak · 2024-10-13T10:37:04.573Z · LW · GW

it can only ever resolve to one side of the issue, so absent other considerations you should assume that it is heavily skewed to that side.

 

Prediction markets don't give a noticeably different answer from expert surveys, I doubt the bias is that bad.  Manifold isn't a "real money" market anyway, so I suspect most people are answering in good-faith.

Comment by Logan Zoellner (logan-zoellner) on Most arguments for AI Doom are either bad or weak · 2024-10-13T01:10:57.950Z · LW · GW

I don't think it's an improvement to say the same thing with more words.  It gives the aura of sophistication without actually improving on the reasoning.

Comment by Logan Zoellner (logan-zoellner) on Most arguments for AI Doom are either bad or weak · 2024-10-12T22:45:05.154Z · LW · GW

so, do you nonetheless expect humans to still control the world? 

 

I personally don't control the world now.  I (on average) expect to be treated about as well by our new AGI overlords as I am treated by the current batch of rulers.

Comment by Logan Zoellner (logan-zoellner) on Most arguments for AI Doom are either bad or weak · 2024-10-12T22:37:11.765Z · LW · GW

Worth distinguishing doom in the sense of extinction and doom in the sense of existential risk short of extinction, getting most of the cosmic wealth taken away. I have very high doom expectations in the sense of loss of cosmic wealth, but only 20-40% for extinction. 

 

By doom I mean the universe gets populated by AI with no moral worth (e.g. paperclippers).  I expect humans to look pretty different in a century or two even if AGI was somehow impossible, so I don't really care about preserving status-quo humanity.

Comment by Logan Zoellner (logan-zoellner) on Most arguments for AI Doom are either bad or weak · 2024-10-12T22:28:43.163Z · LW · GW

My 90/10 timeframe for when AGI gets built is 3 years-15 years. And most of my probability mass for PDoom is on the shorter end of that.  If we have the current near-human-ish level AI around for another decade, I assume we'll figure out how to control it.

my p(Doom|AGI after 2040) is <1%