post by [deleted] · · ? · GW · 0 comments

This is a link post for

0 comments

Comments sorted by top scores.

comment by TAG · 2023-05-17T18:03:32.589Z · LW(p) · GW(p)

Who do you think is proposing the strong form?

Replies from: Adrien Sicart
comment by Adrien Sicart · 2023-05-19T17:11:33.843Z · LW(p) · GW(p)

Arbital is where I found this specific wording for the strong form.

Since I wrote this (two weeks), I am working on addressing some lesser forms as presented in Stuart Armstrong’s [LW · GW] article at section 4.5.

Replies from: TAG
comment by TAG · 2023-07-08T21:32:52.199Z · LW(p) · GW(p)

Arbital says:

The strong form of the Orthogonality Thesis says that there’s no extra difficulty or complication in creating an intelligent agent to pursue a goal, above and beyond the computational tractability of that goal

You say:

“To whatever extent you (or a superintelligent version of you) could figure out how to get a high-U outcome if aliens offered to pay you huge amount of resources to do it, the corresponding agent that terminally prefers high-U outcomes can be at least that good at achieving U.”

I don't see the connection.

Replies from: Adrien Sicart
comment by Adrien Sicart · 2023-07-19T07:49:27.762Z · LW(p) · GW(p)

This is actually a quote from Arbital. Their article explain the connection.

comment by Anon User (anon-user) · 2023-05-17T16:30:41.302Z · LW(p) · GW(p)

Wait, if Clip-maniac finds itself in a scenario where Clippy would achieve higher U then itself, the rational thing for it would be to self-modify into Clippy, and the Strong Form would still hold, wouldn't it?

Replies from: Adrien Sicart
comment by Adrien Sicart · 2023-05-19T16:59:34.333Z · LW(p) · GW(p)

We can consider the « Stronger Strong Form » about « Eternally Terminal » Agents, which CANNOT change, does not hold, then :-)

Replies from: anon-user
comment by Anon User (anon-user) · 2023-05-27T19:05:58.457Z · LW(p) · GW(p)

Well, yeah, if you specifically choose a crippled version of the high-U agent that is somehow unable to pursue the winning strategy, it will loose - but IMHO that's not what the discussion here should be about.

Replies from: Adrien Sicart
comment by Adrien Sicart · 2023-06-04T13:07:04.284Z · LW(p) · GW(p)

The discussion here is about the strong form. Proving that a « terminal » agent is crippled is exactly what is needed to prove the strong form does not hold.

Replies from: anon-user
comment by Anon User (anon-user) · 2023-07-08T21:22:17.146Z · LW(p) · GW(p)

Maybe there is a better way to put it - SFOT holds for objective functions/environments that only depend on the agent I/O behavior. Once the agent itself is embodied, then yes, you can use all kinds of diagonal tricks to get weird counterexamples. Implications for alignment - yes, if your agent is fully explainable and you can transparently examine it's workings, chances are that alignment is easier. But that is kind of obvious without having to use SFOT to reason about it.

Edited to add: "diagonal tricks" above refers to things in the conceptual neighborhood of https://en.m.wikipedia.org/wiki/Diagonal_lemma

Replies from: Adrien Sicart
comment by Adrien Sicart · 2023-07-18T20:48:49.605Z · LW(p) · GW(p)

My point is that SFOT likely never work in any environment relevant to AI Alignement, where such diagonal methods show any Agent with a fixed Objective Function is crippled by an adequate counter.

Therefore SFOT should not be used when exploring AI alignement.

Can SFOT hold in ad-hoc limited situations that do not represent the real world? Maybe, but that was not my point.

Finding one counter-example that shows SFOT does not hold in a specific setting (Clippy in my scenario) proves that it does not hold in general, which was my goal.