harfe's Shortform

harfe

harfe's Shortform

post by harfe · 2022-09-01T22:02:25.267Z · LW · GW · 21 comments

21 comments

21 comments

Comments sorted by top scores.

comment by harfe · 2025-02-11T18:19:02.707Z · LW(p) · GW(p)

A potentially impactful thing: someone competent runs as a candidate for the 2028 election on an AI notkilleveryoneism^[1] platform. Maybe even two people should run, one for the democratic primary, and one in the republican primary. While getting the nomination is rather unlikely, there could be lots of benefits even if you fail to gain the nomination (like other presidential candidates becoming sympathetic to AI notkilleveryoneism, or more popularity of AI notkilleveryoneism in the population, etc.)

On the other hand, attempting a presidential run can easily backfire.

A relevant previous example to this kind of approach is the 2020 campaign by Andrew Yang, which focussed on universal basic income (and downsides of automation). While the campaign attracted some attention, it seems like it didn't succeed in making UBI a popular policy among democrats.

Not necessarily using that name. ↩︎

Replies from: ete, mateusz-baginski, Mitchell_Porter

↑ comment by plex (ete) · 2025-02-14T20:46:11.724Z · LW(p) · GW(p)

Yeah, this seems worth a shot. If we do this, we should do our own pre-primary in like mid 2027 to select who to run in each party, so that we don't split the vote and also so that we select the best candidate.

Someone I know was involved in a DIY pre-primary in the UK which unseated an extremely safe politician, and we'd get a bunch of extra press while doing this.

Replies from: harfe

↑ comment by harfe · 2025-02-14T21:21:23.875Z · LW(p) · GW(p)

Mid 2027 seems too late to me for such a candidate to start the official campaign.

For the 2020 presidential election, many democratic candidates announced their campaign in early 2019, and Yang already in 2017. Debates happened already in June 2019. As a likely unknown candidate, you probably need a longer run time to accumulate a bit of fame.

Replies from: ete

↑ comment by plex (ete) · 2025-02-14T22:49:27.753Z · LW(p) · GW(p)

oh yup, sorry, I meant mid 2026, like ~6 months before the primary proper starts. But could be earlier.

↑ comment by Mateusz Bagiński (mateusz-baginski) · 2025-02-12T06:46:17.079Z · LW(p) · GW(p)

Did Yang's campaign backfire in some way?

Replies from: harfe

↑ comment by harfe · 2025-02-12T13:05:59.328Z · LW(p) · GW(p)

I wouldn't say so, I don't think his campaign has made UBI advocacy more difficult.

But an AI notkilleveryoneism campaign seems more risky. It could end up making the worries look silly, for example.

↑ comment by Mitchell_Porter · 2025-02-12T02:50:13.405Z · LW(p) · GW(p)

What would their actual platform be?

Replies from: harfe

↑ comment by harfe · 2025-02-12T03:40:10.592Z · LW(p) · GW(p)

Their platform would be whatever version and framing of AI notkilleveryoneism the candidates personally endorse, plus maybe some other smaller things. They should be open that they consider the potential human disempowerment or extinction to be the main problem of our time.

As for the concrete policy proposals, I am not sure. The focus could be on international treaties, or banning or heavy regulation of AI models who were trained with more than a trillion quadrillion (10^27) operations. (not sure I understand the intent behind your question).

Replies from: Mitchell_Porter

↑ comment by Mitchell_Porter · 2025-02-12T09:27:44.056Z · LW(p) · GW(p)

the intent behind your question

I guess I'm expressing doubt about the viability of wise or cautious AI strategies, given our new e/acc world order, in which everyone who can, is sprinting uninhibitedly towards superintelligence.

My own best hope at this point is that someone will actually solve the "civilizational superalignment" problem of CEV, i.e. learning how to imbue autonomous AI with the full set of values (whatever they are) required to "govern" a transhuman civilization in a way that follows from the best in humanity, etc. - and that this solution will be taken into account by whoever actually wins the race to superintelligence.

This is a strategy that can be pursued within the technical communities that are actually designing and building frontier AI. But the public discussion of AI is currently hopeless, it's deeply unrealistic about the most obvious consequences of creating new intelligences that are smarter, faster, and more knowledgeable than any human, namely, that they are going to replace human beings as masters of the Earth, and eventually that they will simply replace the entire human race.

Replies from: Davidmanheim, MichaelDickens

↑ comment by Davidmanheim · 2025-02-12T13:21:49.456Z · LW(p) · GW(p)

My own best hope at this point is that someone will actually solve the "civilizational superalignment" problem of CEV, i.e. learning how to imbue autonomous AI with the full set of values (whatever they are) required to "govern" a transhuman civilization in a way that follows from the best in humanity, etc. - and that this solution will be taken into account by whoever actually wins the race to superintelligence.

Sounds like post-hoc justification for not even trying to stop something bad by picking a plan with zero percent chance of success, instead of further thought and actually trying to do the impossible [LW · GW].

Replies from: Mitchell_Porter

↑ comment by Mitchell_Porter · 2025-02-12T16:45:39.697Z · LW(p) · GW(p)

Do you perceive the irony in telling me my hope has "zero percent chance" of happening, then turning around and telling me to do the impossible? I guess some impossibles are more impossible than others.

In fact I've spent 30 years attempting various forms of "the impossible" (i.e. things of unknown difficulty that aren't getting done), it's kind of why I'm chronically unemployed and rarely have more than $2000 in the bank. I know how to be audaciously ambitious in unpromising circumstances and I know how to be stubborn about it.

You like to emphasize the contingency of history as a source of hope. Fine. Let me tell you that the same applies to the world of intellectual discovery, which I know a lot better than I know the world of politics. Revolutionary advances in understanding can and do occur, and sometimes on the basis of very simple but potent insights.

Replies from: Davidmanheim

↑ comment by Davidmanheim · 2025-02-13T11:42:19.617Z · LW(p) · GW(p)

Sorry if this was unclear, but there's a difference between plans which work conditioning on an impossibility, and trying to do the impossible. For example, building a proof that works only if P=NP is true is silly in ways that trying to prove P=NP is not. The second is trying to do the impossible, the first is what I was dismissive of.

Replies from: Mitchell_Porter

↑ comment by Mitchell_Porter · 2025-02-13T17:02:46.542Z · LW(p) · GW(p)

So what's the impossible thing - identifying an adequate set of values? instilling them in a superintelligence?

Replies from: Davidmanheim

↑ comment by Davidmanheim · 2025-02-14T01:55:30.408Z · LW(p) · GW(p)

Yes, doing those things in ways that a capable alignment research can't find obvious failure modes for. (Which may not be enough, given that they.aren't superintelligences - but is still a bar which no proposed plan comes close to passing.)

Replies from: Mitchell_Porter

↑ comment by Mitchell_Porter · 2025-02-14T09:58:56.631Z · LW(p) · GW(p)

Is there someone you regard as the authority on why it can't be done? (Yudkowsky? Yampolskiy?)

Because what I see, are not problems that we know to be unsolvable, but rather problems that the human race is not seriously trying to solve.

Replies from: Davidmanheim

↑ comment by Davidmanheim · 2025-02-14T13:57:01.097Z · LW(p) · GW(p)

I think that basically everyone at MIRI, Yampolskiy, and a dozen other people all have related and strong views on this. You're posting on Lesswrong, and I don't want to be rude, but I don't know why I'd need to explain this instead of asking you to read the relevant work.

Replies from: Mitchell_Porter

↑ comment by Mitchell_Porter · 2025-02-15T23:04:11.780Z · LW(p) · GW(p)

I asked because I'm talking with you and I wanted to know *your* reasoning as to why a technical solution to the alignment of superintelligence is impossible. It seems to be "lots of people see lots of challenges and they are too many to overcome, take it up with them".

But it's just a hard problem, and the foundations are not utterly mysterious. Humanity understands quite a lot about the physical and computational nature of our reality by now.

Maybe it would be more constructive to ask how you envisage achieving the political impossible of stopping the worldwide AI race, since that's something that you do advocate.

↑ comment by MichaelDickens · 2025-02-12T16:52:48.093Z · LW(p) · GW(p)

I guess I'm expressing doubt about the viability of wise or cautious AI strategies, given our new e/acc world order, in which everyone who can, is sprinting uninhibitedly towards superintelligence.

e/acc does not poll well and there is widespread popular support for regulating AI (see AIPI polls). If the current government favors minimal regulations, that's evidence that an AI safety candidate is more likely to succeed, not less.

(Although I'm not sure that follows because I think the non-notkilleveryonism variety of AI safety is more popular. Also Musk's regulatory plan is polling well and I'm not sure if it differs from e.g. Vance's plan.)

Replies from: harfe

↑ comment by harfe · 2025-02-12T17:20:20.752Z · LW(p) · GW(p)

Also Musk's regulatory plan is polling well

What plan are you referring to? Is this something AI safety specific?

Replies from: MichaelDickens

↑ comment by MichaelDickens · 2025-02-13T18:02:14.531Z · LW(p) · GW(p)

I don't know what the regulatory plan is, I was just referring to this poll, which I didn't read in full, I just read the title. Reading it now, it's not so much a plan as a vision, and it's not so much "Musk's vision" as it is a viewpoint (that the poll claims is associated with Musk) in favor of regulating the risks of AI. Which is very different from JD Vance's position; Vance's position is closer to the one that does not poll well.

comment by harfe · 2022-09-01T21:57:05.488Z · LW(p) · GW(p)

PreDCA might not lead to CEV.

My summarized understanding of preDCA: preDCA has a bunch of hypotheses how the universe might be like. For each hypothesis, it detects which computations are running in the universe, then figures out which of these computations is the "user", then figures out likely utility functions of the user. Then it takes actions to increase a combination of these utility functions (possibly using something like maximal lotteries, rather than averaging utility functions). There are also steps to ignore certain hypotheses which might be malign, but I will ignore this issue here.

Lets add more detail to the "figuring out the utility function of the user" step. The probability that an agent $G$ has utility function $U$ is proportional to $\frac{2^{-} K (U)}{P r_{π \in ξ} ([U (⌈ G ⌉, π) \leq U (⌈ G ⌉, G *)])},$ where $K (U)$ is the complexity of the utility function $U$ , and $P r_{π \in ξ} [U (⌈ G ⌉, π) \geq U (⌈ G ⌉, G *)]$ is the probability that a random policy $π$ (according to a distribution $ξ$ over possible policies) is better than the policy $G *$ of the agent $G$ .

So, a utility function is more likely if it is less complex, and if the agent's policy is better at satisfying $U$ than a random policy.

How would a human's utility function according to preDCA compare with CEV?

My intuition is that preDCa falls short on the "extrapolated" part in "Coherent extrapolated volition". PreDCA would extract a utility function from the flawed algorithm implemented by a human brain. This utility function would be coherent, but might not be extrapolated: The extrapolated utility function (ie what humans would value if they would be much smarter) is probably more complicated to formulate than the un-extrapolated utility function.

For example, the policy implemented by an average human brain probably contributes more to total human happiness than most other policies. Lets say $U_{1}$ is an utility function that values human happiness as measured by certain chemical states in the brain, and $U_{2}$ is "extrapolated happiness" (where "putting all humans brains in vat to make it feel happy" would not be good for $U_{2}$ ). Then it is plausible that $K (U_{1}) < K (U_{2})$ . But the policy implemented by an average human brain would do approximately equally well on both utility functions. Thus, $P r [U_{1}] > P r [U_{2}]$ .

The concerns might also apply to a similar proposal [LW · GW].

harfe's Shortform

Contents

21 comments