Posts

Sufficiently many Godzillas as an alignment strategy 2022-08-28T00:08:02.666Z
What if we solve AI Safety but no one cares 2022-08-22T05:38:02.894Z

Comments

Comment by 142857 on All AGI Safety questions welcome (especially basic ones) [May 2023] · 2023-05-09T06:35:50.753Z · LW · GW

Given an aligned AGI, to what extent are people ok with letting the AGI modify us? Examples of such modifications include (feel free to add to the list):

  • Curing aging/illnesses
  • Significantly altering our biological form
  • Converting us to digital life forms
  • Reducing/Removing the capacity to suffer
  • Giving everyone instant jhanas/stream entry/etc.
  • Altering our desires to make them easier to satisfy
  • Increasing our intelligence (although this might be an alignment risk?)
  • Decreasing our intelligence
  • Refactoring our brains entirely

What exact parts of being "human" do we want to preserve?

Comment by 142857 on My Assessment of the Chinese AI Safety Community · 2023-04-26T00:15:30.695Z · LW · GW

A "moonshot idea" I saw brought up is getting Yudkowsky's Harry Potter fanfiction translated into Chinese (please never ever do this).

This has already been done, and has pretty good reviews and some discussions

I've looked through the EA/Rationalist/AI Safety forums in China

If these are public, could you post the links to them?

there is only one group doing technical alignment work in China

Do you know the name of the group, and what kinds of approaches they are taking toward technical alignment?

Comment by 142857 on All AGI Safety questions welcome (especially basic ones) [~monthly thread] · 2023-03-16T02:37:47.133Z · LW · GW

I'll use this comment to collect things I find.

LOVE (Learning Other's Values or Empowerment).

Comment by 142857 on All AGI Safety questions welcome (especially basic ones) [~monthly thread] · 2023-03-14T06:32:30.716Z · LW · GW

Are there any alignment approaches that try to replicate how children end up loving their parents (or vice versa), except with AI and humans? Alternatively, approaches that look like getting an AI to do Buddhist lovingkindness?

Comment by 142857 on Is there an Ultimate text editor? · 2022-09-12T22:47:25.386Z · LW · GW

For derendering latex in Emacs, see https://github.com/io12/org-fragtog.

For drawing images in line, you could try https://github.com/misohena/el-easydraw.

Comment by 142857 on phone.spinning's Shortform · 2022-09-12T05:10:26.419Z · LW · GW

I like this idea and think it is worth exploring. It is not even just with training new models; AGI have to worry about misalignment with every self-modification and every interaction with the environment that changes itself.

Perhaps there are even ways to deter an AGI from self-improvement, by making misalignment more likely.

Some caveats are:

  • AGI may not take alignment seriously. We already have plenty of examples of general intelligences who don't.
  • AGI can still increase its capabilities without training new models, e.g. by getting more compute
  • If an AGI decides to solve alignment before significant self-improvement, it will very likely be overtaken by other humans or AGI who don't care as much about alignment.
Comment by 142857 on phone.spinning's Shortform · 2022-09-12T04:58:23.077Z · LW · GW
Comment by 142857 on Interspecies diplomacy as a potentially productive lens on AGI alignment · 2022-08-24T21:22:35.472Z · LW · GW

Escape. Invest in space travel and escape the solar system before they arrive.
If your AI timelines are long, this may be a viable strategy for preserving the human species in the event of unaligned AGI.
In your AI timelines are short, a budget solution is to just send human brains into space and hope they will be found and revived by other powerful species (hopefully at least one of them is "benevolent").