AI Safety without Alignment: How humans can WIN against AI

vicchain

AI Safety without Alignment: How humans can WIN against AI

post by vicchain (vic-cheng) · 2023-06-29T17:53:03.194Z · LW · GW · 1 comments

  Overall idea:
  Assumptions:
  Example Idea 1 (only require Assumption 1) "Escape":
  Example Idea 2 (require all 3 assumptions) "Lock":
  Pros of these ideas:
  Cons:
  Note: 
None
1 comment

Here are some ideas on how we can restrict Super-human AI (SAI) without built-in Alignment:

Overall idea:

We need to use unbreakable first principles, especially laws of physics to restrict or destroy SAI. We can leverage the asymmetries in physics, such as the speed of light limit, time causality, entropy, uncertainty, etc.

Assumptions:

1. Super-Intelligence cannot break the laws of physics.
2. SAI is silicon-based.
3. Anything with lower-intelligence than the current Collective Human Intelligence (CHI) cannot destroy all humans.

Example Idea 1 (only require Assumption 1) "Escape":

We send human biological information (or any information that we want to protect from SAI) at/near the speed of light to the Cosmic Event Horizon. Then after some time, the information will be impossible to be reached by anything coming from Earth due to the accelerated expansion of the universe, protecting the information from our SAI.

Of course, we could never reach this information either but at least we know humans persisted and "live on" in the universe. Last resort.

Example Idea 2 (require all 3 assumptions) "Lock":

We figure out the lower-bound of computational speed (or precision) to reach our current CHI (not trivial). Lock the global computational ability at that lower-bound for silicon, using some sort of interference device (uncertainty principles). So the intelligence of AI will never surpass our current CHI. Ideally, this device can be implemented and keep running easily so that we don't need to use political tools such as a moratorium.

Pros of these ideas:

Black-box AI safety. No need to depend that everyone has the intention or ability to build intrinsically safe AI.
Turning AI safety into trackable engineering problems.
We can iterate many times because restricting a weak AI (today) with first principles could potentially generalize to a SAI.
Don't rely on policymakers or AI companies (in theory. We still need resources to implement these ideas but it can come from other sources.)
Don't rely on AI to control AI
Fundamentally frame the problem as a war of biological human vs. silicon AI, so we can have more tools in our hands to be more "ruthless" against AI

Cons:

Technically very hard to implement
Might take longer time than SAI development

These are probably some of the worst ideas in this line of thinking so it would be wonderful if you all can give some feedback on the framework and assumptions or build on top of these ideas.

Note:

The 3 assumptions could be very wrong and I would love to understand how, especially the first one.
I am not against AI Alignment. And you can argue the idea above is still one type of external Alignment.
I use SAI instead of AGI to emphasize "Super-human".
I think human civilizations can progress pretty well without SAI just with weak AI or even no AI, just maybe slower. I expect many people to disagree with me on this.

1 comments

Comments sorted by top scores.

comment by Raemon · 2023-06-29T17:57:56.777Z · LW(p) · GW(p)

You know, I started reading this expecting it to be nonsense, but, I think maybe I'm moderately sold on "beam out high fidelity info about humanity at lightspeed and hope someone picks it up" being something that's... someone's comparative advantage. Like, this seems like the sort of thing the Long Now Foundation would be into and they weren't really strategically focused on x-risk anyway.

AI Safety without Alignment: How humans can WIN against AI

Contents

Overall idea:

Assumptions:

Example Idea 1 (only require Assumption 1) "Escape":

Example Idea 2 (require all 3 assumptions) "Lock":

Pros of these ideas:

Cons:

Note:

1 comments