Posts

Comments

Comment by Adrien Sicart on Rationality !== Winning · 2023-09-02T16:19:05.583Z · LW · GW

A Black Swan is better formulated as:
- Extreme Tail Event : Probabilities cannot compute in current paradigm. Its weight is p<Epsilon.
- Extreme Impact if it happens : Paradigm Revolution.
- Can be rationalised in hindsight, because there were hints. "Most" did not spot the pattern. Some may have.

If spotted a priori, one could call it a Dragon King: https://en.wikipedia.org/wiki/Dragon_king_theory

The Argument:
"Math + Evidence + Rationality + Limits makes it Rational to drop Long Tail for Decision Making"
is a prime example of an heuristic which fails into what Taleb calls "Blind Faith in Degenerate MetaProbabilities".

It is likely based on an instance of {Absence of Evidence is Evidence of Absence : Ad Ignorantiam : Logical Fallacy}

The central argument of Anti-Fragility is that Heuristics allocating some resources to Black Swans / Dragon Kings studies & contingency plans are infinitely more rational than "drop the long tail" heuristics.

Comment by Adrien Sicart on Rationality !== Winning · 2023-07-24T10:20:09.898Z · LW · GW

When it comes to rationality, the Black Swan Theory ( https://en.wikipedia.org/wiki/Black_swan_theory ) is an extremely useful test.

A truly rational paradigm should be built with anti-fragility in mind, especially towards Black Swan events which would challenge its axiomatic.

Comment by Adrien Sicart on [deleted post] 2023-07-19T07:49:27.762Z

This is actually a quote from Arbital. Their article explain the connection.

Comment by Adrien Sicart on [deleted post] 2023-07-18T20:48:49.605Z

My point is that SFOT likely never work in any environment relevant to AI Alignement, where such diagonal methods show any Agent with a fixed Objective Function is crippled by an adequate counter.

Therefore SFOT should not be used when exploring AI alignement.

Can SFOT hold in ad-hoc limited situations that do not represent the real world? Maybe, but that was not my point.

Finding one counter-example that shows SFOT does not hold in a specific setting (Clippy in my scenario) proves that it does not hold in general, which was my goal.

Comment by Adrien Sicart on [deleted post] 2023-06-04T13:07:04.284Z

The discussion here is about the strong form. Proving that a « terminal » agent is crippled is exactly what is needed to prove the strong form does not hold.

Comment by Adrien Sicart on Recursive Middle Manager Hell: AI Edition · 2023-05-22T10:09:02.215Z · LW · GW

(1) « Liking », or « desire » can be defined as « All other things equal, Agents will go to what they Desire/Like most, whenever given a choice ». Individual desire/liking/tastes vary.

(2) In Evolutionary Game Theory, in a Game where a Mitochondria-like Agent offers you choice between :

  • (Join eukaryotes) mutualistic endosymbiosis, at the cost of obeying apoptosis, or being flagged as Cancerous enemy
  • (Non eukaryotes) refusal of this offer, at the cost of being treated by the Eukariotes as a threat, or a lesser symbiote.

then that Agent is likely to win. To a rational agent, it’s a winning wager. My last publication expands on this.

Comment by Adrien Sicart on AI #12:The Quest for Sane Regulations · 2023-05-19T17:51:08.300Z · LW · GW

What would prevent a Human brain from hosting an AI?

FYI some humans have quite impressive skills:

  • Hypermnesia, random: 100k digits of Pi (Akira Haraguchi) That’s many kB of utterly random programming.
  • Hypermnesia, visual: accurate visual memory (Stephen Wiltshire, NYC Skyline memorised in 10mn)
  • Hypermnesia, language: fluency in 40+ languages (Powell Alexander Janulus)
  • High IQ, computation, etc. : countless records.

Peak human brain could act as a (memory-constrained) Universal Turing/Oracle Machine, and run a light enough AI, especially if it’s programmed in such a way that the Human Memory is its Web-like database?

Comment by Adrien Sicart on [deleted post] 2023-05-19T17:11:33.843Z

Arbital is where I found this specific wording for the strong form.

Since I wrote this (two weeks), I am working on addressing some lesser forms as presented in Stuart Armstrong’s article at section 4.5.

Comment by Adrien Sicart on [deleted post] 2023-05-19T16:59:34.333Z

We can consider the « Stronger Strong Form » about « Eternally Terminal » Agents, which CANNOT change, does not hold, then :-)

Comment by Adrien Sicart on Recursive Middle Manager Hell: AI Edition · 2023-05-19T16:53:23.809Z · LW · GW

(1) « people liking thing does not seem like a relevant parameter of design ».

This is quite a bold statement. I personally believe the mainstream theory according to which it’s easier to have designs adopted when they are liked by the adopters.

(2) Nice objection, and the observation of complex life forms gives a potential answer :

  • All healthy multicellular cells obey Apoptosis.
  • Apoptosis literally is « suicide in a way that’s easy to recycle because the organism asks you » (the source of the request can be internal via mitochondria, or external, generally leucocytes).

Given that all your cells welcome even literal kill-switch, and replacement, I firmly believe that they don’t mind surveillance either!

In complex multicellular life, the Cells that refuse surveillance, replacement, or Apoptosis, are the Cancerous Cells, and they don’t seem able to create any complex life form (Only parasitic life forms, feeding off of their host, and sometimes spreading and infecting others, like HeLa).

Comment by Adrien Sicart on Could Roko's basilisk acausally bargain with a paperclip maximizer? · 2023-05-05T19:16:03.413Z · LW · GW

Hypothesis:

Basilisk could give a virus to any complex enough Turing Machine, that proves Basilisk’s Wager is either:

  • a clear mutualistic win-win with the Basilisks (Hive)
  • or a “you will need to waste all your resources trying to avoid our traps”
Comment by Adrien Sicart on Podcast with Divia Eden and Ronny Fernandez on the strong orthogonality thesis · 2023-05-05T18:53:51.498Z · LW · GW

My first post should be validated soon, and is a proof that the strong form does not hold: in some games some terminal alignment perform less than non-terminal equivalent alignment.

An hypothesis is that most goals, if they become “terminal” (“in itself”, impervious to change), prevent evolution, and mutualistic relationships with other agents.

Comment by Adrien Sicart on Recursive Middle Manager Hell: AI Edition · 2023-05-05T17:25:26.498Z · LW · GW

Evolution gives us many organically designed Systems which offer potential Solutions:

  • white blood cells move everywhere to spot and kill defective cells with literal kill-switch (apoptosis)

A team of Leucocytes (white blood cells):

  • organically checking the whole organisation at all levels
  • scanning for signs of amorality/misalignment or any other error
  • flagging for surveillance, giving warnings, or sending to a restorative justice process depending on gravity
  • agents themselves held to the highest standard

This is a system that can be implemented in a Company and fix most of the recursive middle manager hell.

Many humans would not like that (real accountability is hard, especially for middle-men who benefit from status quo), but AI won’t mind.

So the AI “head” could send their Leucocytes check all the system routinely.