Reactive devaluation: Bias in Evaluating AGI X-Risks

remmelt-ellen

Reactive devaluation: Bias in Evaluating AGI X-Risks

post by Remmelt (remmelt-ellen), flandry19 · 2022-12-30T09:02:58.450Z · LW · GW · 9 comments

9 comments

Devaluing proposals only because they purportedly originated with an adversary.

In any technical discussion, there are a lot of well intentioned but otherwise not so well informed people participating.

On the part of individuals and groups both, this has the effect of creating protective layers of isolation and degrees of separation – between the qualified experts and everyone else.

While this is a natural tendency that can have a beneficial effect, the creation of too specific or strong of an 'in-crowd' can result in mono-culture effects.

The problem of a very poor signal to noise ratio from messages received from people outside of the established professional group basically means that the risk of discarding a good proposal from anyone regarded as an outsider is especially likely.

In terms of natural social process, there does not seem to be any available factor to counteract the possibility of forever increasing brittleness in the form of decreasing numbers of new ideas (ie; 'echo chambers').

- link Wikipedia: Reactive devaluation
- an item on Forrest Landry [LW · GW]'s compiled list of biases in evaluating extinction risks.

9 comments

Comments sorted by top scores.

comment by the gears to ascension (lahwran) · 2022-12-31T10:52:21.279Z · LW(p) · GW(p)

if the proof is to be formal, it needs to plug into the mathematical formalizations one would use to do the social science form of this. and if you want to deal with certain kinds of social systems, it's more complicated than that. I'm also an experienced generalist nerd with a lot to say; unfortunately, even if we're real good at babbling, they're right to look at people funny even if they have the systems programming experience or what have you. seems like you're doing stuff that doesn't plug into reality very well. can you provide a demonstration of it as a multi-agent simulation, for example? various levels of shareable multi-agent sim tool to demonstrate whatever effect you'd like; then the only proof you need provide circumstantially is the applicability of the simulation, which means if we find a way the metaphor breaks, we can go back to the drawing board for the simulated example; in software and life in general, tests only give us coverage of the example we try, but at least we can assert that the test passes.

https://hash.ai/platform/engine - https://hash.ai/models?sort=popularity (looks really sleek and modern, can do complicated cellular automata or smooth movement stuff)
https://insightmaker.com/explore (designed for sharing and has existing simulations of ... varying quality)
https://simlin.com/
https://www.collimator.ai/ <- fancy
https://ncase.me/loopy/ <- cheesy demo one
https://github.com/evoplex/evoplex <- cellular automata focused
https://neuralmmo.github.io/ <- neural network training focused, designed for "love in a simbox" style usage, from openai. there are a few more from openai and deepmind I skipped mentioning, browse their githubs if curious.
honorable mention, looks kinda stuffy: https://helipad.dev/
this one has built in causal inference tools, which is cool I guess? https://github.com/zykls/whynot

Replies from: remmelt-ellen

↑ comment by Remmelt (remmelt-ellen) · 2022-12-31T11:29:52.329Z · LW(p) · GW(p)

it needs to plug into the mathematical formalizations one would use to do the social science form of this.

Could you clarify what you mean with a "social science form" of a mathematical formalisation?
I'm not familiar with this.

they're right to look at people funny even if they have the systems programming experience or what have you.

It was expected and understandable that people look funny at the writings from a multi-skilled researcher with new ideas that those people were not yet familiar with.
Let's move on from first impressions.

simulation

If with simulation, we can refer to a model that is computed to estimate a factor on which further logical deduction steps are based on, that would connect up with Forrest's work (it's not really about multi-agent simulation though).

Based on what I learned from Forrest, we need to distinguish the 'estimation' factors from the 'logical entailment' factors. That the notion of "proof" is only with respect to that which can be logically entailed. Everything else is about assessment. In each case, we need to be sure we are doing the modelling correctly.

For example, it could be argued that step 'b' below is about logical entailment, though according to Forrest most would argue that it is an assessment. Given that it depends on both physics and logic (via comp-sci modeling), it depends on how one regards the notion of 'observation', and where that is empirical or analytic observation.

- b; If AGI/APS is permitted to continue to exist,
then it will inevitably, inexorably,
implement and manifest certain convergent behaviors.

- c; that among these inherent convergent behaviors
will be at least all of:.
- 1; to/towards self existence continuance promotion.
- 2; to/towards capability building capability,
a increase seeking capability,
a capability of seeking increase,
capability/power/influence increase, etc.
- 3; to/towards shifting
ambient environmental conditions/context
to/towards favoring the production of
(variants of, increases of)
its artificial substrate matrix.

Note again: the above is not formal reasoning. It is a super-short description of what two formal reasoning steps would cover.

Replies from: lahwran

↑ comment by the gears to ascension (lahwran) · 2023-01-01T10:18:35.150Z · LW(p) · GW(p)

but if we can take the type signature from a simulation, then we can attempt to do formal reasoning about its possibility space given the concrete example. if we don't have precise types, we can't reason through these systems. b seems to me to be a falsifiable claim that cannot be determined true or false from pure rational computation, it requires active investigation. we have evidence of it, but that evidence needs to be cited.

How does your approach compare with https://www.metaethical.ai/?

comment by Remmelt (remmelt-ellen) · 2022-12-30T10:30:34.400Z · LW(p) · GW(p)

The problem of a very poor signal to noise ratio from messages received from people outside of the established professional group basically means that the risk of discarding a good proposal from anyone regarded as an outsider is especially likely.

This insight feels relevant to a comment exchange I was in yesterday [LW(p) · GW(p)]. An AI Safety insider (Christiano) lightly read an overview of work by an outsider (Landry). The insider then judged the work to be "crankery", in effect acting as a protecting barrier against other insiders having to consider the new ideas.

The sticking point was the claim "It is 100% possible to know that X is 100% impossible", where X is a perpetual motion machine or a 'perpetual general benefit machine' (ie. long-term safe and beneficial AGI).

The insider believed this was an exaggerated claim, which meant we first needed to clarify epistemics and social heuristics, rather than the substantive argument form. The reactions by the busy "expert" insider, who had elected to judge the formal reasoning, led to us losing trust that they would proceed in a patient and discerning manner.

There was simply not enough common background and shared conceptual language for the insider to accurately interpret the outsider's writings ("very poor signal to noise ratio from messages received").

Add to that:

Bandwagon effect
- "the tendency to believe that [long-term safe AGI is possible] because many other people do"
Naive realism
- "that the facts are plain for all to see; that rational people will agree with us [that long-term safe AGI is possible]; and that those who do not are either uninformed, lazy, irrational, or biased."
Belief bias
- "Where the evaluation of the logical strength of an argument is biased by the believability of the conclusion [that long-term safe AGI is impossible]... The difficulty is that we want to apply our intuition too often, particularly because it is generally much faster/easier than actually doing/implementing analytic work.)... Arguments which produce results contrary to one's own intuition about what "should" or "is expected" be the case are also implicitly viewed as somewhat disabling and invalidating of one's own expertise, particularly if there also is some self-identification as an 'expert'. No one wants to give up cherished notions regarding themselves. The net effect is that arguments perceived as 'challenging' will be challenged (criticized) somewhat more fully and aggressively than rationality and the methods of science would have already called for."
Conservatism bias
- "People do not want to be seen as having strong or 'extreme opinions', as this in itself becomes a signal from that person to the group that they are very likely to become 'not a member' due to their willingness to prefer the holding of an idea as a higher value than they would prefer being regarded as a member in good standing in the group. Extreme opinions [such as that it is 100% possible to know that long-term safe AGI is 100% impossible] are therefore to be regarded as a marker of 'possible fanaticism' and therefore of that person being in the 'out crowd'."
Status quo bias; System justification
- "The tendency to like things to stay relatively the same. The tendency to defend and bolster the status quo [such as resolving to build long-term safe AGI, believing that it is a hard but solvable problem]. Existing social, economic, and political arrangements tend to be preferred, and alternatives disparaged sometimes even at the expense of individual and collective self-interest."
Reactance
- "The degree to which these various bias effects occur is generally in proportion to a motivating force, typically whenever there is significant money, power, or prestige involved. Naturally, doing what someone 'tells you to do' [like accepting the advice to not cut to the chase and instead spend the time to dig into and clarify the arguments with us, given the inferential distance] is a signal of 'low status' and is therefore to be avoided whenever possible, even if it is a good idea."

I mean, someone recognised as an expert in AI Safety could consciously mean well trying to judge an outsider's work accurately – in the time they have. But that's a lot of biases to counteract.

Forrest actually clarified the claim further to me by message:

Re "100%" or "fully knowable":
By this, I usually mean that the analytic part of an argument is fully finite and discrete, and that all parts (statements) are there, the transforms are enumerated, known to be correct etc (ie, is valid).
In regards to the soundness aspect, that there is some sort of "finality" or "completeness" in the definitions, such that I do not expect that they would ever need to be revised (ie, is at once addressing all necessary aspects, sufficiently, and comprehensively), and that the observations are fully structured by the definitions, etc. Usually this only works for fairly low level concepts, things that track fairly closely to the theory of epistemology itself -- ie, matters of physics that involve symmetry or continuity directly (comparison) or are expressed purely in terms of causation, etc.

One good way to test the overall notion is that something is "fully 100% knowable" if one can convert it to a computer program, and the program compiles and works correctly. The deterministic logic of computers cannot be fooled, as people sometimes can, as there is no bias. This is may be regarded by some as a somewhat high standard, but it makes sense of me as it is of the appropriate type: Ie, a discrete finite result being tested in a purely discrete finite environment. Hence, nothing missing can hide.

But the point is – few readers will seriously consider this message.

That's my experience, sadly.
The common reaction I noticed too from talking with others in AI Safety is that they immediately devaluated that extreme-sounding conclusion that is based on the research of an outsider. A conclusion that goes against their prior beliefs, and against their role in the community.

Replies from: None

↑ comment by [deleted] · 2022-12-30T11:14:04.936Z · LW(p) · GW(p)

Replies from: remmelt-ellen

↑ comment by Remmelt (remmelt-ellen) · 2022-12-31T03:51:46.809Z · LW(p) · GW(p)

You can't complain about people engaging with things other than your idea if the only thing they can even engage with is your idea.

The tricky thing here is that a few people are reacting by misinterpreting the basic form of the formal reasoning at the onset, and judging the merit of the work by their subjective social heuristics.

Which does not lend me (nor Forrest) confidence that those people would do a careful job at checking the term definitions and reasoning steps – particularly if written in precise analytic language that is unlike the mathematical notation they’re used to.

The filter goes both ways.

Instead you have decided to make this post and trigger more crank alarms.

Actually, this post was written in 2015 and I planned last week to reformat it and post it. Rereading it, I’m just surprised how well it appears to line up with the reactions.

Replies from: None, remmelt-ellen

↑ comment by [deleted] · 2022-12-31T06:30:21.913Z · LW(p) · GW(p)

Replies from: remmelt-ellen

↑ comment by Remmelt (remmelt-ellen) · 2022-12-31T08:54:02.919Z · LW(p) · GW(p)

Really appreciate you sharing your honest thoughts here, Rekrul.

From my side, I’d value actually discussing the reasoning forms and steps we already started to outline on the forum. For example, the relevance of intrinsic vs extrinsic selection and correction, or the relevance of the organic vs. artificial substrate distinction. These distinctions are something I would love to openly chat about with you (not the formal reasoning – I’m the bridge-builder, Forrest is the theorist).

That might feel unsatisfactory – in the sense of “why don’t you just give us the proof now?”

As far as I can tell (Forrest can correct me later), there are at least two key reasons:

There is a tendency amongst AI Safety researchers to want to cut to the chase to judging the believability of the conclusion itself. For example, notice that I tried to clarify several argument parts in comment exchanges with Paul, with little or no response. People tend to believe that this would be the same as judging a maths proof over idealised deterministic and countable spaces. Yet formal reasoning here would have have to reference and build up premises from physical theory in indeterministic settings. So we actually need to clarify how a different form of formal reasoning is required here, that does not look like what would be required for P=NP. Patience is needed on the side of our interlocutors.
While Forrest does have most of the argument parts formalised, his use of precise analytical language and premises are not going to be clear to you. Mathematicians are not the only people who use formal language and reasoning steps to prove impossibilities by contradiction. Some analytical philosophers do too (as do formal verification researchers in industrial software engineering using different notation for logic transformation, etc.). No amount of “just give the proof to us and leave it to us to judge” lends us confidence that the judging would track the reasoning steps – if those people already did not track correspondences of some first basic argument parts described by the explanatory writings by Forrest or I that their comments referred to. Even if they are an accomplished mathematician, they are not going to grasp the argumentation if they skim through the text, judging it based on their preconception of what language the terms should be described in or how the formal reasoning should be structured.

I get that people are busy, but this is how it is. We are actually putting a lot of effort and time into communication (and are very happy to get your feedback on that!). And to make this work, they (or others) will need to put in commensurate effort on their end. It is up to them to show that they are not making inconsistent jumps in reasoning there, or talking in terms of their intuitive probability predictions about the believability of the end result, where we should be talking about binary logic transformations.

And actually, such nitty-gritty conversations would be really helpful for us too! Here is what I wrote before in response to another person’s question whether a public proof is available:

Main bottleneck is (re)writing it in a language that AI(S) researchers will understand without having to do a lot of reading/digging in the definitions of terms and descriptions of axioms/premises. A safety impossibility theorem can be constructed from various forms that are either isomorphic with others or are using separate arguments (eg. different theoretical limits covering different scopes of AGI interaction) to arrive at what seems to be an overdetermined conclusion (that long-term AGI safety is not possible).

We don't want to write it out so long that most/all readers drop out before they get to parse through the key reasoning steps. But we also do not want to make it so brief and dense that researchers are confused about at what level of generality we're talking about, have to read through other referenced literature to understand definitions, etc.

Also, one person (a grant investigator) has warned us that AI safety researchers would be too motivated against the conclusion (see 'belief bias') that few would actually attempt to read through a formal safety impossibility theorem. That's indeed likely based on my exchanges so far with AIS researchers (many of them past organisers or participants of AISC). So that is basically why we are first writing a condensed summary (for the Alignment Forum and beyond) that orders the main arguments for long-term AGI safety impossibility without precisely describing all axioms and definitions of terms used, covering all the reasoning gaps to ensure logical consistency, etc.

Note: Forrest has a background in analytical philosophy; he does not write in mathematical notation. Another grant investigator we called with had the expectation that the formal reasoning is necessarily written out in mathematical notation (a rough post-call write-up consolidating our impressions and responses to that conversation): https://mflb.com/ai_alignment_1/math_expectations_psr.html

Also note that Forrest’s formal reasoning work got funded by a $170K grant by Survival and Flourishing Fund. So some grant investigators were willing to bet on this work with money.

One thing Paul talks about constantly is how useful it would be if he had some hard evidence a current approach is doomed, as it would allow the community to pivot. A proof of alignment impossibility would probably make him ecstatic if it was correct (even if it puts us in quite a scary position).

I respect this take then by Paul a lot. This is how I also started to think about it a year ago.

↑ comment by Remmelt (remmelt-ellen) · 2022-12-31T03:59:32.265Z · LW(p) · GW(p)

BTW, I prefer you being blunt, so glad you’re doing that.

A little more effort to try to understand where we could be coming from would be appreciated. Particularly given what’s at stake here – a full extinction event.

Neither Forrest nor I have any motivation to post unsubstantiated claims. Forrest because frankly, he does not care one bit about being recognised by this community – he just wants to find individuals who actually care enough to consider the arguments rigorously. Me because all I’d be doing is putting my career at risk.

Reactive devaluation: Bias in Evaluating AGI X-Risks

Contents

9 comments