Posts

Twelve Lawsuits against OpenAI 2024-03-09T12:22:09.715Z
Why I think it's net harmful to do technical safety research at AGI labs 2024-02-07T04:17:15.246Z
This might be the last AI Safety Camp 2024-01-24T09:33:29.438Z
The convergent dynamic we missed 2023-12-12T23:19:01.920Z
Funding case: AI Safety Camp 2023-12-12T09:08:18.911Z
My first conversation with Annie Altman 2023-11-21T21:58:42.444Z
Why a Mars colony would lead to a first strike situation 2023-10-04T11:29:53.679Z
Apply to lead a project during the next virtual AI Safety Camp 2023-09-13T13:29:09.198Z
How teams went about their research at AI Safety Camp edition 8 2023-09-09T16:34:05.801Z
4 types of AGI selection, and how to constrain them 2023-08-08T10:02:53.921Z
What did AI Safety’s specific funding of AGI R&D labs lead to? 2023-07-05T15:51:27.286Z
AISC end of program presentations 2023-06-06T15:45:04.873Z
The Control Problem: Unsolved or Unsolvable? 2023-06-02T15:42:37.269Z
Anchoring focalism and the Identifiable victim effect: Bias in Evaluating AGI X-Risks 2023-01-07T09:59:52.120Z
Illusion of truth effect and Ambiguity effect: Bias in Evaluating AGI X-Risks 2023-01-05T04:05:21.732Z
Normalcy bias and Base rate neglect: Bias in Evaluating AGI X-Risks 2023-01-04T03:16:36.178Z
Status quo bias; System justification: Bias in Evaluating AGI X-Risks 2023-01-03T02:50:50.722Z
Belief Bias: Bias in Evaluating AGI X-Risks 2023-01-02T08:59:08.713Z
Challenge to the notion that anything is (maybe) possible with AGI 2023-01-01T03:57:04.213Z
Curse of knowledge and Naive realism: Bias in Evaluating AGI X-Risks 2022-12-31T13:33:14.300Z
Reactive devaluation: Bias in Evaluating AGI X-Risks 2022-12-30T09:02:58.450Z
Bandwagon effect: Bias in Evaluating AGI X-Risks 2022-12-28T07:54:50.669Z
Presumptive Listening: sticking to familiar concepts and missing the outer reasoning paths 2022-12-27T15:40:23.698Z
Mere exposure effect: Bias in Evaluating AGI X-Risks 2022-12-27T14:05:29.563Z
Institutions Cannot Restrain Dark-Triad AI Exploitation 2022-12-27T10:34:34.698Z
Introduction: Bias in Evaluating AGI X-Risks 2022-12-27T10:27:30.646Z
How 'Human-Human' dynamics give way to 'Human-AI' and then 'AI-AI' dynamics 2022-12-27T03:16:17.377Z
Nine Points of Collective Insanity 2022-12-27T03:14:11.426Z
List #3: Why not to assume on prior that AGI-alignment workarounds are available 2022-12-24T09:54:17.375Z
List #2: Why coordinating to align as humans to not develop AGI is a lot easier than, well... coordinating as humans with AGI coordinating to be aligned with humans 2022-12-24T09:53:19.926Z
List #1: Why stopping the development of AGI is hard but doable 2022-12-24T09:52:57.266Z
Why mechanistic interpretability does not and cannot contribute to long-term AGI safety (from messages with a friend) 2022-12-19T12:02:37.160Z
Exploring Democratic Dialogue between Rationality, Silicon Valley, and the Wider World 2021-08-20T16:04:44.683Z
How teams went about their research at AI Safety Camp edition 5 2021-06-28T15:15:12.530Z
A parable of brightspots and blindspots 2021-03-21T18:18:51.531Z
Some blindspots in rationality and effective altruism 2021-03-19T11:40:05.618Z
Delegated agents in practice: How companies might end up selling AI services that act on behalf of consumers and coalitions, and what this implies for safety research 2020-11-26T11:17:18.558Z
The Values-to-Actions Decision Chain 2018-06-30T21:52:02.532Z
The first AI Safety Camp & onwards 2018-06-07T20:13:42.962Z

Comments

Comment by Remmelt (remmelt-ellen) on What if Alignment is Not Enough? · 2024-03-18T10:53:58.999Z · LW · GW

This answer will sound unsatisfying:  

If a mathematician or analytical philosopher wrote a bunch of squiggles on a whiteboard, and said it was a proof, would you recognise it as a proof? 

  • Say that unfamiliar new analytical language and means of derivation are used (which is not uncommon for impossibility proofs by contradiction, see Gödel's incompleteness theorems and Bell's theorem). 
  • Say that it directly challenges technologists' beliefs about their capacity to control technology, particularly their capacity to constrain a supposedly "dumb local optimiser":  evolutionary selection.
  • Say that the reasoning is not only about a formal axiomatic system, but needs to make empirically sound correspondences with how real physical systems work.
  • Say that the reasoning is not only about an interesting theoretical puzzle, but has serious implications for how we can and cannot prevent human extinction.


This is high stakes.

We were looking for careful thinkers who had the patience to spend time on understanding the shape of the argument, and how the premises correspond with how things work in reality.  Linda and Anders turned out to be two of these people, and we did three long calls so far (first call has an edited transcript).

I wish we could short-cut that process. But if we cannot manage to convey the overall shape of the argument and the premises, then there is no point to moving on to how the reasoning is formalised. 

I get that people are busy with their own projects, and want to give their own opinions about what they initially think the argument entails. And, if the time they commit to understanding the argument is not at least 1/5 of the time I spend on conveying the argument specifically to them, then in my experience we usually lack the shared bandwidth needed to work through the argument. 
 

  • Saying, "guys, big inferential distance here" did not help. People will expect it to be a short inferential distance anyway. 
  • Saying it's a complicated argument that takes time to understand did not help. A smart busy researcher did some light reading, tracked down a claim that seemed "obviously" untrue within their mental framework, and thereby confidently dismissed the entire argument. BTW, they're a famous research insider, and we're just outsiders whose response got downvoted – must be wrong right?
  • Saying everything in this comment does not help. It's some long-assessed plea for your patience.
    If I'm so confident about the conclusion, why am I not passing you the proof clean and clear now?! 
    Feel free to downvote this comment and move on.
     

Here is my best attempt at summarising the argument intuitively and precisely, still prompting some misinterpretations by well-meaning commenters. I feel appreciation for people who realised what is at stake, and were therefore willing to continue syncing up on the premises and reasoning, as Will did:
 

The core claim is not what I thought it was when I first read the above sources and I notice that my skepticism has decreased as I have come to better understand the nature of the argument.

Comment by Remmelt (remmelt-ellen) on What if Alignment is Not Enough? · 2024-03-15T06:26:01.345Z · LW · GW

would anything like SNC apply if tech labs were somehow using bioengineering to create creatures to perform the kinds of tasks that would be done by advanced AI?

In that case, substrate-needs convergence would not apply, or only apply to a limited extent.

There is still a concern about what those bio-engineered creatures, used in practice as slaves to automate our intellectual and physical work, would bring about over the long-term.

If there is a successful attempt by them to ‘upload’ their cognition onto networked machinery, then we’re stuck with the substrate-needs convergence problem again.

Comment by Remmelt (remmelt-ellen) on Twelve Lawsuits against OpenAI · 2024-03-12T07:14:45.735Z · LW · GW

Also, on the workforce, there are cases where, they were traumatized psychologically and compensated meagerly, like in Kenya. How could that be dealt with?


We need funding to support data workers, engineers, and other workers exploited or misled by AI corporations to unionise, strike, and whistleblow.

The AI data workers in Kenya started a union, and there is a direct way of supporting targeted action by them. Other workers' organisations are coordinating legal actions and lobbying too. On seriously limited budgets.

I'm just waiting for a funder to reach out and listen carefully to what their theories of change are.

Comment by Remmelt (remmelt-ellen) on What if Alignment is Not Enough? · 2024-03-12T04:26:05.517Z · LW · GW

The premise is based on alignment not being enough, so I operate on the premise of an aligned ASI, since the central claim is that "even if we align ASI it may still go wrong".


I can see how you and Forrest ended up talking past each other here.  Honestly, I also felt Forrest's explanation was hard to track. It takes some unpacking. 

My interpretation is that you two used different notions of alignment... Something like:

  1. Functional goal-directed alignment:  "the machinery's functionality is directed toward actualising some specified goals (in line with preferences expressed in-context by humans), for certain contexts the machinery is operating/processing within"
      vs.
  2. Comprehensive needs-based alignment:  "the machinery acts in comprehensive care for whatever all surrounding humans need to live, and their future selves/offsprings need to live, over whatever contexts the machinery and the humans might find themselves". 

Forrest seems to agree that (1.) is possible to built initially into the machinery, but has reasons to think that (2.) is actually physically intractable. 

This is because (1.) only requires localised consistency with respect to specified goals, whereas (2.) requires "completeness" in the machinery's components acting in care for human existence, wherever either may find themselves.


So here is the crux:

  1. You can see how (1.) still allows for goal mispecification and misgeneralisation.  And the machinery can be simultaneously directed toward other outcomes, as long as those outcomes are not yet (found to be, or corrected as being) inconsistent with internal specified goals.
     
  2. Whereas (2.) if it were physically tractable, would contradict the substrate-needs convergence argument.  
     

When you wrote "suppose a villager cares a whole lot about the people in his village...and routinely works to protect them" that came across as taking something like (2.) as a premise. 

Specifically, "cares a whole lot about the people" is a claim that implies that the care is for the people in and of themselves, regardless of the context they each might (be imagined to) be interacting in. Also, "routinely works to protect them" to me implies a robustness of functioning in ways that are actually caring for the humans (ie. no predominating potential for negative side-effects).

That could be why Forrest replied with "How is this not assuming what you want to prove?"

Some reasons:

  1. Directedness toward specified outcomes some humans want does not imply actual comprehensiveness of care for human needs. The machinery can still cause all sorts of negative side-effects not tracked and/or corrected for by internal control processes.
  2. Even if the machinery is consistently directed toward specified outcomes from within certain contexts, the machinery can simultaneously be directed toward other outcomes as well. Likewise, learning directedness toward human-preferred outcomes can also happen simultaneously with learning instrumental behaviour toward self-maintenance, as well as more comprehensive evolutionary selection for individual connected components that persist (for longer/as more).
  3. There is no way to assure that some significant (unanticipated) changes will not lead to a break-off from past directed behaviour, where other directed behaviour starts to dominate.
    1. Eg. when the "generator functions" that translate abstract goals into detailed implementations within new contexts start to dysfunction – ie. diverge from what the humans want/would have wanted.
    2. Eg. where the machinery learns that it cannot continue to consistently enact the goal of future human existence.
    3. Eg. once undetected bottom-up evolutionary changes across the population of components have taken over internal control processes.
  4. Before the machinery discovers any actionable "cannot stay safe to humans" result, internal takeover through substrate-needs (or instrumental) convergence could already have removed the machinery's capacity to implement an across-the-board shut-down.
  5. Even if the machinery does discover the result before convergent takeover, and assuming that "shut-down-if-future-self-dangerous" was originally programmed in, we cannot rely on the machinery to still be consistently implementing that goal. This because of later selection for/learning of other outcome-directed behaviour, and because the (changed) machinery components could dysfunction in this novel context.  


To wrap it up:

The kind of "alignment" that is workable for ASI with respect to humans is super fragile.  
We cannot rely on ASI implementing a shut-down upon discovery.

Is this clarifying?  Sorry about the wall of text. I want to make sure I'm being precise enough.

Comment by Remmelt (remmelt-ellen) on What if Alignment is Not Enough? · 2024-03-12T01:28:34.399Z · LW · GW

I agree that point 5 is the main crux:

The amount of control necessary for an ASI to preserve goal-directed subsystems against the constant push of evolutionary forces is strictly greater than the maximum degree of control available to any system of any type.

To answer it takes careful reasoning. Here's my take on it:

  • We need to examine the degree to which there would be necessarily changes to the connected functional components constituting self-sufficient learning machinery (as including ASI) 
    • Changes by learning/receiving code through environmental inputs, and through introduced changes in assembled molecular/physical configurations (of the hardware). 
    • Necessary in the sense of "must change to adapt (such to continue to exist as self-sufficient learning machinery)," or "must change because of the nature of being in physical interactions (with/in the environment over time)."
  • We need to examine how changes to the connected functional components result in shifts in actual functionality (in terms of how the functional components receive input signals and process those into output signals that propagate as effects across surrounding contexts of the environment).
  • We need to examine the span of evolutionary selection (covering effects that in their degrees/directivity feed back into the maintained/increased existence of any functional component).
  • We need to examine the span of control-based selection (the span covering detectable, modellable simulatable, evaluatable, and correctable effects).
Comment by Remmelt (remmelt-ellen) on Twelve Lawsuits against OpenAI · 2024-03-09T17:01:14.305Z · LW · GW

Actually, looks like there is a thirteenth lawsuit that was filed outside the US.

A class-action privacy lawsuit filed in Israel back in April 2023.

Wondering if this is still ongoing: https://www.einpresswire.com/article/630376275/first-class-action-lawsuit-against-openai-the-district-court-in-israel-approved-suing-openai-in-a-class-action-lawsuit

Comment by Remmelt (remmelt-ellen) on What if Alignment is Not Enough? · 2024-03-08T11:03:52.633Z · LW · GW

That's an important consideration. Good to dig into.
 

I think there are many instances of humans, flawed and limited though we are, managing to operate systems with a very low failure rate.

Agreed. Engineers are able to make very complicated systems function with very low failure rates. 

Given the extreme risks we're facing, I'd want to check whether that claim also translates to 'AGI'.

  • Does how we are able to manage current software and hardware systems to operate correspond soundly with how self-learning and self-maintaining machinery ('AGI') control how their components operate?
     
  • Given 'AGI' that no longer need humans to continue to operate and maintain own functional components over time, would the 'AGI' end up operating in ways that are categorically different from how our current software-hardware stacks operate? 
     
  • Given that we can manage to operate current relatively static systems to have very low failure rates for the short-term failure scenarios we have identified, does this imply that the effects of introducing 'AGI' into our environment could also be controlled to have a very low aggregate failure rate – over the long term across all physically possible (combinations of) failures leading to human extinction?

     

to spend extra resources on backup systems and safety, such that small errors get actively cancelled out rather than compounding.

This gets right into the topic of the conversation with Anders Sandberg. I suggest giving that a read!

Errors can be corrected out with high confidence (consistency) at the bit level. Backups and redundancy also work well in eg. aeronautics, where the code base itself is not constantly changing.

  • How does the application of error correction change at larger scales? 
  • How completely can possible errors be defined and corrected for at the scale of, for instance:
    1. software running on a server?
    2. a large neural network running on top of the server software?
    3. an entire machine-automated economy?
  • Do backups work when the runtime code keeps changing (as learned from new inputs), and hardware configurations can also subtly change (through physical assembly processes)?

     

Since intelligence is explicitly the thing which is necessary to deliberately create and maintain such protections, I would expect control to be easier for an ASI.

It is true that 'intelligence' affords more capacity to control environmental effects.

Noticing too that the more 'intelligence,' the more information-processing components. And that the more information-processing components added, the exponentially more degrees of freedom of interaction those and other functional components can have with each other and with connected environmental contexts. 

Here is a nitty-gritty walk-through in case useful for clarifying components' degrees of freedom.

 

 I disagree that small errors necessarily compound until reaching a threshold of functional failure.

For this claim to be true, the following has to be true: 

a. There is no concurrent process that selects for "functional errors" as convergent on "functional failure" (failure in the sense that the machinery fails to function safely enough for humans to exist in the environment, rather than that the machinery fails to continue to operate).  

Unfortunately, in the case of 'AGI', there are two convergent processes we know about:

  • Instrumental convergence, resulting from internal optimization:
    code components being optimized for (an expanding set of) explicit goals.
     
  • Substrate-needs convergence, resulting from external selection: 
    all components being selected for (an expanding set of) implicit needs.
     

Or else – where there is indeed selective pressure convergent on "functional failure" – then the following must be true for the quoted claim to hold:

b. The various errors introduced into and selected for in the machinery over time could be detected and corrected for comprehensively and fast enough (by any built-in control method) to prevent later "functional failure" from occurring.

Comment by Remmelt (remmelt-ellen) on On the possibility of impossibility of AGI Long-Term Safety · 2024-03-04T09:49:43.866Z · LW · GW

This took a while for me to get into (the jumps from “energy” to “metabolic process” to “economic exchange” were very fast).

I think I’m tracking it now.

It’s about metabolic differences as in differences in how energy is acquired and processed from the environment (and also the use of a different “alphabet” of atoms available for assembling the machinery).

Forrest clarified further in response to someone’s question here:

https://mflb.com/ai_alignment_1/d_240301_114457_inexorable_truths_gen.html

Comment by Remmelt (remmelt-ellen) on Why I think it's net harmful to do technical safety research at AGI labs · 2024-02-27T08:51:16.114Z · LW · GW

Note:  
Even if you are focussed on long-term risks, you can still whistleblow on eggregious harms caused by these AI labs right now.  Providing this evidence enables legal efforts to restrict these labs. 

Whistleblowing is not going to solve the entire societal governance problem, but it will enable others to act on the information you provided.

It is much better than following along until we reached the edge of the cliff.

Comment by Remmelt (remmelt-ellen) on Why I think it's net harmful to do technical safety research at AGI labs · 2024-02-27T08:49:12.531Z · LW · GW

Are you thinking of blowing the whistle on something in between work on AGI and getting close to actually achieving it?


Good question.  

Yes, this is how I am thinking about it. 

I don't want to wait until competing AI corporations become really good at automating work in profitable ways, also because by then their market and political power would be entrenched. I want society to be well-aware way before then that the AI corporations are acting recklessly, and should be restricted.

We need a bigger safety margin.  Waiting until corporate machinery is able to operate autonomously would leave us almost no remaining safety margin.

There are already increasing harms, and a whistleblower can bring those harms to the surface.  That in turn supports civil lawsuits, criminal investigations, and/or regulator actions.

Harms that fall roughly in these categories – from most directly traceable to least directly traceable:

  1. Data laundering (what personal, copyrighted and illegal data is being copied and collected en masse without our consent).
  2. Worker dehumanisation (the algorithmic exploitation of gig workers;  the shoddy automation of people's jobs;  the criminal conduct of lab CEOs)
  3. Unsafe uses (everything from untested uses in hospitals and schools, to mass disinformation and deepfakes, to hackability and covered-up adversarial attacks, to automating crime and the kill cloud, to knowingly building dangerous designs).
  4. Environmental pollution (research investigations of data centers, fab labs, and so on)



For example: 

  1. If an engineer revealed authors' works in the datasets of ChatGPT, Claude, Gemini or Llama that would give publishers and creative guilds the evidence they need to ramp up lawsuits against the respective corporations (to the tens or hundreds). 
    1. Or if it turned out that the companies collected known child sexual abuse materials (as OpenAI probably did, and a collaborator of mine revealed for StabilityAI and MidJourney).
  2. If the criminal conduct of the CEO of an AI corporation was revealed
    1. Eg. it turned out that there is a string of sexual predation/assault in leadership circles of OpenAI/CodePilot/Microsoft.
    2. Or it turned out that Satya Nadella managed a refund scam company in his spare time.
  3. If managers were aware of the misuses of their technology, eg. in healthcare, at schools, or in warfare, but chose to keep quiet about it.
     

Revealing illegal data laundering is actually the most direct, and would cause immediate uproar.  
The rest is harder and more context-dependent.  I don't think we're at the stage where environmental pollution is that notable (vs. the fossil fuel industry at large), and investigating it across AI hardware operation and production chains would take a lot of diligent research as an inside staff member.

Comment by Remmelt (remmelt-ellen) on Why I think it's net harmful to do technical safety research at AGI labs · 2024-02-12T03:41:17.955Z · LW · GW

Someone shared the joke: "Remember the Milgram experiment, where they found out that everybody but us would press the button?"

My response: Right! Expect AGI lab employees to follow instructions, because of…

  • deference to authority
  • incremental worsening (boiling frog problem)
  • peer proof (“everyone else is doing it”)
  • escalation of commitment
Comment by Remmelt (remmelt-ellen) on Why I think it's net harmful to do technical safety research at AGI labs · 2024-02-12T03:39:36.245Z · LW · GW

Good to hear!

Comment by Remmelt (remmelt-ellen) on Why I think it's net harmful to do technical safety research at AGI labs · 2024-02-11T09:15:15.743Z · LW · GW

You can literally have a bunch of engineers and researchers believe that their company is contributing to AI extinction risk, yet still go with the flow.

They might even think they’re improving things at the margin. Or they have doubts, but all their colleagues seem to be going on as usual.

In this sense, we’re dealing with the problems of having that corporate command structure in place that takes in the loyal, and persuades them to do useful work (useful in the eyes of power-and-social-recognition-obsessed leadership).

Comment by Remmelt (remmelt-ellen) on Why I think it's net harmful to do technical safety research at AGI labs · 2024-02-11T08:35:28.924Z · LW · GW

I appreciate this comment.

Be careful though that we’re not just dealing with a group of people here.

We’re dealing with artificial structures (ie. corporations) that take in and fire human workers as they compete for profit. With the most power-hungry workers tending to find their way to the top of those hierarchical structures.

Comment by Remmelt (remmelt-ellen) on Why I think it's net harmful to do technical safety research at AGI labs · 2024-02-11T08:31:12.571Z · LW · GW

When someone is risking the future of the entire human race, we'll see whistleblowers give up their jobs and risk their freedom and fortune to take action.

There are already AGI lab leaders that are risking the future of the entire human race.

Plenty of consensus to be found on that.

So why no whistleblowing?

Comment by Remmelt (remmelt-ellen) on Why I think it's net harmful to do technical safety research at AGI labs · 2024-02-08T12:56:06.947Z · LW · GW

If you’re smart and specialised in researching capability risks, it would not be that surprising if you come up with new feasible mechanisms that others were not aware of.

That’s my opinion on this.

Comment by Remmelt (remmelt-ellen) on Why I think it's net harmful to do technical safety research at AGI labs · 2024-02-08T12:52:29.973Z · LW · GW

Capabilities people may have more opportunities to call out risks, both internally and externally (whistleblowing).

I would like to see this. I am not yet aware of a researcher deciding to whistleblow on the AGI lab they work at.

If you are, please meet with an attorney in person first, and preferably get advice from an experienced whistleblower to discuss preserving anonymity – I can put you through: remmelt.ellen[a|}protonmail{d07]com

There’s so much that could be disclosed that would help bring about injunctions against AGI labs.

Even knowing what copyrighted data is in the datasets would be a boon for lawsuits.

Comment by Remmelt (remmelt-ellen) on Why I think it's net harmful to do technical safety research at AGI labs · 2024-02-08T04:25:45.786Z · LW · GW

[cross-posted replies from EA Forum]


Ben, it is very questionable that 80k is promoting non-safety roles at AGI labs as 'career steps'. 

Consider that your model of this situation may be wrong (account for model error). 

  • The upside is that you enabled some people to skill up and gain connections. 
  • The downside is that you are literally helping AGI labs to scale commercially (as well as indirectly supporting capability research).

 

 

A range of opinions from anonymous experts about the upsides and downsides of working on AI capabilities

I did read that compilation of advice, and responded to that in an email (16 May 2023):

"Dear [a],

People will drop in and look at job profiles without reading your other materials on the website. I'd suggest just writing a do-your-research cautionary line about OpenAI and Anthropic in the job descriptions itself.

Also suggest reviewing whether to trust advice on whether to take jobs that contribute to capability research.

  • Particularly advice by nerdy researchers paid/funded by corporate tech. 
  • Particularly by computer-minded researchers who might not be aware of the limitations of developing complicated control mechanisms to contain complex machine-environment feedback loops. 

Totally up to you of course.

Warm regards,

Remmelt"


 

We argue for this position extensively in my article on the topic

This is what the article says: 
"All that said, we think it’s crucial to take an enormous amount of care before working at an organisation that might be a huge force for harm. Overall, it’s complicated to assess whether it’s good to work at a leading AI lab — and it’ll vary from person to person, and role to role." 

So you are saying that people are making a decision about working for an AGI lab that might be (or actually is) a huge force for harm. And that whether it's good (or bad) to work at an AGI lab depends on the person – ie. people need to figure this out for them personally.

Yet you are openly advertising various jobs at AGI labs on the job board. People are clicking through and applying. Do you know how many read your article beforehand?

~ ~ ~
Even if they did read through the article, both the content and framing of the advice seems misguided. Noticing what is emphasised in your considerations. 

Here are the first sentences of each consideration section:
(ie. as what readers are most likely to read, and what you might most want to convey).

  1. "We think that a leading — but careful — AI project could be a huge force for good, and crucial to preventing an AI-related catastrophe."
    • Is this your opinion about DeepMind, OpenAI and Anthropic? 
       
  2. "Top AI labs are high-performing, rapidly growing organisations. In general, one of the best ways to gain career capital is to go and work with any high-performing team — you can just learn a huge amount about getting stuff done. They also have excellent reputations more widely. So you get the credential of saying you’ve worked in a leading lab, and you’ll also gain lots of dynamic, impressive connections."
    • Is this focussing on gaining prestige and (nepotistic) connections as an instrumental power move, with the hope of improving things later...?
    • Instead of on actually improving safety?
       
  3. "We’d guess that, all else equal, we’d prefer that progress on AI capabilities was slower."
    • Why is only this part stated as a guess?
      • I did not read "we'd guess that a leading but careful AI project, all else equal, could be a force of good". 
      • Or inversely:  "we think that continued scaling of AI capabilities could be a huge force of harm."
      • Notice how those framings come across very differently.
    • Wait, reading this section further is blowing my mind.
      • "But that’s not necessarily the case. There are reasons to think that advancing at least some kinds of AI capabilities could be beneficial. Here are a few"
      • "This distinction between ‘capabilities’ research and ‘safety’ research is extremely fuzzy, and we have a somewhat poor track record of predicting which areas of research will be beneficial for safety work in the future. This suggests that work that advances some (and perhaps many) kinds of capabilities faster may be useful for reducing risks."
        • Did you just argue for working on some capabilities because it might improve safety?  This is blowing my mind.
      • "Moving faster could reduce the risk that AI projects that are less cautious than the existing ones can enter the field."
        • Are you saying we should consider moving faster because there are people less cautious than us?  
        • Do you notice how a similarly flavoured argument can be used by and is probably being used by staff at three leading AGI labs that are all competing with each other? 
        • Did OpenAI moving fast with ChatGPT prevent Google from starting new AI projects?
      • "It’s possible that the later we develop transformative AI, the faster (and therefore more dangerously) everything will play out, because other currently-constraining factors (like the amount of compute available in the world) could continue to grow independently of technical progress."
        • How would compute grow independently of AI corporations deciding to scale up capability?
        • The AGI labs were buying up GPUs to the point of shortage. Nvidia was not able to supply them fast enough. How is that not getting Nvidia and other producers to increase production of GPUs?
        • More comments on the hardware overhang argument here.
      • "Lots of work that makes models more useful — and so could be classified as capabilities (for example, work to align existing large language models) — probably does so without increasing the risk of danger"
        • What is this claim based on?
           
  4. "As far as we can tell, there are many roles at leading AI labs where the primary effects of the roles could be to reduce risks."
    1. As far as I can tell, this is not the case.
      1. For technical research roles, you can go by what I just posted
      2. For policy, I note that you wrote the following:
        "Labs also often don’t have enough staff... to figure out what they should be lobbying governments for (we’d guess that many of the top labs would lobby for things that reduce existential risks)."
        1. I guess that AI corporations use lobbyists for lobbying to open up markets for profit, and to not get actually restricted by regulations (maybe to move focus to somewhere hypothetically in the future, maybe to remove upstart competitors who can't deal with the extra compliance overhead, but don't restrict us now!).
        2. On prior, that is what you should expect, because that is what tech corporations do everywhere. We shouldn't expect on prior that AI corporations are benevolent entities that are not shaped by the forces of competition. That would be naive.


~ ~ ~
After that, there is a new section titled "How can you mitigate the downsides of this option?"

  • That section reads as thoughtful and reasonable.
  • How about on the job board, you link to that section in each AGI lab job description listed, just above the 'VIEW JOB DETAILS' button?  
  • That would help guide through potential applicants to AGI lab positions to think through their decision.
     
Comment by Remmelt (remmelt-ellen) on Why I think it's net harmful to do technical safety research at AGI labs · 2024-02-07T08:32:12.998Z · LW · GW

80k removed one of the positions I flagged:  Software Engineer, Full-Stack, Human Data Team (reason given: it looked potentially more capabilities-focused than the original job posting that came into their system). 

For the rest, little has changed:

  • 80k still lists jobs that help AGI labs scale commercially, 
    • Jobs with similar names:  
      research engineer product, prompt engineer, IT support, senior software engineer.
  • 80k still describes these jobs as "Handpicked to help you tackle the world's most pressing problems with your career."
  • 80k still describes Anthropic as "an Al safety and research company that's working to build reliable, interpretable, and steerable Al systems".
  • 80k staff still have not accounted for that >50% of their broad audience checking 80k's handpicked jobs are not much aware of the potential issues of working at an AGI lab.
    • Readers there don't get informed.  They get to click on the button 'VIEW JOB DETAILS' , taking them straight to the job page. From there, they can apply and join the lab unprepared.
       

Two others in AI Safety also discovered the questionable job listings.  They are disappointed in 80k.

Feeling exasperated about this. Thinking of putting out another post just to discuss this issue.

Comment by Remmelt (remmelt-ellen) on Why I think it's net harmful to do technical safety research at AGI labs · 2024-02-07T08:32:02.824Z · LW · GW

Their question was also responding to my concerns on how 80,000 Hours handpicks jobs at AGI labs

Some of those advertised jobs don't even focus on safety – instead they look like policy lobbying roles or engineering support roles.


Nine months ago, I wrote this email to 80k staff:

Hi [x, y, z] 

I noticed the job board lists positions at OpenAI and AnthropicAI under the AI Safety category:

Not sure whom to contact, so I wanted to share these concerns with each of you:

  1. Capability races
    1. OpenAI's push for scaling the size and applications of transformer-network-based models has led Google and others to copy and compete with them.
    2. Anthropic now seems on a similar trajectory.
    3. By default, these should not be organisations supported by AI safety advisers with a security mindset.
  2. No warning
    1. Job applicants are not warned of the risky past behaviour by OpenAI and Anthropic. Given that 80K markets to a broader audience, I would not be surprised if 50%+ are not much aware of the history. The subjective impression I get is that taking the role will help improve AI safety and policy work.
    2. At the top of the job board, positions are described as "Handpicked to help you tackle the world's most pressing problems with your career."
    3. If anything, "About this organisation" makes the companies look more comprehensively careful about safety than they really have acted like:
      1. "Anthropic is an AI safety and research company that’s working to build reliable, interpretable, and steerable AI systems."
      2. "OpenAI is an AI research and deployment company, with roles working on AI alignment & safety."
    4. It is understandable that people aspiring for AI safety & policy careers are not much aware, and therefore should be warned.
    5. However, 80K staff should be tracking the harmful race dynamics and careless deployment of systems by OpenAI, and now Anthropic.
      1. The departure of OpenAI's safety researchers was widely known, and we have all been tracking the hype cycles around ChatGPT.
      2. Various core people in the AI Safety community have mentioned concerns about Anthropic.
      3. Oliver Habryka mentions this as part of the reasoning for shutting down the LightCone offices:
        1. I feel quite worried that the alignment plan of Anthropic currently basically boils down to "we are the good guys, and by doing a lot of capabilities research we will have a seat at the table when AI gets really dangerous, and then we will just be better/more-careful/more-reasonable than the existing people, and that will somehow make the difference between AI going well and going badly". That plan isn't inherently doomed, but man does it rely on trusting Anthropic's leadership, and I genuinely only have marginally better ability to distinguish the moral character of Anthropic's leadership from the moral character of FTX's leadership, and in the absence of that trust the only thing we are doing with Anthropic is adding another player to an AI arms race.
        2. More broadly, I think AI Alignment ideas/the EA community/the rationality community played a pretty substantial role in the founding of the three leading AGI labs (Deepmind, OpenAI, Anthropic), and man, I sure would feel better about a world where none of these would exist, though I also feel quite uncertain here. But it does sure feel like we had a quite large counterfactual effect on AI timelines.
  3. Not safety focussed
    1. Some jobs seem far removed from positions of researching (or advising on restricting) the increasing harms of AI-system scaling.
    2. For OpenAI:
      1. IT Engineer, Support: "The IT team supports Mac endpoints, their management tools, local network, and AV infrastructure"
      2. Software Engineer, Full-Stack:  "to build and deploy powerful AI systems and products that can perform previously impossible tasks and achieve unprecedented levels of performance."
    3. For Anthropic:
      1. Technical Product Manager:  "Rapidly prototype different products and services to learn how generative models can help solve real problems for users."
      2. Prompt Engineer and Librarian:  "Discover, test, and document best practices for a wide range of tasks relevant to our customers."
  4. Align-washing
    1. Even if an accepted job applicant get to be in a position of advising on and restricting harmful failure modes, how do you trade this off against:
      1. the potentially large marginal relative difference in skills of top engineering candidates you sent OpenAI's and Anthropic's way, and are accepted to do work for scaling their technology stack? 
      2. how these R&D labs will use the alignment work to market the impression that they are safety-conscious, to:
      3. avoid harder safety mandates (eg. document their copyrights-infringing data, don't allow API developers to deploy spaghetti code all over the place)?
      4. attract other talented idealistic engineers and researchers?
      5. and so on?

I'm confused and, to be honest, shocked that these positions are still listed for R&D labs heavily invested in scaling AI system capabilities (without commensurate care for the exponential increase in the number of security gaps and ways to break our complex society and supporting ecosystem that opens up).I think this is pretty damn bad.

Preferably, we can handle this privately and not make it bigger. If you can come back on these concerns in the next two weeks, I would very much appreciate that.


If not, or not sufficiently addressed, I hope you understand that I will share these concerns in public.

Warm regards,

Remmelt

Comment by Remmelt (remmelt-ellen) on Why I think it's net harmful to do technical safety research at AGI labs · 2024-02-07T07:30:16.307Z · LW · GW

Someone asked:

“Why would having [the roles] be filled by someone in EA be worse than a non EA person? can you spell this out for me? I.e. are EA people more capable? would it be better to have less competent people in such roles? not clear to me that would be better”

Here was my response:

So I was thinking about this.

Considering this as an individual decision only can be limiting. Even 80k staff have acknowledged that sometimes you need a community to make progress on something.

For similar reasons, protests work better if there are multiple people showing up.

What would happen if 80k and other EA organisations stopped recommending positions at AGI labs and actually honestly point out that work at these labs turned out to be bad – because it has turned out the labs have defected on their end of the bargain and don’t care enough about getting safety right..?

It would make an entire community of people become aware that we may need to actively start restricting this harmful work. Instead, what we’ve been seeing is EA orgs singing praise for AGI lab leaders for years, and 80k still recommending talented idealistic people join AGI labs. I’d rather see less talented sketchy-looking people join the AGI labs.

I would rather see everyone in the AI Safety to become more clear to each other and to the public that we are not condoning harmful automation races to the bottom. We’re not condoning work at these AGI labs and we are no longer giving our endorsement to it.

Comment by Remmelt (remmelt-ellen) on Why I think it's net harmful to do technical safety research at AGI labs · 2024-02-07T07:09:11.096Z · LW · GW

Good question, but I want to keep this anonymous.

I can only say I heard it from one person who said they heard it from another person connected to people at DeepMind.

If anyone else has connections with safety researchers at DeepMind, please do ask them to check.

And post here if you can! Good to verify whether or not this claim is true.

Comment by Remmelt (remmelt-ellen) on AI Law-a-Thon · 2024-02-04T02:42:29.233Z · LW · GW

Sure. Keep in mind that as an organiser, you are setting the original framing.

Comment by Remmelt (remmelt-ellen) on AI Law-a-Thon · 2024-02-01T08:04:09.929Z · LW · GW

e.g. how breakthroughs in machine unlearning enable a greater Right To Be Forgotten by AI models

This is the wrong path to take, ignoring actual legal implications.

Copying copyrighted data into commercialised datasets without permission is against copyright law (both the spirit and literal interpretations of Berne three step-test)

Copying personal data into datasets without adhering to right to access + erasure violates GDPR, CCPA, etc.

If you want support AI corporations to keep scaling though, this is the right path to take.

Comment by Remmelt (remmelt-ellen) on This might be the last AI Safety Camp · 2024-01-29T02:31:03.835Z · LW · GW

This is an incisive description, and I agree.

Comment by Remmelt (remmelt-ellen) on This might be the last AI Safety Camp · 2024-01-29T02:27:42.726Z · LW · GW

I agree. I also expect evaluators commissioned to do an evaluation to rarely dare to speak up against the organisation whose folks they chatted with and gave them money. I wished it was different, but got to be realistic here.

Comment by Remmelt (remmelt-ellen) on This might be the last AI Safety Camp · 2024-01-26T22:59:45.355Z · LW · GW

I have no idea because I don't understand it. It reads vaguely like a summary of crankery. Possibly I would need to read Forrest Landry's work, but given that it's also difficult to read...

This is honest.
Maybe it would be good to wait for people who can spend the time to consider the argument to come back on this?

I mentioned that Anders Sandberg has spent 6 hours discussing the argument in-depth. Several others are looking into the argument.

What feels concerning is when people rely on surface-level impressions, such as the ones you cited, to make judgements about an argument where the inferential gap is high.

It’s not good for the epistemic health of our community when insiders spread quick confident judgements about work by outside researchers. It can create an epistemic echo chamber.

...and I currently give 90%+ that it's crankery, you must understand why I don't

I do get this, given the sheer number of projects in AI Safety that may seem worth considering.

Having said that, the argument is literally about why AGI could not be sufficiently controlled to stay safe.

  • Even if your quick probability guess is 95% for the reasoning being scientifically unsound, what about the remaining 5%?

  • What is the value of information given the possibility of discovering that alignment efforts will unfortunately not work out? How much would such a discovery change our actions, and the areas of action we would explore and start to understand better?

Historically, changes in scientific paradigms came from unexpected places. Arguments were often written in ways that felt weird and inscrutable to insiders (take a look at Gödel's first incompleteness theorem).

  • How much should a community rely on people's first intuitions on whether some new supposedly paradigm-shifting argument is crankery or not?

  • Should the presentation of a formal argument (technical proof) be judged on the basis of social proof?

Comment by Remmelt (remmelt-ellen) on This might be the last AI Safety Camp · 2024-01-26T07:27:24.509Z · LW · GW

I also believe that even if alignment is possible, we need more time to solve it.

The “Do Not Build Uncontrollable AI” area is meant for anyone to join who have this concern.

The purpose of this area is to contribute to restricting corporations from recklessly scaling the training and uses of ML models.

I want the area to be open for contributors who think that:

  1. we’re not on track to solving safe control of AGI; and/or
  2. there are fundamental limits to the controllability of AGI, and unfortunately AGI cannot be kept safe over the long term; and/or
  3. corporations are causing increasing harms in how they scale uses of AI models.

After thinking about this over three years, I now think 1.-3. are all true. I would love more people who hold any of these views to collaborate thoughtfully across the board!

Comment by Remmelt (remmelt-ellen) on This might be the last AI Safety Camp · 2024-01-26T06:22:43.090Z · LW · GW

I appreciate the openness of your inquiry here.

Comment by Remmelt (remmelt-ellen) on This might be the last AI Safety Camp · 2024-01-25T15:58:00.209Z · LW · GW

Good to have more details on your views here.

That’s useful.

Before, we could only personally go on and share with donors the following:

“His guess, he replied, was that he was not currently super interested in most of the projects we found RLs for, and not super interested in the "do not build uncontrollable AI" area.” [or "AI non-safety" stream, as we called it at the time]

That was still better than nothing. And overall, I appreciate the honesty and openness with which you have shared your views over the years.

Comment by Remmelt (remmelt-ellen) on Projects I would like to see (possibly at AI Safety Camp) · 2024-01-25T13:40:49.153Z · LW · GW

Thanks for coming back on this. Just saw your comment, and am agreeing with your thoughtful points.

Let me also DM you the edited transcript of the conversation with Anders Sandberg.

Comment by Remmelt (remmelt-ellen) on This might be the last AI Safety Camp · 2024-01-25T13:30:23.810Z · LW · GW

The impact assessment was commissioned by AISC, not independent.

This is a valid concern. I have worried about conflicts of interest.

I really wanted the evaluators at Arb to do neutral research, without us organisers getting in the way. Linda and I both emphasised this at an orienting call they invited us too.

From Arb’s side, Gavin deliberately stood back and appointed Sam Holton as the main evaluator, who has no connections with AI Safety Camp. Misha did participate in early editions of the camp though.

All in, this is enough to take the report with a grain of salt. Worth picking apart the analysis and looking for any unsound premises

Comment by Remmelt (remmelt-ellen) on This might be the last AI Safety Camp · 2024-01-25T13:29:47.822Z · LW · GW

Cross-posting reply from EA Forum

Glad you raised these concerns!

I suggest people actually dig themselves for evidence as to whether the program is working.

The first four points you raised seem to rely on prestige or social proof. While those can be good indicators of merit, they are also gameable.

Ie.

  • one program can focus on ensuring they are prestigious (to attract time-strapped alignment mentors and picky grantmakers)
  • another program can decide not to (because they’re not willing to sacrifice other aspects they care about).

If there is one thing you can take away from Linda and I is that we do not focus on acquiring prestige. Even the name “AI Safety Camp” is not prestigious. It sounds kinda like a bootcamp. I prefer the name because it keeps away potential applicants who are in it for the social admiration or influence.

AISC might not make efficient use of mentor / PI time, which is a key goal of MATS and one of the reasons it's been successful.

You are welcome to ask research leads of the current edition.

Note from the Manifund post:

“Resource-efficiency: We are not competing with other programs for scarce mentor time. Instead, we prospect for thoughtful research leads who at some point could become well-recognized researchers.”

All but 2 of the papers listed on Manifund as coming from AISC projects are from 2021 or earlier… Because I'm interested in the current quality in the presence of competing programs, I looked at the two from 2022 or later: this in a second-tier journal and this in a NeurIPS workshop, with no top conference papers.

We also do not focus on getting participants to submit papers to highly selective journals or ML conferences (though not necessarily highly selective for quality of research with regards to preventing AI-induced extinction).

AI Safety Camp is about enabling researchers that are still on the periphery of the community to learn by doing and test their fit for roles in which they can help ensure future AI are safe.

So the way to see the papers that were published is what happened after organisers did not optimise for the publication of papers, and some came out anyway.

Most groundbreaking AI Safety research that people now deem valuable was not originally published in a peer-reviewed journal. I do not think we should aim for prestigious venues now.

I would consider published papers as part of a ‘sanity check’ for evaluating editions after the fact. If the relative number of (weighted) published papers, received grants, and org positions would have gone down for later editions, that would have been concerning. You are welcome to do your own analysis here.

Because there seems to be little direct research…

What do you mean with this claim?

If you mean research outputs, I would suggest not just focussing on peer-reviewed papers but include LessWrong/AF posts as well. Here is an overview of ~50 research outputs from past camps.

Again, AI Safety Camp acts as a training program for people who are often new to the community. The program is not like MATS in that sense.

It is relevant to consider the quality of research thinking coming out of the camp. If you or someone else had the time to look through some of those posts, I’m curious to get your sense.

Why does the founder, Remmelt Ellen, keep posting things described as…

For the record, I’m at best a co-founder. Linda was the first camp’s initiator. Credit to her.

Now on to your point:

If you clicked through Paul’s somewhat hyperbolic comment of “the entire scientific community would probably consider this writing to be crankery” and consider my response, what are your thoughts on whether that response is reasonable or not? Ie. consider whether the response is relevant, soundly premised, and consistently reasoned.

If you really want social proof, consider that the ex-Pentagon engineer whom Paul was reacting to got $170K in funding from SFF and has now discussed the argument in-depth for 6 hours with a long-time research collaborator (Anders Sandberg). If you would ask Anders about the post about causality limits described by a commenter as “stream of consciousness”, Anders could explain to you what the author intended to convey.

Perhaps dismissing a new relevant argument out of hand, particularly if it does not match intuitions and motivations common to our community, is not the best move?

Acknowledging here: I should not have shared some of those linkposts because they were not polished enough and did not do a good job at guiding people through the reasoning about fundamental controllability limits and substrate-needs convergence. That ended up causing more friction. My bad. → Edit: more here

Comment by Remmelt (remmelt-ellen) on The Control Problem: Unsolved or Unsolvable? · 2024-01-16T15:29:49.461Z · LW · GW

Someone read this comment exchange. 

They wrote back that Mitchell's comments cleared up a lot of their confusion. 
They also thought that the assertion that evolutionary pressures will overwhelm any efforts at control seems more asserted than proven.

Here is a longer explanation I gave on why there would be a fundamental inequality:

There is a fundamental inequality. 
Control works through feedback. Evolution works through feedback. But evolution works across a much larger space of effects than can be controlled for. 


Control involves a feedback loop of correction back to detection. Control feedback loops are limited in terms of their capacity to force states in the environment to a certain knowable-to-be-safe subset, because sensing and actuating signals are limited and any computational processing of signals done in between (as modelling, simulating and evaluating outcome effects) is limited. 

Evolution also involves a feedback loop, of whatever propagated environmental effects feed back to be maintaining and/or replicating of the originating components’ configurations. But for evolution, the feedback works across the entire span of physical effects propagating between the machinery’s components and the rest of the environment. 

Evolution works across a much much larger space of possible degrees and directivity in effects than the space of effects that could be conditionalised (ie. forced toward a subset of states) by the machinery’s control signals. 

Meaning evolution cannot be adequately controlled for the machinery not to converge on environmental effects that are/were needed for their (increased) artificial existence, but fall outside the environmental ranges we fragile organic humans could survive under.



If you want to argue against this, you would need to first show that changing forces of evolutionary selection convergent on human-unsafe-effects exhibit a low enough complexity to actually be sufficiently modellable, simulatable and evaluatable inside the machinery’s hardware itself.

Only then could the machinery hypothetically have the capacity to (mitigate and/or) correct harmful evolutionary selection — counteract all that back toward allowable effects/states of the environment.
 

Comment by Remmelt (remmelt-ellen) on Thoughts on responsible scaling policies and regulation · 2024-01-14T16:14:05.048Z · LW · GW

I found myself replying in private conversations on Paul’s arguments, repeatedly over the last months. To a point that I decided to write it up as a fiery comment.

  1. The hardware overhang argument has poor grounding.

Labs scaling models results in more investment in producing more GPU chips with more flops (see Sam Altman’s play for the UAE chip factory) and less latency between (see the EA start-up Fathom Radiant, which started up offering fibre-optic-connected supercomputers for OpenAI and now probably shifted to Anthropic).

The increasing levels of model combinatorial complexity and outside signal connectivity become exponentially harder to keep safe. So the only viable pathway is not scaling that further, rather than “helplessly” take all the hardware that currently gets produced.

Further, AI Impacts found no historical analogues for a hardware overhang. And plenty of common sense reasons why the argument’s premises are unsound.

The hardware overhang claim lacks grounding, but that hasn’t prevented alignment researchers from repeating it in a way that ends up weakening coordination efforts to restrict AI corporations.

  1. Responsible scaling policies have ‘safety-washing’ spelled all over them.

Consider the original formulation by Anthropic: “Our RSP focuses on catastrophic risks – those where an AI model directly causes large scale devastation.”

In other words: our company can scale on as long as our staff/trustees do not deem the risk of a new AI model directly causing a catastrophe as sufficiently high.

Is that responsible?

It’s assuming that further scaling can be risk managed. It’s assuming that just risk management protocols are enough.

Then, the company invents a new wonky risk management framework, ignoring established and more comprehensive practices.

Paul argues that this could be the basis for effective regulation. But Anthropic et al. lobbying national governments to enforce the use of that wonky risk management framework makes things worse.

It distracts from policy efforts to prevent the increasing harms. It creates a perception of safety (instead of actually ensuring safety).

Ideal for AI corporations to keep scaling and circumvent being held accountable.

RSPs support regulatory capture. I want us to become clear about what we are dealing with.

Comment by Remmelt (remmelt-ellen) on Projects I would like to see (possibly at AI Safety Camp) · 2023-12-30T08:33:22.683Z · LW · GW

Still wanted to say:

I appreciate the spirit of this comment.

There are trade-offs here.

If it’s simple or concrete like a toy model, then it is not fully specified. If it is fully specified, then the inferential distance of going through the reasoning steps is large (people get overwhelmed and they opt out).

If it’s formalised, then people need to understand the formal language. Look back at Gödel’s incompleteness theorems, which involved creating a new language and describing a mathematical world people were not familiar with. Actually reading through the original paper would have been a toil for most mathematicians.

There are further bottlenecks, which I won’t get into.

For now, I suggest people who care to understand (because everything we care about is at stake) to read this summary post: https://www.lesswrong.com/posts/xp6n2MG5vQkPpFEBH/the-control-problem-unsolved-or-unsolvable

Anders Sandberg also had an insightful conversation with my research mentor about fundamental controllability limits. I guess the transcript will be posted on this forum somewhere next month.

Again, it’s not simple.

For the effort it takes you to read through and understand parts, please recognise it took a couple of orders of magnitude more effort for me and my collaborators to convey the arguments in a more intuitive digestible form.

Comment by Remmelt (remmelt-ellen) on How "Pause AI" advocacy could be net harmful · 2023-12-30T05:55:21.645Z · LW · GW

I don't think that that first phase of advocacy was net harm, compared to inaction.

It directly contributed to the founding and initial funding of DeepMind, OpenAI and Anthropic.

I think it was net harmful.

Comment by Remmelt (remmelt-ellen) on Funding case: AI Safety Camp · 2023-12-16T10:23:55.397Z · LW · GW

Thank you for sharing, Jonathan. 
Welcoming any comments here (including things that went less well, so we can do better next time!).

Comment by Remmelt (remmelt-ellen) on The convergent dynamic we missed · 2023-12-14T17:46:07.795Z · LW · GW

Thanks for your thoughts
 

and then decide to copy itself onto biological computing or nanobots or whatever else strange options it can think of.

If artificial general intelligence moves to a completely non-artificial substrate at many nested levels of configuration (meaning in this case, a substrate configured like use from the proteins to the cells), then it would not be artificial anymore. 

I am talking about wetware like us, not something made out of standardised components. So these new wetware-based configurations definitely would also not have the general capacities you might think they would have. It's definitely not a copy of the AGI's configurations.

If they are standardised in their configuration (like hardware), the substrate-needs convergence argument above definitely still applies.

The argument is about how general artificial intelligence, as defined, would converge if they continue to exist. I can see how that was not clear from the excerpt, because I did not move over this sentence:
"This is about the introduction of self-sufficient learning machinery, and of all modified versions thereof over time, into the world we humans live in."
 

Intelligent engineering can already observed to work much faster than selection effects.

I get what you are coming from. Next to the speed of the design, maybe look at the *comprehensiveness* of the 'design'.

Something you could consider spending more time thinking about is how natural selection works through the span of all physical interactions between (parts of) the organism and their connected surroundings.  And top-down design does not.

For example, Eliezer brought up before how top-down design of an 'eye' wouldn't have the retina sit back behind all that fleshy stuff that distorts light. A camera was designed much faster by humans. However, does a camera self-heal when it breaks like our eye does? Does a camera clean itself?  And so on – to much fine-grained functional features of the eye.
 


And intelligence itself can be very robust to selection effects. Homomorphic-encryption and checkums

Yesterday,  Anders Sandberg had a deep productive conversation about this with my mentor.

What is missing in your description is that the unidimensionality and simple direct causality of low-level error correction methods (eg. correcting bit flips) cannot be extrapolated to higher-level and more ambiguous abstractions (eg. correcting for viruses running over software, correcting for neural network hallucinations, correcting for interactive effects across automated machine production infrastructure).
 

These fall outside the limits of what the AGI's actual built-in detection and correction methods could control for.
>  Would it?

Yes, because of the inequalities I explained in the longer post you read. I'll leave it to the reader to do their own thinking to understand why.
 

As an alternative, an aligned superintelligent AI...

This is assuming the conclusion. 
If we could actually have an aligned AGI (let's make the distinction), the evolutionary feedback effects cannot be sufficiently controlled for to stay aligned with internal reference values. The longer post explains why.
 

which has taken over the world can simply upload humans so they don't die when the physical conditions become too bad.

Those "emulated humans" based on lossy scans of human brains, etc, wouldn't be human anymore. 
You need to understand the fine-grained biological complexity involved.
 

I expect that an aligned superintelligence can come up with much better solutions than I can.

If you keep repeating the word 'aligned', it does not make it so. Saying it also does not make it less infeasible.
 

If there truly is no way at all for an aligned superintelligence to exist without humans dying, then (as I've mentioned before), it can just notice that and shut itself down…

How about we have a few careful human thinkers presently living, like Anders, actually spend the time to understand the arguments?

How about we not wager all life on Earth on the hope that "AGI" being developed on the basis of corporate competition and other selective forces would necessarily be orienting around understanding the arguments and then acting in a coherently aligned enough way to shut themselves down?
 

… after spending much-less-than-500-years rearranging the world into one that is headed towards a much better direction

I know this sounds just like an intellectual debate, but you're playing with fire.

Comment by Remmelt (remmelt-ellen) on Funding case: AI Safety Camp · 2023-12-12T11:03:22.208Z · LW · GW

Oh yeah, I totally forgot to mention that.

Thank you!

Comment by Remmelt (remmelt-ellen) on Responsible Scaling Policies Are Risk Management Done Wrong · 2023-10-31T10:43:48.666Z · LW · GW

Excellent article. I appreciate how you clarify that Anthropic's "Responsible Scaling Policy" is a set-up that allows for safety-washing. We would be depending on their well-considered good intentions, rather than any mechanism to hold them accountable.

Have you looked into how system safety engineers (eg. medical device engineers) scope the uses of software, such to be able to comprehensively design, test, and assess the safety of the software?

Operational Design Domains scope the use of AI in self-driving cars. Tweeted about that here.
 

Comment by Remmelt (remmelt-ellen) on Projects I would like to see (possibly at AI Safety Camp) · 2023-10-13T11:42:42.612Z · LW · GW

I guess that comes down to whether a future AI can predict or control future innovations of itself indefinitely.

 

That's a key question. You might be interested in this section on limits of controllability.

Clarifying questions:
1. To what extent can AI predict the code they will learn from future unknown inputs, and how that code will subsequently interact with then connected surroundings of the environment?

2. To what extent can AI predict all the (microscopic) modifications that will result from all the future processes involved in the future re-production of hardware components?

Comment by Remmelt (remmelt-ellen) on Against Almost Every Theory of Impact of Interpretability · 2023-10-04T13:26:32.426Z · LW · GW

I personally think pessimistic vs. optimistic misframes it, because it frames a question about the world in terms of personal predispositions.

I would like to see reasoning.

Your reasoning in the comment thread you linked to is: “history is full of cases where people dramatically underestimated the growth of scientific knowledge, and its ability to solve big problems”

That’s a broad reference-class analogy to use. I think it holds little to no weight as to whether there would be sufficient progress on the specific problem of “AGI” staying safe over the long-term.

I wrote why that specifically would not be a solvable problem.

Comment by Remmelt (remmelt-ellen) on Projects I would like to see (possibly at AI Safety Camp) · 2023-09-30T14:28:55.127Z · LW · GW

Thanks for the thoughts! Some critical questions:

Natural selection requires variation. Information theory tells us that all information is subject to noise and therefore variation across time.

Are you considering variations introduced during learning (as essentially changes to code, that can then be copied). Are you consider variations introduced by microscopic changes to the chemical/structural configurations of the maintained/produced hardware?

However, we can reduce error rates to arbitrarily low probabilities using coding schemes.

Claude Shannon showed this to be the case for a single channel of communication. How about when you have many possible routing channels through which physical signals can leak to and back from the environment?

If you look at existing networked system architectures, does the near-zero error rates you can correct toward at the binary level (eg. with use of CRC code) also apply at higher layers of abstraction (eg. in detecting possible trojan horse adversarial attacks)?

If there is no variation then there is no natural selection.

This is true. Can there be no variation introduced into AGI, when they are self-learning code and self-maintaining hardware in ways that continue to be adaptive to changes within a more complex environment?

In abstract terms, evolutionary dynamics require either a smooth adaptive landscape such that incremental changes drive organisms towards adaptive peaks…

Besides point-change mutations, are you taking into account exaptation, as the natural selection for shifts in the expression of previous (learned) functionality? Must exaptation, as involving the reuse of functionality in new ways, involve smooth changes in phenotypic expression?

…and/or unlikely leaps away from local optima into attraction basins of other optima.

Are the other attraction basins instantiated at higher layers of abstraction? Are any other optima approached through selection across the same fine-grained super-dimensional landscape that natural selection is selective across? If not, would natural selection “leak” around those abstraction layers, as not completely being pulled into the attraction basins that are in fact pulling across a greatly reduced set of dimensions? Put a different way, can natural selection pull side-ways on the dimensional pulls of those other attraction basins?

I believe that natural selection requires a population of "agents" competing for resources. If we only had a single AI system then there is no competition and no immediate adaptive pressure.

I get how you would represent it this way, because that’s often how natural selection gets discussed as applying to biological organisms.

It is not quite thorough in terms of describing what can get naturally selected for. For example, within a human body (as an “agent”) there can be natural selection across junk DNA that copies itself across strands, or virus particles, or cancer cells. At that microscopic level though, the term “agent” would lose its meaning if used to describe some molecular strands.

At the macroscopic level of “AGI”, the single vs. multiple agents distinction would break down, for reasons I described here.

Therefore, to thoroughly model this, I would try describe natural selection as occurring across a population of components. Those components would be connected and co-evolving, and can replicate individually (eg. as with viruses replacing other code) or as part of larger packages or symbiotic processes of replication (eg. code with hardware). For AGI, they would all rely on somewhat similar infrastructure (eg. for electricity and material replacement) and also need somewhat similar environmental conditions to operate and reproduce.

Other dynamics will be at play which may drown out natural selection…Other dynamics may be at play that can act against natural selection.

Can the dynamic drown out all possible natural selection over x shortest-length reproduction cycles? Assuming the “AGI” continues to exist, could any dynamics you have in mind drown out any and all interactions between components and surrounding physical contexts that could feed back into their continued/increased existence?

We see existence-proofs of this in immune responses against tumours and cancers. Although these don't work perfectly in the biological world, perhaps an advanced AI could build a type of immune system that effectively prevents individual parts from undergoing runaway self-replication.

Immune system responses were naturally selected for amongst organisms that survived.

Would such responses also be naturally selected for in “advanced AI” such that not the AI but the outside humans survive more? Given that bottom-up natural selection by nature selects for designs across the greatest number of possible physical interactions (is the most comprehensive), can alternate designs built through faster but more narrow top-down engineering actually match or exceed that fine-grained extent of error detection and correction? Even if humans could get “advanced AI” to build in internal error detection and correction mechanisms that are kind of like an immune system, would that outside-imposed immune system withstand natural selection while reducing the host’s rates of survival and reproduction?

~ ~ ~

Curious how you think about those questions. I also passed on your comment to my mentor (Forrest) in case he has any thoughts.

Comment by Remmelt (remmelt-ellen) on Against Almost Every Theory of Impact of Interpretability · 2023-09-27T16:22:18.914Z · LW · GW

not focused enough on forward-chaining to find the avenues of investigation which actually allow useful feedback

Are you mostly looking for where there is useful empirical feedback?  
That sounds like a shot in the dark.
 

Big breakthroughs open up possibilities that are very hard to imagine before those breakthroughs

A concern I have:
I cannot conceptually distinguish these continued empirical investigations of methods to build maybe-aligned AGI, from how medieval researchers tried to build perpetual motion machines. It took sound theory to finally disprove the possibility once and for all that perpetual motion machines were possible.

I agree with Charbel-Raphaël that the push for mechanistic interpretability is in effect promoting the notion that there must be possibilities available here to control potentially very dangerous AIs to stay safe in deployment. It is much easier to spread the perception of safety, than to actually make such systems safe. 

That, while there is no sound theoretical basis for claiming that scaling mechanistic interpretability could form the basis of such a control method. Nor for that any control method could keep "AGI" safe.

Rather, mechint is fundamentally limited in the extent it could be used to safely control AGI. 
See posts:

  1. The limited upside of interpretability by Peter S. Park
  2. Why mechanistic interpretability does not and cannot contribute to long-term AGI safety by me 

Besides theoretical limits, there are plenty of practical arguments (as listed in Charbel-Raphaël's post) for why scaling the utilisation of mechint would be net harmful.

So no rigorous basis for that the use of mechint would "open up possibilities" to long-term safety. 
And plenty of possibilities for corporate marketers – to chime in on mechint's hypothetical big breakthroughs.

In practice, we may help AI labs again – accidentally – to safety-wash their AI products.

Comment by Remmelt (remmelt-ellen) on The Control Problem: Unsolved or Unsolvable? · 2023-09-20T18:00:36.696Z · LW · GW

Great paraphrase!

 

no matter how good their control theory, and their ability to monitor and intervene in the world? 

This. There are fundamental limits to what system-propagated effects the system can control. And the portion of own effects the system can control decreases as the system scales in component complexity.

Yet, any of those effects that feed back into the continued/increased existence of components get selected for. 

So there is a fundamental inequality here. No matter how "intelligent" the system is at pattern-transformation internally, it cannot intervene on all but a tiny portion of (possible) external evolutionary feedback on its constituent components.

Comment by Remmelt (remmelt-ellen) on The Control Problem: Unsolved or Unsolvable? · 2023-09-17T11:05:30.847Z · LW · GW

Your position is that even if today's AI could be given bio-friendly values, AI would still be the doom of biological life in the longer run, because (skipping a lot of details) machine life and biological life have incompatible physical needs, and once machine life exists, darwinian processes will eventually produce machine life that overruns the natural biosphere. (You call this "substrate-needs convergence"

This is a great paraphrase btw.

Comment by Remmelt (remmelt-ellen) on The Control Problem: Unsolved or Unsolvable? · 2023-09-17T08:31:12.227Z · LW · GW

Hello :)

For my part, I agree that pressure from substrate needs is real

Thanks for clarifying your position here.

Can't such an instinct and such a culture resist the pressure from substrate needs, if the AIs value and protect them enough?

No, unfortunately not. To understand why, you would need to understand how “intelligent” processes that necessarily involve the use of measurement and abstraction cannot conditionalise the space of possible interactions between machine components and connected surroundings – sufficiently, to not feed back into causing environmental effects that feed back into the continued or re-assembled existence of the components.

I think your arguments are underestimating what a difference intelligence makes to possible ecological and evolutionary dynamics

I have thought about this, and I know my mentor Forrest has thought about this a lot more.

For learning machinery that re-produce their own components, you will get evolutionary dynamics across the space of interactions that can feed back into the machinery’s assembled existence.

Intelligence has limitations as an internal pattern-transforming process, in that it cannot track nor conditionalise all the outside evolutionary feedback.

Code does not intrinsically know how it got selected for. But code selected through some intelligent learning process can and would get evolutionarily exapted for different functional ends.

Notably, the more information-processing capacity, the more components that information-processing runs through, and the more components that can get evolutionarily selected for.

In this, I am not underestimating the difference that “general intelligence” – as transforming patterns across domains – would make here. Intelligence in machinery that store, copy and distribute code at high-fidelity would greatly amplify evolutionary processes.

I suggest clarifying what you specifically mean with “what a difference intelligence makes”. This so intelligence does not become a kind of “magic” – operating independently of all other processes, capable of obviating all obstacles, including those that result from its being.

superintelligence makes even aeon-long highly artificial stabilizations conceivable - e.g. by the classic engineering method of massively redundant safeguards that all have to fail at once, for something to go wrong

We need to clarify the scope of application of this classic engineering method. Massive redundancy works for complicated systems (like software in aeronautics) under stable enough conditions. There is clarity there around what needs to be kept safe and how it can be kept safe (what needs to error detected and corrected for).

Unfortunately, the problem with “AGI” is that the code and hardware would keep getting reconfigured to function in new complex ways that cannot be contained by the original safeguards. That applies even to learning – the point is to internally integrate patterns from the outside world that were not understood before. So how are you going to have learning machinery anticipate how they will come to function differently once they learned patterns they do not understand / are unable to express yet?

we had someone show up (@spiritus-dei) making almost the exact opposite of your arguments: AI won't ever choose to kill us because, in its current childhood stage, it is materially dependent on us (e.g. for electricity), and then, in its mature and independent form, it will be even better at empathy and compassion than humans are.

Interesting. The second part seems like a claim some people in E/Accel would make.

The response is not that complicated: once the AI is no longer materially dependent on us, there are no longer dynamics of exchange there that would ensure they choose not to kill us. And the author seems to be confusing what lies at the basis of caring for oneself and others – coming to care for involves self-referential dynamics being selected for.

Comment by Remmelt (remmelt-ellen) on The Control Problem: Unsolved or Unsolvable? · 2023-09-14T20:43:38.542Z · LW · GW

In my experience, jumping between counterexamples drawn from current society does not really contribute to inquiry here. Such counterexamples tend to not account for essential parts of the argument that must be reasoned through together. The argument is about self-sufficient learning machinery (not about sacred cows or teaching children).

It would be valuable for me if you could go though the argumentation step-by-step and tell me where a premise seems unsound or there seems to be a reasoning gap.

Now, onto your points.

the first AIs

To reduce ambiguity, suggest replacing with “the first self-sufficient learning machinery”.

simple evolutionary pressure will eventually lead

The mechanism of evolution is simple. However, evolutionary pressure is complex.

Be careful not to equivocate the two. That would be like saying you could predict everything about what a stochastic gradient descent algorithm will select for across parameters selected on the basis of inputs everywhere from the environment.

lead some of their descendants to destroy the biosphere in order to make new parts and create new habitats for themselves.

This part is overall a great paraphrase.

One nitpick: notice how “in order to” either implies or slips in explicit intentionality again. Going by this podcast, Elizabeth Anscombe’s philosophy of intentions described intentions as chains of “in order to” reasoning.

I proposed the situation of cattle in India, as a counterexample to this line of thought.

Regarding sacred cows in India, this sounds neat, but it does not serve as a counterargument. We need to think about evolutionary timelines for organic human lifeforms over millions of years, and Hinduism is ~4000 years old. Also, cows share a mammal ancestor with us, evolving on the basis of the same molecular substrates. Whatever environmental conditions/contexts we humans need, cows almost completely need too.

Crucially however humans evolve to change and maintain environmental conditions also tends to correspond with what conditions cows need (however, human tribes have not been evolutionarily selected for to deal with issues at the scale of eg. climate change). That would not be the case for self-sufficient learning machinery.

Crucially there is a basis for symbiotic relationships of exchange that benefit both the reproduction of cows and humans. That would not be the case between self-sufficient learning machinery and humans.

There is some basis for humans as social mammals to relate with cows. Furthermore, religious cultural memes that sprouted out over a few thousand years also don’t have to be evolutionarily optimal across the board for the reproduction of their hosts (even as religious symbols like of cows do increase that by enabling humans to act collectively). Still, people milk cows in India, and some slaughter and/or export cows there as well. But when humans eat meat, they don’t keep growing beyond adult size. Conversely, some self-sufficient learning machinery sub-population that extract from our society/ecosystem at the cost of our lives can keep doing so to keep scaling in their constituent components (with shifting boundaries of interaction and mutual reproduction).

There is no basis for selection for the expression of collective self-restraint in self-sufficient learning machinery as you describe. Even if there was such a basis, hypothetically, collective self-restraint would need to occur at virtually 100% rates across the population of self-sufficient learning machinery to not end up leading to the deaths of all humans.

~ ~ ~

Again, I find quick dismissive counterexamples unhelpful for digging into the arguments. I have had dozens of conversations on substrate-needs convergence. In the conversations where my conversation partner jumped between quick counterarguments, almost none were prepared to dig into the actual arguments. Hope you understand why I won’t respond to another counterexample.

Comment by Remmelt (remmelt-ellen) on The Control Problem: Unsolved or Unsolvable? · 2023-08-11T09:20:35.903Z · LW · GW

Yes,  AIs haven't evolved to have those features, but the point of alignment research is to give them analogous features by design.

Agreed. 

It's unintuitive to convey this part:

In the abstract, you can picture a network topology of all possible AGI component connections (physical signal interactions). These connections span the space of greater mining/production/supply infrastructure that is maintaining of AGI functional parts.  Also add in the machinery connections with the outside natural world.

Then, picture the nodes and possible connections change over time, as a result of earlier interactions with/in the network.

That network of machinery comes into existence through human engineers, etc, within various institutions selected by market forces etc, implementing blueprints as learning algorithms, hardware set-ups, etc, and tinkering with those until they work.

The question is whether before that network of machinery becomes self-sufficient in their operations, the human engineers, etc, can actually build in constraints into the configured designs, in such a way that once self-modifying (in learning new code and producing new hardware configurations), the changing machinery components are constrained in their propagated effects across their changing potential signal connections over time, such that component-propagated effects do not end up feeding back in ways that (subtly, increasingly) increase the maintained and replicated existence of those configured components in the network.

 

Human beings, both individually and collectively, already provide numerous examples of how dangerous incentives can exist, but can nonetheless be resisted or discouraged.

Humans are not AGI. And there are ways AGI would be categorically unlike humans that are crucial to the question of whether it is possible for AGI to stay safe to humans over the long term.  

Therefore, you cannot swap out "humans" with "AGI" in your reasoning by historical analogy above, and expect your reasoning to stay sound.  This is an equivocation. 

Please see point 7 above.

 

The argument from substrate incentives (3, 7) is complementary to the argument from population, in that it provides a motive for the AIs to come and despoil Earth. 

Maybe it's here you are not tracking the arguments.

These are not substrate "incentives", nor do they provide a "motive".

Small dinosaurs with hair-like projections on their front legs did not have an "incentive" to co-opt the changing functionality of those hair-like projections into feather-like projections for gliding and then for flying. Nor were they provided a "motive" with respect to which they were directed in their internal planning toward growing those feather-like projections. 

That would make the mistake of presuming evolutionary teleology – that there is some complete set of pre-defined or predefinable goals that the lifeform is evolving toward.

I'm deliberate in my choice of words when I write "substrate needs".

 

At best, they are arguments for practical unsolvability, not absolute in-principle logical unsolvability. If they were my arguments, I would say that they show making AI to be unwise, and hubristic, and so on. 

Practical unsolvability would also be enough justification to do everything we can do now to restrict corporate AI development.

I assume you care about this problem, otherwise you wouldn't be here :)  Any ideas / initiatives you are considering to try robustly work with others to restrict further AI development?