Posts
Comments
Even if you know a certain market is a bubble, it's not exactly trivial to exploit if you don't know when it's going to burst, which prices will be affected, and to what degree. "The market can remain irrational longer than you can remain solvent" and all that.
Yes, all of this. I didn’t know how to time this, and also good point that operationalising it in terms of AI stocks to target at what strike price could be tricky too.
If I could get the timing right, this makes sense. But I don’t have much of an edge in judging when the bubble would burst. And put options are expensive.
If someone here wants to make a 1:1 bet over the next three years, I’m happy to take them up on the offer.
If there's less demand from cloud users to rent GPU's Google/Microsoft/Amazon would likely use the GPU's in their datacenters for their own projects (or projects like Antrophic/OpenAI).
That’s a good point. Those big tech companies are probably prepared to pay for the energy use if they have the hardware lying around anyway.
To clarify for future reference, I do think it’s likely (80%+) that at some point over the next 5 years there will be a large reduction in investment in AI and a corresponding market crash in AI company stocks, etc, and that both will continue to be for at least three months.
Update: I now think this is 90%+ likely to happen (from original prediction date).
Looks like I summarised it wrong. It’s not about ionising radiation directly from bombarding ions from outer space. It’s about the interaction of the ions with the Earth’s magnetic field, which as you stated “induced large currents in long transmission lines, overloading the transformers.”.
Here is what Greg Weinstein wrote in a scenario I just found written by him:
In 2013, a report had warned that an extreme geomagnetic storm was almost inevitable, and would induce huge currents in Earth’s transmission lines. This vulnerability could, with a little effort, have been completely addressed for a tiny sum of money — less than a tenth of what the world invested annually in text messaging prior to the great collapse of 2024.
Will correct my mistake in the post now.
There is one question on my mind still is whether and how a weakened Earth magnetic field makes things worse. Would the electromagnetic interactions occur on the whole closer to Earth, therefore causing larger currents in power transmission lines? Does that make any sense?
But it’s weird that I cannot find even a good written summary of Bret’s argument online (I do see lots of political podcasts).
I found an earlier scenario written by Bret that covers just one nuclear power plant failing and that does not discuss the risk of a weakening magnetic field.
This was an interesting read, thank you.
Good question! Will look into it / check more if I have the time.
Ah, thanks! Corrected now
Ah, thank you for correcting. I didn’t realise it could be easily interpreted that way.
Also suggest exploring what it may means we are unable to be able to solve the alignment problem for fully autonomous learning machinery.
There will be a [new AI Safety Camp project](https://docs.google.com/document/d/198HoQA600pttXZA8Awo7IQmYHpyHLT49U-pDHbH3LVI/edit) about formalising a model of AGI uncontainability.
Fixed it! You can use either link now to share with your friends.
Igor Krawzcuk, an AI PhD researcher, just shared more specific predictions:
“I agree with ed that the next months are critical, and that the biggest players need to deliver. I think it will need to be plausible progress towards reasoning, as in planning, as in the type of stuff Prolog, SAT/SMT solvers etc. do.
I'm 80% certain that this literally can't be done efficiently with current LLM/RL techniques (last I looked at neural comb-opt vs solvers, it was bad), the only hope being the kitchen sink of scale, foundation models, solvers and RL … If OpenAI/Anthropic/DeepMind can't deliver on promises of reasoning and planning (Q*, Strawberry, AlphaCode/AlphaProof etc.) in the coming months, or if they try to polish more turds into gold (e.g., coming out with GPT-Reasoner, but only for specific business domains) over the next year, then I would be surprised to see the investments last to make it happen in this AI summer.” https://x.com/TheGermanPole/status/1826179777452994657
To clarify for future reference, I do think it’s likely (80%+) that at some point over the next 5 years there will be a large reduction in investment in AI and a corresponding market crash in AI company stocks, etc, and that both will continue to be for at least three months.
Ie. I think we are heading for an AI winter. It is not sustainable for the industry to invest 600+ billion dollars per year in infrastructure and teams in return for relatively little revenue and no resulting profit for major AI labs.
At the same time, I think that within the next 20 years tech companies could both develop robotics that self-navigate multiple domains and have automated major sectors of physical work. That would put society on a path to causing total extinction of current life on Earth. We should do everything we can to prevent it.
Not necessarily :)
Quite likely OpenAI and/or Anthropic continue to exist but their management would have to overhaul the business (no more freebies?) to curb the rate at which they are burning cash. Their attention would be turned inwards.
In that period, there could be more space for people to step in and advise stronger regulation of AI models. Eg. to enforce liability, privacy, and copyright
Or maybe other opportunities open up. Curious if anyone has any ideas.
What's a good overview of those grounded arguments?
Thanks, appreciating your question. The best overview I managed to write was the control problem post. Still takes quite some reading through to put the different parts of the argument together though.
The report is focussed on preventing harms of technology to people using or affected by that tech.
It uses FDA’s mandate of premarket approval and other processes as examples of what could be used for AI.
Restrictions to economic productivity and innovation is a fair point of discussion. I have my own views on this – generally I think the negative assymetry around new scalable products being able to do massive harm gets neglected by the market. I’m glad the FDA exists to counteract that.
The FDA’s slow response to ramping up COVID vaccines during the pandemic is questionable though, as one example. Getting a sense there is a lot of problems with bureacracy and also industrial capture with FDA.
The report does not focus on that though.
Curious about the 'delay the development' via regulation bit.
What is your sense of what near-term passable regulations would be that are actually enforceable? It's been difficult for large stakeholder groups facing threatening situations to even enforce established international treaties, such as the Geneva convention or the Berne three-step test.
Here are dimensions I've been thinking need to be constrained over time:
- Input bandwidth to models (ie. available training and run-time data, including from sensors).
- Multi-domain work by/through models (ie. preventing an automation race-to-the-bottom)
- Output bandwidth (incl. by having premarket approval for allowable safety-tested uses as happens in other industries).
- Compute bandwidth (through caps/embargos put on already resource-intensive supply chains).
(I'll skip the 'make humans smarter' part, which I worry increases problems around techno-solutionist initiatives we've seen).
Appreciating your thoughtful comment.
It's hard to pin down ambiguity around how much alignment "techniques" make models more "usable", and how much that in turn enables more "scaling". This and the safety-washing concern gets us into messy considerations. Though I generally agree that participants of MATS or AISC programs can cause much less harm through either than researchers working directly on aligning eg. OpenAI's models for release.
Our crux though is about the extent of progress that can be made – on engineering fully autonomous machinery to control* their own effects in line with continued human safety. I agree with you that such a system can be engineered to start off performing more** of the tasks we want it to complete (ie. progress on alignment is possible). At the same time, there are fundamental limits to controllability (ie. progress on alignment is capped).
This is where I think we need more discussion:
- Is the extent of AGI control possible at least more than the extent of control needed
(to prevent eventual convergence on causing human extinction)?
* I use the term "control" in the established control theory sense, consistent with Yampolskiy's definition. Just to avoid confusing people, as the term gets used in more specialised ways in the alignment community (eg. in conversations about the shut-down problem or control agenda).
** This is a rough way of stating it. It's also about the machinery performing fewer of the tasks we wouldn't want the system to complete. And the relevant measure is not as much about the number of preferred tasks performed, as the preferred consequences. Finally, this raises a question about who the 'we' is who can express preferences that the system is to act in line with, and whether coherent alignment with different persons' preferences expressed from within different perceived contexts is even a sound concept.
Generally the way that people solve hard problems is to solve related easy problems first, and this is true even if the technology in question gets much more powerful. Imagine if we had to land rockets on barges before anyone had invented PID controllers and observed their failure modes.
This raises questions about the reference class.
- Does controlling a self-learning (and evolving) system fit in the same reference class as the problems that engineers have “generally” been able to solve (such as moving rockets)?
- Is the notion of “powerful” technologies in the sense of eg. rockets being powerful the same notion as “powerful” in the sense of fully autonomous learning being powerful?
- Based on this, can we rely on the reference class of past “powerful” technologies as an indicator of being able to make incremental progress on making and keeping “AGI” safe?
If we are going to use the term burden of proof, I would suggest the burden of proof is on the people who claim that they could make potentially very dangerous systems safe using any (combination of) techniques.
Let’s also stay mindful that these claims are not being made in a vacuum. Incremental progress on making these models usable for users (which is what a lot of applied ML safety and alignment research comes down to) does enable AI corporations to keep scaling.
that alignment is very very hard
There are also grounded arguments why alignment is unworkable. Ie. that AGI could not control its effects in line with staying safe to humans.
I’ve written about this, and Anders Sandberg is currently working on mathematically formalising an elegant model of AGI uncontainability.
We used to list roles that seemed more tangentially safety-related, but because of our reduced confidence in OpenAI
This misses aspects of what used to be 80k's position:
❝ In fact, we think it can be the best career step for some of our readers to work in labs, even in non-safety roles. That’s the core reason why we list these roles on our job board.
– Benjamin Hilton, February 2024
❝ Top AI labs are high-performing, rapidly growing organisations. In general, one of the best ways to gain career capital is to go and work with any high-performing team — you can just learn a huge amount about getting stuff done. They also have excellent reputations more widely. So you get the credential of saying you’ve worked in a leading lab, and you’ll also gain lots of dynamic, impressive connections.
– Benjamin Hilton, June 2023 - still on website
80k was listing some non-safety related jobs:
– From my email on May 2023:
– From my comment on February 2024:
I haven't shared this post with other relevant parties – my experience has been that private discussion of this sort of thing is more paralyzing than helpful.
Fourteen months ago, I emailed 80k staff with concerns about how they were promoting AGI lab positions on their job board.
The exchange:
- I offered specific reasons and action points.
- 80k staff replied by referring to their website articles about why their position on promoting jobs at OpenAI and Anthropic was broadly justified (plus they removed one job listing).
- Then I pointed out what those articles were specifically missing,
- Then staff stopped responding (except to say they were "considering prioritising additional content on trade-offs").
It was not a meaningful discussion.
Five months ago, I posted my concerns publicly. Again, 80k staff removed one job listing (why did they not double-check before?). Again, staff referred to their website articles as justification to keep promoting OpenAI and Anthropic safety and non-safety roles on their job board. Again, I pointed out what seemed missing or off about their justifications in those articles, with no response from staff.
It took the firing of the entire OpenAI superalignment team before 80k staff "tightened up [their] listings". That is, three years after the first wave of safety researchers left OpenAI.
80k is still listing 33 Anthropic jobs, even as Anthropic has clearly been competing to extend "capabilities" for over a year.
Although the training process, in theory, can be wholly defined by source code, this is generally not practical, because doing so would require releasing (1) the methods used to train the model, (2) all data used to train the model, and (3) so called “training checkpoints” which are snapshots of the state of the model at various points in the training process.
Exactly. Without the data, the model design cannot be trained again, and you end up fine-tuning a black box (the "open weights").
Thanks for writing this.
This answer will sound unsatisfying:
If a mathematician or analytical philosopher wrote a bunch of squiggles on a whiteboard, and said it was a proof, would you recognise it as a proof?
- Say that unfamiliar new analytical language and means of derivation are used (which is not uncommon for impossibility proofs by contradiction, see Gödel's incompleteness theorems and Bell's theorem).
- Say that it directly challenges technologists' beliefs about their capacity to control technology, particularly their capacity to constrain a supposedly "dumb local optimiser": evolutionary selection.
- Say that the reasoning is not only about a formal axiomatic system, but needs to make empirically sound correspondences with how real physical systems work.
- Say that the reasoning is not only about an interesting theoretical puzzle, but has serious implications for how we can and cannot prevent human extinction.
This is high stakes.
We were looking for careful thinkers who had the patience to spend time on understanding the shape of the argument, and how the premises correspond with how things work in reality. Linda and Anders turned out to be two of these people, and we did three long calls so far (first call has an edited transcript).
I wish we could short-cut that process. But if we cannot manage to convey the overall shape of the argument and the premises, then there is no point to moving on to how the reasoning is formalised.
I get that people are busy with their own projects, and want to give their own opinions about what they initially think the argument entails. And, if the time they commit to understanding the argument is not at least 1/5 of the time I spend on conveying the argument specifically to them, then in my experience we usually lack the shared bandwidth needed to work through the argument.
- Saying, "guys, big inferential distance here" did not help. People will expect it to be a short inferential distance anyway.
- Saying it's a complicated argument that takes time to understand did not help. A smart busy researcher did some light reading, tracked down a claim that seemed "obviously" untrue within their mental framework, and thereby confidently dismissed the entire argument. BTW, they're a famous research insider, and we're just outsiders whose response got downvoted – must be wrong right?
- Saying everything in this comment does not help. It's some long-assessed plea for your patience.
If I'm so confident about the conclusion, why am I not passing you the proof clean and clear now?!
Feel free to downvote this comment and move on.
Here is my best attempt at summarising the argument intuitively and precisely, still prompting some misinterpretations by well-meaning commenters. I feel appreciation for people who realised what is at stake, and were therefore willing to continue syncing up on the premises and reasoning, as Will did:
The core claim is not what I thought it was when I first read the above sources and I notice that my skepticism has decreased as I have come to better understand the nature of the argument.
would anything like SNC apply if tech labs were somehow using bioengineering to create creatures to perform the kinds of tasks that would be done by advanced AI?
In that case, substrate-needs convergence would not apply, or only apply to a limited extent.
There is still a concern about what those bio-engineered creatures, used in practice as slaves to automate our intellectual and physical work, would bring about over the long-term.
If there is a successful attempt by them to ‘upload’ their cognition onto networked machinery, then we’re stuck with the substrate-needs convergence problem again.
Also, on the workforce, there are cases where, they were traumatized psychologically and compensated meagerly, like in Kenya. How could that be dealt with?
We need funding to support data workers, engineers, and other workers exploited or misled by AI corporations to unionise, strike, and whistleblow.
The AI data workers in Kenya started a union, and there is a direct way of supporting targeted action by them. Other workers' organisations are coordinating legal actions and lobbying too. On seriously limited budgets.
I'm just waiting for a funder to reach out and listen carefully to what their theories of change are.
The premise is based on alignment not being enough, so I operate on the premise of an aligned ASI, since the central claim is that "even if we align ASI it may still go wrong".
I can see how you and Forrest ended up talking past each other here. Honestly, I also felt Forrest's explanation was hard to track. It takes some unpacking.
My interpretation is that you two used different notions of alignment... Something like:
- Functional goal-directed alignment: "the machinery's functionality is directed toward actualising some specified goals (in line with preferences expressed in-context by humans), for certain contexts the machinery is operating/processing within"
vs. - Comprehensive needs-based alignment: "the machinery acts in comprehensive care for whatever all surrounding humans need to live, and their future selves/offsprings need to live, over whatever contexts the machinery and the humans might find themselves".
Forrest seems to agree that (1.) is possible to built initially into the machinery, but has reasons to think that (2.) is actually physically intractable.
This is because (1.) only requires localised consistency with respect to specified goals, whereas (2.) requires "completeness" in the machinery's components acting in care for human existence, wherever either may find themselves.
So here is the crux:
- You can see how (1.) still allows for goal mispecification and misgeneralisation. And the machinery can be simultaneously directed toward other outcomes, as long as those outcomes are not yet (found to be, or corrected as being) inconsistent with internal specified goals.
- Whereas (2.) if it were physically tractable, would contradict the substrate-needs convergence argument.
When you wrote "suppose a villager cares a whole lot about the people in his village...and routinely works to protect them" that came across as taking something like (2.) as a premise.
Specifically, "cares a whole lot about the people" is a claim that implies that the care is for the people in and of themselves, regardless of the context they each might (be imagined to) be interacting in. Also, "routinely works to protect them" to me implies a robustness of functioning in ways that are actually caring for the humans (ie. no predominating potential for negative side-effects).
That could be why Forrest replied with "How is this not assuming what you want to prove?"
Some reasons:
- Directedness toward specified outcomes some humans want does not imply actual comprehensiveness of care for human needs. The machinery can still cause all sorts of negative side-effects not tracked and/or corrected for by internal control processes.
- Even if the machinery is consistently directed toward specified outcomes from within certain contexts, the machinery can simultaneously be directed toward other outcomes as well. Likewise, learning directedness toward human-preferred outcomes can also happen simultaneously with learning instrumental behaviour toward self-maintenance, as well as more comprehensive evolutionary selection for individual connected components that persist (for longer/as more).
- There is no way to assure that some significant (unanticipated) changes will not lead to a break-off from past directed behaviour, where other directed behaviour starts to dominate.
- Eg. when the "generator functions" that translate abstract goals into detailed implementations within new contexts start to dysfunction – ie. diverge from what the humans want/would have wanted.
- Eg. where the machinery learns that it cannot continue to consistently enact the goal of future human existence.
- Eg. once undetected bottom-up evolutionary changes across the population of components have taken over internal control processes.
- Before the machinery discovers any actionable "cannot stay safe to humans" result, internal takeover through substrate-needs (or instrumental) convergence could already have removed the machinery's capacity to implement an across-the-board shut-down.
- Even if the machinery does discover the result before convergent takeover, and assuming that "shut-down-if-future-self-dangerous" was originally programmed in, we cannot rely on the machinery to still be consistently implementing that goal. This because of later selection for/learning of other outcome-directed behaviour, and because the (changed) machinery components could dysfunction in this novel context.
To wrap it up:
The kind of "alignment" that is workable for ASI with respect to humans is super fragile.
We cannot rely on ASI implementing a shut-down upon discovery.
Is this clarifying? Sorry about the wall of text. I want to make sure I'm being precise enough.
I agree that point 5 is the main crux:
The amount of control necessary for an ASI to preserve goal-directed subsystems against the constant push of evolutionary forces is strictly greater than the maximum degree of control available to any system of any type.
To answer it takes careful reasoning. Here's my take on it:
- We need to examine the degree to which there would be necessarily changes to the connected functional components constituting self-sufficient learning machinery (as including ASI)
- Changes by learning/receiving code through environmental inputs, and through introduced changes in assembled molecular/physical configurations (of the hardware).
- Necessary in the sense of "must change to adapt (such to continue to exist as self-sufficient learning machinery)," or "must change because of the nature of being in physical interactions (with/in the environment over time)."
- We need to examine how changes to the connected functional components result in shifts in actual functionality (in terms of how the functional components receive input signals and process those into output signals that propagate as effects across surrounding contexts of the environment).
- We need to examine the span of evolutionary selection (covering effects that in their degrees/directivity feed back into the maintained/increased existence of any functional component).
- We need to examine the span of control-based selection (the span covering detectable, modellable simulatable, evaluatable, and correctable effects).
Actually, looks like there is a thirteenth lawsuit that was filed outside the US.
A class-action privacy lawsuit filed in Israel back in April 2023.
Wondering if this is still ongoing: https://www.einpresswire.com/article/630376275/first-class-action-lawsuit-against-openai-the-district-court-in-israel-approved-suing-openai-in-a-class-action-lawsuit
That's an important consideration. Good to dig into.
I think there are many instances of humans, flawed and limited though we are, managing to operate systems with a very low failure rate.
Agreed. Engineers are able to make very complicated systems function with very low failure rates.
Given the extreme risks we're facing, I'd want to check whether that claim also translates to 'AGI'.
- Does how we are able to manage current software and hardware systems to operate correspond soundly with how self-learning and self-maintaining machinery ('AGI') control how their components operate?
- Given 'AGI' that no longer need humans to continue to operate and maintain own functional components over time, would the 'AGI' end up operating in ways that are categorically different from how our current software-hardware stacks operate?
- Given that we can manage to operate current relatively static systems to have very low failure rates for the short-term failure scenarios we have identified, does this imply that the effects of introducing 'AGI' into our environment could also be controlled to have a very low aggregate failure rate – over the long term across all physically possible (combinations of) failures leading to human extinction?
to spend extra resources on backup systems and safety, such that small errors get actively cancelled out rather than compounding.
This gets right into the topic of the conversation with Anders Sandberg. I suggest giving that a read!
Errors can be corrected out with high confidence (consistency) at the bit level. Backups and redundancy also work well in eg. aeronautics, where the code base itself is not constantly changing.
- How does the application of error correction change at larger scales?
- How completely can possible errors be defined and corrected for at the scale of, for instance:
- software running on a server?
- a large neural network running on top of the server software?
- an entire machine-automated economy?
- Do backups work when the runtime code keeps changing (as learned from new inputs), and hardware configurations can also subtly change (through physical assembly processes)?
Since intelligence is explicitly the thing which is necessary to deliberately create and maintain such protections, I would expect control to be easier for an ASI.
It is true that 'intelligence' affords more capacity to control environmental effects.
Noticing too that the more 'intelligence,' the more information-processing components. And that the more information-processing components added, the exponentially more degrees of freedom of interaction those and other functional components can have with each other and with connected environmental contexts.
Here is a nitty-gritty walk-through in case useful for clarifying components' degrees of freedom.
I disagree that small errors necessarily compound until reaching a threshold of functional failure.
For this claim to be true, the following has to be true:
a. There is no concurrent process that selects for "functional errors" as convergent on "functional failure" (failure in the sense that the machinery fails to function safely enough for humans to exist in the environment, rather than that the machinery fails to continue to operate).
Unfortunately, in the case of 'AGI', there are two convergent processes we know about:
- Instrumental convergence, resulting from internal optimization:
code components being optimized for (an expanding set of) explicit goals.
- Substrate-needs convergence, resulting from external selection:
all components being selected for (an expanding set of) implicit needs.
Or else – where there is indeed selective pressure convergent on "functional failure" – then the following must be true for the quoted claim to hold:
b. The various errors introduced into and selected for in the machinery over time could be detected and corrected for comprehensively and fast enough (by any built-in control method) to prevent later "functional failure" from occurring.
This took a while for me to get into (the jumps from “energy” to “metabolic process” to “economic exchange” were very fast).
I think I’m tracking it now.
It’s about metabolic differences as in differences in how energy is acquired and processed from the environment (and also the use of a different “alphabet” of atoms available for assembling the machinery).
Forrest clarified further in response to someone’s question here:
https://mflb.com/ai_alignment_1/d_240301_114457_inexorable_truths_gen.html
Note:
Even if you are focussed on long-term risks, you can still whistleblow on eggregious harms caused by these AI labs right now. Providing this evidence enables legal efforts to restrict these labs.
Whistleblowing is not going to solve the entire societal governance problem, but it will enable others to act on the information you provided.
It is much better than following along until we reached the edge of the cliff.
Are you thinking of blowing the whistle on something in between work on AGI and getting close to actually achieving it?
Good question.
Yes, this is how I am thinking about it.
I don't want to wait until competing AI corporations become really good at automating work in profitable ways, also because by then their market and political power would be entrenched. I want society to be well-aware way before then that the AI corporations are acting recklessly, and should be restricted.
We need a bigger safety margin. Waiting until corporate machinery is able to operate autonomously would leave us almost no remaining safety margin.
There are already increasing harms, and a whistleblower can bring those harms to the surface. That in turn supports civil lawsuits, criminal investigations, and/or regulator actions.
Harms that fall roughly in these categories – from most directly traceable to least directly traceable:
- Data laundering (what personal, copyrighted and illegal data is being copied and collected en masse without our consent).
- Worker dehumanisation (the algorithmic exploitation of gig workers; the shoddy automation of people's jobs; the criminal conduct of lab CEOs)
- Unsafe uses (everything from untested uses in hospitals and schools, to mass disinformation and deepfakes, to hackability and covered-up adversarial attacks, to automating crime and the kill cloud, to knowingly building dangerous designs).
- Environmental pollution (research investigations of data centers, fab labs, and so on)
For example:
- If an engineer revealed authors' works in the datasets of ChatGPT, Claude, Gemini or Llama that would give publishers and creative guilds the evidence they need to ramp up lawsuits against the respective corporations (to the tens or hundreds).
- Or if it turned out that the companies collected known child sexual abuse materials (as OpenAI probably did, and a collaborator of mine revealed for StabilityAI and MidJourney).
- If the criminal conduct of the CEO of an AI corporation was revealed
- Eg. it turned out that there is a string of sexual predation/assault in leadership circles of OpenAI/CodePilot/Microsoft.
- Or it turned out that Satya Nadella managed a refund scam company in his spare time.
- If managers were aware of the misuses of their technology, eg. in healthcare, at schools, or in warfare, but chose to keep quiet about it.
Revealing illegal data laundering is actually the most direct, and would cause immediate uproar.
The rest is harder and more context-dependent. I don't think we're at the stage where environmental pollution is that notable (vs. the fossil fuel industry at large), and investigating it across AI hardware operation and production chains would take a lot of diligent research as an inside staff member.
Someone shared the joke: "Remember the Milgram experiment, where they found out that everybody but us would press the button?"
My response: Right! Expect AGI lab employees to follow instructions, because of…
- deference to authority
- incremental worsening (boiling frog problem)
- peer proof (“everyone else is doing it”)
- escalation of commitment
Good to hear!
You can literally have a bunch of engineers and researchers believe that their company is contributing to AI extinction risk, yet still go with the flow.
They might even think they’re improving things at the margin. Or they have doubts, but all their colleagues seem to be going on as usual.
In this sense, we’re dealing with the problems of having that corporate command structure in place that takes in the loyal, and persuades them to do useful work (useful in the eyes of power-and-social-recognition-obsessed leadership).
I appreciate this comment.
Be careful though that we’re not just dealing with a group of people here.
We’re dealing with artificial structures (ie. corporations) that take in and fire human workers as they compete for profit. With the most power-hungry workers tending to find their way to the top of those hierarchical structures.
When someone is risking the future of the entire human race, we'll see whistleblowers give up their jobs and risk their freedom and fortune to take action.
There are already AGI lab leaders that are risking the future of the entire human race.
Plenty of consensus to be found on that.
So why no whistleblowing?
If you’re smart and specialised in researching capability risks, it would not be that surprising if you come up with new feasible mechanisms that others were not aware of.
That’s my opinion on this.
Capabilities people may have more opportunities to call out risks, both internally and externally (whistleblowing).
I would like to see this. I am not yet aware of a researcher deciding to whistleblow on the AGI lab they work at.
If you are, please meet with an attorney in person first, and preferably get advice from an experienced whistleblower to discuss preserving anonymity – I can put you through: remmelt.ellen[a|}protonmail{d07]com
There’s so much that could be disclosed that would help bring about injunctions against AGI labs.
Even knowing what copyrighted data is in the datasets would be a boon for lawsuits.
[cross-posted replies from EA Forum]
Ben, it is very questionable that 80k is promoting non-safety roles at AGI labs as 'career steps'.
Consider that your model of this situation may be wrong (account for model error).
- The upside is that you enabled some people to skill up and gain connections.
- The downside is that you are literally helping AGI labs to scale commercially (as well as indirectly supporting capability research).
I did read that compilation of advice, and responded to that in an email (16 May 2023):
"Dear [a],
People will drop in and look at job profiles without reading your other materials on the website. I'd suggest just writing a do-your-research cautionary line about OpenAI and Anthropic in the job descriptions itself.
Also suggest reviewing whether to trust advice on whether to take jobs that contribute to capability research.
- Particularly advice by nerdy researchers paid/funded by corporate tech.
- Particularly by computer-minded researchers who might not be aware of the limitations of developing complicated control mechanisms to contain complex machine-environment feedback loops.
Totally up to you of course.
Warm regards,
Remmelt"
We argue for this position extensively in my article on the topic
This is what the article says:
"All that said, we think it’s crucial to take an enormous amount of care before working at an organisation that might be a huge force for harm. Overall, it’s complicated to assess whether it’s good to work at a leading AI lab — and it’ll vary from person to person, and role to role."
So you are saying that people are making a decision about working for an AGI lab that might be (or actually is) a huge force for harm. And that whether it's good (or bad) to work at an AGI lab depends on the person – ie. people need to figure this out for them personally.
Yet you are openly advertising various jobs at AGI labs on the job board. People are clicking through and applying. Do you know how many read your article beforehand?
~ ~ ~
Even if they did read through the article, both the content and framing of the advice seems misguided. Noticing what is emphasised in your considerations.
Here are the first sentences of each consideration section:
(ie. as what readers are most likely to read, and what you might most want to convey).
- "We think that a leading — but careful — AI project could be a huge force for good, and crucial to preventing an AI-related catastrophe."
- Is this your opinion about DeepMind, OpenAI and Anthropic?
- Is this your opinion about DeepMind, OpenAI and Anthropic?
- "Top AI labs are high-performing, rapidly growing organisations. In general, one of the best ways to gain career capital is to go and work with any high-performing team — you can just learn a huge amount about getting stuff done. They also have excellent reputations more widely. So you get the credential of saying you’ve worked in a leading lab, and you’ll also gain lots of dynamic, impressive connections."
- Is this focussing on gaining prestige and (nepotistic) connections as an instrumental power move, with the hope of improving things later...?
- Instead of on actually improving safety?
- "We’d guess that, all else equal, we’d prefer that progress on AI capabilities was slower."
- Why is only this part stated as a guess?
- I did not read "we'd guess that a leading but careful AI project, all else equal, could be a force of good".
- Or inversely: "we think that continued scaling of AI capabilities could be a huge force of harm."
- Notice how those framings come across very differently.
- Wait, reading this section further is blowing my mind.
- "But that’s not necessarily the case. There are reasons to think that advancing at least some kinds of AI capabilities could be beneficial. Here are a few"
- "This distinction between ‘capabilities’ research and ‘safety’ research is extremely fuzzy, and we have a somewhat poor track record of predicting which areas of research will be beneficial for safety work in the future. This suggests that work that advances some (and perhaps many) kinds of capabilities faster may be useful for reducing risks."
- Did you just argue for working on some capabilities because it might improve safety? This is blowing my mind.
- "Moving faster could reduce the risk that AI projects that are less cautious than the existing ones can enter the field."
- Are you saying we should consider moving faster because there are people less cautious than us?
- Do you notice how a similarly flavoured argument can be used by and is probably being used by staff at three leading AGI labs that are all competing with each other?
- Did OpenAI moving fast with ChatGPT prevent Google from starting new AI projects?
- "It’s possible that the later we develop transformative AI, the faster (and therefore more dangerously) everything will play out, because other currently-constraining factors (like the amount of compute available in the world) could continue to grow independently of technical progress."
- How would compute grow independently of AI corporations deciding to scale up capability?
- The AGI labs were buying up GPUs to the point of shortage. Nvidia was not able to supply them fast enough. How is that not getting Nvidia and other producers to increase production of GPUs?
- More comments on the hardware overhang argument here.
- "Lots of work that makes models more useful — and so could be classified as capabilities (for example, work to align existing large language models) — probably does so without increasing the risk of danger"
- What is this claim based on?
- What is this claim based on?
- Why is only this part stated as a guess?
- "As far as we can tell, there are many roles at leading AI labs where the primary effects of the roles could be to reduce risks."
- As far as I can tell, this is not the case.
- For technical research roles, you can go by what I just posted.
- For policy, I note that you wrote the following:
"Labs also often don’t have enough staff... to figure out what they should be lobbying governments for (we’d guess that many of the top labs would lobby for things that reduce existential risks)."- I guess that AI corporations use lobbyists for lobbying to open up markets for profit, and to not get actually restricted by regulations (maybe to move focus to somewhere hypothetically in the future, maybe to remove upstart competitors who can't deal with the extra compliance overhead, but don't restrict us now!).
- On prior, that is what you should expect, because that is what tech corporations do everywhere. We shouldn't expect on prior that AI corporations are benevolent entities that are not shaped by the forces of competition. That would be naive.
- As far as I can tell, this is not the case.
~ ~ ~
After that, there is a new section titled "How can you mitigate the downsides of this option?"
- That section reads as thoughtful and reasonable.
- How about on the job board, you link to that section in each AGI lab job description listed, just above the 'VIEW JOB DETAILS' button?
- For example, you could append and hyperlink 'Suggestions for mitigating downsides' to the organisational descriptions of Google DeepMind, OpenAI and Anthropic.
- That would help guide through potential applicants to AGI lab positions to think through their decision.
80k removed one of the positions I flagged: Software Engineer, Full-Stack, Human Data Team (reason given: it looked potentially more capabilities-focused than the original job posting that came into their system).
For the rest, little has changed:
- 80k still lists jobs that help AGI labs scale commercially,
- Jobs with similar names:
research engineer product, prompt engineer, IT support, senior software engineer.
- Jobs with similar names:
- 80k still describes these jobs as "Handpicked to help you tackle the world's most pressing problems with your career."
- 80k still describes Anthropic as "an Al safety and research company that's working to build reliable, interpretable, and steerable Al systems".
- 80k staff still have not accounted for that >50% of their broad audience checking 80k's handpicked jobs are not much aware of the potential issues of working at an AGI lab.
- Readers there don't get informed. They get to click on the button 'VIEW JOB DETAILS' , taking them straight to the job page. From there, they can apply and join the lab unprepared.
- Readers there don't get informed. They get to click on the button 'VIEW JOB DETAILS' , taking them straight to the job page. From there, they can apply and join the lab unprepared.
Two others in AI Safety also discovered the questionable job listings. They are disappointed in 80k.
Feeling exasperated about this. Thinking of putting out another post just to discuss this issue.
Their question was also responding to my concerns on how 80,000 Hours handpicks jobs at AGI labs.
Some of those advertised jobs don't even focus on safety – instead they look like policy lobbying roles or engineering support roles.
Nine months ago, I wrote this email to 80k staff:
Hi [x, y, z]
I noticed the job board lists positions at OpenAI and AnthropicAI under the AI Safety category:
Not sure whom to contact, so I wanted to share these concerns with each of you:
- Capability races
- OpenAI's push for scaling the size and applications of transformer-network-based models has led Google and others to copy and compete with them.
- Anthropic now seems on a similar trajectory.
- By default, these should not be organisations supported by AI safety advisers with a security mindset.
- No warning
- Job applicants are not warned of the risky past behaviour by OpenAI and Anthropic. Given that 80K markets to a broader audience, I would not be surprised if 50%+ are not much aware of the history. The subjective impression I get is that taking the role will help improve AI safety and policy work.
- At the top of the job board, positions are described as "Handpicked to help you tackle the world's most pressing problems with your career."
- If anything, "About this organisation" makes the companies look more comprehensively careful about safety than they really have acted like:
- "Anthropic is an AI safety and research company that’s working to build reliable, interpretable, and steerable AI systems."
- "OpenAI is an AI research and deployment company, with roles working on AI alignment & safety."
- It is understandable that people aspiring for AI safety & policy careers are not much aware, and therefore should be warned.
- However, 80K staff should be tracking the harmful race dynamics and careless deployment of systems by OpenAI, and now Anthropic.
- The departure of OpenAI's safety researchers was widely known, and we have all been tracking the hype cycles around ChatGPT.
- Various core people in the AI Safety community have mentioned concerns about Anthropic.
- Oliver Habryka mentions this as part of the reasoning for shutting down the LightCone offices:
- I feel quite worried that the alignment plan of Anthropic currently basically boils down to "we are the good guys, and by doing a lot of capabilities research we will have a seat at the table when AI gets really dangerous, and then we will just be better/more-careful/more-reasonable than the existing people, and that will somehow make the difference between AI going well and going badly". That plan isn't inherently doomed, but man does it rely on trusting Anthropic's leadership, and I genuinely only have marginally better ability to distinguish the moral character of Anthropic's leadership from the moral character of FTX's leadership, and in the absence of that trust the only thing we are doing with Anthropic is adding another player to an AI arms race.
- More broadly, I think AI Alignment ideas/the EA community/the rationality community played a pretty substantial role in the founding of the three leading AGI labs (Deepmind, OpenAI, Anthropic), and man, I sure would feel better about a world where none of these would exist, though I also feel quite uncertain here. But it does sure feel like we had a quite large counterfactual effect on AI timelines.
- Not safety focussed
- Some jobs seem far removed from positions of researching (or advising on restricting) the increasing harms of AI-system scaling.
- For OpenAI:
- IT Engineer, Support: "The IT team supports Mac endpoints, their management tools, local network, and AV infrastructure"
- Software Engineer, Full-Stack: "to build and deploy powerful AI systems and products that can perform previously impossible tasks and achieve unprecedented levels of performance."
- For Anthropic:
- Technical Product Manager: "Rapidly prototype different products and services to learn how generative models can help solve real problems for users."
- Prompt Engineer and Librarian: "Discover, test, and document best practices for a wide range of tasks relevant to our customers."
- Align-washing
- Even if an accepted job applicant get to be in a position of advising on and restricting harmful failure modes, how do you trade this off against:
- the potentially large marginal relative difference in skills of top engineering candidates you sent OpenAI's and Anthropic's way, and are accepted to do work for scaling their technology stack?
- how these R&D labs will use the alignment work to market the impression that they are safety-conscious, to:
- avoid harder safety mandates (eg. document their copyrights-infringing data, don't allow API developers to deploy spaghetti code all over the place)?
- attract other talented idealistic engineers and researchers?
- and so on?
I'm confused and, to be honest, shocked that these positions are still listed for R&D labs heavily invested in scaling AI system capabilities (without commensurate care for the exponential increase in the number of security gaps and ways to break our complex society and supporting ecosystem that opens up).I think this is pretty damn bad.
Preferably, we can handle this privately and not make it bigger. If you can come back on these concerns in the next two weeks, I would very much appreciate that.
If not, or not sufficiently addressed, I hope you understand that I will share these concerns in public.
Warm regards,
Remmelt
Someone asked:
“Why would having [the roles] be filled by someone in EA be worse than a non EA person? can you spell this out for me? I.e. are EA people more capable? would it be better to have less competent people in such roles? not clear to me that would be better”
Here was my response:
So I was thinking about this.
Considering this as an individual decision only can be limiting. Even 80k staff have acknowledged that sometimes you need a community to make progress on something.
For similar reasons, protests work better if there are multiple people showing up.
What would happen if 80k and other EA organisations stopped recommending positions at AGI labs and actually honestly point out that work at these labs turned out to be bad – because it has turned out the labs have defected on their end of the bargain and don’t care enough about getting safety right..?
It would make an entire community of people become aware that we may need to actively start restricting this harmful work. Instead, what we’ve been seeing is EA orgs singing praise for AGI lab leaders for years, and 80k still recommending talented idealistic people join AGI labs. I’d rather see less talented sketchy-looking people join the AGI labs.
I would rather see everyone in the AI Safety to become more clear to each other and to the public that we are not condoning harmful automation races to the bottom. We’re not condoning work at these AGI labs and we are no longer giving our endorsement to it.
Good question, but I want to keep this anonymous.
I can only say I heard it from one person who said they heard it from another person connected to people at DeepMind.
If anyone else has connections with safety researchers at DeepMind, please do ask them to check.
And post here if you can! Good to verify whether or not this claim is true.
Sure. Keep in mind that as an organiser, you are setting the original framing.
e.g. how breakthroughs in machine unlearning enable a greater Right To Be Forgotten by AI models
This is the wrong path to take, ignoring actual legal implications.
Copying copyrighted data into commercialised datasets without permission is against copyright law (both the spirit and literal interpretations of Berne three step-test)
Copying personal data into datasets without adhering to right to access + erasure violates GDPR, CCPA, etc.
If you want support AI corporations to keep scaling though, this is the right path to take.
This is an incisive description, and I agree.