Ways I Expect AI Regulation To Increase Extinction Risk

1a3orn

Ways I Expect AI Regulation To Increase Extinction Risk

post by 1a3orn · 2023-07-04T17:32:48.047Z · LW · GW · 32 comments

  (a) Misdirected Regulations Reduce Effective Safety Effort; Regulations Will Almost Certainly Be Misdirected
  (b) Regulations Generally Favor The Legible-To-The-State
  (c) Heavy Regulations Can Simply Disempower the Regulator
  (d) Regulations Are Likely To Maximize The Power of Companies Pushing Forward Capabilities the Most
None
32 comments

The following are some very short stories about some of the ways that I expect AI regulation to make x-risk worse.

I don't think that each of these stories will happen. Some of them are mutually exclusive, by design. But, conditional upon AI heavy regulations passing, I'd strongly expect the dynamics pointed at in more than one of them to happen.

Pointing out potential costs of different regulations is a necessary part of actual deliberation about regulation; like all policy debates, AI policy debates should not appear one-sided [LW · GW]. If someone proposes an AI policy, and doesn't actively try to consider potential downsides -- or treats people who bring up these downsides as an opponent [? · GW] -- they are probably doing something closer to propaganda than actual policy work.

All these stories have excessive detail; these are archetypes meant to illustrate patterns more than they are predictions. Before mentioning one fix that would stop a problem, consider whether it would be subject to similar problems in the same class.

(a) Misdirected Regulations Reduce Effective Safety Effort; Regulations Will Almost Certainly Be Misdirected

"Alice, have you gotten the new AI model past the AMSA [AI and ML Safety Administration] yet?" said Alex, Alice's manager. "We need their approval of the pre-deployment risk-assessment to launch."

"I'm still in correspondence with them," said Alice, wincing. "The model is a little more faceblind than v4 -- particularly in the case of people with partially shaved heads, facial tattoos, or heavy piercings -- so that runs into the equity clauses and so on. They won't let us deploy without fixing this."

"Sheesh," said Alex. "Do we know why the model is like this?"

Alice shrugged.

"Well, getting this model past the AMSA is your number one priority," Alex said. "Check if you can do some model surgery to patch in the facial recognition from v4. It should still be compatible."

"Alex," said Alice. "I'm not really worried about the face-blindness. It's a regression, but it should have minimal impact."

"I mean, I agree," grinned Alex, "but don't put that in an email."

"But," Alice continued, hesitantly, "I'm really concerned about some of the capabilities of the model, which I don't think we've explored completely. I know that it passed the standard safety checks, and I know I've explored some of its capabilities, but I think this is much much more important than -- "

"Alice," interrupted Alex. "We've gone over this before. You're on the Alignment and Safety Team. That means part of your job is to make sure the model passes AMSA standards. That's what the Alignment and Safety team does."

"But the AMSA regulations went through Congress before Ilya's Abductor was even invented!" Alice said. "I know the regs are out of date; you know the regs out of date; everyone knows they're out of date, but they haven't even begun to change yet!--"

"Alice," interrupted Alex again. "Sorry for interrupting. And look, I'm really sorry about the situation in general. But you've spent two whole months on this already. I let you bring in Aaron on the problem, and he spent a whole month. Neither of you have found anything dangerous."

He continued: "I've shielded you from pressure from my boss for a while now. He was understanding the first three weeks, impatient the next three weeks, and has been increasingly irritated since. I wish we had more time for you, but we can't just ignore the AMSA -- it's the law. I need you to work on this and only this, or you're fired."

I expect actual instances of misdirected safety effort from safety laws to be far more universal, and only moderately more hidden, than is indicated in this dialogue.

If you think this is unlikely, consider IRBs, and consider that essentially there are nearly as many theories about what AI safety should look like as there are AI safety theorists.

(b) Regulations Generally Favor The Legible-To-The-State

"Our request for compute has been denied by the AMSA. Again." said Barbara. "No actual explanation for why, of course."

"Fuck," said Brian. "Fuck. Why!? We want to do a zero-risk interpretability experiment. Why on earth are OpenMind and DeepAI -- and Lockheed Martin, and Palantir, and Anduril -- getting their requests for huge GPU runs approved, and not us?"

Barbara looked at him blankly.

"Is that a rhetorical question?" she said.

"No!"

"I mean... probably because the regulators see those entities as national assets, and not us. Probably because the regulations were specifically written to fit the kind of requests for compute that such organizations would tend to make. Like they just fill out a form on the web with most of the fields pre-populated. And probably because these organizations have long-standing relationships with the government," she said.

"But the bureaucracy was brought into existence to help fight x-risk, which means funding interpretability!" said Brian. "But their access to compute was scarcely curtailed!"

Barbara shrugged, her mouth quirking.

"You wouldn't make that kind of mistake about an AI, you know," she said.

"The mistake -- oh, thinking that it will act for a specific goal because it was made to accomplish that goal."

Brian deflated.

Barbara spoke: "The AMSA bureaucracy came into existence because of a wide-spread political coalition with mutually incompatible goals. Many of the coalition's goals were contradictory, or impossible, or otherwise dumb, which is typical for politics. You shouldn't expect that kind of an entity to have coherent ends that it acts to execute. You shouldn't expect it to even have goals; it has procedures. Which we don't fit into particularly well, and with which we are now stuck."

"But we should be able to change it now," Brian said, without conviction.

"Government's aren't exactly corrigible either," Barabara sighed. "I'll help you with the fourth draft."

Consider that the National Environmental Protection Act as enforced has exemptions to speed oil and gas projects but not for many similar renewable energy projects.

(c) Heavy Regulations Can Simply Disempower the Regulator

Following Christopher Nolan's surprising smash hit, Diamondoid Bacteria -- which ends with a montage of everyone on Earth falling over dead in the same five seconds, after an AI has infected them with a nanotechnological virus -- the US passes unprecedented national anti-AI regulation.

The atmosphere of fear demands intense action; so the regulation is deeply conservative and in every case of ambiguity bans things rather than permits them. Anthropic, OpenAI, and other AI labs are all simply shut down. Interconnect between more than a handful of H100s is banned. TSMC and Nvidia stock plunges.

Nevertheless, the US cannot rule out AI research everywhere by simple fiat. It isn't the 90s anymore; it's a multi-polar world.

India, China, and a handful of other countries continue to do work with AI -- except the researchers there are far more cautious about how they represent themselves publicly. There's a small diaspora of AI researchers from the US to India and Australia, countries that the US cannot pressure because it needs them on their side in geopolitical conflict with China. And in fact the US starts to decline -- even more quickly -- relative to the influence of China, India, and others because it has effectively banned the most transformative technology of the 2020s. New technologies emerge at an increasing rate from China, India, and other countries, increasing their geopolitical power, and accelerating the prexisting relative power decline of the US.

In 1 to 4 years, we're in the same situation we were before, except AI research is more spread out and now conducted in the countries that are most hostile to alignment and least concerned for safety. Nearly all of the safety work -- conducted on low-power systems in the US in the interim -- turns out to be useless, given how far pre-AGI the systems were, so it's a strictly worse situation.

I don't think exact world is not particularly likely; but I do think this or something like this could happen, in a world where fear drives policy making and where the most conservative / worried AI policy people have their way.

(d) Regulations Are Likely To Maximize The Power of Companies Pushing Forward Capabilities the Most

"So. I'm unemployed too now," Darren said, collapsing onto the couch on the porch. He stared at the moon rising over the city.

"Another one bites the dust," said David, puffing cigarette smoke into the night.

He didn't one ask why Darren was unemployed. He knew the answer already. In the last year, the three largest corporations in the US had grown at a cancerous rate, swallowing the economic spaces of hundreds of other companies. All of them had been AI companies; you could describe them now more as "everything companies." [LW · GW] They now produced more than 50% of literally all things manufactured in the US by value; in a handful of months it would be more than 60%.

Although, by now, "corporation" somewhat of a misnomer -- they handed out food allotments, supplied entertainment, enforced the peace, often provided housing and -- so far as Darren could tell -- dictated policy to the increasingly moribund government of the United States.

If you didn't have an artificial intelligence, you couldn't keep up.

"Once I did -- 'open source AI work'," Darren said, sadly, with air quotes to distance himself from the phrase. He didn't mean it as a claim about previous success; it was an expression of sadness for a lost past.

"Doubt you ever had a chance against the boys with with the billions," said David.

"Fellas with the FLOPs," said Darren.

"Guys with the GPUs," said David. "Don't look at me like that, I know that one doesn't work."

"But look," Darren said irritably, "they didn't always rule the world. There were open source machine-learning models that were useful once."

He continued:

"You could download a general-purpose model, shove it into a robot dog, and it would work in less than an hour. You could run a code-completion model on your very own GPUs. People were doing open-source interpretability work to steer these things, because they could see the weights; it helped you modify them however you wanted. Everyone was building small models that could compete [LW · GW] with big ones on very specific domains, if not general ones. And were hundreds of specific domains that people were training them on."

He drank some more vodka.

"The open source ones tended to be small models. They were far less general than the models of the big guys. But it was still all shut down as too hazardous in `25."

He continued: "They got a law passed, stating that models trained on too many flops were forbidden without a license. Which meant all the useful open-source models were forbidden; anything powerful had to be behind an API, to meet all the requirements. Which meant anything useful had to be behind an API."

He sighed.

"Then everyone had to purchase intelligence from the big guys because they were the only game in town. Their profits fucking doubled. Their return on investment doubled. Why wouldn't it double, we rewarded those fuckers with a monopoly because they promised they'd make it all safe, even though they had said they were trying to make God, and open source was just usually trying to make... some part of a man. AI was already rewarding capital because it was a machine that took capital and printed out labor, and we poured gasoline on that fire."

Darren sighed again.

"Well," said David sarcastically, "you forget the the advantages of maximizing the intelligence differential between men; only a few companies having power made co-ordination easier."

"Christ," said Darren. "Yeah coordination is always easier if only a few people have power. Historically undeniable."

"Definitely aren't any other disadvantages."

"Nope."

In the distance, beyond the decaying human-occupied part of the city, they could see the wall of a factory being lifted by robot-controlled drones. The robots worked 24/7, of course. A time-lapse from here would have shown whole new factories rising in weeks.

"Wouldn't matter if you had a whole farm of GPUs now," said David.

"No, it would not," said Darren. "But once upon a time it, might have."

Open source models have some disadvantages. Policies trying to shut down open source models also have some disadvantages. Thinking about these disadvantages is important for deciding if shutting down open source is an amazing idea, or a really shitty idea. I have tried to gesture at a small handful of these disadvantages here; I think that there are many more.

32 comments

Comments sorted by top scores.

comment by Daniel Kokotajlo (daniel-kokotajlo) · 2023-07-05T02:43:48.640Z · LW(p) · GW(p)

This is a great post. Anyone pushing for AI regulation should make sure they are familiar with this post or at least the ideas in it.

However, I don't think you've convinced me that AI regulation will increase x-risk, because you haven't talked much at all about what the world without AI regulation looks like. Seems to me that that world looks like almost-certain doom, for the usual reasons. With such a low bar, I conclude that advocating for regulation is net-positive, because there's a decent chance it'll buy the world a bunch of time.

comment by Shankar Sivarajan (shankar-sivarajan) · 2023-07-04T18:17:16.311Z · LW(p) · GW(p)

"You wouldn't make that kind of mistake about an AI" is a good line. Almost no one else thinks of the state as an eldritch horror, or even refers to it as such in jest, like they do about AI systems. They should.

Replies from: JenniferRM

↑ comment by JenniferRM · 2023-07-04T23:36:27.865Z · LW(p) · GW(p)

"L'état c'est nous" though? (The state, it is us.)

I'm pretty sure I am not an eldritch horror and I suspect you aren't either, Shankar! Does the "eldritch horror part" arises from our composition? Is so, why and how? Maybe it is an aspect of humans that emerges somehow from a large number of humans [LW · GW]?

"L'état ce ne sont que des bureaucrates humains" is another possibility (the state, it is merely some human bureaucrats) who I fully admit might not be saints, and might not be geniuses, but at least they are humans, operating at human speeds, with human common sense and the moral intuitions that arise in a human limbic system in the loop.

We have a lot of data on what instances from the class of human governments look like! The distribution is not fundamentally alien and unpredictable.

AGI really will be fundamentally different, in a statistical, material, mechanical, temporal, and algorithmic sense.

Scenario C, with an approach where "the regulation is deeply conservative and in every case of ambiguity bans things rather than permits them" seemed the most promising to me, and the OP's central problem there is only limited by "international agreement"... which existing nation states know how to at least sort of do.

Replies from: shankar-sivarajan

↑ comment by Shankar Sivarajan (shankar-sivarajan) · 2023-07-05T03:58:14.728Z · LW(p) · GW(p)

Calling it "eldritch" is mere rhetorical flourish to evoke Lovecraft; of course it's not literally paranormal.

Asking which individual is responsible for the evil of the state is like asking which transistor in the AGI is misaligned. That kind of reductionism obviously won't get you a reasonable answer, and you know it.

The problem is the incorrigibility of the system; it's the same idea as Scott Alexander's Meditations on Moloch: ultimately, it's a coordination problem that giving a name to helps reify in the human mind. In this context, I like quoting 1 Samuel 8:11–18, slightly paraphrased.

Upon consideration, a lighthearted example might be better than Cabaret at showing how such a system can get out of control: Daylight Saving Time. The US government can't get rid of it, and even I can't find a way to blame that on bureaucrats.

As for "the state, it is us," I guess I don't understand what you mean. Sure, in the same sense the serfs are the estate; the conscripts, the army; and the slaves, the plantation, but that's not really what one means when denouncing the state. But if it's a way of transferring moral culpability for the state's actions, I reject that entirely: I do not blame a slave for his submission.

I don't think the moral intuitions of the people involved are really relevant, for the reasons above, but more importantly, you say that looking backwards through human history, the distribution is not fundamentally unpredictable: that's not how predictions work! To people who had never seen a state before, I claim it would have indeed have been alien to see, for example, fishermen killing whales for neither meat nor blubber, but to fill quotas.

If AGI is indeed as "fundamentally different" as you worry from anything that came before it, I do not consider it as unlikely to be better than the state as it exists today as you do.

Scenario C seemed the most promising to me too, but for different reasons.

Replies from: JenniferRM

↑ comment by JenniferRM · 2023-07-05T10:16:47.680Z · LW(p) · GW(p)

When you say "I do not blame a slave for his submission" regarding Daylight Savings Time, that totally works in the second frame where "l'état ce ne sont que des bureaucrates humains".

You're identifying me, and you, and all the other "slaves" as "mere transistors in society".

I dunno about you, but I grew up in a small (unincorporated) town, that was run by the Chamber of Commerce and Rotary Club. My first check as pay for labor was as a soccer referee when I was 12, reffing seven year olds. There was a Deputy of the County Sheriff, but he was not the state. He was very mindful around the communitarian influencers of the town.

Of my town, it made total sense to me to imagine "l'état c'est nous".

Whether you were Democrat, Green, Republican, or Libertarian in your party registration were were all democrats, not peasants. Yes?

In my town, PTA moms would make sure the Deputy got a bad guy if the gossip network noticed that there was a schizophrenic building a machine gun in his garage (or whatever).

I road my bike around, and picked wild fruit from overgrown orchards.

A chain tried to open a Kwik-E-Mart in town, and they didn't donate to the PTA's Halloween Carnival, and so I wasn't allowed to shop there, as a child, for years until they started donating, which they eventually did.

I don't think the moral intuitions of the people involved are really relevant, for the reasons above

I can't imagine any polity actually working where the rulers (and in a democratic republic that's the voters) did NOT have moral intuitions engaged with their regulation of themselves and their neighbors.

I can imagine people's moral intuitions not connecting to the successful operation of their own government if the government is just evil, and anarchism is correct... but I don't think the government is evil and therefore I don't think anarchism is correct!

And if I did think that, then I'd expect it to be relatively clear to show this to nearly all "right thinking people" using reasonable person standards of explanation, and then "we" could just notice that, and overthrow the evil government... right?

As I see it, Russia and China are places where the common man is tragically disengaged from governance.

And because they do not have an autonomous citizenry, they lack people who have trust in their own ability to engage in civic reasoning towards civic ends.

There is only one real "citizen" in each of those countries: Putin and Xi respectively... who need not recognize anyone else. (Note: this actually makes securing AI in those countries easier! Convince "the one guy" and... I think then it will basically happen!?! (From their perspective, I bet we seem like the chaos monkeys.))

Now, I am not personally a neo-con right now.

My current posteriors say: maybe not all currently existing cultures are capable of healthy democratic self-rule.

However, when I think of "straightforward" solutions to AGI regulation my first thought is to set up a Global Parliament of 71 representatives, with proportional representation and quadratic monetary voting as a funding mechanism. I know there are more countries than 71 on the whole planet, but South American only has ~425 million people and ~2 languages.

Make the new thing a "House of Commons" that lets the "United Nations" function as a sort of auxiliary "Senate" or "House of Lords" or whatever (for ceremonial bootstrapping)?

Have that group hold elections over and over until they can find a Prime Minister of Earth who fulfills the Condorcet Criterion.

I bet that PM would be in favor of "AI-not-killing-everyone-ism" because "AI-killing-everyone-ism" seems like a pretty funky platform!

Personally, I would be open to paying every human who can speak (but not every AI and not every orca and not every african gray parrot) a silver dime to participate in voting.

I'd make the first vote cost (in USD?) cost $1, then $2.71, $7.39, $20.09, $54.60, $148.41 and so on with powers of "e".

People would be able to give the dime back to get more votes... but personally I'd keep my dime as a "globally patriotic" keepsake and just pay for my votes in fiat.

I think America and Japan and Korea and Europe and so on have enough rich people that if there's a couple billion idiots clogging up the voting system... those idiots will be poor enough to lose out in the final vote count? And I honestly thing the idiots would wise up pretty fast! <3

If we could finish the "first try" before the end of 2023, that would be amazing! Then do another in 2024. Then do it every year for 10 years, just to find the balances, and learn the skills!

Then my representative in Earth's parliament could ask Earth's Prime Minister, each year, what the fuck was up with AI, and maybe the PM would say (into the cameras in the Chamber) that the global government can't use all that quadratic voting money to come up with ANY way to work with the pre-existing westphalian states to globally prevent humans from getting murdered by a scary AI in any way at all ever...

...and then maybe we deserve to die?

Or (I think to myself) if that's what YOU all end up electing, then maybe YOU ALL deserve to die? I don't wanna die, see? I just want nice normal things, like clean neighborhoods and happy kids n'stuff :-)

So at that point imaginary point, where global governance has actually been tried and it didn't work... I might be ready to say that "my right to self defense is more important than the rest of humanity's right to the consent of the governed"...

...and then I'd start looking for someone who has a solution for "the problem of literally everyone on earth maybe getting killed in a small number of years by a bunch of poorly coded machines" who could credibly say "l'etat c'est moi" like in the bad old days?

I don't know basically anything about Napoleon the Eighth, but if he is an adequate focal point who can read a credible plan for dealing with AI off of a teleprompter...

...then maybe I'd give up on democracy, and bend the knee to him, and hope desperately that his "Butlerian" War is relatively peaceful and fast and successful???

I really really really hope it doesn't get to that point, but if it does... well... I'm pretty sure I have a right to self defense???

And maybe that right (and the similar rights of everyone else who wants to be safe from AI) trumps the moral requirement to secure the consent of governed people?

But I would expect any unelected monarch to still take deontics strongly into account, even if granted insane powers by a crazy situation. The people who swear their sword to such a ruler would care about honor, I'm pretty sure...

Maybe you can explain to me, Shankar, how you complete the sentence "l'etat c'est <entity>" such that the <entity> seems to you like an entity that is at least vaguely person-shaped and able to track its own interests, and pursue its real interests somewhat effectively.

(And if there is no such person-shaped thing in charge of anything, then who the fuck are we paying taxes to? If there is no at-least-semi-benevolent person-shaped entity in that "Rulership Slot", doesn't that mean we could (and maybe should?) all just agree, on the internet, to simultaneously all not pay taxes, and then we'd all get away with it, and then we could sing songs about it or something?)

This is all intensely practical from my perspective.

Its the same "mental muscles" that hold a marriage together, or keep seven year olds from hurting each other while they play soccer, or let a group house elect a kitchen manager who buys food for the whole house that makes the vegans and the carnists happy, because everyone has a right to eat food they've paid for, and collective buying is a good way to get good deals for low prices!

If you're not modeling government as a best-effort pretty-OK "buyers club for public services" run by people who care about the people they have a formal duty to act on behalf of... why aren't you burning cop cars and shooting tax collectors?

And if you do think the government is anything other than a bunch of evil mindless parasites, please join with me and get that non-evil government to do its fucking job and credibly and reliably and systematically protect humans from robot-caused extinction.

Replies from: shankar-sivarajan

↑ comment by Shankar Sivarajan (shankar-sivarajan) · 2023-07-05T13:33:49.247Z · LW(p) · GW(p)

I read your perspective that you've elaborated on at some considerable length, and it's more than a little frustrating that you get so close to understanding mine, that you describe what seems like a perfectly reasonable model of my views, and then go "surely not," so I shall be a little terse; that should leave less room for well-meaning misinterpretation.

if the government is just evil, and anarchism is correct

Yes.

I'd expect it to be relatively clear to show this to nearly all "right thinking people" using reasonable person standards of explanation, and then "we" could just notice that, and overthrow the evil government... right?

I wouldn't. I didn't believe it until only recently (I used to be a minarchist), so I see just how difficult it is to show this.

Also, "overthrow the evil government" sounds like "just switch off the misaligned AGI."

there is no at-least-semi-benevolent person-shaped entity in that "Rulership Slot"

There isn't.

should all just agree, on the internet, to simultaneously all not pay taxes,

We should.

if you do think the government is anything other than a bunch of evil mindless parasites

I don't. (Well, okay, maybe the county sheriffs of some small towns. But nobody relevant.)

how you complete the sentence "l'etat c'est <entity>"

If you insist that I complete this French sentence you keep riffing on, "l'etat c'est Moloch."

why aren't you burning cop cars and shooting tax collectors?

This is an eminently reasonable question I ask myself everyday: call it cowardice, but I don't think it will accomplish anything, and I'd rather be alive than dead.

As a side note, this fundamental misunderstanding reminds of how Bioshock became so beloved by libertarians: the intended response to the "no gods or kings, only man" message was "Ah, I see how good and necessary the state is," not "No king, no king, la la la la la la."

Replies from: JenniferRM

↑ comment by JenniferRM · 2023-07-05T20:29:42.186Z · LW(p) · GW(p)

I beg the tolerance of anyone who sees these two very long comments.

I personally found it useful to learn "yet another of my interlocutors who seems to be opposed to AI regulations has just turned out to just be basically an anarchist at heart".

Also, Shankar and I have started DMing a bunch, to look for cruxes, because I really want to figure out how Anarchist Souls work, and he's willing to seek common epistemic ground, and so hopefully I'll be able to learn something in private, and me and Shankar can do some "adversarial collaboration" (or whatever), and eventually we might post something in public that lists our agreements and "still unresolved cruxes"... or something? <3

In the meantime, please don't downvote him (or me) too harshly! I don't think we will be "polluting the signal/noise commons of LW" much more until the private conversation resolves? Hopefully? :-)

Replies from: Seth Herd

↑ comment by Seth Herd · 2023-07-06T00:05:30.523Z · LW(p) · GW(p)

The state is largely run by people who seek power and fame. That is importantly different from most of us.

comment by Raemon · 2023-07-04T18:42:51.116Z · LW(p) · GW(p)

Can you spell out more the x-risk increasing in the last section? I get that centralizing power has bad downsides, but a) the vignette doesn't even spell out the usual downsides, just obliquely hints at them, b) while I can imagine vague paths to "somehow the math worked out that there was more xrisk here than in the counterfactual", I don't see a clear indication that it's I should expect it to be worse on average in this world.

Replies from: 1a3orn

↑ comment by 1a3orn · 2023-07-04T20:45:53.119Z · LW(p) · GW(p)

Yeah, there are several distinct ideas in that one. There's a cluster around "downsides to banning open source" mixed with a cluster of "downsides to centralization" and the vignette doesn't really distinguish them.

I think "downsides to centralization" can have x-risk relevant effects, mostly backchaining from not-immediately x-risk relevant but still extremely bad effects that are obvious from history. But that wasn't as much my focus... so let me instead talk about "downsides to banning open source" even though both are important.

[All of the following are, of course, disputable.]

(In the following, the banning could either be explicit [i.e., all models must be licensed to a particular owner and watermarked] or implicit [i.e., absolute liability for literally any harms caused by a model, which is effectively the same as a ban]).

(a) -- Open source competes against OpenAI / [generic corp] business. If you expect most x-risk to come from well funded entities making frontier runs (big if, of course), then shutting down open source is simply handing money to the organizations which are causing the most x-risk.

(b) -- I expect AI development patterns based on open source to produce more useful understanding of ML / AI than AI development patterns based on using closed source stuff. That is, a world where people can look at the weights of models and alter them is a world where the currently-crude tools like activation vectors [LW · GW] or inference time intervention [LW · GW] or LEACE get fleshed out into fully-fledged and regularly-used analysis and debugging tools. Sort of a getting-in-the-habit-of-using a debugger, at a civilizational expertise level -- while closed source stuff is getting-in-the-habit-of-tweaking-the-code-till-it-works-in-a-model-free-fashion, civilizationally. I think the influence of this could actually be really huge, cumulatively over time.

(c) -- Right now, I think open source generally focuses on more specific models when compared to closed source. I expect such more specific models to be mostly non-dangerous, and to be useful for making the world less fragile. [People's intuitions differ widely about how much less fragile you could make the world, of course.] If you can make the world less fragile, this has far fewer potential downsides than centralization and so seems really worth pursuing.

Of course you have to balance these against whatever risks are involved in open source (inability to pause if near unaligned FOOM; multipolarity, if you think singletons are a good idea; etc); and against-against whatever risks and distortions are involved in the actual regulatory process of banning open source (surveillance state? etc etc).

comment by Erich_Grunewald · 2023-07-04T20:08:40.871Z · LW(p) · GW(p)

Seems to me like (a), (b) and maybe (d) are true for the airplane manufacturing industry, to some degree.
But I'd still guess that flying is safer with substantial regulation than it would be in a counterfactual world without substantial regulation.

That would seem to invalidate your claim that regulation would make AI x-risk worse. Do you disagree with (1), and/or with (2), and/or see some important dissimilarities between AI and flight that make a difference here?

Replies from: 1a3orn

↑ comment by 1a3orn · 2023-07-05T13:16:04.825Z · LW(p) · GW(p)

I do think there are important dissimilarities between AI and flight.

For instance: People disagree massively over what is safe for AI in ways they do not over flight; i.e., are LLMs going to plateau and provide us a harmless and useful platform for exploring interpretability, while maybe robustifying the world somewhat; or are they going to literally kill everyone?

I think pushing for regulations under such circumstances is likely to promote the views of an accidental winner of a political struggle; or to freeze in amber currently-accepted views that everyone agrees were totally wrong two years later; or to result in a Frankenstein-eqsque mishmash of regulation that serves a miscellaneous set of incumbents and no one else.

My chief purpose here, though, isn't to provide a totally comprehensive "AI regs will be bad" point-of-view.

It's more that, well, everyone has a LLM / babbler [LW · GW] inside themselves that helps them imaginatively project forward into future. It's the thing that makes you autocomplete the world; the implicit world-model driving thought, the actual iceberg of real calculation behind the iceberg's tip of conscious thought.

When you read a story of AI things going wrong, you train the LLM / babbler. If you just read stories of AI stuff going wrong -- and read no stories of AI policy going wrong -- then the babbler becomes weirdly sensitive to the former, and learns to ignore the former. And there are many, many stories on LW and elsewhere now about how AI stuff goes wrong -- without such stories about AI policy going wrong.

If you want to pass good AI regs -- or like, be in a position to not pass AI regs with bad effects -- the babbler needs to be trained to see all these problems. Without being equally trained to see all these problems, you can have confidence that AI regs will be good, but that confidence will just correspond to a hole in one's world model.

This is... intended to be a small fraction of the training data one would want, to get one's intuition in a place where confidence doesn't just correspond to a hole in one's world model.

comment by Shmi (shminux) · 2023-07-04T18:13:14.740Z · LW(p) · GW(p)

I made a post about it some time ago, not in so many words: https://www.lesswrong.com/posts/g2aeGupbr3XC68tLJ/upcoming-ai-regulations-are-likely-to-make-for-an-unsafer [LW · GW]

comment by Daniel Kokotajlo (daniel-kokotajlo) · 2023-07-05T02:41:06.801Z · LW(p) · GW(p)

In 1 to 4 years, we're in the same situation we were before, except AI research is more spread out and now conducted in the countries that are most hostile to alignment and least concerned for safety.

Minor comment: I feel like if the western labs shut down due to public outcry + regulation in the West, it would take more than 1-4 years for the rest of the world to catch up, in expectation. I could totally see it happening, but I think it's fairly likely that the rest of the world (which is already at least a year or so behind) would slow down substantially in response.

comment by Bridgett Kay (bridgett-kay) · 2023-07-10T17:40:06.425Z · LW(p) · GW(p)

Lately I've been appreciating, more and more, something I'm starting to call "Meta-Alignment." Like, with everything that touches AI, we have to make sure that thing is aligned just enough to where it won't mess up or "misalign" the alignment project. For example, we need to be careful about the discourse surrounding alignment, because we might give the wrong idea to people who will vote on policy or work on AI/AI adjacent fields themselves. Or policy needs to be carefully aligned, so it doesn't create misaligned incentives that mess up the alignment project; the same goes for policies in companies that work with AI. This is probably a statement of the obvious, but it is really a daunting prospect the more I think about it.

comment by Raemon · 2023-07-17T17:42:13.357Z · LW(p) · GW(p)

FYI the moderation team has thought about curating this since it makes some useful points, but so far decided against because it doesn't really argue for its claims (and in some cases the claims seem kinda weak, and not compared against counterfactual or upsides, etc). But, I'd still be interested in a version of this post that makes a clearer/stronger case.

comment by Rudi C (rudi-c) · 2023-07-04T19:59:18.379Z · LW(p) · GW(p)

Another important problem is that while x-risk is speculative and relatively far off, rent-seeking and exploitation are rampant and everpresent. These regulations will make the current ailing politico-economic system much worse to the detriment of almost everyone. In our history, giving tribute in exchange for safety has usually been a terrible idea.

Replies from: daniel-kokotajlo

↑ comment by Daniel Kokotajlo (daniel-kokotajlo) · 2023-07-05T14:21:14.571Z · LW(p) · GW(p)

AI x-risk is not far off at all, it's something like 4 years away IMO. As for "speculative..." that's not an argument, that's an epithet.

I was trained in analytic philosophy, and then I got lots of experience thinking about AI risks of various kinds, trying to predict the future in other ways too (e.g. war in Ukraine, future of warfare assuming no AI) and I do acknowledge that it's sometimes valid to add in lots of uncertainty to a topic on the grounds that currently the discussion on that topic is speculative, as opposed to mathematically rigorous or empirically verified etc. But I feel like people are playing this card inappropriately if they think that AGI might happen this decade but that AI x-risk is dismissible on grounds of being speculative. If AGI happens this decade the risks are very much real and valid and should not be dismissed, certainly not for such a flimsy reason.

Replies from: beren, rudi-c, person-1

↑ comment by beren · 2023-07-06T12:00:46.855Z · LW(p) · GW(p)

AI x-risk is not far off at all, it's something like 4 years away IMO

Can I ask where this four years number is coming from? It was also stated prominently in the new 'superalignment' announcement (https://openai.com/blog/introducing-superalignment). Is this some agreed upon median timelines at OAI? Is there an explicit plan to build AGI in four years? Is there strong evidence behind this view -- i.e. that you think you know how to build AGI explicitly and it will just take four years more compute/scaling?

Replies from: daniel-kokotajlo

↑ comment by Daniel Kokotajlo (daniel-kokotajlo) · 2023-07-06T14:43:04.411Z · LW(p) · GW(p)

Sure. First of all, disclaimer: This is my opinion, not that of my employer. (I'm not supposed to say what my employer thinks.) Yes, I think I know how to build AGI. Lots of people do. The difficult innovations are already behind us, now it's mostly a matter of scaling. And there are at least two huge corporate conglomerates in the process of doing so (Microsoft+OpenAI and Alphabet+GoogleDeepMind).

There's a lot to say on the subject of AGI timelines. For miscellaneous writings of mine, see AI Timelines - LessWrong [? · GW]. But for the sake of brevity I'd recommend (1) The "Master Argument" I wrote in 2021, after reading Ajeya Cotra's Bio Anchors Report [LW · GW], which lays out a way to manage one's uncertainty about AI timelines (credit to Ajeya) by breaking it down into uncertainty about the compute ramp-up, uncertainty about how much compute would be needed to build AGI using the ideas of today, and uncertainty about the rate at which new ideas will come along that reduce the compute requirements. You can get soft upper bounds on your probability mass, and hard lower bounds, and then you can argue about where the probability mass should be in between those bounds and then look empirically at the rate of compute ramp-up and new-ideas-coming-along-that-reduce-compute-costs.

And (2) I'd recommend doing the following exercise: Think of what skills a system would need to have in order to constitute AGI. (I'd recommend being even more specific, and asking what skills are necessary to massively accelerate AI R&D, and what skills are necessary to have a good shot at disempowering humanity). Then think about how you'd design a system to have those skills today, if you were in charge of OpenAI and that was what you wanted to do for some reason. What skills are missing from e.g. AutoGPT-4? Can you think of any ways to fill in those gaps? When I do this exercise the conclusion I come to is "Yeah it seems like probably there isn't any fundamental blocker here, we basically just need more scaling in various dimensions." I've specifically gone around interviewing people who have longer timelines and asking them what blockers they think exist--what skills they think are necessary for AI R&D acceleration AND takeover, but will not be achieved by AI systems in the next ten years--and I've not been satisfied with any of the answers.

↑ comment by Rudi C (rudi-c) · 2023-07-11T22:37:46.513Z · LW(p) · GW(p)

The prior is that dangerous AI will not happen in this decade. I have read a lot of arguments here for years, and I am not convinced that there is a good chance that the null hypothesis is wrong.

GPT4 can be said to be an AGI already. But it's weak, it's slow, it's expensive, it has little agency, and it has already used up high-quality data and tricks such as ensembling. 4 years later, I expect to see GPT5.5 whose gap with GPT4 will be about the gap between GPT4 and GPT3.5. I absolutely do not expect the context window problem to get solved in this timeframe or even this decade. (https://arxiv.org/abs/2307.03172)

↑ comment by Person (person-1) · 2023-07-05T14:50:00.550Z · LW(p) · GW(p)

If AGI happens this decade the risks are very much real and valid and should not be dismissed, certainly not for such a flimsy reason.

Especially considering what people consider the near-term risks, which we can expect to become more and more visible and present, will likely shift the landscape in regards to taking AI x-risk seriously. I posit x-risk won't remain speculative for long with roughly the same timeline you gave.

comment by Mikhail Samin (mikhail-samin) · 2023-07-04T18:32:36.227Z · LW(p) · GW(p)

(c) is a meme that points in the wrong direction. It describes an Earth with not enough regulation, not an Earth with too much regulation. It’d be great to make the regulators around the world make right decisions for the correct reasons, but getting anyone anywhere to implement anything remotely sane, even if you get only half-way there and others continue to race ahead, means dying with more dignity (and later)

Replies from: rudi-c

↑ comment by Rudi C (rudi-c) · 2023-07-04T20:00:03.241Z · LW(p) · GW(p)

Taboo dignity.

Replies from: mikhail-samin

↑ comment by Mikhail Samin (mikhail-samin) · 2023-07-05T09:31:52.597Z · LW(p) · GW(p)

Shorthand for having done things that increase the log-odds of having a long-term future, as a goal that makes sense to pursue, doesn’t include unilateral unethical actions, means that the world is closer to a surviving one, etc. See the Death with Dignity post.

comment by David Scott Krueger (formerly: capybaralet) (capybaralet) · 2023-08-14T18:42:00.968Z · LW(p) · GW(p)

I found this thought provoking, but I didn't find the arguments very strong.

(a) Misdirected Regulations Reduce Effective Safety Effort; Regulations Will Almost Certainly Be Misdirected [LW · GW]

(b) Regulations Generally Favor The Legible-To-The-State [LW · GW]

(d) Regulations Are Likely To Maximize The Power of Companies Pushing Forward Capabilities the Most [LW · GW]

Briefly responding:
a) The issue in this story seems to be that the company doesn't care about x-safety, not that they are legally obligated to care about face-blindness.
b) If governments don't have bandwidth to effectively vet small AI projects, it seems prudent to err on the side of forbidding projects that might pose x-risk.
c) I do think we need effective international cooperation around regulation. But even buying 1-4 years time seems good in expectation.
d) I don't see the x-risk aspect of this story.

comment by Ruby · 2023-07-06T02:32:47.856Z · LW(p) · GW(p)

Kudos! I'm very in favor of people reflecting on how efforts to help might backfire if not done right.

comment by Review Bot · 2024-02-16T03:42:33.134Z · LW(p) · GW(p)

The LessWrong Review [? · GW] runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2024. The top fifty or so posts are featured prominently on the site throughout the year.

Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?

comment by Yellow · 2023-08-06T05:21:10.674Z · LW(p) · GW(p)

a point this reminds me of (which animal welfare cause area is being rekt on rn) may be the expected pushback. Animal advocacy orgs are smol (so is ai safety) in comarison to animal AG (or ai techies)

Having a small regulatory win (ex. Califonia's prop 12 which banned caged eggs / crated pork from being sold in CA) Has now lead to HUGE pushback from big animal AG (EATS Act - if passed will prohibit/undo basically all legistlature from states/counties reguarding regulations on animal ag products - currently has a really good shot at passing)

Animal advocacy orgs currently do not have much capacity to face up big AG toe to toe. If they had waited from passing regulations to build up their community more, maybe odds would be better(a serious trade-off as waiting means more low welfare).

Can ai safety orgs withstand the backlash that ai big techies benefiters of unregulated ai (MS, GOOG) might bring to crush their attempts at regulation?

comment by Michael Tontchev (michael-tontchev-1) · 2023-07-11T23:14:40.183Z · LW(p) · GW(p)

I feel like we can spin up stories like this that go any way we want. I'd rather look at trends and some harder analysis.

For example we can tell an equally-entertaining story where any amount of AI progress slowdown in the US pushes researchers to other countries that care less about alignment, so no amount of slowdown is effective. Additionally, any amount of safety work and deployment criteria can push the best capabilities people to the firms with the least safety restrictions.

But do we think these are plausible, and specifically more plausible than alternatives where slowdowns work?

comment by Gordon Seidoh Worley (gworley) · 2023-07-04T21:02:03.981Z · LW(p) · GW(p)

This is great content for LessWrong, but I worry a bit about if you tried to present these sorts of arguments to policy makers. I think they'd not read the nuance and only see a dial of more/less regulation and either it's more regulation good or more regulation bad (politicians seem to have a hard time accepting just enough regulation except as a temporary measure until they can either increase or decrease the amount of regulation).

I think a nearby but slightly better framing would be: which specific regulations do I expect to be worse than no regulation at all. Maybe this is exactly what you had in mind, but the slightly different wording seems safer to put into the policy debate.

Replies from: Raemon

↑ comment by Raemon · 2023-07-04T21:37:03.692Z · LW(p) · GW(p)

I read the post as mostly targeting LessWrongers (in order to get them to not lobby policymakers in productive ways), and the main thing was intuition building for "your vague models of 'slow down AI however the cost' are missing downsides that should be more salient"

Ways I Expect AI Regulation To Increase Extinction Risk

Contents

(a) Misdirected Regulations Reduce Effective Safety Effort; Regulations Will Almost Certainly Be Misdirected

(b) Regulations Generally Favor The Legible-To-The-State

(c) Heavy Regulations Can Simply Disempower the Regulator

(d) Regulations Are Likely To Maximize The Power of Companies Pushing Forward Capabilities the Most

32 comments