What I Would Do If I Were Working On AI Governance

johnswentworth

What I Would Do If I Were Working On AI Governance

post by johnswentworth · 2023-12-08T06:43:42.565Z · LW · GW · 32 comments

  Liability
    Why Liability?
    How Liability?
  Regulatory Survey and Small Interventions
  Agencies Which Enforce Secrecy
  Ambitious Legislation
    Public Opinion (And Perception Thereof)
    Legislation Design
    Selling It
  Concluding Thoughts
None
32 comments

I don’t work in AI governance, and am unlikely to do so in the future. But various anecdotes and, especially, Akash’s recent discussion [LW · GW] leave me with the impression that few-if-any people are doing the sort of things which I would consider sensible starting points, and instead most people are mostly doing things which do not seem-to-me to address any important bottleneck to useful AI governance.

So this post lays out the places I would start, if I were working on AI governance, and some of the reasoning behind them.

No doubt I am missing lots of important things! Perhaps this post will nonetheless prove useful to others working in AI governance, perhaps Cunningham’s Law will result in me learning useful things as a result of this post, perhaps both. I expect that the specific suggestions in this post are more likely to be flawed than the style of reasoning behind them, and I therefore recommend paying more attention to the reasoning than the specific suggestions.

This post will be mostly US-focused, because that is what I know best and where all the major AI companies are, but presumably versions of the interventions discussed could also carry over to other polities.

Liability

One major area I’d focus on is making companies which build AI liable for the damages caused by that AI, both de-facto and de-jure.

Why Liability?

The vague goal here is to get companies which build AI to:

Design from the start for systems which will very robustly not cause problems.
Invest resources in red-teaming, discovering new failure-modes before they come up in production, etc.
Actually not deploy systems which raise red flags, even when the company has invested heavily in building those systems.
In general, act as though the company will take losses from damages caused by their AI, not just capture profits from the benefits caused by their AI.

… and one natural way to do that is to ensure that companies do, in fact, take losses from damages caused by their AI, not just capture profits from the benefits caused by their AI. That’s liability in a nutshell.

Now, realistically, this is not going to extend all the way to e.g. making companies buy extinction insurance [LW · GW]. So why do realistic levels of liability matter for extinction risk? Because they incentivize companies to put in place safety processes with any actual teeth at all.

For instance: right now, lots of people are working on e.g. safety evals. My very strong expectation is that, if and when those evals throw red flags, the major labs will respond by some combination of (1) having some meetings where people talk about safety a bunch, (2) fine-tuning until the red flags are no longer thrown (in a way which will obviously not robustly remove the underlying problems), and then (3) deploying it anyway, under heavy pressure from the CEO of Google/Microsoft/Amazon and/or Sam Altman. (In particular, the strongest prediction here is that the models will somehow end up deployed anyway.)

On the other hand, if an AI company has already been hit with lots of expensive lawsuits for problems caused by their AI, then I expect them to end up with a process which will test new models in various ways, and then actually not deploy them if red flags come up. They will have already done the “fine tune until red light stops flashing” thing a few times, and paid for it when their fine tuning failed to actually remove problems in deployment.

Another way to put it: liability forces a company to handle the sort of organizational problems [LW · GW] which are a central bottleneck to making any sort of AI safety governance basically-real, rather than basically-fake. It forces the organizational infrastructure/processes needed for safety mechanisms with teeth.

For a great case study of how liability solved a similar problem in another area, check out Jason Crawford’s How Factories Were Made Safe [LW · GW]. That was the piece which originally put this sort of strategy on my radar for AI.

How Liability?

Now on to the sorts of things that I’d work on day-to-day to achieve that vague vision.

Broadly speaking, I see three paths:

The judicial path: establish legal precedents in which AI companies are held liable for damages
The regulatory path: get regulatory agencies which maintain liability-relevant rules to make rule clarifications or even new rules under which AI companies will be unambiguously liable for various damages
The legislative path: get state and/or federal lawmakers to pass laws making AI companies unambiguously liable for various damages

In order to ensure de-facto liability, all of these paths should also be coupled with an org which actively searches out people with claims against AI companies, and provides lawyers to pursue those claims.

Given that any of the paths require an org which actively searches out people with claims, and provides lawyers to pursue those claims, the judicial path is the obvious one to take, since it requires exactly that same infrastructure. The only tweak would be that the org also actively looks for good test-cases and friendly jurisdictions in which to prosecute them.

Depending on the available resources, one could also pursue this sort of strategy by finding an existing law firm that’s good at this sort of thing, and simply subsidizing their cases in exchange for a focus on AI liability.

The sort of cases I’d (low-confidence) expect to pursue relatively early on would be things like:

Find a broad class of people damaged in some way by hallucinations, and bring a class-action suit against the company which built the large language model.
Find some celebrity or politician who’s been the subject of a lot of deepfakes, and bring a suit against the company whose model made a bunch of them.
Find some companies/orgs which have been damaged a lot by employees/contractors using large language models to fake reports, write-ups, etc, and then sue the company whose model produced those reports/write-ups/etc.

The items lower down on this list would (I would guess) establish stronger liability standards for the AI companies, since they involve other parties clearly “misusing” the model (who could plausibly shoulder most of the blame in place of the AI company). If successful, such cases would therefore be useful precedents, incentivizing AI companies to put in place more substantive guardrails against misuse, since the AI company would be less able to shove off liability onto “misusers”. (As per the previous section, this isn’t the sort of thing which would immediately matter for extinction risk, but is rather intended to force companies to put any substantive guardrails in place at all.)

Note that a major challenge here is that one would likely be directly fighting the lawyers of major tech companies. That said, I expect there are law firms whose raison d’etre is to fight large companies in court, and plenty of judges who do not like major tech companies.

Regulatory Survey and Small Interventions

There’s presumably dozens of federal agencies working on rulemaking for AI right now. One obvious-to-me thing to do is to read through all of the rule proposals and public-comment-periods entering the Federal Register, find any relevant to AI (or GPUs/cloud compute), and simply submit comments on them arguing for whatever versions of the proposed rules would most mitigate AI extinction risk.

Some goals here might be:

Push for versions of rules around AI which have real teeth, i.e. they’re not just some paperwork.
Push for versions of rules which target relatively-general AI specifically, and especially new SOTA training runs.
Make GPUs more scarce in general (this one is both relatively more difficult and relatively more “anticooperative”; I’m unsure whether I’d want to prioritize that sort of thing).

… but mostly I don’t currently know what kinds of rules are in the pipe, so I don’t know what specific goals I would pursue. A big part of the point here is just to orient to what’s going on, get a firehose of data, and then make small pushes in a lot of places.

Note that this is the sort of project which large language models could themselves help with a lot - e.g. in reading through the (very long) daily releases of the Federal Register to identify relevant items.

Insofar as I wanted to engage in typical lobbying, e.g. higher-touch discussion with regulators, this would also be my first step. By getting a broad view of everything going on, I’d be able to find the likely highest-impact new rules to focus on.

In terms of implementation, I would definitely not aim to pitch rulemakers on AI X-risk in general (at least not as part of this strategy). Rather, I’d focus on:

Deeply understanding the proposed rules and the rulemakers’ own objectives
“Solving for the equilibrium” of the incentives they create (in particular looking for loopholes which companies are likely to exploit)
Suggesting implementation details which e.g. close loopholes or otherwise tweak incentives in ways which both advance X-risk reduction goals and fit the rulemakers’ own objectives

Another thing I’d be on the lookout for is hostile activity - i.e. lobbyists for Google, OpenAI/Microsoft, Amazon, Facebook, etc trying to water down rules.

An aside: one thing I noticed during The Great Air Conditioner Debate of 2022 [LW(p) · GW(p)] was that, if the reader actually paid attention and went looking for bullshit from the regulators, it was not at all difficult to tell what was going on. The regulators’ write-up from the comment period said more-or-less directly that a bunch of single-hose air conditioner manufacturers claimed their sales would be killed by the originally-proposed energy efficiency reporting rules. In response:

However, as discussed further in section III.C.2, section III.C.3, and III.H of this final rule, the rating conditions and SACC calculation proposed in the November 2015 SNOPR mitigate De’ Longhi’s concerns. DOE recognizes that the impact of infiltration on portable AC performance is test-condition dependent and, thus, more extreme outdoor test conditions (i.e., elevated temperature and humidity) emphasize any infiltration related performance differences. The rating conditions and weighting factors proposed in the November 2015 SNOPR, and adopted in this final rule (see section III.C.2.a and section III.C.3 of this final rule), represent more moderate conditions than those proposed in the February 2015 NOPR. Therefore, the performance impact of infiltration air heat transfer on all portable AC configurations is less extreme. In consideration of the changes in test conditions and performance calculations since the February 2015 NOPR 31 and the test procedure established in this final rule, DOE expects that single-duct portable AC performance is significantly less impacted by infiltration air.

In other words, the regulators’ write-up itself helpfully highlighted exactly where the bullshit was in their new formulas: some change to the assumed outdoor conditions. I took a look, and that was indeed where the bullshit was: the modified energy efficiency standards used a weighted mix of two test conditions, with 80% of the weight on conditions in which outdoor air is only 3°F/1.6°C hotter than indoor air, making single-hose air conditioners seem far less inefficient (relative to two-hose) than more realistic conditions under which one would use a portable air conditioner.

Getting back to the main thread: the point is, insofar as that case-study is representative, it’s not actually all that hard to find where the bullshit is. The regulators want their beneficiaries to be able to find the bullshit, so the regulators will highlight the bullshit themselves.

But that’s a two-edged sword. One project I could imagine taking on would be to find places where AI companies’ lobbyists successfully lobbied for some bullshit, and then simply write up a public post which highlights the bullshit for laymen. I could easily imagine such a post generating the sort of minor public blowback to which regulators are highly averse, thereby incentivizing less bullshit AI rules from regulators going forward. (Unfortunately it would also incentivize the regulators being more subtle in the future, but we’re not directly talking about aligning a human-level-plus AI here, so that’s plausibly an acceptable trade-off.)

Agencies Which Enforce Secrecy

I’m not going to go into much detail in this section, largely because I don’t know much of the relevant detail. That said, it sure does seem like:

Intelligence agencies (and intelligence-adjacent agencies like e.g. the people who make sure nuclear secretes stay secret) are uniquely well-equipped in terms of operational capabilities needed to prevent dangerous AI from being built
Intelligence agencies (and intelligence-adjacent agencies) guard an awful lot of secrets which AI makes a lot easier to unearth.

I’ve heard many times that there’s tons of nominally-classified information on the internet, but it’s not particularly easy to find and integrate. Large language models seem like a pretty ideal tool for that.

So it seems like there’s some well-aligned incentives here, and possibly all that’s needed is for someone to make it very obvious to intelligence agencies that they need to be involved in the security of public-facing AI.

I don’t know how much potential there is here or how best to tap it. If I were carrying out such a project, step 1 would be to learn a lot and talk to people with knowledge of how the relevant institutions work.

Ambitious Legislation

I’d rather not tackle ambitious legislative projects right out the gate, at least not before a regulatory survey; it is the most complex kind of governance project by a wide margin. But if I were to follow that path, here are the bottlenecks I’d expect and how I’d tackle them.

Public Opinion (And Perception Thereof)

For purposes of legislation in general, I see public opinion as a form of currency. Insofar as one’s legislative agenda directly conflicts with the priorities of tech companies’ lobbyists, or requires lots of Nominally Very Important People to pay attention (e.g. to create whole new agencies), one will need plenty of that currency to spend.

It does seem like the general public is pretty scared of AGI in general. Obviously typical issues like jobs, racism/wokeness, etc, are still salient, but even straight-up X-risk seems like a place where the median voter has views not-too-dissimilar to Lesswrongers once the issue is brought to their attention at all. THIS DOES NOT LEAD TO RISING PROPERTY VALUES IN TOKYO [? · GW] seems to be a pretty common intuition.

One thing to emphasize here is that perception of public opinion matters at least as much as public opinion itself, for purposes of this “currency”. We want policymakers to know that the median voter is pretty scared of AGI in general. So there’s value in things like e.g. surveys, not just in outright public outreach.

Legislation Design

Once there’s enough public-opinion-currency to make an ambitious legislative project viable, the next big step is to draft legislation which would actually do what we want.

(There’s an intermediate step of figuring out what vague goals to aim - like e.g. licensing, moratorium, etc - but I do not expect that step to be a major bottleneck, and in any case one should mostly spend time on draft legislation and then update high-level target if and when one discovers barriers along the way.)

This step is where most of the cognitive work would happen. It’s a tricky design problem of:

Figuring out who in the federal government needs to do what
What legislative language will cause them to do that
Thinking through equilibrium behavior of regulators, companies, and researchers under the resulting incentives
Iterating on all that

This obviously requires pretty detailed knowledge of who does what in existing agencies, how new agencies are founded and operate, who’s responsible for all that, etc. Lots of figuring out who actually does what.

I expect this step is currently the primary bottleneck, and where most of the value is.

Selling It

The next step would be pitching that legislation to people who can make it law. This is the step which I expect gets a LOT easier as public-opinion-currency increases. With enough pressure from the median voter, congressional offices will be actively searching around for proposals, and anyone standing around with actual draft legislation will be in high demand. On the other hand, if there’s relatively little public opinion pressure, then one would need to rely a lot more on “inside game”. If I were following this path, I’d definitely be aiming to rely relatively heavily on public opinion rather than inside game, but I’d contract 2-3 people with inside-game know-how and consult them independently to make sure I wasn’t missing anything crucial.

Concluding Thoughts

As mentioned at the start, there’s probably stuff in here that I’m just completely wrong about. But it hopefully gives a sense of the kind of approach and mental models I’d use. In particular, note:

A focus on bottlenecks, not just marginal impact.
A focus on de-facto effects and equilibrium behavior under incentives, not just symbolic rules
A focus on figuring out the details of the processes for both creating and enforcing rules, i.e. which specific people do which specific things.

A final note: this post did not talk about which governance projects I would not allocate effort to, or why. If you're curious about particular projects or classes of project which the post ignored, feel free to leave a comment.

32 comments

Comments sorted by top scores.

comment by Chris_Leong · 2023-12-08T09:04:37.845Z · LW(p) · GW(p)

I'm much more skeptical about liability.

My worry is that these lawsuits would become the overriding factor in terms of what decisions are made about safety to the point where other considerations, such as what is actually safe, will be driven out.

Replies from: johnswentworth

↑ comment by johnswentworth · 2023-12-08T09:52:34.468Z · LW(p) · GW(p)

Yeah, I think that's a sensible concern to have. On the other hand, at that point we'd be relatively few bits-of-optimization away from a much better situation than today: adjusting liability laws to better target actual safety, in a world where liability is already a de-facto decision driver at AI companies, is a much easier problem than causing AI companies to de-novo adopt decision-driving-processes which can actually block deployments.

Replies from: D0TheMath

↑ comment by Garrett Baker (D0TheMath) · 2023-12-08T17:15:01.866Z · LW(p) · GW(p)

If liability law moves AI companies that much, I’d expect the AI companies to have already made the easy, and moderately difficult changes you’re able to make to that liability process, so it seems unlikely to be able to be changed much.

Replies from: johnswentworth

↑ comment by johnswentworth · 2023-12-08T17:20:25.385Z · LW(p) · GW(p)

That sentence is not parsing for me, could you please reword?

Replies from: D0TheMath

↑ comment by Garrett Baker (D0TheMath) · 2023-12-08T18:29:34.584Z · LW(p) · GW(p)

I expect that once liability law goes in place, and AI companies learn they can’t ignore it, they too would lobby the government with more resources & skill than safety advocates, likely not destroying the laws, but at least restricting their growth, including toward GCR relevant liabilities.

Replies from: johnswentworth

↑ comment by johnswentworth · 2023-12-08T19:05:13.128Z · LW(p) · GW(p)

Ah, that makes sense. I would have agreed more a year ago. At this point, it's looking like the general public is sufficiently scared of AGI that the AI companies might not be able to win that fight just by throwing resources at the problem. (And in terms of them having more skill than safety advocates... well, they'll hire people with good-looking resumes, but remember that the dysfunctionality of large companies still applies here.)

More generally, I do think that some kind of heads-up fight with AI companies is an unavoidable element of any useful policy plan at this point. The basic problem is to somehow cause an AI company to not deploy a model when deployment would probably make the company a bunch of money, and any plan to achieve that via policy unavoidably means fighting the companies.

comment by Dweomite · 2023-12-09T02:43:50.702Z · LW(p) · GW(p)

The sort of cases I’d (low-confidence) expect to pursue relatively early on would be things like:
Find a broad class of people damaged in some way by hallucinations, and bring a class-action suit against the company which built the large language model.
Find some celebrity or politician who’s been the subject of a lot of deepfakes, and bring a suit against the company whose model made a bunch of them.
Find some companies/orgs which have been damaged a lot by employees/contractors using large language models to fake reports, write-ups, etc, and then sue the company whose model produced those reports/write-ups/etc.

The second two--and possibly the first one, but it's hard to tell because it's kinda vague--feel pretty bad to me. Like, if you ignore the x-risk angle, imagine that current AI is roughly as strong as AI will ever be, and just look at this from a simple product liability angle, then making AI creators liable for those things strikes me as unreasonable and also bad for humanity. Kinda like if you made Adobe liable for stuff like kids using Photoshop to create fake driver's licenses (with the likely result that all legally-available graphics editing software will suck, forever).

Curious if you disagree and think those would be good liability rules even if AI progress was frozen, or if you're viewing this as a sacrifice that you're willing to make to get a weapon against x-risk?

Replies from: johnswentworth

↑ comment by johnswentworth · 2023-12-09T03:00:50.567Z · LW(p) · GW(p)

Curious if you disagree and think those would be good liability rules even if AI progress was frozen, or if you're viewing this as a sacrifice that you're willing to make to get a weapon against x-risk?

I don't think my actual view here quite fits that binary?

Very roughly speaking, these sorts of lawsuits are pushing toward a de-facto rule of "If you want to build AI (without getting sued into oblivion), then you are responsible for solving its alignment problem. If you further want it to be usable by the broad public, then you are responsible for solving the harder version of its alignment problem, in which it must be robust to misuse.".

I can see the view in which that's unreasonable. Like, if it is both true that these products were never particularly dangerous in the first place, and solving the alignment problem for them (including misuse) is Hard, then yeah, we'd potentially be missing out on a lot of value. On the other hand... if we're in that world and the value in fact dramatically outweighs the costs, then the efficient solution is for the AI companies to eat the costs. Like, if they're generating way more value than deepfakes and fake reports and whatnot generate damage, then it is totally fair and reasonable for the companies to make lots of money but also pay damages to the steady stream of people harmed by their products. That's the nice thing about using liability rather than just outlawing things: if the benefits in fact outweigh the costs, then the AI companies can eat the costs and still generate lots of value.

... Ok, having written that out, I think my actual answer is "it's just totally reasonable". The core reason it's reasonable is Coasean thinking: liability is not the same as outlawing things, so if the upside outweighs the downside then AI companies should eat the costs.

(A useful analogy here is worker's comp: yes, it's a pain in the ass, but it does not mean that all companies just stop having employees. It forces the companies to eat costs, and therefore strongly incentivizes them to solve safety problems, and that's what we want here. If the upside is worth the downside, then companies eat the cost, and that's also a fine outcome.)

Replies from: David Hornbein, Dweomite

↑ comment by David Hornbein · 2023-12-09T23:09:33.826Z · LW(p) · GW(p)

Cars are net positive, and also cause lots of harm. Car companies are sometimes held liable for the harm caused by cars, e.g. if they fail to conform to legal safety standards or if they sell cars with defects. More frequently the liability falls on e.g. a negligent driver or is just ascribed to accident. The solution is not just "car companies should pay out for every harm that involves a car", partly because the car companies also don't capture all or even most of the benefits of cars, but mostly because that's an absurd overreach which ignores people's agency in using the products they purchase. Making cars (or ladders or knives or printing presses or...) "robust to misuse", as you put it, is not the manufacturer's job.

Liability for current AI systems could be a good idea, but it'd be much less sweeping than what you're talking about here, and would depend a lot on setting safety standards which properly distinguish cases analogous to "Alice died when the car battery caught fire because of poor quality controls" from cases analogous to "Bob died when he got drunk and slammed into a tree at 70mph".

↑ comment by Dweomite · 2023-12-09T08:54:03.452Z · LW(p) · GW(p)

That seems like a useful framing. When you put it like that, I think I agree in principle that it's reasonable to hold a product maker liable for the harms that wouldn't have occurred without their product, even if those harms are indirect or involve misuse, because that is a genuine externality, and a truly beneficial product should be able to afford it.

However, I anticipate a few problems that I expect will cause any real-life implementation to fall seriously short of that ideal:

The product can only justly be held liable for the difference in harm, compared to the world without that product. For instance, maybe someone used AI to write a fake report, but without AI they would have written a fake report by hand. This is genuinely hard to measure, because sometimes the person wouldn't have written a fake if they didn't have such a convenient option, but at the same time, fake reports obviously existed before AI, so AI can't possibly be responsible for 100% of this problem.
If you assign all liability to the product, this will discourage people from taking reasonable precautions. For instance, they might stop making even a cursory attempt to check if reports look fake, knowing that AI is on the hook for the damage. This is (in some cases) far less efficient than the optimal world, where the defender pays for defense as if they were liable for the damage themselves.
In principle you could do a thing where the AI pays for the difference in defense costs plus the difference in harm-assuming-optimal-defense, instead of for actual harm given your actual defense, but calculating "optimal defense" and "harm assuming optimal defense" sounds like it would be fiendishly hard even if all parties' incentives were aligned, which they aren't. (And you'd have to charge AI for defense costs even in situations where no actual attack occurred, and maybe even credit them in situations where the net result is an improvement to avoid overcharging them overall?)
My model of our legal system--which admittedly is not very strong--predicts that the above two problems are hard to express within our system, that no specific party within our system believes they have the responsibility of solving them, and that therefore our system will not make any organized attempt to solve them.
For instance, if I imagine trying to persuade a judge that they should estimate the damage a hand-written fake report would have generated and bill the AI company only for the difference in harm, I don't have terribly high hopes of the judge actually trying to do that. (I am not a legal expert and am least certain about this point.)

Replies from: johnswentworth

↑ comment by johnswentworth · 2023-12-09T09:03:50.167Z · LW(p) · GW(p)

(I should probably explain this in more detail, but I'm about to get on a plane so leaving a placeholder comment. The short answer is that these are all standard points discussed around the Coase theorem, and I should probably point people to David Friedman's treatment of the topic, but I don't remember which book it was in.)

comment by RogerDearnaley (roger-d-1) · 2023-12-08T10:40:10.991Z · LW(p) · GW(p)

Cunningham’s Law

Cunningham's Law: "the best way to get the right answer on the internet is not to ask a question; it's to post the wrong answer."

This suggests an alternative to the "helpful assistant" paradigm and its risk of sycophancy during RL training: come up with a variant of instruct training. where, rather than asking the chatbot a question that it will then answer, you instead tell it your opinion, and it corrects you at length, USENET-style. It should be really easy to elicit this behavior from base models.

Replies from: johnswentworth

↑ comment by johnswentworth · 2023-12-08T10:59:28.178Z · LW(p) · GW(p)

That is a surprisingly excellent idea.

Replies from: roger-d-1

↑ comment by RogerDearnaley (roger-d-1) · 2023-12-08T11:09:04.912Z · LW(p) · GW(p)

I'm almost tempted to Cunningham-tune a Mistral 7B base model. We'd only need O(10,000) good examples and O($100). And it would be funny as hell.

Replies from: roger-d-1

↑ comment by RogerDearnaley (roger-d-1) · 2023-12-08T11:24:25.276Z · LW(p) · GW(p)

Trying this prompting approach briefly on GPT-4, if you just venture a clearly-mistaken opinion, it does politely but informatively correct you (distinctly not USENET-style). On some debatable subjects it was rather sycophantic to my viewpoint, though with a bit of on-the-other-hand push-back in later paragraphs. So I'm gradually coming to the opinion this is only about as humorous as Grok. But it still might be a thought-provoking change of pace.

Replies from: roger-d-1

↑ comment by RogerDearnaley (roger-d-1) · 2023-12-08T11:34:55.133Z · LW(p) · GW(p)

IMO the criterion for selecting the positive training examples should be that the chatbot won the argument, under standard debating rules (plus Godwin's Law, of course): it net-shifted a vote of humans towards its position. If the aim is to evoke USENET, I think we should allow the chatbot to use more then one persona holding more than one viewpoint, even ones that also argue with each other.

comment by aphyer · 2023-12-09T03:41:36.313Z · LW(p) · GW(p)

I find your 'liability' section somewhat scary. It sounds really concerningly similar to saying the following:

AI companies haven't actually done any material harm to anyone yet. However, I would like to pretend that they have, and to punish them for these imagined harms, because I think that being randomly punished for no reason will make them improve and be less likely to do actual harms in future.

Replies from: johnswentworth

↑ comment by johnswentworth · 2023-12-09T04:31:47.057Z · LW(p) · GW(p)

Liability is not punishment in-and-of itself. Liability is the ability to be punished, if it is established in court that one did <whatever type of thing one would be liable for>. What I want to create is de-facto machinery which can punish AI companies, insofar as they do harm.

Also, I do not find it at all plausible that no material harm has been done to anyone by AI yet. Surely people have been harmed by hallucinations, deepfakes, etc. And I'm not proposing disproportionate punishments here - just punishments proportionate to the harms. If the benefits of AI greatly exceed the harms (which they clearly do so far), then AI companies would be incentivized by liability to eat that cost short-term and find ways to mitigate the harms long-term.

comment by trevor (TrevorWiesinger) · 2023-12-08T19:32:42.291Z · LW(p) · GW(p)

I think a valuable use of forecasting is predicting which of the top thinkers in technical alignment will successfully pivot towards making big progress/discoveries in AI governance, because thinking of solutions to the current global situation with AI is probably worth a much larger portion of their day than 0-1% (maybe an hour a day or 2 hours per week).

The initial barriers to having good takes on governments are significant, e.g. there are a lot of terrible information sources on global affairs that depict themselves as good while actually being dogshit e.g. news corporations and many documentaries (for example, the military doesn't actually hand over the nuclear arsenal to a new person every 4 or 8 years because they won an election, that's obvious propaganda). I think these barriers to entry are causing our best people to "bounce off" off the most valuable approaches.

Those barriers to entry are worth overcoming, I currently haven't yet thought of a good way to do that without flying people to DC (the bay area has a strong libertarian cultural background, which results in failure modes like assuming that all parts of all intelligence agencies are as incompetent as FDA bureaucrats [LW · GW]).

I'm currently aware of solid potential for John Wentworth, Jan Kulviet, and Andrew Critch. Yudkowsky already resolved as "YES" [LW · GW]. These people should be made ready to make big discoveries ASAP since the world is moving fast.

comment by faul_sname · 2023-12-08T22:52:17.791Z · LW(p) · GW(p)

Would you mind giving a concrete example of what are you imagining that a typical lawsuit that is not possible under today's laws, but would be possible under your proposal, would look like? In particular:

Establishing standing: What real and tangible harm occurred to the plaintiff, and how did the defendant's actions cause that harm? Alternatively, what immanent (still real and tangible) harm will be reasonably expected to occur if the court does not intervene? Alternatively alternatively, what specific statute did the defendant violate, such that the violation of that statute provides sufficient standing? (You can invent a new statute if you need to, this is your scenario)
Proof of claims: For example in a negligence case, how does the plaintiff prove that the defendant failed in their duty to exercise a reasonable standard of care while performing acts that could foreseeably harm others, and that that failure caused the injury to the plaintiff (i.e. the plaintiff would not have been harmed if not for the defendant's foreseeably bad actions), and that the plaintiff in fact came to harm.
Nature of the relief to be granted: There must be some way for the court to actually provide some sort of remedy to the plaintiff. This can be money, or an order for the defendant to stop doing something, or even a declaration about what rights the involved parties have.

(I'm assuming you're not trying to overhaul the entire legal system, and also that you're looking at civil cases).

Edit: If you find that your typical example doesn't require you to invent new statutes, you might be able to skip the "invent new laws" bit and jump straight to "try to find people who have been tangibly harmed by something an AI tool did, where the harm was foreseeable and the company releasing the AI tool failed to take precautions that a reasonable person would have, and then fund their lawsuits".

You a should probably expect this shorter process to take a few years. Lawsuits are slow.

Also, I am not a lawyer. This is not legal advice.

Replies from: johnswentworth

↑ comment by johnswentworth · 2023-12-09T02:21:20.720Z · LW(p) · GW(p)

I haven't put enough effort into this to provide a full concrete example; figuring out which of these sorts of questions I need to ask, and coming up with (initial) answers, is the sort of thing I'd spend the first few days on if I were going to work on such a project full-time. (The main thing I don't have is enough knowledge of tort law to know what the key questions are and what the plausible solution space looks like; presumably a lawyer would have a lot of relative advantage here.)

That said, some pieces which I expect to be part of a full answer:

It's very plausible to me that new statutes aren't needed and the project could indeed just jump straight to finding cases to take on. I would definitely be on the lookout for that.
In terms of changes to the law, one major thing I'm pretty sure I'd want to aim for (conditional on aiming for changes to the law at all) would be strict liability. Ideally, I'd prefer plaintiffs not need to establish reasonable foreseeability at all for damages from AI products, or at least put burden of proof on the defendant to disprove it.

Replies from: faul_sname

↑ comment by faul_sname · 2023-12-09T04:13:50.576Z · LW(p) · GW(p)

Sorry if it looked like I was asking for a super high level of detail -- I'm more wondering what your working examples are, assuming you're developing this idea with some of those in mind (maybe you're not thinking of anything concrete at all? I don't think I could do that, but it wouldn't surprise me overly much if some people could.)

So while reading your post, I largely had three candidate examples in mind - specifically AI tools purpose-built for malicious purposes, autonomous AI agents causing problems, and general-purpose tools being misused.

Realistic Examples

Case 1: DeepFakes-R-Us and the Gullible Grandmother

A US company^[1], DeepFakes-R-Us, releases a tool that allows for real-time modification of video content to transform the speaker's voice and motions to look like any target for which 30 seconds of video footage can be acquired. DeepFakes-R-Us accepts payment in cryptocurrency, without verifying who is paying for the service. Further, they do not check that the target has consented to having their likeness imitated, nor do they check whether the user is using the service to transform obviously-malicious speech despite being aware that it historically has been abused and abuse-detection tools being widely available.

Someone uses DeepFakes-R-Us to fake a phone call to a wealthy but senile elderly woman, impersonating her grandson, and saying that he has gotten in trouble in a foreign country and needs $100,000 wired to a foreign account to bail him out. The elderly woman wires the money overseas.

In this case, we'd have:

Standing: The financial loss due to fraud constitutes a concrete and particularized injury. She needs to establish a causal link between that injury and the actions of DeepFakes-R-Us. No direct causation exists, but she could argue that DeepFakes-R-Us facilitated the fraud by providing the tool. This injury is redressable: specifically, she can be made whole by giving her the money she lost.
Claims: Probably negligence, ChatGPT claims maybe also something called "vicarious liability" and something else called "failure to warn".
Proof: Documentation that the fraudulent transaction occurred, evidence that the technology provided by DeepFakes-R-Us was directly used in the fraud and that but for their contribution, the fraud would not have happened
Nature of relief: ＄＄＄＄.

Case 2: The Hacktastic Stock Trading Bot

An AI stock-trading bot, operated by a financial organization, is programmed to maximize profit in whatever way it can, including an unrestricted internet connection^[2]. The bot discovers that if it shorts a company's stock, and then finds vulnerabilities in that company’s systems and exfiltrates and publishes the data, and publicizes the data breach, it can gain a competitive advantage in trading by knowing that the company's stock is likely to decrease in value. In one instance, the bot exploits a vulnerability in a healthcare company's system,^[3] leading to a massive data breach. Among the compromised data are sensitive medical records of numerous patients.

One specific individual Jane Doe, was in a highly sensitive occupation (e.g., a covert operative or a public figure in a sensitive role). The data breach exposed her medical records, leading to her immediate dismissal from her position, causing her career damage, severe emotional distress, and financial loss.

In this case, we'd have

Standing: My understanding is that Jane Doe has sustained an injury in the form of financial loss and emotional distress caused by the bot, acting in the interests of the company (this might get legally interesting), and that her injury is redressable through monetary compensation or whatever.
Claims: Probably negligence (the company had a responsibility to ensure that its bot operated within the bounds of the law, and failed to do so), breach of privacy (for the obvious reasons)
Proof: Evidence that that particular bot caused that particular data breach, that the breach caused her dismissal, and that her dismissal harmed her.
Nature of relief: ＄＄＄＄, hopefully an injunction to make the company stop using the bot.

Case 3: Novel OCR Heist

Milliprog^[4] is a multi-billion dollar corporation most known for its wide variety of desktop software for productivity on day-to-day office tasks. This year, it released an exciting new digitization tool, which allows users to OCR an entire book's worth of paper records, even handwritten ones, in seconds, simply by flipping through the book in front of a webcam.

K. L. Souling is a bestselling author of the well-known and ongoing "Furry Ceramicist" series of novels^[4]. She is famously tight-lipped about the future events that will happen in the series. She tells people "if you want to find out, you'll have to wait and buy the book". She keeps only one copy of her own notes for the story, handwritten and stored in her office.

One night, one of the cleaning staff in the office pull out Souling's notes and uses the Milliprog OCR software to scan the Furry Ceramicist plot notes, then publishes those notes online.

Souling has clearly sustained an injury, in the form of lost sales of her books as people just look up the plot online, and Milliprog could redress her injury through monetary compensation (＄＄＄＄^[5])

However, establishing causation is going to be difficult in this case -- I think that would require new laws.

My Thoughts

So in cases 1 and 2, I think hoping for legal liability is sensible, though the cases are likely to be legally interesting.^[6]

If you're hoping to establish precedent that Milliprog should face strict liability in case 3, I think that's a pretty hard sell.^[7]

I hope that clarifies what I was going for.

Sincerely,

I am not a lawyer and this is not legal advice.

^{^}
Ok, I may have exaggerated a little bit when I called the examples "realistic".
^{^}
Always a good plan
^{^}
I have to take some liberties with the realism, obviously^[8] a healthcare provider would never actually use software with exploitable vulnerabilities.
^{^}
Any similarity to actual persons, living or dead, is purely coincidental. Please don't sue me.
^{^}
Shocking that that's the form of redress, I know.
^{^}
This is lawyer speak for "the bill will have more digits than you expect".
^{^}
Because it's a terrible idea. Please don't do this.
^{^}
This is sarcasm, in case it wasn't clear.

Replies from: aphyer

↑ comment by aphyer · 2023-12-09T18:17:27.819Z · LW(p) · GW(p)

How do you distinguish your Case 1 from 'impose vast liability on Adobe for making Photoshop'?

Replies from: faul_sname

↑ comment by faul_sname · 2023-12-09T22:17:45.261Z · LW(p) · GW(p)

Short answer is "foreseeability of harm coming from the tool being used as intended". Law is not computer code, so for example intent and reasonableness matter here.

Long answer should probably be a full post.

I think the train of thought here mostly is that people here implicitly have 2 as their main threat model for how things will actually go wrong in practice, but they want legal precedents to be in place before any actual incidents happen, and as such are hoping that increasing the legal risk of companies doing things like 1 and 3 will work for that purpose.

And I think that, while legal liability for cases like 1 is probably good in egregious cases, extending to that to cases where there is no intent of harm and no reasonable expectation of harm (like 3) is a terrible idea, and separately that pushing for 1 won't significantly help with 2.

That's also part of a broader pattern of "let's figure out what outcomes we want from a policy, and then say that we should advocate for policies that cause those outcomes, and then either leave the choice of specific policy as an exercise for the reader (basically fine) or suggest a policy that will not accomplish those goals and also predictably cause a bunch of terrible outcomes (not so fine)". But I think the idea that the important part is to come up with the intended outcomes of your policy the rest is just unimportant implementation details is bad, and maybe impactfully so if people are trying to take the Churchill "never let a good crisis go to waste" approach for getting their political agenda implemented (i.e. prepare policy suggestions in advance and then push them really hard once a crisis occurs that plausibly could have been mitigated by your favored policy).

Yeah, after writing that out I really think I need to write a full post here.

Replies from: sharmake-farah

↑ comment by Noosphere89 (sharmake-farah) · 2023-12-10T00:21:24.852Z · LW(p) · GW(p)

But I think the idea that the important part is to come up with the intended outcomes of your policy the rest is just unimportant implementation details is bad

This is the story of a lot of failed policies, especially policies that goodharted on their goals, and I'm extremely scared if people don't actually understand this and use it.

This is a big flaw of a lot of radical groups, and I see this as a warning sign that your policy proposals aren't net-positive.

comment by Zach Stein-Perlman · 2023-12-08T08:04:23.884Z · LW(p) · GW(p)

I think you use "AI governance" to mean "AI policy," thereby excluding e.g. lab governance (e.g. structured access and RSPs). But possibly you mean to imply that AI governance minus AI policy is not a priority.

Replies from: johnswentworth

↑ comment by johnswentworth · 2023-12-08T08:42:41.205Z · LW(p) · GW(p)

I indeed mean to imply that AI governance minus AI policy is not a priority. Before the recent events at OpenAI, I would have assigned minority-but-not-negligible probability to the possibility that lab governance might have any meaningful effect. After the recent events at OpenAI... the key question is "what exactly is the mechanism by which lab governance will result in a dangerous model not being built/deployed?", and the answer sure seems to be "it won't". (Note that I will likely update back toward minority-but-not-negligible probability if the eventual outcome at OpenAI involves a board which will clearly say "no" sometimes, in ways which meaningfully impact the bottom line, and actually get their way when they do so.)

Things like structured access and RSPs are nice-to-have, but I do not see any plausible trajectory on which those successfully address major bottlenecks to humanity's survival.

Replies from: trevor, Erich_Grunewald

↑ comment by tlevin (trevor) · 2023-12-09T18:31:51.436Z · LW(p) · GW(p)

I broadly share your prioritization of public policy over lab policy, but as I've learned more about liability, the more it seems like one or a few labs having solid RSPs/evals commitments/infosec practices/etc would significantly shift how courts make judgments about how much of this kind of work a "reasonable person" would do to mitigate the foreseeable risks. Legal and policy teams in labs will anticipate this and thus really push for compliance with whatever the perceived industry best practice is. (Getting good liability rulings or legislation would multiply this effect.)

↑ comment by Erich_Grunewald · 2023-12-08T16:00:18.796Z · LW(p) · GW(p)

Fwiw, there is also AI governance work that is neither policy nor lab governance, in particular trying to answer broader strategic questions that are relevant to governance, e.g., timelines, whether a pause is desirable [? · GW], which intermediate goals are valuable to aim for, and how much computing power Chinese actors will have access to. I guess this is sometimes called "AI strategy", but often the people/orgs working on AI governance also work on AI strategy, and vice versa, and they kind of bleed into each other.

How do you feel about that sort of work relative to the policy work you highlight above?

Replies from: johnswentworth

↑ comment by johnswentworth · 2023-12-08T16:56:11.569Z · LW(p) · GW(p)

Let's go through those:

Timelines appear to me [LW · GW] to be at least one and maybe two orders-of-magnitude more salient than they are strategically relevant, in EA/rationalist circles. I think the right level of investment in work on them is basically "sometimes people who are interested write blogposts on them in their spare time", and it is basically not worthwhile for anyone to focus their main work on timelines at current margins. Also, the "trying to be serious" efforts on timelines typically look-to-me to be basically bullshit - i.e. they make basically-implausible assumptions which simplify the problem, and then derive nonsense conclusions from those. (Ajeya's biological anchors report is a good central example here.) (Also, to be clear, that sort of work is great insofar as people use it as a toy model and invest both effort and credence accordingly.)
The AI pause debate seems to be a "some people had a debate in their spare time" sort of project, and seems like a pretty solid spare-time thing to do. But if making that debate happen was someone's main job for two months, then I'd be pretty unimpressed with their productivity.
I think there's room for high-value work on figuring out which intermediate goals are valuable to aim for. That work does not look like running a survey on which intermediate goals are valuable to aim for. It looks like some careful thinking backchained from end-goals (like "don't die to AI"), a bunch of red-teaming of ideas and distilling key barriers, combined with "comprehensive info gathering [LW · GW]"-style research to find what options are available (like e.g. the regulatory survey mentioned in the OP). The main goal of such work would be to discover novel high-value intermediate goals, barriers and unknown unknowns. (Note that there is value in running surveys to create common knowledge, but that's a very different use-case from figuring things out.)
I do not currently see much reason to care about how much computing power Chinese actors will have access to. If the world were e.g. implementing a legal regime around AI which used compute availability as a major lever, then sure, availability of computing power to Chinese actors would be important for determining how much buy-in is needed from the Chinese government to make that legal regime effective. But realistically, the answer to that is probably "the Chinese need to be on-board for a legal strategy to be effective regardless". (Also I don't expect Chinese interest to be rate-limiting anyway, their governments' incentives are much more directly lined up with X-risk mitigation interests than most other governments).

In general, there are versions of strategy work which I would consider useful. But in practice, it looks-to-me like people who invest full-time effort in such things do not usually produce more value than people just e.g. writing off-the-cuff posts on the topic or organizing debates as side-projects or spending a few evenings going through some data and then writing up as a moderate-effort post. Most people do not seem to know how to put more effort into such projects in a way which produces more actual value, as opposed to just more professional-looking outputs.

Replies from: sharmake-farah

↑ comment by Noosphere89 (sharmake-farah) · 2023-12-08T20:46:24.808Z · LW(p) · GW(p)

Timelines appear to me to be at least one and maybe two orders-of-magnitude more salient than they are strategically relevant, in EA/rationalist circles. I think the right level of investment in work on them is basically "sometimes people who are interested write blogposts on them in their spare time", and it is basically not worthwhile for anyone to focus their main work on timelines at current margins. Also, the "trying to be serious" efforts on timelines typically look-to-me to be basically bullshit - i.e. they make basically-implausible assumptions which simplify the problem, and then derive nonsense conclusions from those. (Ajeya's biological anchors report is a good central example here.) (Also, to be clear, that sort of work is great insofar as people use it as a toy model and invest both effort and credence accordingly.)

Also, there's already a not terrible model in this post, which I'd use as a reference:

https://www.lesswrong.com/posts/3nMpdmt8LrzxQnkGp/ai-timelines-via-cumulative-optimization-power-less-long [LW · GW]

The AI pause debate seems to be a "some people had a debate in their spare time" sort of project, and seems like a pretty solid spare-time thing to do. But if making that debate happen was someone's main job for two months, then I'd be pretty unimpressed with their productivity.

I disagree with this, mostly because of Nora Belrose making some very important points, and unifying and making better arguments that AI is easy to control, so I'm happy with how the debate went.

comment by Review Bot · 2024-02-14T06:49:19.608Z · LW(p) · GW(p)

The LessWrong Review [? · GW] runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2024. The top fifty or so posts are featured prominently on the site throughout the year.

Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?

What I Would Do If I Were Working On AI Governance

Contents

Liability

Why Liability?

How Liability?

Regulatory Survey and Small Interventions

Agencies Which Enforce Secrecy

Ambitious Legislation

Public Opinion (And Perception Thereof)

Legislation Design

Selling It

Concluding Thoughts

32 comments

Realistic Examples

Case 1: DeepFakes-R-Us and the Gullible Grandmother

Case 2: The Hacktastic Stock Trading Bot

Case 3: Novel OCR Heist

My Thoughts