Posts

Pay Risk Evaluators in Cash, Not Equity 2024-09-07T02:37:59.659Z
Non-Disparagement Canaries for OpenAI 2024-05-30T19:20:13.022Z
OMMC Announces RIP 2024-04-01T23:20:00.433Z
Safetywashing 2022-07-01T11:56:33.495Z
Matt Botvinick on the spontaneous emergence of learning algorithms 2020-08-12T07:47:13.726Z
At what point should CFAR stop holding workshops due to COVID-19? 2020-02-25T09:59:17.910Z
CFAR: Progress Report & Future Plans 2019-12-19T06:19:58.948Z
Why are the people who could be doing safety research, but aren’t, doing something else? 2019-08-29T08:51:33.219Z
adam_scholl's Shortform 2019-08-12T00:53:37.221Z

Comments

Comment by Adam Scholl (adam_scholl) on COT Scaling implies slower takeoff speeds · 2024-09-29T00:12:43.193Z · LW · GW

I claim the phrasing in your first comment ("significant AI presence") and your second ("AI driven R&D") are pretty different—from my perspective, the former doesn't bear much on this argument, while the latter does. But I think little of the progress so far has resulted from AI-driven R&D?

Comment by Adam Scholl (adam_scholl) on COT Scaling implies slower takeoff speeds · 2024-09-28T23:55:49.500Z · LW · GW

Huh, this doesn't seem clear to me. It's tricky to debate what people used to be imagining, especially on topics where those people were talking past each other this much, but my impression was that the fast/discontinuous argument was that rapid, human-mostly-or-entirely-out-of-the-loop recursive self-improvement seemed plausible—not that earlier, non-self-improving systems wouldn't be useful.

Comment by Adam Scholl (adam_scholl) on COT Scaling implies slower takeoff speeds · 2024-09-28T23:26:08.094Z · LW · GW

Why do you think this? Recursive self-improvement isn't possible yet, so from my perspective it doesn't seem like we've encountered much evidence either way about how fast it might scale.

Comment by Adam Scholl (adam_scholl) on Why I funded PIBBSS · 2024-09-20T00:34:42.607Z · LW · GW

Given both my personal experience with LLMs and my reading of the role that empirical engagement has historically played in non-paradigmatic research, I tend to advocate for a methodology which incorporates immediate feedback loops with present day deep learning systems over the classical "philosophy -> math -> engineering" deconfusion/agent foundations paradigm.

I'm curious what your read of the history is, here? My impression is that most important paradigm-forming work so far has involved empirical feedback somehow, but often in ways exceedingly dissimilar from/illegible to prevailing scientific and engineering practice.

I have a hard time imagining scientists like e.g. Darwin, Carnot, or Shannon describing their work as depending much on "immediate feedback loops with present day" systems. So I'm curious whether you think PIBBSS would admit researchers like these into your program, were they around and pursuing similar strategies today?

Comment by Adam Scholl (adam_scholl) on The Checklist: What Succeeding at AI Safety Will Involve · 2024-09-11T17:03:11.936Z · LW · GW

For what it's worth, as someone in basically the position you describe—I struggle to imagine automated alignment working, mostly because of Godzilla-ish concerns—demos like these do not strike me as cruxy. I'm not sure what the cruxes are, exactly, but I'm guessing they're more about things like e.g. relative enthusiasm about prosaic alignment, relative likelihood of sharp left turn-type problems, etc., than about whether early automated demos are likely to work on early systems.

Maybe you want to call these concerns unserious too, but regardless I do think it's worth bearing in mind that early results like these might seem like stronger/more relevant evidence to people whose prior is that scaled-up versions of them would be meaningfully helpful for aligning a superintelligence.

Comment by Adam Scholl (adam_scholl) on AI forecasting bots incoming · 2024-09-10T22:31:38.394Z · LW · GW

I sympathize with the annoyance, but I think the response from the broader safety crowd (e.g., your Manifold market, substantive critiques and general ill-reception on LessWrong) has actually been pretty healthy overall; I think it's rare that peer review or other forms of community assessment work as well or quickly.

Comment by Adam Scholl (adam_scholl) on My Number 1 Epistemology Book Recommendation: Inventing Temperature · 2024-09-08T23:11:49.956Z · LW · GW

It's not a full conceptual history, but fwiw Boole does give a decent account of his own process and frustrations in the preface and first chapter of his book.

Comment by Adam Scholl (adam_scholl) on Pay Risk Evaluators in Cash, Not Equity · 2024-09-08T20:12:48.155Z · LW · GW

I just meant there are many teams racing to build more agentic models. I agree current ones aren't very agentic, though whether that's because they're meaningfully more like "tools" or just still too stupid to do agency well or something else entirely, feels like an open question to me; I think our language here (like our understanding) remains confused and ill-defined.

I do think current systems are very unlike oracles though, in that they have far more opportunity to exert influence than the prototypical imagined oracle design—e.g., most have I/O with ~any browser (or human) anywhere, people are actively experimenting with hooking them up to robotic effectors, etc.

Comment by Adam Scholl (adam_scholl) on My Number 1 Epistemology Book Recommendation: Inventing Temperature · 2024-09-08T17:30:56.121Z · LW · GW

I liked Thermodynamic Weirdness for similar reasons. It does the best job of books I've found at describing case studies of conceptual progress—i.e., what the initial prevailing conceptualizations were, and how/why scientists realized they could be improved.

It's rare that books describe such processes well, I suspect partly because it's so wildly harder to generate scientific ideas than to understand them, that they tend to strike people as almost blindingly obvious in retrospect. For example, I think it's often pretty difficult for people familiar with evolution to understand why it would have taken Darwin years to realize that organisms that reproduce more influence descendants more, or why it was so hard for thermodynamicists to realize they should demarcate entropy from heat, etc. Weirdness helped make this more intuitive for me, which I appreciate.

(I tentatively think Energy, Force and Matter will end up being my second-favorite conceptual history, but I haven't finished yet so not confident).

Comment by Adam Scholl (adam_scholl) on ryan_greenblatt's Shortform · 2024-09-08T00:45:45.334Z · LW · GW

This seems like a great activity, thank you for doing/sharing it. I disagree with the claim near the end that this seems better than Stop, and in general felt somewhat alarmed throughout at (what seemed to me like) some conflation/conceptual slippage between arguments that various strategies were tractable, and that they were meaningfully helpful. Even so, I feel happy that the world contains people sharing things like this; props.

Comment by Adam Scholl (adam_scholl) on Pay Risk Evaluators in Cash, Not Equity · 2024-09-07T22:54:55.461Z · LW · GW

I think the latter group is is much smaller. I'm not sure who exactly has most influence over risk evaluation, but the most obvious examples are company leadership and safety staff/red-teamers. From what I hear, even those currently receive equity (which seems corroborated by job listings, e.g. Anthropic, DeepMind, OpenAI).

Comment by Adam Scholl (adam_scholl) on Zach Stein-Perlman's Shortform · 2024-08-24T20:52:02.241Z · LW · GW

What seemed psychologizing/unfair to you, Raemon? I think it was probably unnecessarily rude/a mistake to try to summarize Anthropic’s whole RSP in a sentence, given that the inferential distance here is obviously large. But I do think the sentence was fair.

As I understand it, Anthropic’s plan for detecting threats is mostly based on red-teaming (i.e., asking the models to do things to gain evidence about whether they can). But nobody understands the models well enough to check for the actual concerning properties themselves, so red teamers instead check for distant proxies, or properties that seem plausibly like precursors. (E.g., for “ability to search filesystems for passwords” as a partial proxy for “ability to autonomously self-replicate,” since maybe the former is a prerequisite for the latter).

But notice that this activity does not involve directly measuring the concerning behavior. Rather, it instead measures something more like “the amount the model strikes the evaluators as broadly sketchy-seeming/suggestive that it might be capable of doing other bad stuff.” And the RSP’s description of Anthropic’s planned responses to these triggers is so chock full of weasel words and caveats and vague ambiguous language that I think it barely constrains their response at all.

So in practice, I think both Anthropic’s plan for detecting threats, and for deciding how to respond, fundamentally hinge on wildly subjective judgment calls, based on broad, high-level, gestalt-ish impressions of how these systems seem likely to behave. I grant that this process is more involved than the typical thing people describe as a “vibe check,” but I do think it’s basically the same epistemic process, and I expect will generate conclusions around as sound.

Comment by Adam Scholl (adam_scholl) on Zach Stein-Perlman's Shortform · 2024-08-23T02:11:55.698Z · LW · GW

My guess is that most don’t do this much in public or on the internet, because it’s absolutely exhausting, and if you say something misremembered or misinterpreted you’re treated as a liar, it’ll be taken out of context either way, and you probably can’t make corrections.  I keep doing it anyway because I occasionally find useful perspectives or insights this way, and think it’s important to share mine.  That said, there’s a loud minority which makes the AI-safety-adjacent community by far the most hostile and least charitable environment I spend any time in, and I fully understand why many of my colleagues might not want to.

My guess is that this seems so stressful mostly because Anthropic’s plan is in fact so hard to defend, due to making little sense. Anthropic is attempting to build a new mind vastly smarter than any human, and as I understand it, plans to ensure this goes well basically by doing periodic vibe checks to see whether their staff feel sketched out yet. I think a plan this shoddy obviously endangers life on Earth, so it seems unsurprising (and good) that people might sometimes strongly object; if Anthropic had more reassuring things to say, I’m guessing it would feel less stressful to try to reassure them.

Comment by Adam Scholl (adam_scholl) on Fields that I reference when thinking about AI takeover prevention · 2024-08-17T05:04:51.704Z · LW · GW

Open Philanthropy commissioned five case studies of this sort, which ended up being written by Moritz von Knebel; as far as I know they haven't been published, but plausibly someone could convince him to.

Comment by Adam Scholl (adam_scholl) on adam_scholl's Shortform · 2024-08-17T04:07:24.361Z · LW · GW

Those are great examples, thanks; I can totally believe there exist many such problems.

Still, I do really appreciate ~never having to worry that food from grocery stores or restaurants will acutely poison me; and similarly, not having to worry that much that pharmaceuticals are adulterated/contaminated. So overall I think I currently feel net grateful about the FDA’s purity standards, and net hateful just about their efficacy standards?

Comment by Adam Scholl (adam_scholl) on adam_scholl's Shortform · 2024-08-17T00:26:35.875Z · LW · GW

What countries are you imagining? I know some countries have more street food, but from what I anecdotally hear most also have far more food poisoning/contamination issues. I'm not sure what the optimal tradeoff here looks like, and I could easily believe it's closer to the norms in e.g. Southeast Asia than the U.S. But it at least feels much less obvious to me than that drug regulations are overzealous.

(Also note that much regulation of things like food trucks is done by cities/states, not the FDA).

Comment by Adam Scholl (adam_scholl) on adam_scholl's Shortform · 2024-08-16T23:26:49.253Z · LW · GW

Arguments criticizing the FDA often seem to weirdly ignore the "F." For all I know food safety regulations are radically overzealous too, but if so I've never noticed (or heard a case for) this causing notable harm.

Overall, my experience as a food consumer seems decent—food is cheap, and essentially never harms me in ways I expect regulators could feasibly prevent (e.g., by giving me food poisoning, heavy metal poisoning, etc). I think there may be harmful contaminants in food we haven't discovered yet, but if so I mostly don't blame the FDA for that lack of knowledge, and insofar as I do it seems an argument they're being under-zealous.

Comment by Adam Scholl (adam_scholl) on Buck's Shortform · 2024-08-06T17:01:50.961Z · LW · GW

I agree it seems good to minimize total risk, even when the best available actions are awful; I think my reservation is mainly that in most such cases, it seems really important to say you're in that position, so others don't mistakenly conclude you have things handled. And I model AGI companies as being quite disincentivized from admitting this already—and humans generally as being unreasonably disinclined to update that weird things are happening—so I feel wary of frames/language that emphasize local relative tradeoffs, thereby making it even easier to conceal the absolute level of danger.

Comment by Adam Scholl (adam_scholl) on Buck's Shortform · 2024-08-06T06:43:09.639Z · LW · GW
  • *The rushed reasonable developer regime.* The much riskier regimes I expect, where even relatively reasonable AI developers are in a huge rush and so are much less able to implement interventions carefully or to err on the side of caution.

I object to the use of the word "reasonable" here, for similar reasons I object to Anthropic's use of the word "responsible." Like, obviously it could be the case that e.g. it's simply intractable to substantially reduce the risk of disaster, and so the best available move is marginal triage; this isn't my guess, but I don't object to the argument. But it feels important to me to distinguish strategies that aim to be "marginally less disastrous" from those which aim to be "reasonable" in an absolute sense, and I think strategies that involve creating a superintelligence without erring much on the side of caution generally seem more like the former sort.

Comment by Adam Scholl (adam_scholl) on Buck's Shortform · 2024-08-06T06:06:19.213Z · LW · GW

It sounds like you think it's reasonably likely we'll end up in a world with rogue AI close enough in power to humanity/states to be competitive in war, yet not powerful enough to quickly/decisively win? If so I'm curious why; this seems like a pretty unlikely/unstable equilibrium to me, given how much easier it is to improve AI systems than humans.

Comment by Adam Scholl (adam_scholl) on Buck's Shortform · 2024-08-06T05:55:55.662Z · LW · GW

I do basically assume this, but it isn't cruxy so I'll edit.

Comment by Adam Scholl (adam_scholl) on Buck's Shortform · 2024-08-06T03:29:13.558Z · LW · GW

*The existential war regime*. You’re in an existential war with an enemy and you’re indifferent to AI takeover vs the enemy defeating you. This might happen if you’re in a war with a nation you don’t like much, or if you’re at war with AIs.

Does this seem likely to you, or just an interesting edge case or similar? It's hard for me to imagine realistic-seeming scenarios where e.g. the United States ends up in a war where losing would be comparably bad to AI takeover. This is mostly because ~no functional states (certainly no great powers) strike me as so evil that I'd prefer extinction AI takeover to those states becoming a singleton, and for basically all wars where I can imagine being worried about this—e.g. with North Korea, ISIS, Juergen Schmidhuber—I would expect great powers to be overwhelmingly likely to win. (At least assuming they hadn't already developed decisively-powerful tech, but that's presumably the case if a war is happening).

Comment by Adam Scholl (adam_scholl) on Twitter thread on open-source AI · 2024-07-31T15:12:47.909Z · LW · GW

We should generally have a strong prior favoring technology in general

Should we? I think it's much more obvious that the increase in human welfare so far has mostly been caused by technology, than that most technologies have net helped humans (much less organisms generally).

I'm quite grateful for agriculture now, but unsure I would have been during the Bronze Age; grateful for nuclear weapons, but unsure how many nearby worlds I'd feel similarly; net bummed about machine guns, etc.

Comment by Adam Scholl (adam_scholl) on tlevin's Shortform · 2024-07-31T12:42:21.585Z · LW · GW

I agree music has this effect, but I think the Fence is mostly because it also hugely influences the mood of the gathering, i.e. of the type and correlatedness of people's emotional states.

(Music also has some costs, although I think most of these aren't actually due to the music itself and can be avoided with proper acoustical treatment. E.g. people sometimes perceive music as too loud because the emitted volume is literally too high, but ime people often say this when the noise is actually overwhelming for other reasons, like echo (insofar as walls/floor/ceiling are near/hard/parallel), or bass traps/standing waves (such that the peak amplitude of the perceived wave is above the painfully loud limit, even though the average amplitude is fine; in the worst cases, this can result in barely being able to hear the music while simultaneously perceiving it as painfully loud!)

Comment by Adam Scholl (adam_scholl) on Towards more cooperative AI safety strategies · 2024-07-24T00:11:56.061Z · LW · GW

I appreciate you adding the note, though I do think the situation is far more unusual than described. I agree it's widely priced in that companies in general seek power, but I think probably less so that the author of this post personally works for a company which is attempting to acquire drastically more power than any other company ever, and that much of the behavior the post describes as power-seeking amounts to "people trying to stop the author and his colleagues from attempting that."

Comment by Adam Scholl (adam_scholl) on Towards more cooperative AI safety strategies · 2024-07-23T12:40:18.489Z · LW · GW

Yeah, this omission felt pretty glaring to me. OpenAI is explicitly aiming to build "the most powerful technology humanity has yet invented." Obviously that doesn't mean Richard is wrong that the AI safety community is too power-seeking, but I would sure have appreciated him acknowledging/grappling with the fact that the company he works for is seeking to obtain more power than any group of people in history by a gigantic margin.

Comment by Adam Scholl (adam_scholl) on Buck's Shortform · 2024-06-26T01:51:11.509Z · LW · GW

I agree we might end up in a world like that, where it proves impossible to make a decent safety case. I just think of the ~whole goal of alignment research as figuring out how to avoid that world, i.e. of figuring out how to mitigate/estimate the risk as much/precisely as needed to make TAI worth building.

Currently, AI risk estimates are mostly just verbal statements like "I don't know man, probably some double digit chance of extinction." This is exceedingly unlike the sort of predictably tolerable risk humanity normally expects from its engineering projects, and which e.g. allows for decent safety cases. So I think it's quite important to notice how far we currently are from being able to make them, since that suggests the scope and nature of the problem.

Comment by Adam Scholl (adam_scholl) on Buck's Shortform · 2024-06-25T21:45:41.947Z · LW · GW

Maybe I'm just confused what you mean by those words, but where is the disanalogy with safety engineering coming from? That normally safety engineering focuses on mitigating risks with complex causes, whereas AI risk is caused by some sort of scaffolding/bureaucracy which is simpler?

Comment by Adam Scholl (adam_scholl) on Buck's Shortform · 2024-06-25T21:02:29.730Z · LW · GW

I'm still confused what sort of simplicity you're imagining? From my perspective, the type of complexity which determines the size of the fail surface for alignment mostly stems from things like e.g. "degree of goal stability," "relative detectability of ill intent," and other such things that seem far more complicated than airplane parts.

Comment by Adam Scholl (adam_scholl) on Buck's Shortform · 2024-06-25T20:05:33.940Z · LW · GW

What's the sense in which you think they're more simple? Airplanes strike me as having a much simpler fail surface.

Comment by Adam Scholl (adam_scholl) on Buck's Shortform · 2024-06-25T18:22:07.842Z · LW · GW

Right, but then from my perspective it seems like the core problem is that the situations are currently disanalogous, and so it feels reasonable and important to draw the analogy.

Comment by Adam Scholl (adam_scholl) on Buck's Shortform · 2024-06-25T18:03:54.448Z · LW · GW

I agree we don’t currently know how to prevent AI systems from becoming adversarial, and that until we do it seems hard to make strong safety cases for them. But I think this inability is a skill issue, not an inherent property of the domain, and traditionally the core aim of alignment research was to gain this skill.

Plausibly we don’t have enough time to figure out how to gain as much confidence that transformative AI systems are safe as we typically have about e.g. single airplanes, but in my view that’s horrifying, and I think it’s useful to notice how different this situation is from the sort humanity is typically willing to accept.

Comment by Adam Scholl (adam_scholl) on Non-Disparagement Canaries for OpenAI · 2024-06-04T02:00:49.359Z · LW · GW

Thanks, that's helpful context.

I also have a model of how people choose whether or not to make public statements where it’s extremely unsurprising most people would not choose to do so.

I agree it's unsurprising that few rank-and-file employees would make statements, but I am surprised by the silence from those in policy/evals roles. From my perspective, active non-disparagement obligations seem clearly disqualifying for most such roles, so I'd think they'd want to clarify.

Comment by Adam Scholl (adam_scholl) on Non-Disparagement Canaries for OpenAI · 2024-06-03T22:19:21.569Z · LW · GW

I am quite confident the contract has been widely retracted. 

Can you share your reasons for thinking this? Given that people who remain bound can’t say so, I feel hesitant to conclude that people aren’t without clear evidence.

I am unaware of any people who signed the agreement after 2019 and did not receive the email, outside cases where the nondisparagement agreement was mutual (which includes Sutskever and likely also Anthropic leadership).

Excepting Jack Clark (who works for Anthropic) and Remco Zwetsloot (who left in 2018), I would think all the policy leadership folks listed above meet these criteria, yet none have reported being released. Would you guess that they have been?

Comment by Adam Scholl (adam_scholl) on Non-Disparagement Canaries for OpenAI · 2024-05-31T22:28:45.804Z · LW · GW

Yeah, the proposal here differs from warrant canaries in that it doesn't ask people to proactively make statements ahead of time—it just relies on the ability of some people who can speak, to provide evidence that others can't. So if e.g. Bob and Joe have been released, but Alice hasn't, then Bob and Joe saying they've been released makes Alice's silence more conspicuous.

Comment by Adam Scholl (adam_scholl) on Non-Disparagement Canaries for OpenAI · 2024-05-31T22:15:58.446Z · LW · GW

the post appears to wildly misinterpret the meaning of this term as "taking any actions which might make the company less valuable"

I'm not a lawyer, and I may be misinterpreting the non-interference provision—certainly I'm willing to update the post if so! But upon further googling, my current understanding is still that in contracts, "interference" typically means "anything that disrupts, damages or impairs business."

And the provision in the OpenAI offboarding agreement is written so broadly—"Employee agrees not to interfere with OpenAI’s relationship with current or prospective employees, current or previous founders, portfolio companies, suppliers, vendors or investors"—that I assumed it was meant to encompass essentially all business impact, including e.g. the company's valuation.

Comment by Adam Scholl (adam_scholl) on Non-Disparagement Canaries for OpenAI · 2024-05-31T21:36:45.503Z · LW · GW

I agree, but I also doubt the contract even has been widely retracted. Why do you think it has, Jacob? Quite few people have reported being released so far.

Comment by Adam Scholl (adam_scholl) on Non-Disparagement Canaries for OpenAI · 2024-05-31T21:15:05.433Z · LW · GW

I agree, but I think it still matters whether or not he's bound by the actual agreement. One might imagine that he's carefully pushing the edge of what he thinks he can get away with saying, for example, in which case he may still not be fully free to speak his mind. And since I would much prefer to live in a world where he is, I'm wary of prematurely concluding otherwise without clear evidence.

Comment by Adam Scholl (adam_scholl) on Non-Disparagement Canaries for OpenAI · 2024-05-31T03:09:52.769Z · LW · GW

Thanks! Edited to fix.

Comment by Adam Scholl (adam_scholl) on We might be missing some key feature of AI takeoff; it'll probably seem like "we could've seen this coming" · 2024-05-10T02:11:49.221Z · LW · GW

Do you expect AI labs would actually run extensive experimental tests in this world? I would be surprised if they did, even if such a window does arise.

(To roughly operationalize: I would be surprised to hear a major lab spent more than 5 FTE-years conducting such tests, or that the tests decreased the p(doom) of the average reasonably-calibrated external observer by more than 10%).

Comment by Adam Scholl (adam_scholl) on Paul Christiano named as US AI Safety Institute Head of AI Safety · 2024-04-26T01:53:49.391Z · LW · GW

This thread isn't seeming very productive to me, so I'm going to bow out after this. But yes, it is a primary concern—at least in the case of Open Philanthropy, it's easy to check what their primary concerns are because they write them up. And accidental release from dual use research is one of them.

Comment by Adam Scholl (adam_scholl) on Paul Christiano named as US AI Safety Institute Head of AI Safety · 2024-04-25T03:10:54.825Z · LW · GW

the idea that we should have "BSL-5" is the kind of silly thing that novice EAs propose that doesn't make sense because there literally isn't something significantly more restrictive

I mean, I'm sure something more restrictive is possible. But my issue with BSL levels isn't that they include too few BSL-type restrictions, it's that "lists of restrictions" are a poor way of managing risk when the attack surface is enormous. I'm sure someday we'll figure out how to gain this information in a safer way—e.g., by running simulations of GoF experiments instead of literally building the dangerous thing—but at present, the best available safeguards aren't sufficient.

I also think that "nearly all EA's focused on biorisk think gain of function research should be banned" is obviously underspecified, and wrong because of the details. 

I'm confused why you find this underspecified. I just meant "gain of function" in the standard, common-use sense—e.g., that used in the 2014 ban on federal funding for such research.

Comment by Adam Scholl (adam_scholl) on Paul Christiano named as US AI Safety Institute Head of AI Safety · 2024-04-25T00:49:59.702Z · LW · GW

I think we must still be missing each other somehow. To reiterate, I'm aware that there is non-accidental biorisk, for which one can hardly blame the safety measures. But there is also accident risk, since labs often fail to contain pathogens even when they're trying to.

Comment by Adam Scholl (adam_scholl) on Paul Christiano named as US AI Safety Institute Head of AI Safety · 2024-04-23T05:15:33.382Z · LW · GW

My guess is more that we were talking past each other than that his intended claim was false/unrepresentative. I do think it's true that EA's mostly talk about people doing gain of function research as the problem, rather than about the insufficiency of the safeguards; I just think the latter is why the former is a problem.

Comment by Adam Scholl (adam_scholl) on Paul Christiano named as US AI Safety Institute Head of AI Safety · 2024-04-23T02:43:35.777Z · LW · GW

There have been frequent and severe biosafety accidents for decades, many of which occurred at labs which were attempting to follow BSL protocol.

Comment by Adam Scholl (adam_scholl) on Paul Christiano named as US AI Safety Institute Head of AI Safety · 2024-04-23T02:36:27.166Z · LW · GW

The EA cause area around biorisk is mostly happy to rely on those levels

I disagree—I think nearly all EA's focused on biorisk think gain of function research should be banned, since the risk management framework doesn't work well enough to drive the expected risk below that of the expected benefit. If our framework for preventing lab accidents worked as well as e.g. our framework for preventing plane accidents, I think few EA's would worry much about GoF.

(Obviously there are non-accidental sources of biorisk too, for which we can hardly blame the safety measures; but I do think the measures work sufficiently poorly that even accident risk alone would justify a major EA cause area).

Comment by Adam Scholl (adam_scholl) on Express interest in an "FHI of the West" · 2024-04-19T15:53:25.256Z · LW · GW

Man, I can’t believe there are no straightforwardly excited comments so far!

Personally, I think an institution like this is sorely needed, and I’d be thrilled if Lightcone built one. There are remarkably few people in the world who are trying to think carefully about the future, and fewer still who are trying to solve alignment; institutions like this seem like one of the most obvious ways to help them.

Comment by Adam Scholl (adam_scholl) on Express interest in an "FHI of the West" · 2024-04-19T14:50:05.513Z · LW · GW

Your answer might also be "I, Oliver, will play this role". My gut take would be excited for you to be like one of three people in this role (with strong co-leads, who are maybe complementary in the sense that they're strong at some styles of thinking you don't know exactly how to replicate), and kind of weakly pessimistic about you doing it alone. (It certainly might be that that pessimism is misplaced.)

For what it’s worth, my guess is that your pessimism is misplaced. Oliver certainly isn’t as famous as Bostrom, so I doubt he’d be a similar “beacon.” But I’m not sure a beacon is needed—historically, plenty of successful research institutions (e.g. Bells Labs, IAS, the Royal Society in most eras) weren’t led by their star researchers, and the track record of those that were strikes me as pretty mixed.

Oliver spends most of his time building infrastructure for researchers, and I think he’s become quite good at it. For example, you are reading this comment on (what strikes me as) rather obviously the best-designed forum on the internet; I think the review books LessWrong made are probably the second-best designed books I’ve seen, after those from Stripe Press; and the Lighthaven campus is an exceedingly nice place to work.

Personally, I think Oliver would probably be my literal top choice to head an institution like this.

Comment by Adam Scholl (adam_scholl) on Express interest in an "FHI of the West" · 2024-04-19T13:43:46.487Z · LW · GW

I ask partly because I personally would be more excited of a version of this that wasn't ignoring AGI timelines, but I think a version of this that's not ignoring AGI timelines would probably be quite different from the intellectual spirit/tradition of FHI.

This frame feels a bit off to me. Partly because I don’t think FHI was ignoring timelines, and because I think their work has proved quite useful already—mostly by substantially improving our concepts for reasoning about existential risk.

But also, the portfolio of alignment research with maximal expected value need not necessarily perform well in the most likely particular world. One might imagine, for example—and indeed this is my own bet—that the most valuable actions we can take will only actually save us in the subset of worlds in which we have enough time to develop a proper science of alignment.

Comment by Adam Scholl (adam_scholl) on Paul Christiano named as US AI Safety Institute Head of AI Safety · 2024-04-17T08:08:43.246Z · LW · GW

I agree metrology is cool! But I think units are mostly helpful for engineering insofar as they reflect fundamental laws of nature—see e.g. the metric units—and we don't have those yet for AI. Until we do, I expect attempts to define them will be vague, high-level descriptions more than deep scientific understanding.

(And I think the former approach has a terrible track record, at least when used to define units of risk or controllability—e.g. BSL levels, which have failed so consistently and catastrophically they've induced an EA cause area, and which for some reason AI labs are starting to emulate).