Posts

Non-Disparagement Canaries for OpenAI 2024-05-30T19:20:13.022Z
OMMC Announces RIP 2024-04-01T23:20:00.433Z
Safetywashing 2022-07-01T11:56:33.495Z
Matt Botvinick on the spontaneous emergence of learning algorithms 2020-08-12T07:47:13.726Z
At what point should CFAR stop holding workshops due to COVID-19? 2020-02-25T09:59:17.910Z
CFAR: Progress Report & Future Plans 2019-12-19T06:19:58.948Z
Why are the people who could be doing safety research, but aren’t, doing something else? 2019-08-29T08:51:33.219Z
adam_scholl's Shortform 2019-08-12T00:53:37.221Z

Comments

Comment by Adam Scholl (adam_scholl) on Towards more cooperative AI safety strategies · 2024-07-24T00:11:56.061Z · LW · GW

I appreciate you adding the note, though I do think the situation is far more unusual than described. I agree it's widely priced in that companies in general seek power, but I think probably less so that the author of this post personally works for a company which is attempting to acquire drastically more power than any other company ever, and that much of the behavior the post describes as power-seeking amounts to "people trying to stop the author and his colleagues from attempting that."

Comment by Adam Scholl (adam_scholl) on Towards more cooperative AI safety strategies · 2024-07-23T12:40:18.489Z · LW · GW

Yeah, this omission felt pretty glaring to me. OpenAI is explicitly aiming to build "the most powerful technology humanity has yet invented." Obviously that doesn't mean Richard is wrong that the AI safety community is too power-seeking, but I would sure have appreciated him acknowledging/grappling with the fact that the company he works for is seeking to obtain more power than any group of people in history by a gigantic margin.

Comment by Adam Scholl (adam_scholl) on Buck's Shortform · 2024-06-26T01:51:11.509Z · LW · GW

I agree we might end up in a world like that, where it proves impossible to make a decent safety case. I just think of the ~whole goal of alignment research as figuring out how to avoid that world, i.e. of figuring out how to mitigate/estimate the risk as much/precisely as needed to make TAI worth building.

Currently, AI risk estimates are mostly just verbal statements like "I don't know man, probably some double digit chance of extinction." This is exceedingly unlike the sort of predictably tolerable risk humanity normally expects from its engineering projects, and which e.g. allows for decent safety cases. So I think it's quite important to notice how far we currently are from being able to make them, since that suggests the scope and nature of the problem.

Comment by Adam Scholl (adam_scholl) on Buck's Shortform · 2024-06-25T21:45:41.947Z · LW · GW

Maybe I'm just confused what you mean by those words, but where is the disanalogy with safety engineering coming from? That normally safety engineering focuses on mitigating risks with complex causes, whereas AI risk is caused by some sort of scaffolding/bureaucracy which is simpler?

Comment by Adam Scholl (adam_scholl) on Buck's Shortform · 2024-06-25T21:02:29.730Z · LW · GW

I'm still confused what sort of simplicity you're imagining? From my perspective, the type of complexity which determines the size of the fail surface for alignment mostly stems from things like e.g. "degree of goal stability," "relative detectability of ill intent," and other such things that seem far more complicated than airplane parts.

Comment by Adam Scholl (adam_scholl) on Buck's Shortform · 2024-06-25T20:05:33.940Z · LW · GW

What's the sense in which you think they're more simple? Airplanes strike me as having a much simpler fail surface.

Comment by Adam Scholl (adam_scholl) on Buck's Shortform · 2024-06-25T18:22:07.842Z · LW · GW

Right, but then from my perspective it seems like the core problem is that the situations are currently disanalogous, and so it feels reasonable and important to draw the analogy.

Comment by Adam Scholl (adam_scholl) on Buck's Shortform · 2024-06-25T18:03:54.448Z · LW · GW

I agree we don’t currently know how to prevent AI systems from becoming adversarial, and that until we do it seems hard to make strong safety cases for them. But I think this inability is a skill issue, not an inherent property of the domain, and traditionally the core aim of alignment research was to gain this skill.

Plausibly we don’t have enough time to figure out how to gain as much confidence that transformative AI systems are safe as we typically have about e.g. single airplanes, but in my view that’s horrifying, and I think it’s useful to notice how different this situation is from the sort humanity is typically willing to accept.

Comment by Adam Scholl (adam_scholl) on Non-Disparagement Canaries for OpenAI · 2024-06-04T02:00:49.359Z · LW · GW

Thanks, that's helpful context.

I also have a model of how people choose whether or not to make public statements where it’s extremely unsurprising most people would not choose to do so.

I agree it's unsurprising that few rank-and-file employees would make statements, but I am surprised by the silence from those in policy/evals roles. From my perspective, active non-disparagement obligations seem clearly disqualifying for most such roles, so I'd think they'd want to clarify.

Comment by Adam Scholl (adam_scholl) on Non-Disparagement Canaries for OpenAI · 2024-06-03T22:19:21.569Z · LW · GW

I am quite confident the contract has been widely retracted. 

Can you share your reasons for thinking this? Given that people who remain bound can’t say so, I feel hesitant to conclude that people aren’t without clear evidence.

I am unaware of any people who signed the agreement after 2019 and did not receive the email, outside cases where the nondisparagement agreement was mutual (which includes Sutskever and likely also Anthropic leadership).

Excepting Jack Clark (who works for Anthropic) and Remco Zwetsloot (who left in 2018), I would think all the policy leadership folks listed above meet these criteria, yet none have reported being released. Would you guess that they have been?

Comment by Adam Scholl (adam_scholl) on Non-Disparagement Canaries for OpenAI · 2024-05-31T22:28:45.804Z · LW · GW

Yeah, the proposal here differs from warrant canaries in that it doesn't ask people to proactively make statements ahead of time—it just relies on the ability of some people who can speak, to provide evidence that others can't. So if e.g. Bob and Joe have been released, but Alice hasn't, then Bob and Joe saying they've been released makes Alice's silence more conspicuous.

Comment by Adam Scholl (adam_scholl) on Non-Disparagement Canaries for OpenAI · 2024-05-31T22:15:58.446Z · LW · GW

the post appears to wildly misinterpret the meaning of this term as "taking any actions which might make the company less valuable"

I'm not a lawyer, and I may be misinterpreting the non-interference provision—certainly I'm willing to update the post if so! But upon further googling, my current understanding is still that in contracts, "interference" typically means "anything that disrupts, damages or impairs business."

And the provision in the OpenAI offboarding agreement is written so broadly—"Employee agrees not to interfere with OpenAI’s relationship with current or prospective employees, current or previous founders, portfolio companies, suppliers, vendors or investors"—that I assumed it was meant to encompass essentially all business impact, including e.g. the company's valuation.

Comment by Adam Scholl (adam_scholl) on Non-Disparagement Canaries for OpenAI · 2024-05-31T21:36:45.503Z · LW · GW

I agree, but I also doubt the contract even has been widely retracted. Why do you think it has, Jacob? Quite few people have reported being released so far.

Comment by Adam Scholl (adam_scholl) on Non-Disparagement Canaries for OpenAI · 2024-05-31T21:15:05.433Z · LW · GW

I agree, but I think it still matters whether or not he's bound by the actual agreement. One might imagine that he's carefully pushing the edge of what he thinks he can get away with saying, for example, in which case he may still not be fully free to speak his mind. And since I would much prefer to live in a world where he is, I'm wary of prematurely concluding otherwise without clear evidence.

Comment by Adam Scholl (adam_scholl) on Non-Disparagement Canaries for OpenAI · 2024-05-31T03:09:52.769Z · LW · GW

Thanks! Edited to fix.

Comment by Adam Scholl (adam_scholl) on We might be missing some key feature of AI takeoff; it'll probably seem like "we could've seen this coming" · 2024-05-10T02:11:49.221Z · LW · GW

Do you expect AI labs would actually run extensive experimental tests in this world? I would be surprised if they did, even if such a window does arise.

(To roughly operationalize: I would be surprised to hear a major lab spent more than 5 FTE-years conducting such tests, or that the tests decreased the p(doom) of the average reasonably-calibrated external observer by more than 10%).

Comment by Adam Scholl (adam_scholl) on Paul Christiano named as US AI Safety Institute Head of AI Safety · 2024-04-26T01:53:49.391Z · LW · GW

This thread isn't seeming very productive to me, so I'm going to bow out after this. But yes, it is a primary concern—at least in the case of Open Philanthropy, it's easy to check what their primary concerns are because they write them up. And accidental release from dual use research is one of them.

Comment by Adam Scholl (adam_scholl) on Paul Christiano named as US AI Safety Institute Head of AI Safety · 2024-04-25T03:10:54.825Z · LW · GW

the idea that we should have "BSL-5" is the kind of silly thing that novice EAs propose that doesn't make sense because there literally isn't something significantly more restrictive

I mean, I'm sure something more restrictive is possible. But my issue with BSL levels isn't that they include too few BSL-type restrictions, it's that "lists of restrictions" are a poor way of managing risk when the attack surface is enormous. I'm sure someday we'll figure out how to gain this information in a safer way—e.g., by running simulations of GoF experiments instead of literally building the dangerous thing—but at present, the best available safeguards aren't sufficient.

I also think that "nearly all EA's focused on biorisk think gain of function research should be banned" is obviously underspecified, and wrong because of the details. 

I'm confused why you find this underspecified. I just meant "gain of function" in the standard, common-use sense—e.g., that used in the 2014 ban on federal funding for such research.

Comment by Adam Scholl (adam_scholl) on Paul Christiano named as US AI Safety Institute Head of AI Safety · 2024-04-25T00:49:59.702Z · LW · GW

I think we must still be missing each other somehow. To reiterate, I'm aware that there is non-accidental biorisk, for which one can hardly blame the safety measures. But there is also accident risk, since labs often fail to contain pathogens even when they're trying to.

Comment by Adam Scholl (adam_scholl) on Paul Christiano named as US AI Safety Institute Head of AI Safety · 2024-04-23T05:15:33.382Z · LW · GW

My guess is more that we were talking past each other than that his intended claim was false/unrepresentative. I do think it's true that EA's mostly talk about people doing gain of function research as the problem, rather than about the insufficiency of the safeguards; I just think the latter is why the former is a problem.

Comment by Adam Scholl (adam_scholl) on Paul Christiano named as US AI Safety Institute Head of AI Safety · 2024-04-23T02:43:35.777Z · LW · GW

There have been frequent and severe biosafety accidents for decades, many of which occurred at labs which were attempting to follow BSL protocol.

Comment by Adam Scholl (adam_scholl) on Paul Christiano named as US AI Safety Institute Head of AI Safety · 2024-04-23T02:36:27.166Z · LW · GW

The EA cause area around biorisk is mostly happy to rely on those levels

I disagree—I think nearly all EA's focused on biorisk think gain of function research should be banned, since the risk management framework doesn't work well enough to drive the expected risk below that of the expected benefit. If our framework for preventing lab accidents worked as well as e.g. our framework for preventing plane accidents, I think few EA's would worry much about GoF.

(Obviously there are non-accidental sources of biorisk too, for which we can hardly blame the safety measures; but I do think the measures work sufficiently poorly that even accident risk alone would justify a major EA cause area).

Comment by Adam Scholl (adam_scholl) on Express interest in an "FHI of the West" · 2024-04-19T15:53:25.256Z · LW · GW

Man, I can’t believe there are no straightforwardly excited comments so far!

Personally, I think an institution like this is sorely needed, and I’d be thrilled if Lightcone built one. There are remarkably few people in the world who are trying to think carefully about the future, and fewer still who are trying to solve alignment; institutions like this seem like one of the most obvious ways to help them.

Comment by Adam Scholl (adam_scholl) on Express interest in an "FHI of the West" · 2024-04-19T14:50:05.513Z · LW · GW

Your answer might also be "I, Oliver, will play this role". My gut take would be excited for you to be like one of three people in this role (with strong co-leads, who are maybe complementary in the sense that they're strong at some styles of thinking you don't know exactly how to replicate), and kind of weakly pessimistic about you doing it alone. (It certainly might be that that pessimism is misplaced.)

For what it’s worth, my guess is that your pessimism is misplaced. Oliver certainly isn’t as famous as Bostrom, so I doubt he’d be a similar “beacon.” But I’m not sure a beacon is needed—historically, plenty of successful research institutions (e.g. Bells Labs, IAS, the Royal Society in most eras) weren’t led by their star researchers, and the track record of those that were strikes me as pretty mixed.

Oliver spends most of his time building infrastructure for researchers, and I think he’s become quite good at it. For example, you are reading this comment on (what strikes me as) rather obviously the best-designed forum on the internet; I think the review books LessWrong made are probably the second-best designed books I’ve seen, after those from Stripe Press; and the Lighthaven campus is an exceedingly nice place to work.

Personally, I think Oliver would probably be my literal top choice to head an institution like this.

Comment by Adam Scholl (adam_scholl) on Express interest in an "FHI of the West" · 2024-04-19T13:43:46.487Z · LW · GW

I ask partly because I personally would be more excited of a version of this that wasn't ignoring AGI timelines, but I think a version of this that's not ignoring AGI timelines would probably be quite different from the intellectual spirit/tradition of FHI.

This frame feels a bit off to me. Partly because I don’t think FHI was ignoring timelines, and because I think their work has proved quite useful already—mostly by substantially improving our concepts for reasoning about existential risk.

But also, the portfolio of alignment research with maximal expected value need not necessarily perform well in the most likely particular world. One might imagine, for example—and indeed this is my own bet—that the most valuable actions we can take will only actually save us in the subset of worlds in which we have enough time to develop a proper science of alignment.

Comment by Adam Scholl (adam_scholl) on Paul Christiano named as US AI Safety Institute Head of AI Safety · 2024-04-17T08:08:43.246Z · LW · GW

I agree metrology is cool! But I think units are mostly helpful for engineering insofar as they reflect fundamental laws of nature—see e.g. the metric units—and we don't have those yet for AI. Until we do, I expect attempts to define them will be vague, high-level descriptions more than deep scientific understanding.

(And I think the former approach has a terrible track record, at least when used to define units of risk or controllability—e.g. BSL levels, which have failed so consistently and catastrophically they've induced an EA cause area, and which for some reason AI labs are starting to emulate).

Comment by Adam Scholl (adam_scholl) on Anthropic release Claude 3, claims >GPT-4 Performance · 2024-03-06T23:37:02.018Z · LW · GW

I assumed "anyone" was meant to include OpenAI—do you interpret it as just describing novel entrants? If so I agree that wouldn't be contradictory, but it seems like a strange interpretation to me in the context of a pitch deck asking investors for a billion dollars.

Comment by Adam Scholl (adam_scholl) on Anthropic release Claude 3, claims >GPT-4 Performance · 2024-03-06T22:25:12.277Z · LW · GW

I agree it's common for startups to somewhat oversell their products to investors, but I think it goes far beyond "somewhat"—maybe even beyond the bar for criminal fraud, though I'm not sure—to tell investors you're aiming to soon get "too far ahead for anyone to catch up in subsequent cycles," if your actual plan is to avoid getting meaningfully ahead at all.

Comment by Adam Scholl (adam_scholl) on Anthropic release Claude 3, claims >GPT-4 Performance · 2024-03-06T21:05:12.822Z · LW · GW

"Diverting money" strikes me as the wrong frame here. Partly because I doubt this actually was the consequence—i.e., I doubt OpenAI etc. had a meaningfully harder time raising capital because of Anthropic's raise—but also because it leaves out the part where this purported desirable consequence was achieved via (what seems to me like) straightforward deception!

If indeed Dario told investors he hoped to obtain an insurmountable lead soon, while telling Dustin and others that he was committed to avoid gaining any meaningful lead, then it sure seems like one of those claims was a lie. And by my ethical lights, this seems like a horribly unethical thing to lie about, regardless of whether it somehow caused OpenAI to have less money.

Comment by Adam Scholl (adam_scholl) on Jimrandomh's Shortform · 2024-03-06T13:55:26.142Z · LW · GW

Huh, I've also noticed a larger effect from indoors/outdoors than seems reflected by CO2 monitors, and that I seem smarter when it's windy, but I never thought of this hypothesis; it's interesting, thanks.

Comment by Adam Scholl (adam_scholl) on Anthropic release Claude 3, claims >GPT-4 Performance · 2024-03-05T19:06:39.456Z · LW · GW

Yeah, seems plausible; but either way it seems worth noting that Dario left Dustin, Evan and Anthropic's investors with quite different impressions here.

Comment by Adam Scholl (adam_scholl) on Anthropic release Claude 3, claims >GPT-4 Performance · 2024-03-05T12:41:53.858Z · LW · GW

It seems Dario left Dustin Moskovitz with a different impression—that Anthropic had a policy/commitment to not meaningfully advance the frontier:

Comment by Adam Scholl (adam_scholl) on Safetywashing · 2024-01-14T01:51:33.574Z · LW · GW

Interesting, I checked LW/Google for the keyword before writing and didn't see much, but maybe I missed it; it does seem like a fairly natural riff, e.g. someone wrote a similar post on EA forum a few months later.

Comment by Adam Scholl (adam_scholl) on OpenAI: Facts from a Weekend · 2023-11-23T22:14:14.002Z · LW · GW

I can imagine it being the case that their ability to reveal this information is their main source of leverage (over e.g. who replaces them on the board).

Comment by Adam Scholl (adam_scholl) on My thoughts on the social response to AI risk · 2023-11-02T22:40:28.801Z · LW · GW

I do have substantial credence (~15%?) on AGI being built by hobbyists/small teams. I definitely think it's more likely to be built by huge teams with huge computers, like most recent advances. But my guess is that physics permits vastly more algorithmic efficiency than we've discovered, and it seems pretty plausible to me—especially in worlds with longer timelines—that some small group might discover enough of it in time.

Comment by Adam Scholl (adam_scholl) on My thoughts on the social response to AI risk · 2023-11-02T21:30:41.487Z · LW · GW

Nonetheless, I acknowledge that my disagreement with these proposals often comes down to a more fundamental disagreement about the difficulty of alignment, rather than any beliefs about the social response to AI risk.

My guess is that this disagreement (about the difficulty of alignment) also mostly explains the disagreement about humanity’s relative attentiveness/competence. If the recent regulatory moves seem encouraging to you, I can see how that would seem like counterevidence to the claim that governments are unlikely to help much with AI risk.

But personally it doesn’t seem like much counterevidence, because the recent moves haven’t seemed very encouraging. They’ve struck me as slightly encouraging, insofar as they’ve caused me to update slightly upwards that governments might eventually coordinate to entirely halt AI development. But absent the sort of scientific understanding we typically have when deploying high-risk engineering projects—where e.g., we can answer at least most of the basic questions about how the systems work, and generally understand how to predict in advance what will happen if we do various things, etc.—little short of a Stop is going to reduce my alarm much.

Comment by Adam Scholl (adam_scholl) on AI as a science, and three obstacles to alignment strategies · 2023-10-29T09:01:37.116Z · LW · GW

AI used to be a science. In the old days (back when AI didn't work very well), people were attempting to develop a working theory of cognition.

Those scientists didn’t succeed, and those days are behind us.

I claim many of them did succeed, for example:

  • George Boole invented boolean algebra in order to establish (part of) a working theory of cognition—the book where he introduces it is titled "An Investigation of the Laws of Thought,” and his stated aim was largely to help explain how minds work.[1]
  • Ramón y Cajal discovered neurons in the course of trying to better understand cognition.[2]
  • Turing described his research as aimed at figuring out what intelligence is, what it would mean for something to “think,” etc.[3]
  • Shannon didn’t frame his work this way quite as explicitly, but information theory is useful because it characterizes constraints on the transmission of thoughts/cognition between people, and I think he was clearly generally interested in figuring out what was up with agents/minds—e.g., he spent time trying to design machines to navigate mazes, repair themselves, replicate, etc.
  • Geoffrey Hinton initially became interested in neural networks because he was trying to figure out how brains worked.

Not all of these scientists thought of themselves as working on AI, of course, but I do think many of the key discoveries which make modern AI possible—boolean algebra, neurons, computers, information theory, neural networks—were developed by people trying to develop theories of cognition.

  1. ^

     The opening paragraph of Boole’s book:  "The design of the following treatise is to investigate the fundamental laws of those operations of the mind by which reasoning is performed; to give expression to them in the symbolical language of a Calculus, and upon this foundation to establish the science of Logic and construct its method; to make that method itself the basis of a general method for the application of the mathematical doctrine of Probabilities; and, finally, to collect from the various elements of truth brought to view in the course of these inquiries some probable intimations concerning the nature and constitution of the human mind."

  2. ^

     From Cajal’s autobiography:  "... the problem attracted us irresistibly. We saw that an exact knowledge of the structure of the brain was of supreme interest for the building up of a rational psychology. To know the brain, we said, is equivalent to ascertaining the material course of thought and will, to discovering the intimate history of life in its perpetual duel with external forces; a history summarized, and in a way engraved in the defensive neuronal coordinations of the reflex, of instinct, and of the association of ideas" (305).

  3. ^

     The opening paragraph of Turing’s paper, Computing Machinery and Intelligence:  "I propose to consider the question, 'Can machines think?' This should begin with definitions of the meaning of the terms 'machine 'and 'think'. The definitions might be framed so as to reflect so far as possible the normal use of the words, but this attitude is dangerous. If the meaning of the words 'machine' and 'think 'are to be found by examining how they are commonly used it is difficult to escape the conclusion that the meaning and the answer to the question, 'Can machines think?' is to be sought in a statistical survey such as a Gallup poll. But this is absurd. Instead of attempting such a definition I shall replace the question by another, which is closely related to it and is expressed in relatively unambiguous words."

Comment by Adam Scholl (adam_scholl) on Open Thread – Autumn 2023 · 2023-10-24T09:38:19.014Z · LW · GW

But it's not just language any longer either, with image inputs, etc... all else equal I'd prefer a name that emphasized how little we understand how they work ("model" seems to me to connote the opposite), but I don't have any great suggestions.

Comment by Adam Scholl (adam_scholl) on Feedbackloop-first Rationality · 2023-08-11T05:51:09.788Z · LW · GW

I just meant that Faraday's research strikes me as counterevidence for the claim I was making—he had excellent feedback loops, yet also seems to me to have had excellent pre-paradigmatic research taste/next-question-generating skill of the sort my prior suggests generally trades off against strong focus on quickly-checkable claims. So maybe my prior is missing something!

Comment by Adam Scholl (adam_scholl) on Feedbackloop-first Rationality · 2023-08-10T22:34:28.527Z · LW · GW

Yeah, my impression is similarly that focus on feedback loops is closer to "the core thing that's gone wrong so far with alignment research," than to "the core thing that's been missing." I wouldn't normally put it this way, since I think many types of feedback loops are great, and since obviously in the end alignment research is useless unless it helps us better engineer AI systems in the actual territory, etc. 

(And also because some examples of focus on tight feedback loops, like Faraday's research, strike me as exceedingly excellent, although I haven't really figured out yet why his work seems so much closer to the spirit we need than e.g. thinking physics problems).

Like, all else equal, it clearly seems better to have better empirical feedback; I think my objection is mostly that in practice, focus on this seems to lead people to premature formalization, or to otherwise constraining their lines of inquiry to those whose steps are easy to explain/justify along the way.

Another way to put this: most examples I've seen of people trying to practice attending to tight feedback have involved them focusing on trivial problems, like simple video games or toy already-solved science problems, and I think this isn't a coincidence. So while I share your sense Raemon that transfer learning seems possible here, my guess is that this sort of practice mostly transfers within the domain of other trivial problems, where solutions (or at least methods for locating solutions) are already known, and hence where it's easy to verify you're making progress along the way.

Comment by Adam Scholl (adam_scholl) on Exposure to Lizardman is Lethal · 2023-04-01T06:08:07.048Z · LW · GW

I've been trying to spend a bit more time voting in response to this, to try to help keep thread quality high; at least for now, the size of the influx strikes me as low enough that a few long-time users doing this might help a bunch.

Comment by Adam Scholl (adam_scholl) on Is InstructGPT Following Instructions in Other Languages Surprising? · 2023-02-20T06:03:12.653Z · LW · GW

I agree we don't really understand anything in LLMs at this level of detail, but I liked Jan highlighting this confusion anyway, since I think it's useful to promote particular weird behaviors to attention. I would be quite thrilled if more people got nerd sniped on trying to explain such things!

Comment by Adam Scholl (adam_scholl) on Bing Chat is blatantly, aggressively misaligned · 2023-02-17T00:32:41.009Z · LW · GW

John, it seems totally plausible to me that these examples do just reflect something like “hallucination,” in the sense you describe. But I feel nervous about assuming that! I know of no principled way to distinguish “hallucination” from more goal-oriented thinking or planning, and my impression is that nobody else does either.

I think it’s generally unwise to assume LLM output reflects its internal computation in a naively comprehensible way; it usually doesn’t, so I think it’s a sane prior to suspect it doesn't here, either. But at our current level of understanding of the internal computation happening in these models, I feel wary of confident-seeming assertions that they're well-described in any particular way—e.g., as "hallucinations," which I think is far from a well-defined concept, and which I don't have much confidence carves reality at its joints—much less that they're not dangerous.

So while I would personally bet fairly strongly against the explicit threats produced by Bing being meaningfully reflective of threatening intent, it seems quite overconfident to me to suggest they don’t “at all imply” it! From my perspective, they obviously imply it, even if that's not my lead hypothesis for what's going on.

Comment by Adam Scholl (adam_scholl) on Why Are Bacteria So Simple? · 2023-02-06T19:10:19.284Z · LW · GW

If simple outcompetes complex, wouldn't we expect to see more prokaryotic DNA in the biosphere? Whereas in fact we see 2-3x as much eukaryotic DNA, depending on how you count—hardly a small niche!

Comment by Adam Scholl (adam_scholl) on Can we efficiently distinguish different mechanisms? · 2023-01-12T10:19:18.937Z · LW · GW

I also found the writing way clearer than usual, which I appreciate - it made the post much easier for me to engage with.

Comment by Adam Scholl (adam_scholl) on Would it be good or bad for the US military to get involved in AI risk? · 2023-01-02T03:56:02.536Z · LW · GW

As I understand it, the recent US semiconductor policy updates—e.g., CHIPS Act, export controls—are unusually extreme, which does seem consistent with the hypothesis that they're starting to take some AI-related threats more seriously. But my guess is that they're mostly worried about more mundane/routine impacts on economic and military affairs, etc., rather than about this being the most significant event since the big bang; perhaps naively, I suspect we'd see more obvious signs if they were worried about the latter, a la physics departments clearing out during the Manhattan Project.

Comment by Adam Scholl (adam_scholl) on Let’s think about slowing down AI · 2022-12-27T18:56:47.580Z · LW · GW

Critch, I agree it’s easy for most people to understand the case for AI being risky. I think the core argument for concern—that it seems plausibly unsafe to build something far smarter than us—is simple and intuitive, and personally, that simple argument in fact motivates a plurality of my concern. That said:

  • I think it often takes weirder, less intuitive arguments to address many common objections—e.g., that this seems unlikely to happen within our lifetimes, that intelligence far superior to ours doesn’t even seem possible, that we’re safe because software can’t affect physical reality, that this risk doesn’t seem more pressing than other risks, that alignment seems easy to solve if we just x, etc.
  • It’s also remarkably easy to convince many people that aliens visit Earth on a regular basis, that the theory of evolution via natural selection is bunk, that lottery tickets are worth buying, etc. So while I definitely think some who engage with these arguments come away having good reason to believe the threat is likely, for values of “good” and “believe” and “likely” at least roughly similar those common around here, I suspect most update something more like their professed belief-in-belief, than their real expectations—and that even many who do update their real expectations do so via symmetric arguments that leave them with poor models of the threat.

These factors make me nervous about strategies that rely heavily on convincing everyday people, or people in government, to care about AI risk, for reasons I don’t think are well described as “systematically discounting their opinions/agency.” Personally, I’ve engaged a lot with people working in various corners of politics and government, and decently much with academics, and I respect and admire many of them, including in ways I rarely admire rationalists or EA’s.

(For example, by my lights, the best ops teams in government are much more competent than the best ops teams around here; the best policy wonks, lawyers, and economists are genuinely really quite smart, and have domain expertise few R/EA’s have without which it’s hard to cause many sorts of plausibly-relevant societal change; perhaps most spicily, I think academics affiliated with the Santa Fe Institute have probably made around as much progress on the alignment problem so far as alignment researchers, without even trying to, and despite being (imo) deeply epistemically confused in a variety of relevant ways).

But there are also a number of respects in which I think rationalists and EA’s tend to far outperform any other group I’m aware of—for example, in having beliefs that actually reflect their expectations, trying seriously to make sure those beliefs are true, being open to changing their mind, thinking probabilistically, “actually trying” to achieve their goals as a behavior distinct from “trying their best,” etc. My bullishness about these traits is why e.g. I live and work around here, and read this website.

And on the whole, I am bullish about this culture. But it’s mostly the relative scarcity of these and similar traits in particular, not my overall level of enthusiasm or respect for other groups, that causes me to worry they wouldn’t take helpful actions if persuaded of AI risk.

My impression is that it’s unusually difficult to figure out how to take actions that reduce AI risk without substantial epistemic skill of a sort people sometimes have around here, but only rarely have elsewhere. On my models, this is mostly because:

  • There are many more ways to make the situation worse than better;
  • A number of key considerations are super weird and/or terrifying, such that it's unusually hard to reason well about them;
  • It seems easier for people to grok the potential importance of transformative AI, than the potential danger.

My strong prior is that, to accomplish large-scale societal change, you nearly always need to collaborate with people who disagree with you, even about critical points. And I’m sympathetic to the view that this is true here, too; I think some of it probably is. But I think the above features make this more fraught than usual, in a way that makes it easy for people who grok the (simpler) core argument for concern, but not some of the (typically more complex) ancillary considerations, to accidentally end up making the situation even worse.

Here are some examples of (what seem to me like) this happening:

  • The closest thing I'm aware of to an official US government position on AI risk is described in the 2016 and 2017 National Science and Technology Council reports. I haven't read all of them, but the parts I have read struck me as a strange mix of claims like “maybe this will be a big deal, like mobile phones were,” and “maybe this will be a big deal, in the sense that life on Earth will cease to exist.” And like, I can definitely imagine explanations for this that don't much involve the authors misjudging the situation—maybe their aim was more to survey experts than describe their own views, or maybe they were intentionally underplaying the threat for fear of starting an arms race, etc. But I think my lead hypothesis is more that the authors just didn’t actually, viscerally consider that the sentences they were writing might be true, in the sense of describing a reality they might soon inhabit.
    • I think rationalists and EA's tend to make this sort of mistake less often, since the “taking beliefs seriously”-style epistemic orientation common around here has the effect of making it easier for people to viscerally grasp that trend lines on graphs and so forth might actually reflect reality. (Like, one frame on EA as a whole, is “an exercise in avoiding the ‘learning about the death of a million feels like a statistic, not a tragedy’ error”). And this makes me at least somewhat more confident they won’t do dumb things upon becoming worried about AI risk, since without this epistemic skill, I think it’s easier to make critical errors like overestimating how much time we have, or underestimating the magnitude or strangeness of the threat.
  • As I understand it, OpenAI is named what it is because, at least at first, its founders literally hoped to make AGI open source. (Elon Musk: “I think the best defense against the misuse of AI is to empower as many people as possible to have AI. If everyone has AI powers, then there’s not any one person or a small set of individuals who can have AI superpower.”)
    • By my lights, there are unfortunately a lot of examples of rationalists and EA’s making big mistakes while attempting to reduce AI risk. But it’s at least... hard for me to imagine most of them making this one? Maybe I’m being insufficiently charitable here, but from my perspective, this just fails a really basic “wait, but then what happens next?” sanity check, that I think should have occurred to them more or less immediately, and that I suspect would have to most rationalists and EA's.
  • For me, the most striking aspect of the AI Impacts poll, was that all those ML researchers who reported thinking ML had a substantial chance of killing everyone, still research ML. I’m not sure why they do this; I’d guess some of them are convinced for some reason or another that working on it still makes sense, even given that. But my perhaps-uncharitable guess is that most of them actually don’t—that they don’t even have arguments which feel compelling to them that justify their actions, but that they for some reason press on anyway. This too strikes me as a sort of error R/EA’s are less likely to make.
    • (When Bostrom asked Geoffrey Hinton why he still worked on AI, if he thought governments would likely use it to terrorize people, he replied, "I could give you the usual arguments, but the truth is that the prospect of discovery is too sweet").
  • Sam Altman recently suggested, on the topic of whether to slow down AI, that “either we figure out how to make AGI go well or we wait for the asteroid to hit."
    • Maybe he was joking, or meant "asteroid" as a stand-in for all potentially civilization-ending threats, or something? But that's not my guess, because his follow-up comment is about how we need AGI to colonize space, which makes me suspect he actually considers asteroid risk in particular a relevant consideration for deciding when to deploy advanced AI. Which if true, strikes me as... well, more confused than any comment in this thread strikes me. And it seems like the kind of error that might, for example, cause someone to start an org with the hope of reducing existential risk, that mostly just ends up exacerbating  it.

Obviously our social network doesn't have a monopoly on good reasoning, intelligence, or competence, and lord knows it has plenty of its own pathologies. But as I understand it, most of the reason the rationality project exists is to help people reason more clearly about the strange, horrifying problem of AI risk. And I do think it has succeeded to some degree, such that empirically, people with less exposure to this epistemic environment far more often take actions which seem terribly harmful to me.

Comment by Adam Scholl (adam_scholl) on Common misconceptions about OpenAI · 2022-08-27T15:54:54.544Z · LW · GW

One comment in this thread compares the OP to Philip Morris’ claims to be working toward a “smoke-free future.” I think this analogy is overstated, in that I expect Philip Morris is being more intentionally deceptive than Jacob Hilton here. But I quite liked the comment anyway, because I share the sense that (regardless of Jacob's intention) the OP has an effect much like safetywashing, and I think the exaggerated satire helps make that easier to see.

The OP is framed as addressing common misconceptions about OpenAI, of which it lists five:

  1. OpenAI is not working on scalable alignment.
  2. Most people who were working on alignment at OpenAI left for Anthropic.
  3. OpenAI is a purely for-profit organization.
  4. OpenAI is not aware of the risks of race dynamics.
  5. OpenAI leadership is dismissive of existential risk from AI.

Of these, I think 1, 3, and 4 address positions that are held by basically no one. So by “debunking” much dumber versions of the claims people actually make, the post gives the impression of engaging with criticism, without actually meaningfully doing that. 2 at least addresses a real argument, but at least as I understand it, is quite misleading—while technically true, it seriously underplays the degree to which there was an exodus of key safety-conscious staff, who left because they felt OpenAI leadership was too reckless. So of these, only 5 strikes me as responding non-misleadingly to a real criticism people actually regularly make.

In response to the Philip Morris analogy, Jacob advised caution:

rhetoric like this seems like an excellent way to discourage OpenAI employees from ever engaging with the alignment community.

For many years, the criticism I heard of OpenAI in private was dramatically more vociferous than what I heard in public. I think much of this was because many people shared Jacob’s concern—if we say what we actually think about their strategy, maybe they’ll write us off as enemies, and not listen later when it really counts?

But I think this is starting to change. I’ve seen a lot more public criticism lately, which I think is probably at least in part because it’s become so obvious that the strategy of mincing our words hasn't worked. If they mostly ignore all but the very most optimistic alignment researchers now, why should we expect that will change later, as long as we keep being careful to avoid stating any of our offensive-sounding beliefs?

From talking with early employees and others, my impression is that OpenAI’s founding was incredibly reckless, in the sense that they rushed to deploy their org, before first taking much time to figure out how to ensure that went well. The founders' early comments about accident risk mostly strike me as so naive and unwise, that I find it hard to imagine they thought much at all about the existing alignment literature before deciding to charge ahead and create a new lab. Their initial plan—the one still baked into their name—would have been terribly dangerous if implemented, for reasons I’d think should have been immediately obvious to them had they stopped to think hard about accident risk at all.

And I think their actions since then have mostly been similarly reckless. When they got the scaling laws result, they published a paper about it, thereby popularizing the notion that “just making the black box bigger” might be a viable path to AGI. When they demoed this strategy with products like GPT-3, DALL-E, and CLIP, they described much of the architecture publicly, inspiring others to pursue similar research directions.

So in effect, as far as I can tell, they created a very productive “creating the x-risk” department, alongside a smaller “mitigating that risk” department—the presence of which I take the OP to describe as reassuring—staffed by a few of the most notably optimistic alignment researchers, many of whom left because even they felt too worried about OpenAI’s recklessness.

After all of that, why would we expect they’ll suddenly start being prudent and cautious when it comes time to deploy transformative tech? I don’t think we should.

My strong bet is that OpenAI leadership are good people, in the standard deontological sense, and I think that’s overwhelmingly the sense that should govern interpersonal interactions. I think they’re very likely trying hard, from their perspective, to make this go well, and I urge you, dear reader, not to be an asshole to them. Figuring out what makes sense is hard; doing things is hard; attempts to achieve goals often somehow accidentally end up causing the opposite thing to happen; nobody will want to work with you if small strategic updates might cause you to suddenly treat them totally differently.

But I think we are well past the point where it plausibly makes sense for pessimistic folks to refrain from stating their true views about OpenAI (or any other lab) just to be polite. They didn’t listen the first times alignment researchers screamed in horror, and they probably won’t listen the next times either. So you might as well just say what you actually think—at least that way, anyone who does listen will find a message worth hearing.

Comment by Adam Scholl (adam_scholl) on Common misconceptions about OpenAI · 2022-08-26T11:23:02.984Z · LW · GW

Incorrect: OpenAI leadership is dismissive of existential risk from AI.

Why, then, would they continue to build the technology which causes that risk? Why do they consider it morally acceptable to build something which might well end life on Earth?

Comment by Adam Scholl (adam_scholl) on Common misconceptions about OpenAI · 2022-08-26T11:07:34.344Z · LW · GW

Incorrect: OpenAI is not aware of the risks of race dynamics.

I don't think this is a common misconception. I, at least, have never heard anyone claim OpenAI isn't aware of the risk of race dynamics—just that it nonetheless exacerbates them. So I think this section is responding to a far dumber criticism than the one which people actually commonly make.