andream

Posts
Comments

Posts

Anthropic CEO calls for RSI 2025-01-29T16:54:24.943Z

The Compendium, A full argument about extinction risk from AGI 2024-10-31T12:01:51.714Z

A Narrow Path: a plan to deal with AI extinction risk 2024-10-07T13:02:15.229Z

Priorities for the UK Foundation Models Taskforce 2023-07-21T15:23:34.029Z

Conjecture: A standing offer for public debates on AI 2023-06-16T14:33:43.273Z

Shah (DeepMind) and Leahy (Conjecture) Discuss Alignment Cruxes 2023-05-01T16:47:41.655Z

Christiano (ARC) and GA (Conjecture) Discuss Alignment Cruxes 2023-02-24T23:03:04.917Z

Retrospective on the 2022 Conjecture AI Discussions 2023-02-24T22:41:13.131Z

Full Transcript: Eliezer Yudkowsky on the Bankless podcast 2023-02-23T12:34:19.523Z

AGI in sight: our look at the game board 2023-02-18T22:17:44.364Z

Don't accelerate problems you're trying to solve 2023-02-15T18:11:30.595Z

FLI Podcast: Connor Leahy on AI Progress, Chimps, Memes, and Markets (Part 1/3) 2023-02-10T13:55:59.387Z

Conjecture: Internal Infohazard Policy 2022-07-29T19:07:08.491Z

Comments

Comment by Andrea_Miotti (AndreaM) on A Narrow Path: a plan to deal with AI extinction risk · 2024-10-07T18:03:26.523Z · LW · GW

Thanks! Do you still think the "No AIs improving other AIs" criterion is too onerous after reading the policy enforcing it in Phase 0?

In that policy, we developed the definition of "found systems" to have this measure only apply to AI systems found via mathematical optimization, rather than AIs (or any other code) written by humans.

This reduces the cost of the policy significantly, as it applies only to a very small subset of all AI activities, and leaves most innocuous software untouched.

Comment by Andrea_Miotti (AndreaM) on RSPs are pauses done right · 2023-10-17T00:12:00.008Z · LW · GW

In terms of explicit claims:

"So one extreme side of the spectrum is build things as fast as possible, release things as much as possible, maximize technological progress [...].

The other extreme position, which I also have some sympathy for, despite it being the absolutely opposite position, is you know, Oh my god this stuff is really scary.

The most extreme version of it was, you know, we should just pause, we should just stop, we should just stop building the technology for, indefinitely, or for some specified period of time. [...] And you know, that extreme position doesn't make much sense to me either."

Dario Amodei, Anthropic CEO, explaining his company's "Responsible Scaling Policy" on the Logan Bartlett Podcast on Oct 6, 2023.

Starts at around 49:40.

Comment by Andrea_Miotti (AndreaM) on Priorities for the UK Foundation Models Taskforce · 2023-07-24T11:02:31.905Z · LW · GW

Thanks for the kind feedback! Any suggestions for a more interesting title?

Comment by Andrea_Miotti (AndreaM) on Palantir's AI models · 2023-06-16T18:54:22.533Z · LW · GW

Palantir's recent materials on this show that they're using three (pretty small for today frontier's standards) open source LLMs: Dolly-v2-12B, GPT-NeoX-20B, and Flan-T5 XL.

Comment by Andrea_Miotti (AndreaM) on Critiques of prominent AI safety labs: Conjecture · 2023-06-12T08:56:33.619Z · LW · GW

Apologies for the 404 on the page, it's an annoying cache bug. Try to hard refresh your browser page (CMD + Shift + R) and it should work.

Comment by Andrea_Miotti (AndreaM) on Shah (DeepMind) and Leahy (Conjecture) Discuss Alignment Cruxes · 2023-05-03T12:35:50.239Z · LW · GW

The "1000" instead of "10000" was a typo in the summary.

In the transcript Connor states "SLT over the last 10000 years, yes, and I think you could claim the same over the last 150". Fixed now, thanks for flagging!

Comment by Andrea_Miotti (AndreaM) on Japan AI Alignment Conference · 2023-03-11T01:50:35.395Z · LW · GW

Which one? All of them seem to be working for me.

Comment by Andrea_Miotti (AndreaM) on Fighting without hope · 2023-03-01T20:04:37.818Z · LW · GW

Pessimism of the intellect, optimism of the will.

Comment by Andrea_Miotti (AndreaM) on Retrospective on the 2022 Conjecture AI Discussions · 2023-02-28T07:30:06.087Z · LW · GW

People from OpenPhil, FTX FF and MIRI were not interested in discussing at the time. We also talked with MIRI about moderating, but it didn't work out in the end.

People from Anthropic told us their organization is very strict on public communications, and very wary of PR risks, so they did not participate in the end.

In the post I over generalized to not go into full details.

Comment by Andrea_Miotti (AndreaM) on Christiano (ARC) and GA (Conjecture) Discuss Alignment Cruxes · 2023-02-26T18:26:52.666Z · LW · GW

Yes, some people mentioned it was confusing to have two posts (I had originally posted two separate ones for Summary and Transcript due to them being very lengthy) so I merged them in one, and added headers pointing to Summary and Transcript for easier navigation.

Comment by Andrea_Miotti (AndreaM) on Christiano (ARC) and GA (Conjecture) Discuss Alignment Cruxes · 2023-02-25T15:31:04.672Z · LW · GW

Thanks, I was looking for a way to do that but didn't know the space in italics hack!

Another formatting question: how do I make headers and sections collapsible? It would be great to have the "Summary" and "Transcript" sections as collapsible, considering how long the post is.

Comment by Andrea_Miotti (AndreaM) on Full Transcript: Eliezer Yudkowsky on the Bankless podcast · 2023-02-23T18:29:28.082Z · LW · GW

Thanks, fixed them!

Comment by Andrea_Miotti (AndreaM) on Don't accelerate problems you're trying to solve · 2023-02-22T20:40:15.525Z · LW · GW

I really don't think that AI dungeon was the source of this idea (why do you think that?)

We've heard the story from a variety of sources all pointing to AI Dungeon, and to the fact that the idea was kept from spreading for a significant amount of time. This @gwern Reddit comment, and previous ones in the thread, cover the story well.

And even granting the claim about chain of thought, I disagree about where current progress is coming from. What exactly is the significant capability increase from fine-tuning models to do chain of thought? This isn't part of ChatGPT or Codex or AlphaCode. What exactly is the story?

Regarding the effects of chain of thought prompting on progress^[1], there's two levels of impact: first order effects and second order effects.

On first order, once chain of thought became public a large number of groups started using it explicitly to finetune their models.

Aside from non-public examples, big ones include PaLM, Google's most powerful model to date. Moreover, it makes models much more useful for internal R&D with just prompting and no finetuning.

We don’t know what OpenAI used for ChatGPT, or future models: if you have some information about that, it would be super useful to hear about it!

On second order: implementing this straightforwardly improved the impressiveness and capabilities of models, making them more obviously powerful to the outside world, more useful for customers, and leading to an increase in attention and investment into the field.

Due to compounding, the earlier these additional investments arrive, the sooner large downstream effects will happen.

^{^}
This is also partially replying to @Rohin Shah 's question in another comment:
Why do you believe this "drastically" slowed down progress?

Comment by Andrea_Miotti (AndreaM) on Don't accelerate problems you're trying to solve · 2023-02-22T19:03:31.977Z · LW · GW

We'd maybe be at our current capability level in 2018, [...] the world would have had more time to respond to the looming risk, and we would have done more good safety research.

It’s pretty hard to predict the outcome of “raising awareness of problem X” ahead of time. While it might be net good right now because we’re in a pretty bad spot, we have plenty of examples from the past where greater awareness of AI risk has arguably led to strongly negative outcomes down the line, due to people channeling their interest in the problem into somehow pushing capabilities even faster and harder.

Comment by Andrea_Miotti (AndreaM) on Don't accelerate problems you're trying to solve · 2023-02-22T19:00:23.894Z · LW · GW

My view is that progress probably switched from being net positive to net negative (in expectation) sometime around GPT-3.

We fully agree on this, and so it seems like we don’t have large disagreements on externalities of progress. From our point of view, the cutoff point was probably GPT-2 rather than 3, or some similar event that established the current paradigm as the dominant one.

Regarding the rest of your comment and your other comment here, here are some reasons why we disagree. It’s mostly high level, as it would take a lot of detailed discussion into models of scientific and technological progress, which we might cover in some future posts.

In general, we think you’re treating the current paradigm as over-determined. We don’t think that being in a DL-scaling language model large single generalist system-paradigm is a necessary trajectory of progress, rather than a historical contingency.

While the Bitter Lesson might be true and a powerful driver for the ease of working on singleton, generalist large monolithic systems over smaller, specialized ones, science doesn’t always (some might say very rarely!) follow the most optimal path.

There are many possible paradigms that we could be in, and the current one is among the worse ones for safety. For instance, we could be in a symbolic paradigm, or a paradigm that focuses on factoring problems and using smaller LSTMs to solve them. Of course, there do exist worse paradigms, such as a pure RL non-language based singleton paradigm.

In any case, we think the trajectory of the field got determined once GPT-2 and 3 brought scaling into the limelight, and if those didn’t happen or memetics went another way, we could be in a very very different world.

Comment by Andrea_Miotti (AndreaM) on Don't accelerate problems you're trying to solve · 2023-02-22T18:51:50.480Z · LW · GW

1. Fully agree and we appreciate you stating that.

2. While we are concerned about capability externalities from safety work (that’s why we have an infohazard policy), what we are most concerned about, and that we cover in this post, is deliberate capabilities acceleration justified as being helpful to alignment. Or, to put this in reverse, using the notion that working on systems that are closer to being dangerous might be more fruitful for safety work, to justify actively pushing the capabilities frontier and thus accelerating the arrival of the dangers themselves.

3. We fully agree that engaging with arguments is good, this is why we’re writing this and other work, and we would love all relevant players to do so more. For example, we would love to hear a more detailed, more concrete story from OpenAI of why they believe accelerating AI development has an altruistic justification. We do appreciate that OpenAI and Jan Leike have published their own approach to AI alignment, even though we disagree with some of its contents, and we would strongly support all other players in the field doing the same.

I think that Anthropic's work also accelerates AI arrival, but it is much easier for it to come out ahead on a cost-benefit: they have significantly smaller effects on acceleration, and a more credible case that they will be safer than alternative AI developers. I have significant unease about this kind of plan, partly for the kinds of reasons you list and also a broader set of moral intuitions. As a result it's not something I would do personally.

But I've spent some time thinking it through as best I can and it does seem like the expected impact is good.

We share your significant unease with such plans. But given what you say here, why at the same time you wouldn’t pursue this plan yourself, yet you say that it seems to you like the expected impact is good?

From our point of view, an unease-generating, AI arrival-accelerating plan seems pretty bad unless proven otherwise. It would be great for the field to hear the reasons why, despite these red flags, this is nevertheless a good plan.

And of course, it would be best to hear the reasoning about the plan directly from those who are pursuing it.

Comment by Andrea_Miotti (AndreaM) on AGI in sight: our look at the game board · 2023-02-19T15:42:53.319Z · LW · GW

Good point, and I agree progress has been slower in robotics compared to the other areas.

I just edited the post to add better examples (DayDreamer, VideoDex and RT-1) of recent robotics advances that are much more impressive than the only one originally cited (Boston Dynamics), thanks to Alexander Kruel who suggested them on Twitter.

Comment by Andrea_Miotti (AndreaM) on My understanding of Anthropic strategy · 2023-02-17T12:55:06.471Z · LW · GW

Was Dario Amodei not the former head of OpenAI’s safety team?
He wrote "Concrete Problems in AI Safety".
I don't see how the claim isn't just true/accurate.

If someone reads "Person X is Head of Safety", they wouldn't assume that the person led the main AI capabilities efforts of the company for the last 2 years.

Only saying "head of the safety team" implies that this was his primary activity at OpenAI, which is just factually wrong.

According to his LinkedIn, from 2018 until end of 2020, when he left, he was Director of Research and then VP of Research of OpenAI, where he "set overall research direction" and "led the effort to build GPT2 and 3". He led the safety team before that, between 2016 and 2018.

Comment by Andrea_Miotti (AndreaM) on Don't accelerate problems you're trying to solve · 2023-02-17T12:48:31.087Z · LW · GW

Your graph shows "a small increase" that represents progress that is equal to an advance of a third to a half the time left until catastrophe on the default trajectory. That's not small! That's as much progress as everyone else combined achieves in a third of the time till catastrophic models! It feels like you'd have to figure out some newer efficient training that allows you to get GPT-3 levels of performance with GPT-2 levels of compute to have an effect that was plausibly that large.
In general I wish you would actually write down equations for your model and plug in actual numbers; I think it would be way more obvious that things like this are not actually reasonable models of what's going on.

I'm not sure I get your point here: the point of the graph is to just illustrate that when effects compound, looking only at the short term difference is misleading. Short term differences lead to much larger long term effects due to compounding. The graph was just a quick digital rendition of what I previously drew on a whiteboard to illustrate the concept, and is meant to be an intuition pump.

The model is not implying any more sophisticated complex mathematical insight than just "remember that compounding effects exist", and the graph is just for illustrative purposes.

Of course, if you had perfect information at the bottom of the curve, you would see that the effect your “small” intervention is having is actually quite big: but that’s precisely the point of the post, it’s very hard to see this normally! We don’t have perfect information, and the point aims at raising salience to people’s minds that what they perceive as a “small” action in the present moment, will likely lead to a “big” impact later on.

To illustrate the point: if you make a discovery now worth 2 billion dollars more of investment in AI capabilities, and this compounds yearly at a 20% rate, you’ll get far more than +2 billion in the final total e.g., 10 years later. If you make this 2 billion dollar discovery later, after ten years you will not have as much money invested in capabilities as you would have in the other case!

Such effects might be obvious in retrospect with perfect information, but this is indeed the point of the post: when evaluating actions in our present moment it’s quite hard to foresee these things, and the post aims to raise these effects to salience!

We could spend time on more graphs, equations and numbers, but that wouldn’t be a great marginal use of our time. Feel free to spend more time on this if you find it worthwhile (it’s a pretty hard task, since no one has a sufficiently gears-level model of progress!).

Comment by Andrea_Miotti (AndreaM) on Don't accelerate problems you're trying to solve · 2023-02-17T12:45:02.823Z · LW · GW

I think this is pretty false. There's no equivalent to Let's think about slowing down AI, or a tag like Restrain AI Development (both of which are advocating an even stronger claim than just "caution") -- there's a few paragraphs in Paul's post, one short comment by me, and one short post by Kaj. I'd say that hardly any optimization has gone into arguments to AI safety researchers for advancing capabilities.
[...]
(I agree in the wider world there's a lot more optimization for arguments in favor of capabilities progress that people in general would find compelling, but I don't think that matters for "what should alignment researchers think".)

Thanks for the reply! From what you’re saying here, it seems like we already agree that “in the wider world there's a lot more optimization for arguments in favor of capabilities progress”.

I’m surprised to hear that you “don't think that matters for "what should alignment researchers think".”

Alignment researchers, are part of the wider world too! And conversely, a lot of people in the wider world that don’t work on alignment directly make relevant decisions that will affect alignment and AI, and think about alignment too (likely many more of those exist than “pure” alignment researchers, and this post is addressed to them too!)

I don't buy this separation with the wider world. Most people involved in this live in social circles connected to AI development, they’re sensitive to status, many work at companies directly developing advanced AI systems, consume information from the broader world and so on. And the vast majority of the real world’s economy has so far been straightforwardly incentivizing reasons to develop new capabilities, faster. Here's some tweets from Kelsey that illustrate some of this point.

Comment by Andrea_Miotti (AndreaM) on My understanding of Anthropic strategy · 2023-02-16T20:18:37.544Z · LW · GW

Anthropic’s founding team consists of, specifically, people who formerly led safety and policy efforts at OpenAI

This claim seems misleading at best: Dario, Anthropic's founder and CEO, led OpenAI's work on GPT-2 and GPT-3, two crucial milestone in terms of public AI capabilities.
Given that I don't have much time to evaluate each claim one by one, and Gell-Mann amnesia, I am a bit more skeptical of the other ones.

Comment by Andrea_Miotti (AndreaM) on Gradient hacking is extremely difficult · 2023-01-26T16:17:01.369Z · LW · GW

The link in "started out as a comment on this post", in the first line of the post, is broken

User info

Posts

Comments