Posts
Comments
Thanks! Do you still think the "No AIs improving other AIs" criterion is too onerous after reading the policy enforcing it in Phase 0?
In that policy, we developed the definition of "found systems" to have this measure only apply to AI systems found via mathematical optimization, rather than AIs (or any other code) written by humans.
This reduces the cost of the policy significantly, as it applies only to a very small subset of all AI activities, and leaves most innocuous software untouched.
In terms of explicit claims:
"So one extreme side of the spectrum is build things as fast as possible, release things as much as possible, maximize technological progress [...].
The other extreme position, which I also have some sympathy for, despite it being the absolutely opposite position, is you know, Oh my god this stuff is really scary.
The most extreme version of it was, you know, we should just pause, we should just stop, we should just stop building the technology for, indefinitely, or for some specified period of time. [...] And you know, that extreme position doesn't make much sense to me either."
Dario Amodei, Anthropic CEO, explaining his company's "Responsible Scaling Policy" on the Logan Bartlett Podcast on Oct 6, 2023.
Starts at around 49:40.
Thanks for the kind feedback! Any suggestions for a more interesting title?
Palantir's recent materials on this show that they're using three (pretty small for today frontier's standards) open source LLMs: Dolly-v2-12B, GPT-NeoX-20B, and Flan-T5 XL.
Apologies for the 404 on the page, it's an annoying cache bug. Try to hard refresh your browser page (CMD + Shift + R) and it should work.
The "1000" instead of "10000" was a typo in the summary.
In the transcript Connor states "SLT over the last 10000 years, yes, and I think you could claim the same over the last 150". Fixed now, thanks for flagging!
Which one? All of them seem to be working for me.
Pessimism of the intellect, optimism of the will.
People from OpenPhil, FTX FF and MIRI were not interested in discussing at the time. We also talked with MIRI about moderating, but it didn't work out in the end.
People from Anthropic told us their organization is very strict on public communications, and very wary of PR risks, so they did not participate in the end.
In the post I over generalized to not go into full details.
Yes, some people mentioned it was confusing to have two posts (I had originally posted two separate ones for Summary and Transcript due to them being very lengthy) so I merged them in one, and added headers pointing to Summary and Transcript for easier navigation.
Thanks, I was looking for a way to do that but didn't know the space in italics hack!
Another formatting question: how do I make headers and sections collapsible? It would be great to have the "Summary" and "Transcript" sections as collapsible, considering how long the post is.
Thanks, fixed them!
I really don't think that AI dungeon was the source of this idea (why do you think that?)
We've heard the story from a variety of sources all pointing to AI Dungeon, and to the fact that the idea was kept from spreading for a significant amount of time. This @gwern Reddit comment, and previous ones in the thread, cover the story well.
And even granting the claim about chain of thought, I disagree about where current progress is coming from. What exactly is the significant capability increase from fine-tuning models to do chain of thought? This isn't part of ChatGPT or Codex or AlphaCode. What exactly is the story?
Regarding the effects of chain of thought prompting on progress[1], there's two levels of impact: first order effects and second order effects.
On first order, once chain of thought became public a large number of groups started using it explicitly to finetune their models.
Aside from non-public examples, big ones include PaLM, Google's most powerful model to date. Moreover, it makes models much more useful for internal R&D with just prompting and no finetuning.
We don’t know what OpenAI used for ChatGPT, or future models: if you have some information about that, it would be super useful to hear about it!
On second order: implementing this straightforwardly improved the impressiveness and capabilities of models, making them more obviously powerful to the outside world, more useful for customers, and leading to an increase in attention and investment into the field.
Due to compounding, the earlier these additional investments arrive, the sooner large downstream effects will happen.
- ^
This is also partially replying to @Rohin Shah 's question in another comment:
Why do you believe this "drastically" slowed down progress?
We'd maybe be at our current capability level in 2018, [...] the world would have had more time to respond to the looming risk, and we would have done more good safety research.
It’s pretty hard to predict the outcome of “raising awareness of problem X” ahead of time. While it might be net good right now because we’re in a pretty bad spot, we have plenty of examples from the past where greater awareness of AI risk has arguably led to strongly negative outcomes down the line, due to people channeling their interest in the problem into somehow pushing capabilities even faster and harder.
My view is that progress probably switched from being net positive to net negative (in expectation) sometime around GPT-3.
We fully agree on this, and so it seems like we don’t have large disagreements on externalities of progress. From our point of view, the cutoff point was probably GPT-2 rather than 3, or some similar event that established the current paradigm as the dominant one.
Regarding the rest of your comment and your other comment here, here are some reasons why we disagree. It’s mostly high level, as it would take a lot of detailed discussion into models of scientific and technological progress, which we might cover in some future posts.
In general, we think you’re treating the current paradigm as over-determined. We don’t think that being in a DL-scaling language model large single generalist system-paradigm is a necessary trajectory of progress, rather than a historical contingency.
While the Bitter Lesson might be true and a powerful driver for the ease of working on singleton, generalist large monolithic systems over smaller, specialized ones, science doesn’t always (some might say very rarely!) follow the most optimal path.
There are many possible paradigms that we could be in, and the current one is among the worse ones for safety. For instance, we could be in a symbolic paradigm, or a paradigm that focuses on factoring problems and using smaller LSTMs to solve them. Of course, there do exist worse paradigms, such as a pure RL non-language based singleton paradigm.
In any case, we think the trajectory of the field got determined once GPT-2 and 3 brought scaling into the limelight, and if those didn’t happen or memetics went another way, we could be in a very very different world.
1. Fully agree and we appreciate you stating that.
2. While we are concerned about capability externalities from safety work (that’s why we have an infohazard policy), what we are most concerned about, and that we cover in this post, is deliberate capabilities acceleration justified as being helpful to alignment. Or, to put this in reverse, using the notion that working on systems that are closer to being dangerous might be more fruitful for safety work, to justify actively pushing the capabilities frontier and thus accelerating the arrival of the dangers themselves.
3. We fully agree that engaging with arguments is good, this is why we’re writing this and other work, and we would love all relevant players to do so more. For example, we would love to hear a more detailed, more concrete story from OpenAI of why they believe accelerating AI development has an altruistic justification. We do appreciate that OpenAI and Jan Leike have published their own approach to AI alignment, even though we disagree with some of its contents, and we would strongly support all other players in the field doing the same.
4.
I think that Anthropic's work also accelerates AI arrival, but it is much easier for it to come out ahead on a cost-benefit: they have significantly smaller effects on acceleration, and a more credible case that they will be safer than alternative AI developers. I have significant unease about this kind of plan, partly for the kinds of reasons you list and also a broader set of moral intuitions. As a result it's not something I would do personally.
But I've spent some time thinking it through as best I can and it does seem like the expected impact is good.
We share your significant unease with such plans. But given what you say here, why at the same time you wouldn’t pursue this plan yourself, yet you say that it seems to you like the expected impact is good?
From our point of view, an unease-generating, AI arrival-accelerating plan seems pretty bad unless proven otherwise. It would be great for the field to hear the reasons why, despite these red flags, this is nevertheless a good plan.
And of course, it would be best to hear the reasoning about the plan directly from those who are pursuing it.
Good point, and I agree progress has been slower in robotics compared to the other areas.
I just edited the post to add better examples (DayDreamer, VideoDex and RT-1) of recent robotics advances that are much more impressive than the only one originally cited (Boston Dynamics), thanks to Alexander Kruel who suggested them on Twitter.
Was Dario Amodei not the former head of OpenAI’s safety team?
He wrote "Concrete Problems in AI Safety".
I don't see how the claim isn't just true/accurate.
If someone reads "Person X is Head of Safety", they wouldn't assume that the person led the main AI capabilities efforts of the company for the last 2 years.
Only saying "head of the safety team" implies that this was his primary activity at OpenAI, which is just factually wrong.
According to his LinkedIn, from 2018 until end of 2020, when he left, he was Director of Research and then VP of Research of OpenAI, where he "set overall research direction" and "led the effort to build GPT2 and 3". He led the safety team before that, between 2016 and 2018.
Your graph shows "a small increase" that represents progress that is equal to an advance of a third to a half the time left until catastrophe on the default trajectory. That's not small! That's as much progress as everyone else combined achieves in a third of the time till catastrophic models! It feels like you'd have to figure out some newer efficient training that allows you to get GPT-3 levels of performance with GPT-2 levels of compute to have an effect that was plausibly that large.
In general I wish you would actually write down equations for your model and plug in actual numbers; I think it would be way more obvious that things like this are not actually reasonable models of what's going on.
I'm not sure I get your point here: the point of the graph is to just illustrate that when effects compound, looking only at the short term difference is misleading. Short term differences lead to much larger long term effects due to compounding. The graph was just a quick digital rendition of what I previously drew on a whiteboard to illustrate the concept, and is meant to be an intuition pump.
The model is not implying any more sophisticated complex mathematical insight than just "remember that compounding effects exist", and the graph is just for illustrative purposes.
Of course, if you had perfect information at the bottom of the curve, you would see that the effect your “small” intervention is having is actually quite big: but that’s precisely the point of the post, it’s very hard to see this normally! We don’t have perfect information, and the point aims at raising salience to people’s minds that what they perceive as a “small” action in the present moment, will likely lead to a “big” impact later on.
To illustrate the point: if you make a discovery now worth 2 billion dollars more of investment in AI capabilities, and this compounds yearly at a 20% rate, you’ll get far more than +2 billion in the final total e.g., 10 years later. If you make this 2 billion dollar discovery later, after ten years you will not have as much money invested in capabilities as you would have in the other case!
Such effects might be obvious in retrospect with perfect information, but this is indeed the point of the post: when evaluating actions in our present moment it’s quite hard to foresee these things, and the post aims to raise these effects to salience!
We could spend time on more graphs, equations and numbers, but that wouldn’t be a great marginal use of our time. Feel free to spend more time on this if you find it worthwhile (it’s a pretty hard task, since no one has a sufficiently gears-level model of progress!).
I think this is pretty false. There's no equivalent to Let's think about slowing down AI, or a tag like Restrain AI Development (both of which are advocating an even stronger claim than just "caution") -- there's a few paragraphs in Paul's post, one short comment by me, and one short post by Kaj. I'd say that hardly any optimization has gone into arguments to AI safety researchers for advancing capabilities.
[...]
(I agree in the wider world there's a lot more optimization for arguments in favor of capabilities progress that people in general would find compelling, but I don't think that matters for "what should alignment researchers think".)
Thanks for the reply! From what you’re saying here, it seems like we already agree that “in the wider world there's a lot more optimization for arguments in favor of capabilities progress”.
I’m surprised to hear that you “don't think that matters for "what should alignment researchers think".”
Alignment researchers, are part of the wider world too! And conversely, a lot of people in the wider world that don’t work on alignment directly make relevant decisions that will affect alignment and AI, and think about alignment too (likely many more of those exist than “pure” alignment researchers, and this post is addressed to them too!)
I don't buy this separation with the wider world. Most people involved in this live in social circles connected to AI development, they’re sensitive to status, many work at companies directly developing advanced AI systems, consume information from the broader world and so on. And the vast majority of the real world’s economy has so far been straightforwardly incentivizing reasons to develop new capabilities, faster. Here's some tweets from Kelsey that illustrate some of this point.
Anthropic’s founding team consists of, specifically, people who formerly led safety and policy efforts at OpenAI
This claim seems misleading at best: Dario, Anthropic's founder and CEO, led OpenAI's work on GPT-2 and GPT-3, two crucial milestone in terms of public AI capabilities.
Given that I don't have much time to evaluate each claim one by one, and Gell-Mann amnesia, I am a bit more skeptical of the other ones.
The link in "started out as a comment on this post", in the first line of the post, is broken