0 comments

Comments sorted by top scores.

comment by Joe Collman (Joe_Collman) · 2023-11-23T06:29:50.262Z · LW(p) · GW(p)

This seems great in principle.
The below is meant in the spirit of [please consider these things while moving forward with this], and not [please don't move forward until you have good answers on everything].

That said:

First, I think it's important to clearly distinguish:

A great world would have a lot more AI safety orgs. (yes)
Conditional on many new AI safety orgs starting, the world is in a better place. (maybe)
Intervening to facilitate the creation of new AI safety orgs makes the world better. (also maybe)

This program would be doing (3), so it's important to be aware that (1) is not in itself much of an argument. I expect that it's very hard to do (3) well, and that even a perfect version doesn't allow us to jump to the (1) of our dreams. But I still think it's a good idea!

Some thoughts that might be worth considering (very incomplete, I'm sure):

Impact of potential orgs will vary hugely.
1. Your impact will largely come down to [how much you increase (/reduce) the chance that positive (/negative) high-impact orgs get created].
2. This may be best achieved by aiming to create many orgs. It may not.
  1. Of course [our default should be zero new orgs] is silly, but so would be [we're aiming to create as many new orgs as possible].
  2. You'll have a bunch of information, time and levers that funders don't have, so I don't think such considerations can be left to funders.
3. In the below I'll be mostly assuming that you're not agnostic to the kind of orgs you're facilitating (since this would be foolish :)). However, I note that even if you were agnostic, you'd inevitably make choices that imply significant tradeoffs.
Consider the incentive landscape created by current funding sources.
1. Consider how this compares to a highly-improved-by-your-lights incentive landscape.
2. Consider what you can do to change things for the better in this regard.
  1. If anything seems clearly suboptimal as things stand, consider spending significant effort making this case to funders as soon as possible.
  2. Consider what data you could gather on potential failure modes, or simply on dynamics that are non-obvious at the outset. (anonymized appropriately)
    Gather as much data as possible.
    1. If you don't have the resources to do a good job at experimentation, data gathering etc., make this case to funders and get those resources. Make the case that the cost of this is trivial relative to the opportunity cost of failing to gather the information.
The most positive-for-the-world orgs are likely among the hardest to create.
1. By default, orgs created are likely to be doing not-particularly-neglected things (similar selection pressures that created the current field act on new orgs; non-neglected areas of the field correlate positively with available jobs and in-demand skills...).
2. By default, it's much more likely to select for [org that moves efficiently in some direction] than [org that picks a high-EV-given-what's-currently-known direction].
  1. Given that impact can easily vary by a couple of orders of magnitude (and can be negative), direction is important.
  2. It's long-term direction that's important. In principle, an org that moves efficiently in some direction could radically alter that direction later. In practice, that's uncommon - unless this mindset existed at the outset.
    1. Perhaps facilitating this is another worthwhile intervention?? - i.e. ensuring that safety orgs have an incentive to pivot to higher-EV approaches, rather than to continue with a [low EV-relative-to-counterfactual, but high comparative advantage] approach.
3. Making it easier to create any kind of safety org doesn't change the selection pressures much (though I do think it's a modest improvement). If all the branches are a little lower, it's still the low-hanging-fruit that tends to be picked first. It may often be easier to lower the low branches too.
  1. If possible, you'd want to disproportionately lower the highest branches. Clearly this is easier said than done. (e.g. spending a lot of resources on helping those with hard-to-make-legible ideas achieve legibility, [on a process level, if necessary], so that there's not strong selection for [easy to make legible])
Ground truth feedback on the most important kinds of progress is sparse-to-non-existent.
1. You'll be using proxies (for [what seems important], [what impact we'd expect org x to have], [what impact direction y has had], [what impact new org z has had] etc. etc.).
  1. Most proxies aren't great.
  2. The most natural proxies and metrics will tend to be the same ones others are using. This may help to get a project funded. It tends to act against neglectedness.
  3. Using multiple, non-obvious proxies is worth a thought.
    1. However, note that you don't have the True Name of [AI safety], [alignment]... in your head: you have a vague, confused proxy.
    2. One person coming up with multiple proxies, will tend to mean creating various proxies to their own internal proxy. That's still a single point of failure.
    3. If you all clearly understand the importance of all the proxies you're using, that's probably a bad sign.
It's much better to create a great org slowly, than a mediocre org quickly. This can easily happen with (some of) the same people, entailing a high opportunity cost.
- I think one of the most harmful dynamics at present is the expectation that people/orgs should have a concretely mapped out agenda/path-to-impact within a few months. This strongly selects against neglectedness.
- Even Marius' response to this [LW · GW] seems to have the wrong emphasis:
  "Second, a great agenda just doesn't seem like a necessary requirement. It seems totally fine for me to replicate other people’s work, extend existing agendas, or ask other orgs if they have projects to outsource (usually they do) for a year or so and build skills during that time. After a while, people naturally develop their own new ideas and then start developing their own agendas."
  I.e. that the options are:
  1. Have a great agenda.
  2. Replicate existing work, extend existing agenda, grab existing ideas to work on.
- Where is the [spend time focusing on understanding the problem more deeply, and forming new ideas / approaches]? Of course this may sometime entail some replication/extension, but that shouldn't be the motivation.
- Financial pressures and incentives are important here: [We'll fund you for six months to focus on coming up with new approaches] amounts to [if you pick a high-variance approach, your org may well cease to exist in six months]. If the aim is to get an org to focus on exploration for six months, guaranteed funding for two years is a more realistic minimum.
  - Of course this isn't directly within your control - but it's the kind of thing you might want to make a case for to funders.
  - Again, the more you're able to shape the incentive landscape for future orgs, the more you'll be able to avoid unhelpful instrumental constraints, and focus on facilitating the creation of the kind of orgs that should exist.
  - Also worth considering that the requirement for this kind of freedom is closer to [the people need near-guaranteed financial support for 2+ years]. Where an org is uncertain/experimental, it may still make sense to give the org short-term funding, but all the people involved medium-term funding.

Replies from: AlexandraB

↑ comment by Alexandra Bos (AlexandraB) · 2023-11-29T12:23:31.325Z · LW(p) · GW(p)

Hi Joe, thanks a lot for your thoughtful comment! We think you're making some valid points here and will take your suggestions and questions into consideration

comment by Gurkenglas · 2023-11-21T17:49:46.301Z · LW(p) · GW(p)

All the leading AI labs so far seem to have come from attempts to found AI safety orgs. Do you have a plan against that failure case?

Replies from: LawChan, AlexandraB

↑ comment by LawrenceC (LawChan) · 2023-11-22T00:18:23.922Z · LW(p) · GW(p)

I don't think that's actually true at all; Anthropic was explicitly a scaling lab when made, for example, and Deepmind does not seem like it was "an attempt to found an ai safety org".

It is the case that Anthropic/OAI/Deepmind did feature AI Safety people supporting the org, and the motivation behind the orgs is indeed safety, but the people involved did know that they were also going to build SOTA AI models.

↑ comment by Alexandra Bos (AlexandraB) · 2023-11-29T12:18:44.650Z · LW(p) · GW(p)

Hi there, thanks for bringing this up. There are a few ways we're planning to reduce the risk of us incubating orgs that end up fast-tracking capabilities research over safety research.

Firstly, we want to select for a strong impact focus & value-alignment in participants.

Secondly, we want to assist the founders to set up their organization in a way that limits the potential for value drift (e.g. a charter for the forming organization that would legally make this more difficult, choosing the right legal structure, and helping them with vetting or suggestions for who you can best take on as an investor or board member)

If you have additional ideas around this we'd be happy to hear them.

Replies from: Gurkenglas

↑ comment by Gurkenglas · 2023-11-29T22:57:45.689Z · LW(p) · GW(p)

Retain an option to buy the org later for a billion dollars, reducing their incentive to become worth more than a billion dollars.