Alignment researchers, how useful is extra compute for you?

post by Lauro Langosco · 2022-02-19T15:35:31.751Z · LW · GW · 4 comments

Contents

  TLDR
  Introduction
  Main idea
    Potential benefits
    Potential problems
  Form
None
4 comments

TLDR

If you work in AI alignment / safety research, please fill out this form on how useful access to extra compute would be for your research. This should take under 10 minutes, and you don't need to read the rest of this post beforehand—in fact it would be great if you could fill out the form right now.

Introduction

I want to get an idea of how much demand there is for a university-independent organization that manages a compute cluster for academic AI alignment groups and independent researchers. Currently I don’t know anybody who is willing to run such an organization, but if demand is large one could either actively look for people to run such a project or find an existing organization that is willing to take it on.

Main idea

Non-industry AI safety research organizations have a hard time procuring compute. Groups spend many researcher-hours on managing servers on a relatively small scale. Common obstacles are 1) having to deal with university bureaucracy (e.g. regarding hiring, engineer wages, and procurement) and 2) missing out on economies of scale.

Proposal: A university-independent organization that provides access to compute for academic AI alignment research groups as well as independent researchers. Such an organization could pay high wages for its engineers (compared to academic labs) and benefit from economies of scale.

Potential benefits

Potential problems

Form

If you haven't already, please fill out this form about how much extra compute might accelerate your research (<10 mins).

4 comments

Comments sorted by top scores.

comment by gwern · 2022-02-19T20:03:17.100Z · LW(p) · GW(p)

One suggestion I would make: don't run your own cluster at all (is that really a core compentency or skill of yours compared to the hyperscalers?), and simply give alignment researchers GCP funding. TRC (TPU Research Cloud) has a ton of TPUs which they give out like candy for research purposes, but which go highly underused in considerable part because they are unable to cover GCP costs, only TPU costs. (It is bizarre and perverse and there is no sign of the situation changing no matter who I complain to at Google.) There turn out to be a lot of people out there who can't just pony up <$500/month for the GCP costs, even to unlock >$50k/month of TPU time. So, they don't.

You could probably do this inside GCP itself by setting up an org or group where you pay the bills and people apply to have their accounts added as quasi-root users who can spin up whatever buckets or VMs they need to drive their TRC quotas. (This is sort of how Tensorfork operated.)

Replies from: Lauro Langosco
comment by Lauro Langosco · 2022-02-20T11:40:15.859Z · LW(p) · GW(p)

Thanks for the rec! I knew TRC was awesome but wasn't aware you could get that much compute.

Still, beyond short-term needs it seems like this is a risky strategy. TRC is basically a charity project that AFAIK could be shut down at any time.

Overall this updates me towards "we should very likely do the GCP funding thing. If this works out fine, setting up a shared cluster is much less urgent. A shared cluster still seems like the safer option in the mid to long term, if there is enough demand for it to be worthwhile"

Curious if you disagree with any of this

Replies from: gwern
comment by gwern · 2022-02-20T15:37:25.257Z · LW(p) · GW(p)

Yes, it's possible TRC could shut down or scale back its grants. But then you are no worse off than you are now. And if you begin building up a shared cluster as a backup or alternative, you are losing the time-value of the money/research and it will be increasingly obsolete in terms of power or efficiency, and you aren't really at much 'risk': a shutdown means that a researcher switches gears for a bit or has to pay normal prices like everyone else etc, but there's no really catastrophic outcome like going-bankrupt. OK, you lose the time and effort you invested in learning GCP and setting up such an 'org' in it, but that's small potatoes - probably buying a single A100 costs more! For DL researchers, the rent vs buy dichotomy is always heavily skewed towards 'rent'. (Going the GCP route has other advantages in terms of getting running faster and building up a name and practical experience and a community who would even be interested in using your hypothetical shared cluster.)

comment by leogao · 2022-02-21T07:05:26.466Z · LW(p) · GW(p)

Looking forward to seeing the survey results!

By the way, if you're an alignment researcher and compute is your bottleneck, please send me a DM. EleutherAI already has a lot of compute resources (as well as a great community for discussing alignment and ML!), and we're very interested in providing compute for alignment researchers with minimal bureaucracy required.