How To Get Into Independent Research On Alignment/Agency

johnswentworth

How To Get Into Independent Research On Alignment/Agency

post by johnswentworth · 2021-11-19T00:00:21.600Z · LW · GW · 38 comments

  Background Models
    Independence
    Preparadigmicity
  Getting Paid
    Don’t Bullshit
    Reading
    The Hamming Question
    Use Your Pareto Frontier
    Legibility
  When To Start
    Runway
  Meta
None
38 comments

I’m an independent researcher working on AI alignment and the theory of agency. I’m 29 years old, will make about $90k this year, and set my own research agenda. I deal with basically zero academic bullshit - my grant applications each take about one day’s attention to write (and decisions typically come back in ~1 month), and I publish the bulk of my work right here on LessWrong/AF. Best of all, I work on some really [? · GW] cool [LW · GW] technical [LW · GW] problems [LW · GW] which I expect are central to the future of humanity.

If your reaction to that is “Where can I sign up?”, then this post is for you.

Background Models

Independence

First things first: the “independent” part of “independent research” means self-employment, and everything that goes with it. It means the onus is on you to figure out what to do, how to provide value, what to prioritize, and what to aim for. In practice, it also usually means “independent” in a broader sense: you won’t have a standard template or agenda to follow. If you go down this path, assume that you will need to chart your own course - in particular, your own research agenda.

For the sort of person this post is aimed at, that will be a very big upside, not a downside.

Disclaimer: there are ways to get into alignment research which don’t involve quite so much figuring-it-all-out-on-your-own. Some people receive mentorship from existing researchers. Some people go work for alignment research organizations. Either of those paths can involve “independent research” in the sense that you are technically self-employed, but those paths aren’t “independent” in the broader sense of the word, and they’re not the main topic of this post.

Preparadigmicity

As a field, the study of alignment and agency is especially well-suited to independent research, because they center around problems we don’t understand [LW · GW]. It’s not just that we don’t have the answers; we don’t even have the right frames for thinking about the problems. Agency is an area where we are fundamentally confused. AI alignment is largely a problem which hasn’t happened yet, on technology which hasn’t been invented yet, which we nonetheless want to solve in advance [LW · GW]. Figuring out the right frames - the right paradigm - is itself a central part of the job.

The field needs people who are going to come up with new frames/approaches/models/paradigms/etc, because we’re pretty sure the current frames/approaches/models/paradigms/etc aren’t enough. Thus the great fit for independent research: as an independent researcher, you’re not beholden to some existing agenda based on existing frames. Coming up with your own idea of what the key problems are, how to frame them, what tools to apply… that sort of thing is exactly what we need, and it requires people who aren’t committed to the strategies of existing senior researchers and organizations. It requires people who have an independent high-level understanding of the field and different angles of looking at, and can pick out the key problems and paths from that perspective.

Again, for the sort of person this post is aimed at, that will be a very big upside.

… but it comes with some trade-offs. As a historical example of preparadigmatic research, here’s Kuhn talking about optics before Newton:

Being able to take no common body of belief for granted, each writer on physical optics felt forced to build his field anew from its foundations. In doing so, his choice of supporting experiment and observation was relatively free, for there was no standard set of methods or of phenomena that every optical writer felt forced to employ and explain. Under these circumstances, the dialogue of the resulting books was often directed as much to the members of other schools as it was to nature.

This very much applies to alignment research. Because the field does not already have a set of shared frames [LW · GW] - i.e. a paradigm - you will need to spend a lot of effort explaining your frames, tools, agenda, and strategy. For the field, such discussion is a necessary step to spreading ideas and eventually creating a paradigm. For you, it’s a necessary step to get paid, and to get useful engagement with your work from others.

In particular, you will probably need to both think and write a lot about your strategy: the models and intuitions which inform why you’re working on the particular problems you’ve chosen, why the tools you’re using seem promising, what kinds of results you expect, and what your long-term vision looks like. Inevitably, a lot of this will rely on informal arguments or intuitions; you will need to figure out how to trace the sources of those intuitions and explain them to other people, without having to formalize everything. Explain the actual process which led to an idea/decision/approach [LW · GW], without going down the bottomless rabbit hole of deeply researching every single claim.

The current version of LessWrong was built in large part to support exactly that sort of discussion, and I strongly recommend using it.

Getting Paid

Right now, the best grantmaker in this space is the Long-Term Future Fund (LTFF). There are other options [LW(p) · GW(p)], but none are quite as good a fit for the sort of work we’re talking about here.

I’ve received a few LTFF grants myself and know some of the people involved in the grantmaking decisions, so I’ll give some thoughts on the most important things you’ll need in order to get paid. Bear in mind that this is inherently speculative and not endorsed by anyone at LTFF. I’d also recommend looking at LTFF’s past grants to get a more direct idea of what kinds of things they fund.

Don’t Bullshit

A low-bullshit grantmaking process works both ways. The LTFF wants to do object-level useful things, not just Look Prestigious, so they keep the application simple and the turnaround time relatively fast. The flip side is that I expect them to look very unkindly on bullshit - i.e. attempts to make the applicant/application Sound Prestigious without actually doing object-level useful things.

In academia, it’s common practice to make up some bullshit about how your research is going to help the world. During my undergrad, this sort of bullshit was explicitly taught. Of course, it’s not like anyone is ever going to hire an economist or statistician (let alone consult a prediction market) to figure out whether the research is actually likely to impact the world in the manner claimed. The goal is just to make the proposal sound good. If you’re coming from academia, this sort of bullshit may be an ingrained habit which takes effort to break.

If you want to make it in alignment/agency research, you’re going to need an actual object-level strategy.

We’ll talk more in the next sections about how to come up with a strategy, but the first stop is The Bottom Line [LW · GW]: once you’ve chosen a strategy, anything you say to justify it will not make it any more correct. All that matters is the process which originally made you choose that strategy, or made you stick to it at times when you might realistically have changed course. So first things first, forget whatever clever idea you already have cached, and let’s start from a blank slate.

Reading

Preparadigmicity means you’ll need to spend a lot of time explaining your choice of vision, strategy, models, tools, etc. The flip side of that coin is reading: you’ll probably need to read quite a bit of material from others in the field. This is often nontechnical or semi-technical background material, explanations of intuitions, vague gesturing at broad ideas, etc - you can see plenty of it here on LessWrong and the Alignment Forum. The more of this you read, the better you’ll understand other researchers’ frames (or at least know which frames you don’t understand), and the better you’ll be able to explain your own material in terms others can readily understand.

Early on, there are two main motivators for reading:

To understand which strategies have already been tried, and failed, to avoid retreading that ground
To understand a bit of the existing jargon (definitely not all of it!), in order to explain your own ideas in terms already familiar to others

To understand (some) existing approaches and jargon, I’d recommend at least skimming these sequences/posts, and diving deeper into whichever most resemble the directions you want to pursue:

Embedded Agency [? · GW]
Value Learning [? · GW]
11 Proposals For Building Safe Advanced AI [LW · GW]
Risks From Learned Optimization [? · GW]

To understand barriers (other than what’s discussed in the above links), this talk and the Rocket Alignment Problem [LW · GW] are probably the best starting points. Note that lots of people disagree with those last two links (as well as 11 Proposals), but you probably want to be at least familiar enough to have an informed disagreement.

Note that this is all on LessWrong, which means you can leave comments with questions, attempts to summarize, disagreements, etc. Often people will reply. This helps a lot for actually absorbing the ideas. (h/t Adam Shimi for pointing this out.)

I invite others to leave suggested reading in the comments. (This does risk turning into a big debate over whether X or Y is actually a good idea for new people, but at least then we’ll have a realistic demonstration of how much everybody disagrees over all this. I did warn you that the field is preparadigmatic!)

Finally, there’s The Sequences [? · GW]. They are long, but if you haven’t read them, then you definitely risk various failure modes which will be obvious to people who have read them and very confusing to you. I wouldn’t quite say they’re required reading, especially if you’re on the more technical end of the spectrum and already somewhat familiar with alignment discussions, but there are definitely many people who will be somewhat surprised if you do technical alignment/agency research and haven’t read them.

Again, I want to emphasize that everyone disagrees on all this stuff. Roughly speaking, assume that the grantmakers care more about your research having some plausible path to usefulness than about agreeing with any particular position in any of the field’s ongoing arguments.

The Hamming Question

Over on the other side of the dining hall was a chemistry table. I had worked with one of the fellows, Dave McCall; furthermore he was courting our secretary at the time. I went over and said, "Do you mind if I join you?" They can't say no, so I started eating with them for a while. And I started asking, "What are the important problems of your field?" And after a week or so, "What important problems are you working on?" And after some more time I came in one day and said, "If what you are doing is not important, and if you don't think it is going to lead to something important, why are you at Bell Labs working on it?" I wasn't welcomed after that; I had to find somebody else to eat with!

Probably the most common mistake people make when first attempting to enter the alignment/agency research field is to not have any model at all of the main bottlenecks to alignment, or how their work will address those bottlenecks. The standard (and strongly recommended) exercise to alleviate that problem is to start from the Hamming Questions [? · GW]:

What are the most important problems in your field (i.e. alignment/agency)?
How are you going to solve them?

At this point, somebody usually complains that minor contributions are important or some such. I’m not going to argue with that, because I expect the sort of person who this post is already aimed at (i.e. people who are excited to forge their own path in a technical field where everyone is fundamentally confused) is probably not the sort of person who is aiming for minor contributions anyway.

If you have decent answers to the Hamming Questions, and you make those answers clear to other people, that is probably a sufficient condition for your grant application to not end up in the giant pile of applications from people who don’t even have a model of how their proposal will help. It’s not quite a sufficient condition to get paid, but I would guess that a large majority of people who can clearly answer the Hamming Questions do get paid.

I want to emphasize that I think clear answers to the Hamming Questions are an approximately-sufficient condition, not an approximately-necessary condition; there are definitely other paths. Steve’s story [LW(p) · GW(p)] in the comments below is a good example; in his words:

If you're a kinda imposter-syndrome-y person who just constitutionally wouldn't dream of looking themselves in the mirror and saying "I am aiming for a major contribution!", well me too, and don't let John scare you off. :-P

Use Your Pareto Frontier

A great line from Adam Shimi:

Most people who try to go in a direction 'no one else has tried' end up going in the most obvious direction which everyone else has tried.

My main advice to avoid this failure mode is to leverage your Pareto frontier [LW · GW]. Apply whatever knowledge, or combination of knowledge, you have which others in the field don’t. Personally, I’ve gained a lot of insight into agency by drawing on systems biology, economics, statistical mechanics, and chaos theory. Others draw heavily on abstract math, like category theory or model theory. Evolutionary biology and user interface design are both rich sources.

This is one reason why it helps to have a broad technical background: the more frames and tools you have to draw on, the more likely you’ll find a novel and promising combination to apply to the most important problems in the field. (Or, just as good: the more frames and tools you have to draw on, the more likely you’ll notice that one of the most important problems has been overlooked.)

Flip side of this: if you have a novel-seeming idea which involves the same kinds of frames and tools which most people in alignment have (i.e. programming expertise, some ML experience, reading Astral Codex Ten) then do write it up, but don’t be surprised if it’s already been done.

If you read through some existing alignment work, and the strategy seems obviously wrong to you in a way which would not be obvious to the median LessWrong user, then that’s a very promising sign.

Legibility

Part of getting a grant is not just having a good plan and the skills to execute it, but to make your plan and skills legible to the people reviewing the grant.

Here’s (my summary of) a rough model from Oli, who’s one of the fund managers for LTFF. In order to get a grant for alignment research, usually someone needs to do one of these three:

Write a grant application which clearly signals that they understand the alignment problem and have a non-bullshitted research strategy. (This is rare/difficult.)
Have a reference from someone the fund managers know and trust (i.e. the existing alignment research community).
Have some visible online material which clearly signals that they understand the alignment problem and have a non-bullshitted research strategy. (LessWrong posts/comments are a central example.)

As a new entrant to the field, I expect that option #3 is probably your main path. Write up not just your research strategy, but the intuitions, models and arguments behind that strategy. Give examples. Explain what you consider the key problems, why those problems seem central, and the frames and generators behind that reasoning. Again, give examples. Explain conjectures or tools you think are relevant, ideally with examples. If you’re on the theory side, sketch potential empirical tests; if on the empirical side, sketch the conceptual theory behind the ideas. And include examples. Explain your vision of success, and expected applications of your research (if it succeeds). At all stages, focus on giving accessible, intuitive explanations and lots of examples; even people who have lots of technical background will often skip over sections with just dense math, and not everyone has the same technical background as you. And put the examples at the beginnings of the posts, before the abstract/general explanations [LW · GW].

Remember: this is preparadigmatic work. Writing up the ideas, and the generators of the ideas, and the frames, and the tools, and making it all clear and accessible to people with totally different frames and tools, is a central part of the job.

All this writing will also make option #1 and #2 easier over time: writing a lot of posts and comments will eventually generate social connections (though this takes quite a bit of time, especially if you’re not in the Bay Area), and discussion/feedback will give some idea of how to explain things in a way which signals the kinds-of-things LTFF looks for.

(On the topic of feedback: a lot of more experienced researchers ignore most posts which they don't find very promising, partly because it’s a lot of work to explain/argue about problems and partly because there are too many posts to read it all anyway. If you explicitly reach out - e.g. send a message on LessWrong - and ask for feedback, people are much more likely to tell you what they think.)

By the time all that is written up and posted, the grant application itself is a drop in the bucket; that’s a big part of why it only takes a day to write up. A quote from Oli regarding the actual application:

I really wish people would just pretend they’re writing me an email explaining what they plan to do, rather than something aimed at the general public.

This is part of why option #1 is rare - people try to write the LTFF application like it’s an academic grant application or something, and it really isn’t. But also, clear communication is just pretty hard in general, even when you do understand the problem and have a non-bullshitted strategy.

When To Start

This post was mostly written for people who already have the technical skills they need. That probably means grad-level education, though a PhD is definitely not a formal requirement. I know at least a few who think less-than-a-full-undergrad can suffice. Personally, I never went to grad school (though admittedly my undergrad coursework looks an awful lot like a PhD program; I got an unusually large amount of mileage out of it).

In terms of specific skills, I recently wrote a study guide [LW · GW] with a bunch of technical topics I’ve found useful, but the more important point is that we don’t currently know what the right combination of background knowledge is. If you already have a broad technical background, then my advice is to take a stab at the problem and see how it goes.

If you are currently in high school or undergrad, the study guide [LW · GW] has some recommendations for what to study (and why). The larger your knowledge base, the more tools and frames you’ll have to draw on later. You could also apply for a grant to e.g. pursue some alignment/agency research project over the summer; taking a stab at it will give you some firsthand data on what kinds of tools/frames are useful.

Runway

The grant application takes maybe a day, but there will probably be some groundwork before you’re ready for that. You’ll probably want to read a bunch, figure out a strategy, put up a few posts on it, and maybe update in response to feedback.

Personally, I quit my job as a data scientist in late 2018, and tried out a few different things over the course of the next year before settling into alignment/agency research. I got my first grant in late 2019. If someone with roughly my 2018 level of background knew up front that they wanted to enter the field, I think it would take a lot less time than that; a few months would be my guess. That said, my level of background in 2018 was already well above zero.

I wrote a fair bit on LessWrong, and researched some agency problems, even before quitting my job. I do expect it helps to “ease into it” this way, and if you’re coming in fresh you should probably give yourself extra time to start writing up ideas, following the field, and getting feedback. That said, you should probably plan on going full time at latest by the time you get a grant, and possibly sooner. If you’re in academia, then you’ll probably have more room to aim the bulk of your research at alignment without striking out on your own. (Though you should still totally strike out on your own and enjoy the no-academic-bullshit lifestyle.)

38 comments

Comments sorted by top scores.

comment by Steven Byrnes (steve2152) · 2021-11-19T02:59:09.664Z · LW(p) · GW(p)

the sort of person who this post is already aimed at (i.e. people who are excited to forge their own path in a technical field where everyone is fundamentally confused) is probably not the sort of person who is aiming for minor contributions anyway.

For me, there were two separate decisions. (1) Around March 2019, having just finished my previous intense long-term internet hobby, I figured my next intense long-term internet hobby was gonna be AI alignment; (2) later on, around June 2020, I started trying to get funding for full-time independent work. (I couldn't work at an org because I didn't want to move to a different city.)

I want to emphasize that at the earlier decision-point, I was absolutely "aiming for minor contributions". I didn't have great qualifications, or familiarity with the field, or a lot of time. But I figured that I could eventually get to a point where I could write helpful comments on other people's blog posts. And that would be my contribution!

Well, I also figured I should be capable of pedagogy and outreach. And that was basically the first thing I did—I wrote a little talk [LW · GW] summarizing the field for newbies, and gave it to one audience, and tried and failed to give it to a second audience.

(I find it a lot easier to "study topic X, in order to do Y with that knowledge", compared to "study topic X" full stop. Just starting out on my new hobby, I had no Y yet, so "giving a pedagogical talk" was an obvious-to-me choice of Y.)

Then I had some original ideas! And blogged about them. But they turned out to be bad.

Then I had different original ideas! And blogged about them in my free time for like a year before I applied for LTFF.

…and they rejected me. On the plus side, their rejection came with advice about exactly what I was missing if I wanted to reapply. On the minus side, the advice was pretty hard to follow, given my time constraints. So I started gradually chipping away at the path towards getting those things done. But meanwhile, my rejected LTFF application got forwarded around, and I got a grant offer from a different source a few months later (yay).

With that background, a few comments on the post:

I wrote a fair bit on LessWrong, and researched some agency problems, even before quitting my job. I do expect it helps to “ease into it” this way, and if you’re coming in fresh you should probably give yourself extra time to start writing up ideas, following the field, and getting feedback.

I also went down the "ease into it" path. It's especially (though not exclusively) suitable for people like me who are OK with long-term intense internet hobbies. (AI alignment was my 4th long-term intense internet hobby in my lifetime. Probably last. They are frankly pretty exhausting, especially with a full-time job and kids.)

Probably the most common mistake people make when first attempting to enter the alignment/agency research field is to not have any model at all of the main bottlenecks to alignment, or how their work will address those bottlenecks.

Just to clarify:

This quote makes sense to me if you read "when first attempting to enter the field" as meaning "when first attempting to enter the field as a grant-funded full-time independent researcher".

On the other hand, when you're first attempting to learn about and maybe dabble in the field, well obviously you won't have a good model of the field yet.

One more thing:

the sort of person who this post is already aimed at (i.e. people who are excited to forge their own path in a technical field where everyone is fundamentally confused) is probably not the sort of person who is aiming for minor contributions anyway.

If you're a kinda imposter-syndrome-y person who just constitutionally wouldn't dream of looking themselves in the mirror and saying "I am aiming for a major contribution!", well me too, and don't let John scare you off. :-P

I can attest that it’s an awesome job.

I agree!

comment by tryactions · 2021-11-19T17:43:10.162Z · LW(p) · GW(p)

Thanks for this post, these kinds of details seem very useful for anyone wanting to attempt this path!

A worry I have: there are people who long for the imagined lifestyle and self-description of being an independent AI alignment/agency researcher. I would categorize some of my past selves this way.

For many such people, trying to follow this path too enthusiastically would be bad for them -- but they might not have the memetic immunities that protect them from those bad decisions. For instance, their social safety net might be insufficient for the level of financial risk, or the career tradeoffs might be very large. This post is enthusiastic, but I think many people need to be urged caution when making major life changes -- especially around such high stakes causes, where emotions run high.

So for my past selves, I'd disclaim:

It's ok (and good) to prioritize your own financial and social safety net. You can revisit your ability to contribute from a better position in the future. The risks of things not going as well for you are very real.
When starting down such a path, you should have a clear fallback plan that does not involve immense suffering. For instance, make effort for X time period before attempting to find an alternate job if you have not achieved Y income. Do this only if you have confidence you will not take a too-large psychological hit from the failure.

Replies from: steve2152

↑ comment by Steven Byrnes (steve2152) · 2021-11-19T18:05:09.863Z · LW(p) · GW(p)

Yeah, I took the extremely-low-risk option of tinkering away as a hobby, while working a normal industry job, until I had a new income source in hand. So I had no employment gap. That turned out to be a viable option for me, but YMMV. For example, some jobs suck up all your time or energy, leaving no slack [LW · GW] for side-projects. Anyone can DM me for other tips and tricks. :)

comment by Rob Bensinger (RobbBB) · 2021-11-19T05:23:21.603Z · LW(p) · GW(p)

I love this post. Thanks, John.

comment by jaan · 2021-11-19T15:49:44.111Z · LW(p) · GW(p)

amazing post! scaling up the community of independent alignment researchers sounds like one of the most robust ways to convert money into relevant insights.

comment by Koen.Holtman · 2021-11-23T14:36:20.433Z · LW(p) · GW(p)

As nobody else has mentioned it yet in this comment section: AI Safety Support is a resource-hub specifically set up to help people get into alignment research field.

I am a 50 year old independent alignment researcher. I guess I need to mention for the record that I never read the sequences, and do not plan to. The piece of Yudkowsky writing that I'd recommend everybody interested in alignment should read is Corrigibilty. But in general: read broadly, and also beyond this forum.

I agree with John's observation that some parts of alignment research are especially well-suited to independent researchers, because they are about coming up with new frames/approaches/models/paradigms/etc.

But I would like to add a word of warning. Here are two somewhat equally valid ways to interpret LessWrong/Alignment Forum:

It is a very big tent that welcomes every new idea
It is a social media hang-out for AI alignment researchers who prefer to engage with particular alignment sub-problems and particular styles of doing alignment research only.

So while I agree with John's call for more independent researchers developing good new ideas, I need to warn you that your good new ideas may not automatically trigger a lot of interest or feedback on this forum. Don't tie your sense of self-worth too strongly to this forum.

On avoiding bullshit: discussion on this forum are often a lot better than on some other social media sites, but still Sturgeon's law applies.

comment by AlexMennen · 2021-12-31T09:04:00.769Z · LW(p) · GW(p)

This post claims that having the necessary technical skills probably means grad-level education, and also that you should have a broad technical background. While I suppose these claims are probably both true, it's worth pointing out that there's a tension between them, in that PhD programs typically aim to develop narrow skillsets, rather than broad ones. Often the first year of a PhD program will focus on acquiring a moderately broad technical background, and then rapidly get progressively more specialized, until you're writing a thesis, at which point whatever knowledge you're still acquiring is highly unlikely to be useful for any project that isn't very similar to your thesis.

My advice for people considering a PhD as preparation for work in AI alignment is that only the first couple years should really be thought of as preparation, and for the rest of the program, you should be actually doing the work that the beginning of the PhD was preparation for. While I wouldn't discourage people from starting a PhD as preparation for work in AI alignment work, I would caution that finishing the program may or may not be a good course of action for you, and you should evaluate this while in the program. Don't end up like me, a seventh-year PhD student working on a thesis project highly unlikely to be applicable to AI alignment despite harboring vague ambitions of working in the field.

Replies from: johnswentworth

↑ comment by johnswentworth · 2021-12-31T16:36:20.180Z · LW(p) · GW(p)

Strong agree. A lot of the technical material which I think is relevant is typically not taught until the grad level, but that does not mean that actually finishing a PhD program is useful. Indeed, I sometimes joke that dropping out of a PhD program is one of the most widely-recognized credentials by people currently in the field - you get the general technical background skills, and also send a very strong signal of personal agency.

comment by Orpheus16 (akash-wasil) · 2023-01-16T07:49:51.031Z · LW(p) · GW(p)

Reviewing this quickly because it doesn't have a review.

I've linked this post to several people in the last year. I think it's valuable for people (especially junior researchers or researchers outside of major AIS hubs) to be able to have a "practical sense" of what doing independent alignment research can be like, how the LTFF grant application process works, and some of the tradeoffs of doing this kind of work.

This seems especially important for independent conceptual work, since this is the path that is least well-paved (relative to empirical work, which is generally more straightforward to learn, or working at an organization, where one has colleagues and managers to work with).

I also appreciate John's emphasis of focusing on core problems & his advice to new researchers:

Probably the most common mistake people make when first attempting to enter the alignment/agency research field is to not have any model at all of the main bottlenecks to alignment, or how their work will address those bottlenecks. The standard (and strongly recommended) exercise to alleviate that problem is to start from the Hamming Questions [? · GW]:
What are the most important problems in your field (i.e. alignment/agency)?
How are you going to solve them?

I expect I'll continue to send this to people interested in independent alignment work & it'll continue to help people go from "what the heck does it mean to get a grant to do conceptual AIS work?" to "oh, gotcha... I can kinda see what that might look like, at least in this one case... but seeing even just one case of this makes the idea feel much more real."

comment by Raemon · 2021-11-27T04:25:18.276Z · LW(p) · GW(p)

Curated. This post matched my own models of how folk tend to get into independent alignment research, and I've seen some people whose models I trust more endorse the post as well. Scaling good independent alignment research seems very important.

I do like that the post also specifies who shouldn't be going to independent research.

comment by bvbvbvbvbvbvbvbvbvbvbv · 2021-11-19T09:48:34.348Z · LW(p) · GW(p)

Currently being a medical student that's very into AI, a dream of mine is to be in independant researcher in computational psychiatry.

Your post is very inspiring.

comment by PabloAMC · 2022-02-15T09:55:33.113Z · LW(p) · GW(p)

While I enjoyed this post, I wanted to indicate a couple of reasons why you may want to instead stay in academia or industry, rather than being an independent researcher:

The first one is that it gives more financial stability.
The second is that academia or industry set the bar high. If you get to publish in a good conference and get substantial citations, you know that you are making progress.

Now, many will argue that Safety is still preparadigmatic and consequently there might be contributions that do not really fit well into standard academic journals or conferences. My answer to this point is that we should aim to make AI Safety become paradigmatic. We should really try to get our hands dirty into technical problems and solve them. I think there is a risk of staying at the conceptual level of research agendas for too long and not getting much done. In fact, I have anecdotic experience (https://twitter.com/CraigGidney/status/1489803239956508672?s=20&t=nNSjfZjqYfbUQ4hvmghoHw) of well-known researchers in a field other than AI Safety that do not get to work on it because they find it hard to measure progress or to have intuitions of what works. I argue: we want to make the field paradigmatic so that it becomes just another academic research field.

I also want to cite another important point against becoming an independent researcher: if you work alone it may take you longer to do any high-quality research done. Developing intuitions takes time, and supervision makes everything so much easier. I know that the community is short of supervision, but perhaps taking a good supervisor who does not directly work on AI Safety, but is happy for you to do so and whose research seems useful as tools might be a great idea.

So in summary: we want high-quality research, we want to be able to measure its high quality, and we want to make the field more concrete and grounded so that we can attract tons of academics.

comment by [deleted] · 2021-11-26T12:47:02.087Z · LW(p) · GW(p)

Hi John, thanks a lot.

Your posts are coming at the perfect time. I just gave my notice at my current job, I have about 3 years of runway ahead of me in which I can do whatever I want. I should definitely at least evaluate AI Safety research. My background is a bachelor's in AI (that's a thing in the Netherlands). The little bits of research I did try got good feedback.

Even though I'm in a great position to try this, it still feels like a huge gamble. I'm aware that a lot of AI Safety research is already of questionable quality. So my question is: how can I determine as quickly as possible whether I'm cut out for this?

Not just asking to reduce financial risk, but also because I feel like my learning trajectory would be quite different if I already knew that it was going to work out in the long run. I'd be able to study the fundamentals a lot more before trying research.

Replies from: johnswentworth, Koen.Holtman

↑ comment by johnswentworth · 2021-11-28T03:29:04.675Z · LW(p) · GW(p)

Man, this is a tough question. Evaluating the quality of research in the field is already a tough problem that everybody disagrees on, and as a result people disagree on what sort of people are well-suited to the work. Evaluating it for yourself without already being an expert in the field is even harder. With that in mind, I'll give an answer which I think a reasonably-broad chunk of people would agree with, but with the caveat that it is very very incomplete.

I had a chat with Evan Hubinger a few weeks ago where we were speculating on how our evaluations of grant applications would compare. (I generally don't evaluate grant applications, but Evan does.) We have very different views on what-matters-most in alignment, and agreed that our rankings would probably differ a lot. But we think we'd probably mostly agree on the binary cutoff - i.e. which applications are good enough to get funding at all. That's because at the moment, money is abundant enough that it makes sense to invest in projects based on views which I think are probably wrong but at least have some plausible model under which they could be valuable. If there's a project where Evan would assign it high value, and Evan's model is itself a model-which-I-think-is-probably-wrong-but-still-plausible, then that's enough to merit a grant. (It's a hits-based grantmaking model.) Likewise, I'd expect Evan to view things-I'd-consider-high-value in a similar way.

Assuming that speculation is correct, the main grants which would not be funded are those which (as far as the grant evaluator can tell) don't have any plausible model under which they'd be valuable. Thus the importance of building your own understanding of the whole high-level problem and answering the Hamming Questions: if you can do that, then you have a model under which your research will be valuable, and all that's left is to communicate that model and your plan.

Now back to your perspective. You're already hanging around and commenting on LessWrong, so right out the gate I have a somewhat-higher-than-default prior that you can evaluate the "some model under which the research is valuable" criterion. You're likely to already have the concepts of Bottom Line [LW · GW] and Trying to Try [LW · GW] and so forth (even if you haven't read those exact posts); you probably already have some intuition for the difference between a plan designed to actually-do-the-thing, versus a plan designed to look-like-it's-doing-the-thing or to look-like-it's-trying-to-do-the-thing. That doesn't mean you already have enough of a model of the alignment/agency problems or a promising thread to tackle them, but hopefully you can at least tell if and when you do have those things.

Replies from: None

↑ comment by [deleted] · 2021-11-28T11:24:00.961Z · LW(p) · GW(p)

Based on your comment, I'm more motivated to just sit down and (actually) try to solve AI Safety for X weeks, write up my results and do an application. What is your 95% confidence interval for what X needs to be to reduce the odds of a false negative (i.e. my grant gets rejected but shouldn't have been) to a single digit?

I'm thinking of doing maybe 8 weeks. Maybe more if I can fall back on research engineering so that I haven't wasted my time completely.

Replies from: johnswentworth

↑ comment by johnswentworth · 2021-11-28T17:18:47.662Z · LW(p) · GW(p)

My main modification to that plan would be "writing up your process is more important than writing up your results"; I think that makes a false negative much less likely.

8 weeks seems like it's on the short end to do anything at all, especially considering that there will be some ramp-up time. A lot of that will just be making your background frames/approach more legible. I guess viability depends on exactly what you want to test:

If your goal is write up your background models and strategy well enough to see if grantmakers want to fund your work based on them, 8 weeks is probably sufficient
If your goal is to see whether you have any large insights or make any significant progress, that usually happens for me on a timescale of ~3 months

It sounds like you want to do something closer to the latter, so 12-16 weeks is probably more appropriate?

↑ comment by Koen.Holtman · 2021-11-28T12:06:26.541Z · LW(p) · GW(p)

I'm aware that a lot of AI Safety research is already of questionable quality. So my question is: how can I determine as quickly as possible whether I'm cut out for this?

My key comment here is that, to be an independent researcher, you will have to rely day-by-day on your own judgement on what has quality and what is valuable. So do you think you have such judgement and could develop it further?

To find out, I suggest you skim a bunch of alignment research agendas, or research overviews like this one [LW · GW], and then read some abstracts/first pages of papers mentioned in there, while trying apply your personal, somewhat intuitive judgement to decide

which agenda item/approach looks most promising to you as an actual method for improving alignment
which agenda item/approach you feel you could contribute most to, based on your own skills.

If your personal intuitive judgement tells you nothing about the above questions, if it all looks the same to you, then you are probably not cut out to be an independent alignment researcher.

comment by Cedar (xida-ren) · 2021-11-20T03:11:35.277Z · LW(p) · GW(p)

Ty John for writing this up. This post and the comments really helps me find my own place and direction in terms of doing what I want to do.

Currently in academia and VERY unhappy about the bullshit I have to ingest and create. But I'm still waiting on my social, political, and financial safety nets before I can do anything remotely brave, kindda like tryactions mentioned in his comment.

So the most I'll do is probably just read and write and talk to people on the side.

Speaking of talking to people...

My current research involves (manually) using CPU architectural artifacts to break sandboxing and steal data. I've been wondering whether I could do something on the lines of "make a simple AI that tries to break out of sandboxes, then make an unbreakable sandbox to contain it".

Do shoot me a message if you have any thoughts, or is just curious. I would love to chat.

Replies from: None

↑ comment by [deleted] · 2021-11-30T23:10:14.949Z · LW(p) · GW(p)

I'm guessing you're aware but Jim Babcock and others have thought a bit about AI containment and wrote about it in Guidelines for AI Containment.

Replies from: xida-ren

↑ comment by Cedar (xida-ren) · 2022-04-01T04:44:59.791Z · LW(p) · GW(p)

I went ahead and took a look! I am actually very new to the community and not at all aware of this.

I have some thoughts on this and I would love it if you would hop on a zoom call with me and help brainstorm a bit. You can find me at cedar.ren@gmail.com

Others are welcome too! I'm just a little lonely and a little lost, and would love to chat with people from lesswrong about these ideas

comment by CatGoddess · 2023-04-05T06:33:00.025Z · LW(p) · GW(p)

My main advice to avoid this failure mode is to leverage your Pareto frontier [LW · GW]. Apply whatever knowledge, or combination of knowledge, you have which others in the field don’t.

This makes sense if you already have knowledge which other people don't, but what about if you don't? How much should "number of people in the alignment community who already know X thing" factor into what you decide to study, relative to other factors like "how useful is X thing, when you ignore what everyone else is doing?" For instance, there are probably fewer people who know a lot about geology than who know a lot about economics, but I would expect that learning about economics would still be more valuable for doing agent foundations research.

(My guess is that the answer is "don't worry much at all about the Pareto frontier stuff when deciding what to study," especially because there aren't that many alignment researchers anyways, but I'm not actually sure.)

Replies from: johnswentworth

↑ comment by johnswentworth · 2023-04-05T15:24:22.935Z · LW(p) · GW(p)

My expectation is that if you do the Alignment Game Tree exercise and maybe a few others like it relatively early, and generally study what seems useful from there, and update along the way as you learn more stuff, you'll end up reasonably-differentiated from other researchers by default. On the other hand, if you find yourself literally only studying ML, then that would be a clear sign that you should diversify more (and also I would guess that's an indicator that you haven't gone very deep into the Game Tree).

comment by Nicholas / Heather Kross (NicholasKross) · 2021-11-20T00:49:46.369Z · LW(p) · GW(p)

This is super-encouraging given my circumstances/desires. Thank you so much for posting this!

comment by Raemon · 2023-01-16T07:43:09.477Z · LW(p) · GW(p)

I'd ideally like to see a review from someone who actually got started on Independent Alignment Research via this document, and/or grantmakers or senior researchers who have seen up-and-coming researchers who were influenced by this document.

But, from everything I understand about the field, this seems about right to me, and seems like a valuable resource for people figuring out how to help with Alignment. I like that it both explains the problems the field faces, and it lays out some of the realpolitik of getting grants.

Actually, rereading this, it strikes me as a pretty good "intro to the John Wentworth worldview", weaving a bunch of disparate posts together into a clear frame.

comment by Casey · 2022-09-27T10:36:27.873Z · LW(p) · GW(p)

Thanks for this John, I found it really useful!

Have you seen any research or discussion on whether independent researchers are more likely to develop unique moral views (because they aren't part of a centralized entity), and therefore bring diversity to the global research effort?

Replies from: johnswentworth

↑ comment by johnswentworth · 2022-09-27T16:01:10.575Z · LW(p) · GW(p)

That is something I (and I think most others) expect, but mostly on priors.

comment by Portia (Making_Philosophy_Better) · 2023-03-04T21:38:53.486Z · LW(p) · GW(p)

Thank you for writing this out. Being able to do the things that drew me to academia (understanding important problems; sharing this understanding with others; impacting important decisions), without the fucked up shit that characterises academia (publication rut, endless grant applications, financial precarity, being stuck on minor easily graspable problems rather than the actual bigger problems, blinkers on between fields, resistance to change, idealising authority, elitism, slow turnaround times, etc.) would be fucking fantastic, and I have long wondered how I could get it done.

comment by Alex Mikhalev (alex-mikhalev) · 2021-11-27T10:22:40.922Z · LW(p) · GW(p)

Thank you for this post very encouraging - I was thinking about applying to LTFF - I have all pre-requisites, now I feel it’s worth the try.

comment by kylefox1 · 2021-11-20T00:35:30.135Z · LW(p) · GW(p)

Hi johnswentworth,

Would you have some spare time in the next few weeks to discuss with me the benefits an independent researcher should expect to receive from joining an independent researcher institute?

You can reach me at kylethefox1@gmail.com.

Thank you for your time and consideration.

Replies from: steve2152, johnswentworth

↑ comment by Steven Byrnes (steve2152) · 2021-11-20T01:03:14.244Z · LW(p) · GW(p)

If you're talking about Theiss or Ronin or IGDORE or things like that, see discussion here [EA · GW].

Replies from: johnswentworth, kylefox1

↑ comment by johnswentworth · 2021-11-20T01:16:18.556Z · LW(p) · GW(p)

Oh perfect, I hadn't seen that. Strong upvote, very helpful.

↑ comment by kylefox1 · 2021-11-20T04:01:25.366Z · LW(p) · GW(p)

Thank you for the link.

↑ comment by johnswentworth · 2021-11-20T01:01:05.683Z · LW(p) · GW(p)

It's a very short discussion: there is no independent researcher institute. There are independent researchers, and we have no institute; that's what independent research (in the most literal sense) means.

... ok, actually, there is kind of an independent researcher institute. It's called the Ronin Institute. I'm not affiliated with them at all, and don't really know much about them. My understanding is that they provide an Official-Sounding Institute for independent researchers (in basically any academic field) to affiliate with, and can provide a useful social circle of other independent researchers. Again, I have no connection to them at all, and no particular advice about them.

[EDIT: never mind, go follow Steve's links above.]

My guess is that you intended to ask a different question than that. Can you give an example of the sort of thing you're asking about?

Replies from: kylefox1

↑ comment by kylefox1 · 2021-11-20T04:00:40.289Z · LW(p) · GW(p)

Almost there. My question was actually concerning the expected benefits from affiliating with an independent researcher institute. For example, an independent researcher would expect to receive grant administration (if funded) and virtual infrastructure services as benefits from Theiss Research in exchange for their affiliation.

Please let me know if there is a need for further clarification.

Replies from: johnswentworth

↑ comment by johnswentworth · 2021-11-20T04:25:53.674Z · LW(p) · GW(p)

Oh cool, that is what you were asking. I guess Steve's got you covered, then; I don't really know any more about it.

Replies from: kylefox1

↑ comment by kylefox1 · 2021-11-20T05:00:46.101Z · LW(p) · GW(p)

Yes. Thank you for your time and replying to my question.

Replies from: adamShimi

↑ comment by adamShimi · 2021-11-20T14:38:22.725Z · LW(p) · GW(p)

Giving a perspective from another country that is far more annoying in administrative terms (France), grant administration can be a real plus. I go through a non-profit in France, and they can take care of the taxes and the declarations, which would be a hassle. In addition, here being self-employed is really bad for many things you might want to do (rent a flat, get a loan, pay for unemployment funds), and having a real contract helps a lot with that.

Replies from: kylefox1

↑ comment by kylefox1 · 2021-11-21T10:10:54.973Z · LW(p) · GW(p)

Thank you for replying adamShimi and providing your perspective on this matter.

Regarding benefits, are there other things you desire beside grant administration and employment status?

How To Get Into Independent Research On Alignment/Agency

Contents

Background Models

Independence

Preparadigmicity

Getting Paid

Don’t Bullshit

Reading

The Hamming Question

Use Your Pareto Frontier

Legibility

When To Start

Runway

Meta

38 comments