Why I'm Not (Yet) A Full-Time Technical Alignment Researcherpost by NicholasKross · 2023-05-25T01:26:49.378Z · LW · GW · 17 comments
This is a link post for https://www.thinkingmuchbetter.com/nickai/other/not-yet-fulltime.html
My legible qualifications so far (as of 24 May 2023): Conclusion None 17 comments
I have bills to pay (rent and food). For those, I need money. Most full-time technical alignment researchers solve this problem by looking for outside funding. (I.e., most researchers are not independently wealthy; they go to someone else to afford to work on alignment full-time.)
To get funding in alignment, you generally apply for a grant or a job. In both cases, anyone who'd give you those things will want to see evidence, beforehand, that you know what you're doing. To a near-tautological degree, this evidence must be "legible".
How do I make my skills/ideas legible? The recommended route [AF · GW] is to write my ideas on a blog/LessWrong, read and interact with other alignment materials, and then... eventually it's enough, I assume. For reasons I may/not write about in the near future, many ideas about alignment (especially anything that could be done with today's systems) could very well accelerate capabilities work. There are at least some [LW · GW] types of [LW · GW] alignment research [LW · GW] that are also easy to use for increasing capabilities. Since I'm especially interested in John Wentworth's "abstraction" ideas, anything good I come up might also be like that. In other words legibility and security conflict at least some of the time, especially on the sorts of ideas I'm personally likely to have.
OK, fine, maybe it's hard to publish legible non-exfohazardous original/smart thoughts about AI alignment. Luckily, funders and hire-ers don't expect everyrone coming in to have already published papers! Perhaps I can simply demonstrate my skills, instead?
Firstly, many technical skills relevant to AI alignment are hard to demonstrate efficiently. Say you develop a cool new ML algorithm. Did you just speed up capabilities? Okay, just take public notes on a large amount of technical reading... but that mostly signals conscientiousness, not research Talent:tm:! Well, how about you do another technical project... well, now you're wasting precious time that maybe should've been spent on original research. How long are your timelines? (This also applies to the "become independently wealthy to fund yourself" strategy, only more so. I spent an embarrassingly long time on that route in my spare time...)
My skills, themselves, are not always legible!
Some of my skills are legible enough for my resume: I've engineered some software at some companies, I know how to debug things, and I list more in a section below. I'm pretty okay at these things.
However, I think the stuff I'm best at is currently under-measured [LW · GW]. These skills include (but aren't limited to): fast learning (assuming I have the energy and sleep and hopefully a bit of prior exposure to the topic), some technical intuition, absurdly-general knowledge, a good bit of security mindset, having-read-and-understood-most-of-The-Sequences, thinking clearly (again modulo sleep/energy), noticing some things, curiosity, and the oft-mocked-but-probably-underrated "creativity" or "idea generation". I wish there were something like Human Benchmark but for the kinds of "mental motions" needed in AI alignment research.
Even when my skills are legible, they don't seem to be "world-class" in the way that MIRI or OpenAI seem to select for. I got a Bachelor's degree in Computer Science (with a minor in Mathematics) from RIT, in upstate New York. Is that impressive? I got a math-SAT score of over 700, IIRC, and the paper I got back said I was in the 98th or 99th percentile. Is that interesting? I worked with Tensorflow at an internship, and have learned (and often forgotten) the basics of ML coding in classes and online courses. Is that enough for more theoretical/mathematical/conceptual work in alignment? I list more of these in a section below, but many of them have caveats that make them even less good at signaling my abilities!
I seem to have a weird sleep cycle. IIRC, Yudkowsky claims to have a non-24-hour sleep-cycle. If I just went to sleep / woke up "when I felt like it", I would probably keep going to bed and waking up later and later, until it loops around again. This is consistent with (though not sufficient for a Full Diagnosis of) a weird sleep cycle. What this means is that I'm tired after work, despite working a comfortable remote programming job full-time. And writing, with anywhere near the thoroughness/clarity/etc to help anything around AI alignment, is Work. And it requires thorough/clear/etc thinking, which is also work. I enjoy thinking, and I don't find at least some kinds of "advanced" thought hard (see above), but there are parts that are Work.
I have mostly-inattentive ADHD, and my medication for it (while helpful!) screws up my sleep if I try to use it for after-work activities... like AI alignment research/upskilling/signaling. (Did I mention how screwed-up my sleep is?)
Relatedly, my working-memory is either poor, or somehow seems poor to me. I think it's mostly "brain fog" from the poor sleep.
I don't know how common this is, but at the risk of saying something common: I'd love to have one of those working-relationships where I talk with another smart person, who knows more formalisms than I do, who could help with the math/writing. "Isn't that just 'I want somebody to do the work while I just have the ideas'?". A little, sure! Do I think that's needed to unlock my potential to help AI alignment? Potentially, yeah! Would I get an alignment job or grant based on that? I don't know!
Despite all of the above, I remain cautiously optimistic about being able to do technical alignment research full-time, hopefully starting within the next year or two. One cause for hope was seeing another researcher, Tamsin Leake, go from indie game dev to being grant-funded and running an alignment nonprofit within a shockingly short timespan.
My legible qualifications so far (as of 24 May 2023):
participated in an AGI Safety Fundamentals session. The meetings only, not the project... and I neglected much of the readings because I was busy that summer with...
some high-level research/writing for Nonlinear.
That one thingy I wrote for EleutherAI's lm-evaluation-harness (and which I think had to be rewritten by Leo Gao?)
SAT scores that IIRC corresponded to 125ish IQ (using that one dodgy numerical-table online).
some commercial software-engineering/testing experience, including my current full-time job.
Limited experience with TensorFlow and PyTorch. Like, I've built a tiny neural net in Python, but I'd be hard-pressed to do it from memory. (I did well in a class assignment of that Kaggle "Titanic" thing, but most of that was data-cleaning, organization, and visuals.)
I'm writing posts to enter into the Open Philanthropy AI Worldviews Contest [EA · GW].
I'm about to start in the online section of John Wentworth's "stream" of this summer's SERI MATS!
What do you recommend? Am I being too paranoid/modest, am I missing 1-2 key things, or am I doomed to be unhelpful/annoying to any alignment project I join?
Comments sorted by top scores.
comment by Thomas Larsen (thomas-larsen) · 2023-05-25T06:26:27.634Z · LW(p) · GW(p)
I'm a guest fund manager for the LTFF, and wanted to say that my impression is that the LTFF is often pretty excited about giving people ~6 month grants to try out alignment research at 70% of their industry counterfactual pay (the reason for the 70% is basically to prevent grift). Then, the LTFF can give continued support if they seem to be doing well. If getting this funding would make you excited to switch into alignment research, I'd encourage you to apply.
I also think that there's a lot of impactful stuff to do for AI existential safety that isn't alignment research! For example, I'm quite into people doing strategy [LW · GW], policy outreach to relevant people in government, actually writing policy, capability evaluations, and leveraged community building like CBAI.
↑ comment by Adele Lopez (adele-lopez-1) · 2023-05-25T22:04:28.644Z · LW(p) · GW(p)
If the initial grant goes well, do you give funding at the market price for their labor?Replies from: thomas-larsen
↑ comment by Thomas Larsen (thomas-larsen) · 2023-05-26T19:09:17.637Z · LW(p) · GW(p)
Sometimes, but the norm is to do 70%. This is mostly done on a case by case basis, but salient factors to me include:
- Does the person need the money? (what cost of living place are they living in, do they have a family, etc)
- What is the industry counterfactual? If someone would make 300k, we likely wouldn't pay them 70%, while if their counterfactual was 50k, it feels more reasonable to pay them 100% (or even more).
- How good is the research?
↑ comment by NicholasKross · 2023-05-26T19:27:55.312Z · LW(p) · GW(p)
Quite informative, thanks!
↑ comment by NicholasKross · 2023-05-25T23:03:25.740Z · LW(p) · GW(p)
Ah, thanks! LTFF was definitely on my list of things to apply for, I just wasn't sure if that upskilling/trial period was still "a thing" these days. Very glad that it is!
comment by Quinn (quinn-dougherty) · 2023-05-25T12:55:22.103Z · LW(p) · GW(p)
I would encourage a taboo on "independently wealthy", I think it's vague and obscurantist, and doesn't actually capture real life runway considerations. "How long can I sustain which burn rate, and which burn rate works with my lifestyle?" is the actual question!Replies from: NicholasKross
↑ comment by NicholasKross · 2023-05-25T23:14:49.762Z · LW(p) · GW(p)
Good point, yeah. That very unclarity, itself, contributed to me wasting so much time on that route.
comment by Stephen McAleese (stephen-mcaleese) · 2023-05-28T12:51:15.053Z · LW(p) · GW(p)
For context, I have a very similar background to you - I'm a software engineer with a computer science degree interested in working on AI alignment.
LTFF granted about $10 million last year. Even if all that money were spent on independent AI alignment researchers, if each researcher costs $100k per year, then there would only be enough money to fund about 100 researchers in the world per year so I don't see LTFF as a scalable solution.
Unlike software engineering, AI alignment research tends to be neglected and underfunded because it's not an activity that can easily be made profitable. That's one reason why there are far more software engineers than AI alignment researchers.
Work that is unprofitable but beneficial such as basic science research has traditionally been done by university researchers who, to the best of my knowledge, are mainly funded by government grants.
I have also considered becoming independently wealthy to work on AI alignment in the past but that strategy seems too slow if AGI will be created relatively soon.
So my plan is to apply for jobs at organizations like Redwood Research or apply for funding from LTFF (like the 100 people I met at EAG with the same plan) and if those plans fail, I will consider getting a PhD and getting funding from the government instead which seems more scalable.
comment by JNS (jesper-norregaard-sorensen) · 2023-05-25T08:29:47.083Z · LW(p) · GW(p)
I don't think I have much actionable advice.
Personally I am sort of in the same boat, except I am in a situation where the entire 6-12 month grants thing is way to insecure (financially).
Being married with two kids, I have too many obligations to venture far into "how to pay rent this month?" territory. Also its antithetical to the kind of person I am in general.
Anyway, if you have few obligations, keep it that way and if possible get rid of some, and then throw yourself at it.
comment by RGRGRG · 2023-05-25T17:35:17.836Z · LW(p) · GW(p)
Just wanted to say that I have similar questions about how to best (try to) get funding for mechanistic interpretability research. Might send a bunch of apps out come early June; but like OP, I don't have any technical results in alignment (though like OP, I like to think I have a solid (yet different) background).
comment by Christopher King (christopher-king) · 2023-05-25T02:44:48.519Z · LW(p) · GW(p)
For reasons I may/not write about in the near future, many ideas about alignment (especially anything that could be done with today's systems) could very well accelerate capabilities work.
If it's too dangerous to publish, it's not effective to research. From Some background for reasoning about dual-use alignment research [AF · GW]
Replies from: NicholasKross, zrkrlc
If research would be bad for other people to know about, you should mainly just not do it.
↑ comment by NicholasKross · 2023-05-25T04:28:35.614Z · LW(p) · GW(p)
Counterpoint: at least one kind of research, mechanistic interpretability, could very well be both dangerous [LW · GW] by helping capabilities and also essential [LW · GW] for alignment. My current intuition is that the same could be said of other research avenues.
Yes, there are plenty of dangerous ideas that aren't so coupled with alignment, but they're not the frustrating edge-case I'm writing about. (And, of course, I'm not doing or publishing that type of research.)Replies from: christopher-king
↑ comment by Christopher King (christopher-king) · 2023-05-25T13:13:09.595Z · LW(p) · GW(p)
Right, and that article makes the case that in those cases you should publish. The reasoning is that the value of unpublished research decays rapidly, so if it could help alignment, publish before it loses its value.Replies from: NicholasKross
↑ comment by NicholasKross · 2023-05-25T23:04:10.268Z · LW(p) · GW(p)
Good catch, that certainly motivates me even more to finish my current writings!Replies from: christopher-king
↑ comment by Christopher King (christopher-king) · 2023-05-26T16:01:00.573Z · LW(p) · GW(p)
Yeah exactly! Not telling anyone until the end just means you missed the chance to push society towards alignment and build on your work. Don't wait!
↑ comment by junk heap homotopy (zrkrlc) · 2023-05-25T04:46:56.825Z · LW(p) · GW(p)
I don't know. It seems to me that we have to make the graph of progress in alignment vs capabilities meet somewhere and part of that would probably involve really thinking about which parts of which bottlenecks are really blockers vs just epiphenomena that tag along but can be optimised away. For instance, in your statement:
If research would be bad for other people to know about, you should mainly just not do it
Then maybe doing research but not having the wrong people know about it is the right intervention, rather than just straight-up not doing it at all?