Posts

MATS Winter 2023-24 Retrospective 2024-05-11T00:09:17.059Z
MATS AI Safety Strategy Curriculum 2024-03-07T19:59:37.434Z
Announcing the London Initiative for Safe AI (LISA) 2024-02-02T23:17:47.011Z
MATS Summer 2023 Retrospective 2023-12-01T23:29:47.958Z
Apply for MATS Winter 2023-24! 2023-10-21T02:27:34.350Z
[Job Ad] SERI MATS is (still) hiring for our summer program 2023-06-06T21:07:07.185Z
How MATS addresses “mass movement building” concerns 2023-05-04T00:55:26.913Z
SERI MATS - Summer 2023 Cohort 2023-04-08T15:32:56.737Z
Aspiring AI safety researchers should ~argmax over AGI timelines 2023-03-03T02:04:51.685Z
Would more model evals teams be good? 2023-02-25T22:01:31.568Z
Air-gapping evaluation and support 2022-12-26T22:52:29.881Z
Probably good projects for the AI safety ecosystem 2022-12-05T02:26:41.623Z
Ryan Kidd's Shortform 2022-10-13T19:12:47.984Z
SERI MATS Program - Winter 2022 Cohort 2022-10-08T19:09:53.231Z
Selection processes for subagents 2022-06-30T23:57:25.699Z
SERI ML Alignment Theory Scholars Program 2022 2022-04-27T00:43:38.221Z
Ensembling the greedy doctor problem 2022-04-18T19:16:00.916Z
Is Fisherian Runaway Gradient Hacking? 2022-04-10T13:47:16.454Z
Introduction to inaccessible information 2021-12-09T01:28:48.154Z

Comments

Comment by Ryan Kidd (ryankidd44) on MATS Winter 2023-24 Retrospective · 2024-05-16T21:07:06.575Z · LW · GW

Do you think there are some cultural things that ought to be examined to figure out why scaling labs are so much more attractive than options that at-least-to-me seem more impactful in expectation?


As a naive guess, I would consider the main reasons to be:

  • People seeking jobs in AI safety often want to take on "heroic responsibility." Work on evals and policy, while essential, might be seen as "passing the buck" onto others, often at scaling labs, who have to "solve the wicked problem of AI alignment/control" (quotes indicate my caricature of a hypothetical person). Anecdotally, I've often heard people in-community disparage AI safety strategies that primarily "buy time" without "substantially increasing the odds AGI is aligned." Programs like MATS emphasizing the importance of AI governance and including AI strategy workshops might help shift this mindset, if it exists.
  • Roles in AI gov/policy, while impactful at reducing AI risk, likely have worse quality-of-life features (e.g., wages, benefits, work culture) than similarly impactful roles in scaling labs. People seeking jobs in AI safety might choose between two high-impact roles based on these salient features without considering how many others making the same decisions will affect the talent flow en masse. Programs like MATS might contribute to this problem, but only if the labs keep hiring talent (unlikely given poor returns on scale) and the AI gov/policy orgs don't make attractive offers (unlikely given METR and Apollo pay pretty good wages, high status, and work cultures comparable to labs; AISIs might be limited because government roles don't typically pay well, but it seems there are substantial status benefits to working there).
  • AI risk might be particularly appealing as a cause area to people who are dispositionally and experientially suited to technical work and scaling labs might be the most impactful place to do many varieties of technical work. Programs like MATS are definitely not a detriment here, as they mostly attract individuals who were already going to work in technical careers, expose them to governance-adjacent research like evals, and recommend potential careers in AI gov/policy.
Comment by Ryan Kidd (ryankidd44) on MATS Winter 2023-24 Retrospective · 2024-05-16T20:43:43.454Z · LW · GW

Cheers, Akash! Yep, our confirmed mentor list updated in the days after publishing this retrospective. Our website remains the best up-to-date source for our Summer/Winter plans.

Do you think this is the best thing for MATS to be focusing on, relative to governance/policy?

MATS is not currently bottlenecked on funding for our current Summer plans and hopefully won't be for Winter either. If further interested high-impact AI gov mentors appear in the next month or two (and some already seem to be appearing), we will boost this component of our Winter research portfolio. If ERA disappeared tomorrow, we would do our best to support many of their AI gov mentors. In my opinion, MATS is currently not sacrificing opportunities to significantly benefit AI governance and policy; rather, we are rate-limited by factors outside of our control and are taking substantial steps to circumvent these, including:

  • Substantial outreach to potential AI gov mentors;
  • Pursuing institutional partnerships with key AI gov/policy orgs;
  • Offering institutional support and advice to other training programs;
  • Considering alternative program forms less associated with rationality/longtermism;
  • Connecting scholars and alumni with recommended opportunities in AI gov/policy;
  • Regularly recommending scholars and alumni to AI gov/policy org hiring managers.

We appreciate further advice to this end!

Do you think there are some cultural things that ought to be examined to figure out why scaling labs are so much more attractive than options that at-least-to-me seem more impactful in expectation?

I think this is a good question, but it might be misleading in isolation. I would additionally ask:

  • "How many people are the AISIs, METR, and Apollo currently hiring and are they mainly for technical or policy roles? Do we expect this to change?"
  • "Are the available job opportunities for AI gov researchers and junior policy staffers sufficient to justify pursuing this as a primary career pathway if one is already experienced at ML and particularly well-suited (e.g., dispositionally) for empirical research?"
  • "Is there a large demand for AI gov researchers with technical experience in AI safety and familiarity with AI threat models, or will most roles go to experienced policy researchers, including those transitioning from other fields? If the former, where should researchers gain technical experience? If the latter, should we be pushing junior AI gov training programs or retraining bootcamps/workshops for experienced professionals?"
  • "Are existing talent pipelines into AI gov/policy meeting the needs of established research organizations and think tanks (e.g., RAND, GovAI, TFS, IAPS, IFP, etc.)? If not, where can programs like MATS/ERA/etc. best add value?"
  • "Is there a demand for more organizations like CAIP? If so, what experience do the founders require?"
Comment by Ryan Kidd (ryankidd44) on MATS Winter 2023-24 Retrospective · 2024-05-15T20:23:24.818Z · LW · GW

Of the scholars ranked 5/10 and lower on value alignment, 63% worked with a mentor at a scaling lab, compared with 27% of the scholars ranked 6/10 and higher. The average scaling lab mentors rated their scholars' value alignment at 7.3/10 and rated 78% of their scholars at 6/10 and higher, compared to 8.0/10 and 90% for the average non-scaling lab mentor. This indicates that our scaling lab mentors were more discerning of value alignment on average than non-scaling lab mentors, or had a higher base rate of low-value alignment scholars (probably both).

I also want to push back a bit against an implicit framing of the average scaling lab safety researcher we support as being relatively unconcerned about value alignment or the positive impact of their research; this seems manifestly false from my conversations with mentors, their scholars, and the broader community.

Comment by Ryan Kidd (ryankidd44) on MATS Winter 2023-24 Retrospective · 2024-05-15T17:36:15.829Z · LW · GW

It seems plausible to me that at least some MATS scholars are somewhat motivated by a desire to work at scaling labs for money, status, etc. However, the value alignment of scholars towards principally reducing AI risk seems generally very high. In Winter 2023-24, our most empirical research dominated cohort, mentors rated the median scholar's value alignment at 8/10 and 85% of scholars were rated 6/10 or above, where 5/10 was “Motivated in part, but would potentially switch focus entirely if it became too personally inconvenient.” To me this is a very encouraging statistic, but I’m sympathetic to concerns that well-intentioned young researchers who join scaling labs might experience value drift, or find it difficult to promote safety culture internally or sound the alarm if necessary; we are consequently planning a “lab safety culture” workshop in Summer. Notably, only 3.7% of surveyed MATS alumni say they are working on AI capabilities; in one case, an alumnus joined a scaling lab capabilties team and transferred to working on safety projects as soon as they were able. As with all things, maximizing our impact is about striking the right balance between trust and caution and I’m encouraged by the high apparent value alignment of our alumni and scholars.

We additionally believe:

  1. Advancing researchers to get hired at lab safety teams is generally good;
  2. We would prefer that the people on lab safety teams have more research experience and are more value-aligned, all else equal, and we think MATS improves scholars on these dimensions;
  3. We would prefer lab safety teams to be larger, and it seems likely that MATS helps create a stronger applicant pool for these jobs, resulting in more hires overall;
  4. MATS creates a pipeline for senior researchers on safety teams to hire people they have worked with for up to 6.5 months in-program, observing their compentency and value alignment;
  5. Even if MATS alumni defect to work on pure capabilities, we would still prefer them to be more value-aligned than otherwise (though of course this has to be weighed against the boost MATS gave to their research abilities).

Regarding “AI control,” I suspect you might be underestimating the support that this metastrategy has garnered in the technical AI safety community, particularly among prosaic AGI safety thought leaders. I see Paul’s decision to leave ARC in favor of the US AISI as a potential endorsement of the AI control paradigm over intent alignment, rather than necessarily an endorsement of an immediate AI pause (I would update against this if he pushes more for a pause than for evals and regulations). I do not support AI control to the exclusion of other metastrategies (including intent alignment and Pause AI), but I consider it a vital and growing component of my strategy portfolio.

It’s true that many AI safety projects are pivoting towards AI governance. I think the establishment of AISIs is wonderful; I am in contact with MATS alumni Alan Cooney and Max Kauffman at the UK AISI and similarly want to help the US AISI with hiring. I would have been excited for Vivek Hebbar’s, Jeremy Gillen’s, Peter Barnett’s, James Lucassen’s, and Thomas Kwa’s research in empirical agent foundations to continue at MIRI, but I am also excited about the new technical governance focus that MATS alumni Lisa Thiergart and Peter Barnett are exploring. I additionally have supported AI safety org accelerator Catalyze Impact as an advisor and Manifund Regrantor and advised several MATS alumni founding AI safety projects; it's not easy to attract or train good founders!

MATS has been interested in supporting more AI governance research since Winter 2022-23, when we supported Richard Ngo and Daniel Kokotajlo (although both declined to accept scholars past the training program) and offered support to several more AI gov researchers. In Summer 2023, we reached out to seven handpicked governance/strategy mentors (some of which you recommended, Akash), though only one was interested in mentoring. In Winter 2023-24 we tried again, with little success. In preparation for the upcoming Summer 2024 and Winter 2024-25 Programs, we reached out to 25 AI gov/policy/natsec researchers (who we asked to also share with their networks) and received expressions of interest from 7 further AI gov researchers. As you can see from our website, MATS is supporting four AI gov mentors in Summer 2024 (six if you count Matija Franklin and Philip Moreira Tomei, who are primarily working on value alignment). We’ve additionally reached out to RAND, IAPS, and others to provide general support. MATS is considering a larger pivot, but available mentors are clearly a limiting constraint. Please contact me if you’re an AI gov researcher and want to mentor!

Part of the reason that AI gov mentors are harder to find is that programs like the RAND TASP, GovAI, IAPS, Horizon, ERA, etc. fellowships seem to be doing a great job collectively of leveraging the available talent. It’s also possible that AI gov researchers are discouraged from mentoring at MATS because of our obvious associations with AI alignment (it’s in the name) and the Berkeley longtermist/rationalist scene (we’re talking on LessWrong and operate in Berkeley). We are currently considering ways to support AI gov researchers who don’t want to affiliate with the alignment, x-risk, longtermist, or rationalist communities.

I’ll additionally note that MATS has historically supported much research that indirectly contributes to AI gov/policy, such as Owain Evans’, Beth Barnes’, and Francis Rhys Ward’s capabilities evals, Evan Hubinger’s alignment evals, Jeffrey Ladish’s capabilities demos, Jesse Clifton’s and Caspar Oesterheldt’s cooperation mechanisms, etc.

Comment by Ryan Kidd (ryankidd44) on MATS Winter 2023-24 Retrospective · 2024-05-12T22:30:10.811Z · LW · GW

Yeah, that amount seems reasonable, if on the low side, for founding a small org. What makes you think $300k is reasonably easy to raise in this current ecosystem? Also, I'll note that larger orgs need significantly more.

Comment by Ryan Kidd (ryankidd44) on MATS Winter 2023-24 Retrospective · 2024-05-12T21:44:46.713Z · LW · GW

I think the high interest in working at scaling labs relative to governance or nonprofit organizations can be explained by:

  1. Most of the scholars in this cohort were working on research agendas for which there are world-leading teams based at scaling labs (e.g., 44% interpretability, 17% oversight/control). Fewer total scholars were working on evals/demos (18%), agent foundations (8%), and formal verification (3%). Therefore, I would not be surprised if many scholars wanted to pursue interpretability or oversight/control at scaling labs.
  2. There seems to be an increasing trend in the AI safety community towards the belief that most useful alignment research will occur at scaling labs (particularly once there are automated research assistants) and external auditors with privileged frontier model access (e.g., METR, Apollo, AISIs). This view seems particularly strongly held by proponents of the "AI control" metastrategy.
  3. Anecdotally, scholars seemed generally in favor of careers at an AISI or evals org, but would prefer to continue pursuing their current research agenda (which might be overdetermined given the large selection pressure they faced to get into MATS to work on that agenda).
  4. Starting new technical AI safety orgs/projects seems quite difficult in the current funding ecosystem. I know of many alumni who have founded or are trying to found projects who express substantial difficulties with securing sufficient funding.

Note that the career fair survey might tell us little about how likely scholars are to start new projects as it was primarily seeking interest in which organizations should attend, not in whether scholars should join orgs vs. found their own.

Comment by Ryan Kidd (ryankidd44) on Key takeaways from our EA and alignment research surveys · 2024-05-11T20:13:35.397Z · LW · GW

Can you estimate dark triad scores from the Big Five survey data?

Comment by Ryan Kidd (ryankidd44) on Key takeaways from our EA and alignment research surveys · 2024-05-11T19:26:13.914Z · LW · GW

You might be interested in this breakdown of gender differences in the research interests of the 719 applicants to the MATS Summer 2024 and Winter 2024-25 Programs who shared their gender. The plot shows the difference between the percentage of male applicants who indicated interest in specific research directions from the percentage of female applicants who indicated interest in the same.

The most male-dominated research interest is mech interp, possibly due to the high male representation in software engineering (~80%), physics (~80%), and mathematics (~60%). The most female-dominated research interest is AI governance, possibly due to the high female representation in the humanities (~60%). Interestingly, cooperative AI was a female-dominated research interest, which seems to match the result from your survey where female respondents were less in favor of "controlling" AIs relative to men and more in favor of "coexistence" with AIs.

Comment by Ryan Kidd (ryankidd44) on MATS Winter 2023-24 Retrospective · 2024-05-11T19:17:36.397Z · LW · GW

This is potentially exciting news! You should definitely visit the LISA office, where many MATS extension program scholars are currently located.

Comment by Ryan Kidd (ryankidd44) on MATS Winter 2023-24 Retrospective · 2024-05-11T19:11:20.688Z · LW · GW

Last program, 44% of scholar research was on interpretability, 18% on evals/demos, 17% on oversight/control, etc. In summer, we intend for 35% of scholar research to be on interpretability, 17% on evals/demos, 27% on oversight/control, etc., based on our available mentor pool and research priorities. Interpretability will still be the largest research track and still has the greatest interest from potential mentors and applicants. The plot below shows the research interests of 1331 MATS applicants and 54 potential mentors who have applied for our Summer 2024 or Winter 2024-25 Programs.

Comment by Ryan Kidd (ryankidd44) on MATS Winter 2023-24 Retrospective · 2024-05-11T05:11:25.699Z · LW · GW

Oh, I think we forgot to ask scholars if they wanted Microsoft at the career fair. Is Microsoft hiring AI safety researchers?

Comment by Ryan Kidd (ryankidd44) on Key takeaways from our EA and alignment research surveys · 2024-05-04T21:42:27.601Z · LW · GW

Thank you so much for conducting this survey! I want to share some information on behalf of MATS:

  • In comparison to the AIS survey gender ratio of 9 M:F, MATS Winter 2023-24 scholars and mentors were 4 M:F and 12 M:F, respectively. Our Winter 2023-24 applicants were 4.6 M:F, whereas our Summer 2024 applicants were 2.6 M:F, closer to the EA survey ratio of 2 M:F. This data seems to indicate a large recent change in gender ratios of people entering the AIS field. Did you find that your AIS survey respondents with more AIS experience were significantly more male than newer entrants to the field?
  • MATS Summer 2024 applicants and interested mentors similarly prioritized research to "understand existing models", such as interpretability and evaluations, over research to "control the AI" or "make the AI solve it", such as scalable oversight and control/red-teaming, over "theory work", such as agent foundations and cooperative AI (note that some cooperative AI work is primarily empirical).
  • The forthcoming summary of our "AI safety talent needs" interview series generally agrees with this survey's findings regarding the importance of "soft skills" and "work ethic" in impactful new AIS contributors. Watch this space!
  • In addition to supporting core established AIS research paradigms, MATS would like to encourage the development of new paradigms. For better or worse, the current AIS funding landscape seems to have a high bar for speculative research into new paradigms. Has AE Studios considered sponsoring significant bounties or impact markets for scoping promising new AIS research directions?
  • Did survey respondents mention how they proposed making AIS more multidisciplinary? Which established research fields are more needed in the AIS community?
  • Did EAs consider AIS exclusively a longtermist cause area, or did they anticipate near-term catastrophic risk from AGI?
  • Thank you for the kind donation to MATS as a result of this survey!
Comment by Ryan Kidd (ryankidd44) on Estimating the Current and Future Number of AI Safety Researchers · 2024-04-24T16:42:39.861Z · LW · GW

I found this article useful. Any plans to update this for 2024?

Comment by Ryan Kidd (ryankidd44) on Shallow review of live agendas in alignment & safety · 2023-12-05T21:43:24.748Z · LW · GW

Wow, high praise for MATS! Thank you so much :) This list is also great for our Summer 2024 Program planning.

Comment by Ryan Kidd (ryankidd44) on MATS Summer 2023 Retrospective · 2023-12-05T17:11:43.210Z · LW · GW

Another point: Despite our broad call for mentors, only ~2 individuals expressed interest in mentorship who we did not ultimately decide to support. It's possible our outreach could be improved and I'm happy to discuss in DMs.

Comment by Ryan Kidd (ryankidd44) on MATS Summer 2023 Retrospective · 2023-12-03T23:28:26.380Z · LW · GW

I don't see this distribution of research projects as "Goodharting" or "overfocusing" on projects with clear feedback loops. As MATS is principally a program for prosaic AI alignment at the moment, most research conducted within the program should be within this paradigm. We believe projects that frequently "touch reality" often offer the highest expected value in terms of reducing AI catastrophic risk, and principally support non-prosaic, "speculative," and emerging research agendas for their “exploration value," which might aid potential paradigm shifts, as well as to round out our portfolio (i.e., "hedge our bets").

However, even with the focus on prosaic AI alignment research agendas, our Summer 2023 Program supported many emerging or neglected research agendas, including projects in agent foundations, simulator theory, cooperative/multipolar AI (including s-risks), the nascent "activation engineering" approach our program helped pioneer, and the emerging "cyborgism" research agenda.

Additionally, our mentor portfolio is somewhat conditioned on the preferences of our funders. While we largely endorse our funders' priorities, we are seeking additional funding diversification so that we can support further speculative "research bets". If you are aware of large funders willing to support our program, please let me know!

Comment by Ryan Kidd (ryankidd44) on MATS Summer 2023 Retrospective · 2023-12-02T20:54:31.910Z · LW · GW

There seems to be a bit of pushback against "postmortem" and our team is ambivalent, so I changed to "retrospective."

Comment by Ryan Kidd (ryankidd44) on MATS Summer 2023 Retrospective · 2023-12-02T19:15:25.286Z · LW · GW

Thank you!

Comment by Ryan Kidd (ryankidd44) on MATS Summer 2023 Retrospective · 2023-12-02T19:14:56.510Z · LW · GW

Ok, added!

Comment by Ryan Kidd (ryankidd44) on MATS Summer 2023 Retrospective · 2023-12-02T04:30:30.475Z · LW · GW

FYI, the Net Promoter score is 38%.

Comment by Ryan Kidd (ryankidd44) on MATS Summer 2023 Retrospective · 2023-12-02T04:17:18.067Z · LW · GW

Ok, graph is updated!

Comment by Ryan Kidd (ryankidd44) on MATS Summer 2023 Retrospective · 2023-12-02T04:09:23.934Z · LW · GW

Do you think "46% of scholar projects were rated 9/10 or higher" is better? What about "scholar projects were rated 8.1/10 on average" ?

Comment by Ryan Kidd (ryankidd44) on MATS Summer 2023 Retrospective · 2023-12-02T03:45:14.937Z · LW · GW

We also asked mentors to rate scholars' "depth of technical ability," "breadth of AI safety knowledge," "research taste," and "value alignment." We ommitted these results from the report to prevent bloat, but your comment makes me think we should re-add them.

Comment by Ryan Kidd (ryankidd44) on MATS Summer 2023 Retrospective · 2023-12-02T03:33:30.801Z · LW · GW

Yeah, I just realized the graph is wrong; it seems like the 10/10 scores were truncated. We'll upload a new graph shortly.

Comment by Ryan Kidd (ryankidd44) on MATS Summer 2023 Retrospective · 2023-12-02T03:05:08.270Z · LW · GW

Cheers, Vaniver! As indicated in the figure legend for "Mentor ratings of scholar research", mentors were asked, “Taking the above [depth/breadth/taste ratings] into account, how strongly do you support the scholar's research continuing?” and prompted with:

  • 10/10 = Very disappointed if [the research] didn't continue;
  • 5/10 = On the fence, unsure what the right call is;
  • 1/10 = Fine if research doesn't continue.

Mentors rated 18% of scholar research projects as 10/10 and 28% as 9/10.

Comment by Ryan Kidd (ryankidd44) on Apply for MATS Winter 2023-24! · 2023-11-08T01:22:45.383Z · LW · GW

Also, last year's program was 8-weeks and this year's program is 10-weeks.

Comment by Ryan Kidd (ryankidd44) on Apply to the Constellation Visiting Researcher Program and Astra Fellowship, in Berkeley this Winter · 2023-11-08T01:14:51.374Z · LW · GW

Buck Shlegeris, Ethan Perez, Evan Hubinger, and Owain Evans are mentoring in both programs. The links show their MATS projects, "personal fit" for applicants, and (where applicable) applicant selection questions, designed to mimic the research experience.

Astra seems like an obviously better choice for applicants principally interested in:

  • AI governance: MATS has no AI governance mentors in the Winter 2023-24 Program, whereas Astra has Daniel Kokotajlo, Richard Ngo, and associated staff at ARC Evals and Open Phil;
  • Worldview investigations: Astra has Ajeya Cotra, Tom Davidson, and Lukas Finnvedan, whereas MATS has no Open Phil mentors;
  • ARC Evals: While both programs feature mentors working on evals, only Astra is working with ARC Evals;
  • AI ethics: Astra is working with Rob Long.
Comment by Ryan Kidd (ryankidd44) on Apply to the Constellation Visiting Researcher Program and Astra Fellowship, in Berkeley this Winter · 2023-11-08T00:10:58.900Z · LW · GW

MATS has the following features that might be worth considering:

  1. Empowerment: Emphasis on empowering scholars to develop as future "research leads" (think accelerated PhD-style program rather than a traditional internship), including research strategy workshops, significant opportunities for scholar project ownership (though the extent of this varies between mentors), and a 4-month extension program;
  2. Diversity: Emphasis on a broad portfolio of AI safety research agendas and perspectives with a large, diverse cohort (50-60) and comprehensive seminar program;
  3. Support: Dedicated and experienced scholar support + research coach/manager staff and infrastructure;
  4. Network: Large and supportive alumni network that regularly sparks research collaborations and AI safety start-ups (e.g., Apollo, Leap Labs, Timaeus, Cadenza, CAIP);
  5. Experience: Have run successful research cohorts with 30, 58, 60 scholars, plus three extension programs with about half as many participants.
Comment by Ryan Kidd (ryankidd44) on Apply for MATS Winter 2023-24! · 2023-10-29T22:30:32.496Z · LW · GW

Our alumni were surveyed about what amount of funding they would consider sufficient to participate in MATS and the median (discounting the $0 responses) was quite low.

Comment by Ryan Kidd (ryankidd44) on Apply for MATS Winter 2023-24! · 2023-10-29T22:24:03.134Z · LW · GW

Yes

Comment by Ryan Kidd (ryankidd44) on How MATS addresses “mass movement building” concerns · 2023-05-23T16:38:01.361Z · LW · GW

We agree, which is why we note, "We think that ~1 more median MATS scholar focused on AI safety is worth 5-10 more median capabilities researchers (because most do pointless stuff like image generation, and there is more low-hanging fruit in safety)."

Comment by Ryan Kidd (ryankidd44) on Ryan Kidd's Shortform · 2023-05-23T16:14:08.758Z · LW · GW

MATS' goals:

  • Find + accelerate high-impact research scholars:
    • Pair scholars with research mentors via specialized mentor-generated selection questions (visible on our website);
    • Provide a thriving academic community for research collaboration, peer feedback, and social networking;
    • Develop scholars according to the “T-model of research” (breadth/depth/epistemology);
    • Offer opt-in curriculum elements, including seminars, research strategy workshops, 1-1 researcher unblocking support, peer study groups, and networking events;
  • Support high-impact research mentors:
    • Scholars are often good research assistants and future hires;
    • Scholars can offer substantive new critiques of alignment proposals;
    • Our community, research coaching, and operations free up valuable mentor time and increase scholar output;
  • Help parallelize high-impact AI alignment research:
    • Find, develop, and refer scholars with strong research ability, value alignment, and epistemics;
    • Use alumni for peer-mentoring in later cohorts;
    • Update mentor list and curriculum as the alignment field’s needs change.
Comment by Ryan Kidd (ryankidd44) on Ryan Kidd's Shortform · 2023-05-23T15:43:32.886Z · LW · GW

Types of organizations that conduct alignment research, differentiated by funding model and associated market forces:

Comment by Ryan Kidd (ryankidd44) on SERI MATS - Summer 2023 Cohort · 2023-05-15T22:11:55.687Z · LW · GW

The Summer 2023 Cohort has 460 applicants. Our last cohort included 57 scholars.

Comment by Ryan Kidd (ryankidd44) on SERI MATS - Summer 2023 Cohort · 2023-05-15T17:36:17.152Z · LW · GW

As an educational seminar and independent research program, MATS cannot offer J1 visas. We can support scholars' ESTA and B1/B2 visa applications, however.

Comment by Ryan Kidd (ryankidd44) on SERI MATS - Summer 2023 Cohort · 2023-05-13T00:20:01.334Z · LW · GW

John's scholars have historically only had to seek LTFF funding for the 4-month extension program subsequent to the in-person Scholars Program. They are otherwise treated like other scholars.

Comment by Ryan Kidd (ryankidd44) on SERI MATS - Summer 2023 Cohort · 2023-05-12T20:33:35.199Z · LW · GW

Hi Pulkit. Unfortunately, applications have closed for our Summer 2023 Cohort. Hopefully, we will launch applications for our Winter Cohort soon!

Comment by Ryan Kidd (ryankidd44) on An open letter to SERI MATS program organisers · 2023-05-05T20:19:32.126Z · LW · GW

I'm somewhere in the middle of the cognitivist/enactivist spectrum. I think that e.g. relaxed adversarial training is motivated by trying to make an AI robust to arbitrary inputs it will receive in the world before it leaves the box. I'm sympathetic to the belief that this is computationally intractable; however, it feels more achievable than altering the world in the way I imagine would be necessary without it.

I'm not an idealist here: I think that some civilizational inadequacies should be addressed (e.g., better cooperation and commitment mechanisms) concurrent with in-the-box alignment strategies. My main hope is that we can build an in-the-box corrigible AGI that allows in-deployment modification.

Comment by Ryan Kidd (ryankidd44) on How MATS addresses “mass movement building” concerns · 2023-05-05T20:06:53.669Z · LW · GW

I agree with you that AI is generally seen as "the big thing" now, and we are very unlikely to be counterfactual in encouraging AI hype. This was a large factor in our recent decision to advertise the Summer 2023 Cohort via a Twitter post and a shout-out on Rob Miles' YouTube and TikTok channels.

However, because we provide a relatively simple opportunity to gain access to mentorship from scientists at scaling labs, we believe that our program might seem attractive to aspiring AI researchers who are not fundamentally directed toward reducing x-risk. We believe that accepting such individuals as scholars is bad because:

  • We might counterfactually accelerate their ability to contribute to AI capabilities;
  • They might displace an x-risk-motivated scholar.

Therefore, while we intend to expand our advertising approach to capture more out-of-network applicants, we do not currently plan to reduce the selection pressures for x-risk-motivated scholars.

Another crux here is that I believe the field is in a nascent stage where new funders and the public might be swayed by fundamentally bad "AI safety" projects that make AI systems more commercialisable without reducing x-risk. Empowering founders of such projects is not a goal of MATS. After the field has grown a bit larger while maintaining its focus on reducing x-risk, there will hopefully be less "free energy" for naive AI safety projects, and we can afford to be less choosy with scholars.

Comment by Ryan Kidd (ryankidd44) on How MATS addresses “mass movement building” concerns · 2023-05-05T19:53:41.528Z · LW · GW

Mentorship is critical to MATS. We generally haven't accepted mentorless scholars because we believe that mentors' accumulated knowledge is extremely useful for bootstrapping strong, original researchers.

Let me explain my chain of thought better:

  1. A first-order failure mode would be "no one downloads experts' models, and we grow a field of naive, overconfident takes." In this scenario, we have maximized exploration at the cost of accumulated knowledge transmission (and probably useful originality, as novices might make the same basic mistakes). We patch this by creating a mechanism by which scholars are selected for their ability to download mentors' models (and encouraged to do so).
  2. A second-order failure mode would be "everyone downloads and defers to mentors' models, and we grow a field of paradigm-locked, non-critical takes." In this scenario, we have maximized the exploitation of existing paradigms at the cost of epistemic diversity or critical analysis. We patch this by creating mechanisms for scholars to critically examine their assumptions and debate with peers.
Comment by Ryan Kidd (ryankidd44) on An open letter to SERI MATS program organisers · 2023-05-04T21:02:54.001Z · LW · GW

I think we agree on a lot more than I realized! In particular, I don't disagree with your general claims about pathways to HRAD through Alignment MVPs (though I hold some credence that this might not work). Things I disagree with:

  • I generally disagree with the claim "alignment approaches don't limit agentic capability." This was one subject of my independent research before I started directing MATS. Hopefully, I can publish some high-level summaries soon, time permitting! In short, I think "aligning models" generally trades off bits of optimization pressure with "making models performance-competitive," which makes building aligned models less training-competitive for a given degree of performance.
  • I generally disagree with the claim "corrigibility is not a useful, coherent concept." I think there is a (narrow) attractor basin around "corrigibility" in cognition space. Happy to discuss more and possibly update.
  • I generally disagree with the claim "provably-controllable highly reliable agent design is impossible in principle." I think it is possible to design recursively self-improving programs that are robust to adversarial inputs, even if this is vanishingly hard in practice (which informs my sense of alignment difficulty only insomuch as I hope we don't hit that attractor well before CEV value-loading is accomplished). Happy to discuss and possibly update.
  • I generally disagree with the implicit claim "it's useful to try aligning AI systems via mechanism design on civilization." This feels like a vastly clumsier version of trying to shape AGIs via black-box gradient descent. I also don't think that realistic pre-AGI efficient markets we can build are aligned with human-CEV by default.
Comment by Ryan Kidd (ryankidd44) on How MATS addresses “mass movement building” concerns · 2023-05-04T18:22:12.545Z · LW · GW
  • We broadened our advertising approach for the Summer 2023 Cohort, including a Twitter post and a shout-out on Rob Miles' YouTube and TikTok channels. We expected some lowering of average applicant quality as a result but have yet to see a massive influx of applicants from these sources. We additionally focused more on targeted advertising to AI safety student groups, given their recent growth. We will publish updated applicant statistics after our applications close.
  • In addition to applicant selection and curriculum elements, our Scholar Support staff, introduced in the Winter 2022-23 Cohort, supplement the mentorship experience by providing 1-1 research strategy and unblocking support for scholars. This program feature aims to:
    • Supplement and augment mentorship with 1-1 debugging, planning, and unblocking;
    • Allow air-gapping of evaluation and support, improving scholar outcomes by resolving issues they would not take to their mentor;
    • Solve scholars’ problems, giving more time for research.
  • Defining "good alignment research" is very complicated and merits a post of its own (or two, if you also include the theories of change that MATS endorses). We are currently developing scholar research ability through curriculum elements focused on breadth, depth, and epistemology (the "T-model of research"):
  • Our Alumni Spotlight includes an incomplete list of projects we highlight. Many more past scholar projects seem promising to us but have yet to meet our criteria for inclusion here. Watch this space.
  • Since Summer 2022, MATS has explicitly been trying to parallelize the field of AI safety as much as is prudent, given the available mentorship and scholarly talent. In longer-timeline worlds, more careful serial research seems prudent, as growing the field rapidly is a risk for the reasons outlined in the above article. We believe that MATS' goals have grown more important from the perspective of timelines shortening (though MATS management has not updated on timelines much as they were already fairly short in our estimation).
  • MATS would love to support senior research talent interested in transitioning into AI safety! Our scholars generally comprise 10% Postdocs, and we would like this number to rise. Currently, our advertising strategy is contingent on the AI safety community adequately targeting these populations (which seems false) and might change for future cohorts.
Comment by Ryan Kidd (ryankidd44) on An open letter to SERI MATS program organisers · 2023-05-03T23:57:07.079Z · LW · GW

I don't disagree with Shimi as strongly as you do. I think there's some chance we need radically new paradigms of aligning AI than "build alignment MVPs via RLHF + adversarial training + scalable oversight + regularizers + transparency tools + mulligans."

While I do endorse some anthropocentric "value-loading"-based alignment strategies in my portfolio, such as Shard Theory and Steve Byrnes' research, I worry about overly investing in anthropocentric AGI alignment strategies. I don't necessarily think that RLHF shapes GPT-N in a manner similar to how natural selection and related processes shaped humans to be altruistic. I think it's quite likely that the kind of cognition that GPT-N learns to predict tokens is more akin to an "alien god" than it is to human cognition. I think that trying to value-load an alien god is pretty hard.

In general, I don't highly endorse the framing of alignment as "making AIs more human." I think this kind of approach fails in some worlds and might produce models that are not performance-competitive enough to outcompete the unaligned models others deploy. I'd rather produce corrigible models with superhuman cognition coupled with robust democratic institutions. Nevertheless, I endorse at least some research along this line, but this is not the majority of my portfolio.

Comment by Ryan Kidd (ryankidd44) on An open letter to SERI MATS program organisers · 2023-04-26T22:43:22.128Z · LW · GW

Thank you for recommending your study guide; it looks quite interesting.

MATS does not endorse “maximizing originality” in our curriculum. We believe that good original research in AI safety comes from a combination of broad interdisciplinary knowledge, deep technical knowledge, and strong epistemological investigation, which is why we emphasize all three. I’m a bit confused by your reference to Adam’s post. I interpret his post as advocating for more originality, not less, in terms of diverse alignment research agendas.

I think that some of the examples you gave of "non-alignment" research areas are potentially useful subproblems for what I term "impact alignment." For example, strong anomaly detection (e.g., via mechanistic interpretability, OOD warning lights, or RAT-style acceptability verification) can help ensure "inner alignment" (e.g., through assessing corrigibility or myopia) and infosec/governance/meme-curation can help ensure "outer alignment," "paying the alignment tax," and mitigating "vulnerable world" situations with stolen/open sourced weaponizable models. I think this inner/outer distinction is a useful framing, though not the only way to carve reality at the joints, of course.

I think Metzinger or Clark could give interesting seminars, though I'm generally worried about encouraging the anthropomorphism of AGI. I like the "shoggoth" or "alien god" memetic framings of AGI, as these (while wrong) permit a superset of human-like behavior without restricting assumptions of model internals to (unlikely and optimistic, imo) human-like cognition. In this vein, I particularly like Steve Byrnes' research as I feel it doesn't overly anthropomorphize AGI and encourages the "competent psychopath with alien goals" memetic framing. I'm intrigued by this suggestion, however. How do you think Metzinger or Clark would specifically benefit our scholars?

(Note: I've tried to differentiate between MATS' organizational position and mine by using "I" or "we" when appropriate.)

Comment by Ryan Kidd (ryankidd44) on An open letter to SERI MATS program organisers · 2023-04-26T22:10:59.514Z · LW · GW

MATS' framing is that we are supporting a "diverse portfolio" of research agendas that might "pay off" in different worlds (i.e., your "hedging bets" analogy is accurate). We also think the listed research agendas have some synergy you might have missed. For example, interpretability research might build into better AI-assisted white-box auditing, white/gray-box steering (e.g., via ELK), or safe architecture design (e.g., "retargeting the search").

The distinction between "evaluator" and "generator" seems fuzzier to me than you portray. For instance, two "generator" AIs might be able to red-team each other for the purposes of evaluating an alignment strategy.

Comment by Ryan Kidd (ryankidd44) on An open letter to SERI MATS program organisers · 2023-04-24T02:22:08.208Z · LW · GW

MATS aims to find and accelerate alignment research talent, including:

  • Developing scholar research ability through curriculum elements focused on breadth, depth, and originality (the "T-model of research");
  • Assisting scholars in producing impactful research through research mentorship, a community of collaborative peers, dedicated 1-1 support, and educational seminars;
  • Aiding the creation of impactful new alignment organizations (e.g., Jessica Rumbelow's Leap Labs and Marius Hobbhahn's Apollo Research);
  • Preparing scholars for impactful alignment research roles in existing organizations.

Not all alumni will end up in existing alignment research organizations immediately; some return to academia, pursue independent research, or potentially skill-up in industry (to eventually aid alignment research efforts). We generally aim to find talent with existing research ability and empower it to work on alignment, not necessarily through existing initiatives (though we certainly endorse many).

Comment by Ryan Kidd (ryankidd44) on An open letter to SERI MATS program organisers · 2023-04-24T02:13:04.673Z · LW · GW

In general, MATS gives our chosen mentors the final say on certain curriculum elements (e.g., research projects offered, mentoring style, mentor-specific training program elements) but offers a general curriculum (e.g., Alignment 201, seminars, research strategy/tools workshops) that scholars can opt into.

Comment by Ryan Kidd (ryankidd44) on An open letter to SERI MATS program organisers · 2023-04-24T02:05:17.452Z · LW · GW

I think there should be much more seminars about philosophy of science, methodology of science, and strategy of research in the program. Perhaps, not at the expense of other seminar content, but via increasing the number of seminar hours. Talks about ethics, and in particular about ethical naturalism (such as Oliver Scott Curry’s “Morality as Cooperation”), or interaction of ethics and science more generally, also seem essential.

MATS is currently focused on developing scholars as per the "T-model of research" with three main levers:

  • Mentorship (weekly meetings + mentor-specific curriculum);
  • Curriculum (seminars + workshops + Alignment 201 + topic study groups);
  • Scholar support (1-1s for unblocking + research strategy).

The "T-model of research" is:

In the Winter 2022-23 Cohort, we ran several research strategy workshops (focusing on problem decomposition and strategy identification) and had dedicated scholar support staff who offered regular, airgapped 1-1 support for research strategy and researcher unblocking. We will publish some findings from our Scholar Support Post-Mortem soon. We plan to run further research strategy and “originality” workshops in the Summer 2023 Cohort and are currently reviewing our curriculum from the ground up.

Currently, we are not focused on “ethical naturalism” or similar as a curriculum element as it does not seem broadly useful for our cohort compared to project-specific, scholar-driven self-education. We are potentially open to hosting a seminar on this topic.

Comment by Ryan Kidd (ryankidd44) on An open letter to SERI MATS program organisers · 2023-04-24T02:04:54.398Z · LW · GW

Independent researchers sometimes seem to think that if they come up with a particularly brilliant idea about alignment, leading AGI labs will adopt it. Not realising that this won’t happen no matter what partially for social reasons

MATS generally thinks that separating the alignment problem into “lowering the alignment tax” via technical research and “convincing others to pay the alignment tax” via governance interventions is a useful framing. There are worlds in which the two are not so cleanly separable, of course, but we believe that making progress toward at least one of these goals is probably useful (particularly if more governance-focused initiatives exist). We also support several mentors whose research crosses this boundary (e.g., Dan Hendrycks, Jesse Clifton, Daniel Kokotajlo).

Comment by Ryan Kidd (ryankidd44) on An open letter to SERI MATS program organisers · 2023-04-24T02:01:05.368Z · LW · GW

I see in the previous SERI MATS cohort, there was a debate about this, titled “How much should be invested in interpretability research?", but apparently there is no recording.

We did record this debate but have yet to publish the recording as the speakers have not yet given approval. MATS believe that interpretability research might have many uses.