The Alignment Mapping Program: Forging Independent Thinkers in AI Safety - A Pilot Retrospective

alvin-anestrand

The Alignment Mapping Program: Forging Independent Thinkers in AI Safety - A Pilot Retrospective

post by Alvin Ånestrand (alvin-anestrand), Jonas Hallgren, Utilop · 2025-01-10T16:22:16.905Z · LW · GW · 0 comments

      The Alignment Mapping Program: Forging Independent Thinkers in AI Safety - A Pilot Retrospective
    The Problem: Beyond Rote Learning
    Our Solution: The Alignment Mapping Program (AMP)
    How AMP Works: A Three-Phase Process
  2024 Pilot: Data, Insights, and Improvements
    Key Successes:
    Key Challenges and Data-Driven Changes:
  What's Next for AMP?
    Call to Action:
    Questions for the Community:
None
No comments

The Alignment Mapping Program: Forging Independent Thinkers in AI Safety - A Pilot Retrospective

The AI safety field faces a critical challenge: we need researchers who can not only implement existing solutions but also forge new, independent paths. In 2023, inspired by John Wentworth's work on agency and learning from researchers like Rohin Shah and Adam Shimi who have highlighted the limitations of standard AI safety education, we launched the Alignment Mapping Program (AMP). Though the curriculum is still a work in progress, you can explore it here. This post reflects on our 2024 pilot, sharing data-driven insights, key program changes, and a call to action for the LessWrong community.

The Problem: Beyond Rote Learning

Traditional AI safety education often emphasizes existing frameworks. While valuable, this approach can inadvertently stifle the development of truly independent thought—a crucial skill in a pre-paradigmatic field like ours. We need researchers who can critically evaluate prevailing paradigms, identify their shortcomings, and generate novel approaches to the alignment problem.

Our Solution: The Alignment Mapping Program (AMP)

AMP is an 8-week intensive program designed to bridge the gap between foundational courses (like AISF) and advanced research programs (like MATS). It's built on the core premise that actively constructing and refining one's own mental models of the alignment problem is key to a deep, gears-level understanding.

How AMP Works: A Three-Phase Process

Phase 1: Building Your Own Maps (Weeks 1-3): Participants create comprehensive visual maps of the AI alignment problem space, starting from first principles.
- Week 1: Map the Problems. Participants exhaustively list potential risks from misaligned AI, then iteratively group these into categories and subproblems using visual tools like Excalidraw. The goal is to create a structured, hierarchical representation of the entire problem space.
- Week 2: Map Potential Solutions. Participants identify the most critical subproblems and brainstorm potential solutions, developing high-level solution plans. They are encouraged to use techniques like Murphyjitsu to stress-test their solutions and identify potential failure points.
- Week 3: Map Your Path. Participants reflect on their problem and solution maps to define a personalized roadmap for contributing to AI safety research. This involves identifying their strengths, interests, and the specific areas where they feel best positioned to make an impact.
Phase 2: Engaging with Existing Research (Weeks 4-7): Participants analyze the work of established researchers (e.g., Paul Christiano, Chris Olah, Victoria Krakovna) by actively comparing their models to the participant's own maps.
- This involves creating what we call "shoulder mentors" - simplified but functional models of how these researchers approach alignment. For example, a participant studying Christiano might ask, "How does his emphasis on iterative amplification and distillation challenge or refine my own model of ensuring safe learning at scale?"
- Note: This phase is undergoing significant revision based on pilot feedback.
Phase 3: Planning Next Steps (Week 8): Participants identify the most promising directions from their maps and create concrete, actionable plans, outlining specific research projects, necessary skills and resources, and defining short-term and long-term goals.

2024 Pilot: Data, Insights, and Improvements

We ran five cohorts (four online, one in-person in Gothenburg) with approximately 25 participants.

Key Successes:

High Engagement: The first three weeks received very positive feedback. One participant shared: "The mapping exercises were incredibly helpful for organizing my thoughts and gaining a clearer picture of the alignment landscape." Another said: "Really enjoyed the format of the program. Having the liberty to actually learn and read more about what we want to , pushed us further and closer to our goals. The entire process taught me a lot."
3 out of 5 survey respondents believe that the program should be a core recommended part of every AI safety researcher's path.

Key Challenges and Data-Driven Changes:

Significant Drop-off After Week 3: Approximately -30% of participants dropped out after the first three weeks, with a noticeable decline during Weeks 4-7. A participant shared: "Weeks 4 to 7 could focus on two researchers instead of 4", while another said: "If personal believe are raw and highly susceptible of changes (most of the cases for newcomers on AGI plans) it's not good to continue to stick to the first problem+solution plan."
- Solution: We're restructuring Weeks 4-7, potentially focusing on fewer researchers, having the group collectively analyze one researcher per week, or shifting to a more project-focused approach.
Reading Volume: Initial reading requirements were deemed excessive. A participant shared, "Some tasks took more time than expected. Sometimes I felt uncertainty about whether my homework was good enough."
- Solution: We're curating more focused reading selections (2-3 hours/week) and integrating them more directly with the mapping exercises.
Exercise Clarity: Some participants found certain exercises, particularly in the solution-planning phase, to be somewhat vague.
- Solution: We're developing clearer instructions, more detailed examples of problem maps and solution plans, and progress milestones for each exercise.

What's Next for AMP?

Refine and Scale: We are looking for partners to further develop the program and extend its reach. We’re interested in working with other initiatives to incorporate AMP into their curricula.
Pilot New Formats: We're exploring an in-person, workshop-based version, as well as a 5-week version to address the drop-off issue while maintaining impact.

Call to Action:

If you're interested in any of the following, please fill out this form.

Run AMP at Your Organization: If you're part of an AI safety group or research organization interested in running AMP, please reach out.
Participate: If there is enough interest, we plan to run the program again next year—let us know if you’d like to join.
Share Your Expertise: We're particularly interested in feedback on the "shoulder mentors" concept and strategies for developing independent thinking. Feel free to share your insights in the comments section.

Questions for the Community:

How might we refine the "shoulder mentors" concept to make it more effective? Are there alternative approaches to engaging with existing research that we should consider?
What specific exercises, resources, or frameworks have you found most effective for developing independent thinking in AI safety?
What do you perceive as effective ways to create support structures around this sort of program?
How much do you expect this type of program will help aspiring AI safety researchers? What factors might influence its effectiveness?

Curriculum Overview (WIP)

Developed by: AI Safety Collab's Program Development Group

0 comments

Comments sorted by top scores.

The Alignment Mapping Program: Forging Independent Thinkers in AI Safety - A Pilot Retrospective

Contents

The Problem: Beyond Rote Learning

Our Solution: The Alignment Mapping Program (AMP)

How AMP Works: A Three-Phase Process

2024 Pilot: Data, Insights, and Improvements

Key Successes:

Key Challenges and Data-Driven Changes:

What's Next for AMP?

Call to Action:

Questions for the Community:

0 comments