The Alignment Mapping Program: Forging Independent Thinkers in AI Safety - A Pilot Retrospective

post by Alvin Ånestrand (alvin-anestrand), Jonas Hallgren · 2025-01-10T16:22:16.905Z · LW · GW · 0 comments

Contents

      The Alignment Mapping Program: Forging Independent Thinkers in AI Safety - A Pilot Retrospective
    The Problem: Beyond Rote Learning
    Our Solution: The Alignment Mapping Program (AMP)
    How AMP Works: A Three-Phase Process
  2024 Pilot: Data, Insights, and Improvements
    Key Successes:
    Key Challenges and Data-Driven Changes:
  What's Next for AMP?
    Call to Action:
    Questions for the Community:
None
No comments

The Alignment Mapping Program: Forging Independent Thinkers in AI Safety - A Pilot Retrospective

The AI safety field faces a critical challenge: we need researchers who can not only implement existing solutions but also forge new, independent paths. In 2023, inspired by John Wentworth's work on agency and learning from researchers like Rohin Shah and Adam Shimi who have highlighted the limitations of standard AI safety education, we launched the Alignment Mapping Program (AMP). Though the curriculum is still a work in progress, you can explore it here. This post reflects on our 2024 pilot, sharing data-driven insights, key program changes, and a call to action for the LessWrong community.

The Problem: Beyond Rote Learning

Traditional AI safety education often emphasizes existing frameworks. While valuable, this approach can inadvertently stifle the development of truly independent thought—a crucial skill in a pre-paradigmatic field like ours. We need researchers who can critically evaluate prevailing paradigms, identify their shortcomings, and generate novel approaches to the alignment problem.

Our Solution: The Alignment Mapping Program (AMP)

AMP is an 8-week intensive program designed to bridge the gap between foundational courses (like AISF) and advanced research programs (like MATS). It's built on the core premise that actively constructing and refining one's own mental models of the alignment problem is key to a deep, gears-level understanding.

How AMP Works: A Three-Phase Process

  1. Phase 1: Building Your Own Maps (Weeks 1-3): Participants create comprehensive visual maps of the AI alignment problem space, starting from first principles.
    • Week 1: Map the Problems. Participants exhaustively list potential risks from misaligned AI, then iteratively group these into categories and subproblems using visual tools like Excalidraw. The goal is to create a structured, hierarchical representation of the entire problem space.
    • Week 2: Map Potential Solutions. Participants identify the most critical subproblems and brainstorm potential solutions, developing high-level solution plans. They are encouraged to use techniques like Murphyjitsu to stress-test their solutions and identify potential failure points.
    • Week 3: Map Your Path. Participants reflect on their problem and solution maps to define a personalized roadmap for contributing to AI safety research. This involves identifying their strengths, interests, and the specific areas where they feel best positioned to make an impact.
  2. Phase 2: Engaging with Existing Research (Weeks 4-7): Participants analyze the work of established researchers (e.g., Paul Christiano, Chris Olah, Victoria Krakovna) by actively comparing their models to the participant's own maps.
    • This involves creating what we call "shoulder mentors" - simplified but functional models of how these researchers approach alignment. For example, a participant studying Christiano might ask, "How does his emphasis on iterative amplification and distillation challenge or refine my own model of ensuring safe learning at scale?"
    • Note: This phase is undergoing significant revision based on pilot feedback.
  3. Phase 3: Planning Next Steps (Week 8): Participants identify the most promising directions from their maps and create concrete, actionable plans, outlining specific research projects, necessary skills and resources, and defining short-term and long-term goals.

2024 Pilot: Data, Insights, and Improvements

We ran five cohorts (four online, one in-person in Gothenburg) with approximately 25 participants.

Key Successes:

Key Challenges and Data-Driven Changes:

What's Next for AMP?

Call to Action:

If you're interested in any of the following, please fill out this form.

Questions for the Community:

  1. How might we refine the "shoulder mentors" concept to make it more effective? Are there alternative approaches to engaging with existing research that we should consider?
  2. What specific exercises, resources, or frameworks have you found most effective for developing independent thinking in AI safety?
  3. What do you perceive as effective ways to create support structures around this sort of program?
  4. How much do you expect this type of program will help aspiring AI safety researchers? What factors might influence its effectiveness?

Curriculum Overview (WIP)

Developed by: AI Safety Collab's Program Development Group

0 comments

Comments sorted by top scores.