Call for Collaboration: Renormalization for AI safety
post by Lauren Greenspan (LaurenGreenspan) · 2025-03-31T21:01:56.500Z · LW · GW · 0 commentsContents
About this Call Reporting and Community Engagement Research Programmes Eligibility Requirements Proposal Requirements & Evaluation Criteria None No comments
We invite proposals that probe aspects of renormalization [LW · GW] in AI systems that will help us predict, explain, and interpret neural network behavior at different levels of abstraction. We import ‘renormalization’ from physics, as a technique to coarse-grain theoretical descriptions of complex interactions to focus on those that are most relevant for describing physical reality. We view this direction as a vast ‘opportunity space’ with many possible points of entry, and have identified a few research programmes as preliminary ‘ways in’. A detailed roadmap of this space, and discussion of the programmes can be found here [LW · GW] and here [LW · GW]. Our goal is to narrow the theory-practice gap by grounding an abstract analogy into a practical framework capable of directly impacting real-world interpretability and informing better scientific foundations [LW · GW] for AI safety. A QFT framework for AI systems could give us a toolkit for finding principled features, modeling their interactions at different levels of granularity in their interpretation, and ensure a well-grounded separation between ‘safe’ and ‘unsafe’ behaviors in AI systems. We invite proposals to keep these differences in mind and be clear on which methods or analogies are immediately useful, and which require new development.
We hedge that progress depends on clarifying the link between implicit renormalization – which models how networks coarse-grain information into representations by organizing data into network-meaningful structures – and explicit renormalization, which we operationalize as an interpretability tool capable of probing that structure at a scale of granularity that is meaningful to us. While there is a growing community studying how neural networks implicitly renormalize (e.g., Roberts, Berman, Erbin, Halverson) to organize information (i.e. from data into features) during training and inference, we stress that these are both important, and likely related, even if there is some fuzziness in defining a scale of ‘human interpretation’ and relating it to a network’s implicit notion of scale. Shedding light on this relationship, and leveraging our insights to perform explicit renormalization over neural network representations, is a core goal of this call.
About this Call
At PIBBSS, our goal is to improve our scientific understanding of AI systems by adding more high-variance ideas to the AI safety research landscape. These ideas tend to meet with a high barrier for entry, as they are either out of scope for academic groups, AI safety research labs, or field building initatiatives, or out of range of their expertise. We do this by identifying an opportunity space from a research community external to AI safety, distilling the key information for an AI safety audience, refining the scope of the opportunity space with input from AI safety and the external community, and finally orienting an interdisciplinary collaboration toward making progress in that space.
We are currently looking to hire affiliates to lead projects in one of our open programme calls. We welcome contributions from across disciplines, ranging from a few months (for work leading up to a single paper) up to a year (for more developed research plans). Depending on scope and situation, we would provide a combination of funding, engineering support and compute resources. We can offer a monthly salary in the range of $5,000 - $10,000 USD, commensurate with experience, to be adjusted for part-time affiliates.
Reporting and Community Engagement
As an affiliate, you will meet regularly (at least once per week) with programme leads Lauren Greenspan and Dmitry Vaintrob, as well as other members of the research network. Research teams will also be responsible for brief periodic reports to monitor progress and address any bottlenecks that may arise.
To promote active engagement between affiliates and teams pursuing different projects, we will host periodic workshops and research retreats. For the purpose of a broader AI safety education, affiliates will also participate in a reading group with members of the broader community.
Research Programmes
The goal of the following programmes is to see which theories, methods, and frameworks used in studying the renormalization of physical systems can be useful in understanding AI systems and in which contexts; we want to build theories that support and explain behaviors of realistic networks, and stress the importance of maintaining empirical relevance as the theory advances. More details on each programme can be found here [LW · GW].
Programme 1: Development of unsupervised techniques to identify features in NNs
Programme 2: Model organisms of implicit renormalization: Relating and comparing different notions of scale
Eligibility Requirements
We invite proposals from researchers across sectors, including academic institutions, startups, independent researchers, and industry labs. We invite proposals that clearly outline interdisciplinary perspectives, even if you do not yet have a fully formed project within the AI safety context. PIBBSS is prepared to assist in refining and concretizing promising project ideas.
Ideal candidates demonstrate:
- Scientific Expertise: A strong record in relevant areas such as quantum field theory, statistical mechanics, condensed matter physics, or information theory.
- A desire to apply their skills in AI safety
- Demonstrated interest in interdisciplinary collaborations
- Excellent communication skills
- A commitment to pursuing a career or continued research in AI safety upon completion of this project. In particular, PIBBSS affiliates have gone on to start their own AI safety organization leveraging computational mechanics for AI interpretability. We would be very excited to support a similar outcome for one or more of our affiliates.
- A theoretical or practical (but not necessarily expert level) understanding of NNs
- Programming experience
Alternatively, we may hire affiliates with deep AI Safety Expertise, as indicated by:
- Prior research in ML interpretability, safety frameworks, or large language model analysis
- Expert level proficiency in Python and standard ML frameworks
- A theoretical understanding of ML foundations.
Our goal is to form research collaborations with complementary expertise, understanding that applicants might not individually satisfy every listed criterion. We encourage you to err on the side of applying, even if you don’t tick all of the boxes.
Proposal Requirements & Evaluation Criteria
Apply Here by April 27th, 2025 (11:59 PM AOE). We will review applications on a rolling basis, and encourage you to apply as early as possible. The initial application includes:
- A concise summary of the proposed project’s central idea and objectives, clear statements of expected outcomes, potential risks, and pathways for success or failure.
- Rationale connecting your project directly to one or both research programmes and its relevance to AI interpretability and safety.
- Brief descriptions of your background and relevant expertise (scientific, AI safety, or interdisciplinary experiences).
We understand that many applicants may not come from an AI safety background. Therefore, PIBBSS staff will actively support successful round 1 applicants in refining initial ideas into actionable, well-scoped research projects aligned with the aims of our programmes. We therefore recommend that you prioritize clarity, potential, and reasoning transparency in the initial application. Successful applicants will be contacted for an interview and the invitation to submit a more detailed proposal.
Proposals will be evaluated based on:
- Innovation: The ability to use renormalization concepts or interdisciplinary methods in new or under-developed ways.
- Feasibility: Practical viability, clear identification of risks, and realistic assessment of the resources and expertise required.
- Researcher Background and Experience: Including complementary strengths of applicant or team relative to the project's goals.
- Impact Potential: Likelihood that outcomes will meaningfully advance understanding of neural networks, enhance interpretability methods, or improve theoretical foundations underpinning AI safety.
We will not support proposals that:
- Fall outside the scope of the Opportunity Space [LW · GW].
- Prioritize advancing understanding in external fields (such as physics) over AI safety
- Lack a concrete rationale for how proposed methods will contribute to interpretability and the safety of AI systems.
Want to participate, but don’t have an idea in mind? Fill out our Expression of interest form, and we’ll contact you if an opportunity comes up that fits your skills.
Contact Lauren Greenspan (lauren@pibbss.ai) or Dmitry Vaintrob (dmitry@pibbss.ai) with any questions.
0 comments
Comments sorted by top scores.