Apply to the Conceptual Boundaries Workshop for AI Safety
post by Chipmonk · 2023-11-27T21:04:59.037Z · LW · GW · 0 commentsContents
Website, more details, and application Apply by December 22 For identifying, discussing, and strategizing about promising AI safety research directions pertaining to the boundaries that causally distance agents from their environment. What are agent boundaries? …a natural abstraction for safety? Protecting agents and infrastructure Attendees Confirmed: Seeking 4-6 more: Questions Intended output Related work Website, more details, and application Apply by December 22 Get notified about future boundaries events How you can help None No comments
Do you have experience with formal computer security, Active Inference, Embedded Agency, biological gap junctions, or other frameworks that distinguish agents from their environment? Apply to the Conceptual Boundaries Workshop for AI safety. February in Austin TX.
Website, more details, and application
Apply by December 22
For identifying, discussing, and strategizing about promising AI safety research directions pertaining to the boundaries that causally distance agents from their environment.
What are agent boundaries?
A few examples:
- A bacterium uses its membrane to protect its internal processes from external influences.
- A nation maintains its sovereignty by defending its borders.
- A human protects their mental integrity by selectively filtering the information that comes in and out of their mind.
…a natural abstraction for safety?
Agent boundaries seem to be a natural abstraction representing the safety and autonomy of agents.
- A bacterium survives only if its membrane is preserved.
- A nation maintains its sovereignty only if its borders aren’t invaded.
- A human mind maintains mental integrity only if it can hold off informational manipulation.
Maybe the safety of agents could be largely formalized as the preservation of their membranes.
These boundaries can then be formalized via Markov blankets.
Boundaries are also cool because they show a way to respect agents without needing to talk about their preferences or utility functions. Andrew Critch has said the following about this idea:
my goal is to treat boundaries as more fundamental than preferences, rather than as merely a feature of them. In other words, I think boundaries are probably better able to carve reality at the joints than either preferences or utility functions, for the purpose of creating a good working relationship between humanity and AI technology («Boundaries» Sequence, Part 3b [? · GW])
For instance, respecting the boundary of a bacterium would probably mean “preserving or not disrupting its membrane” (as opposed to knowing its preferences and satisfying them).
Protecting agents and infrastructure
By formalizing and preserving the important boundaries in the world, we could be in a better position to protect humanity from AI threats.
For example, critical computing infrastructure could be secured by creating strong boundaries around them. This can be enforced by cryptography and formal methods such that only the subprocesses that need to have read and/or write access to a particular resource (like memory) have the encryption keys to do so. Related: Object-capability model, Principle of least privilege, Evan Miyazono’s Atlas Computing, Davidad’s Open Agency Architecture [? · GW].
And it may also be possible to do something similar with physical property rights.
Attendees
Confirmed:
- David ‘davidad’ Dalrymple
- Scott Garrabrant
- TJ (Tushant Jha)
- Andrew Critch
- Allison Duettmann
- Alex Zhu [LW · GW]
- Chris Lakin (main organizer)
- Evan Miyazono (co-organizer)
Seeking 4-6 more:
Do you have experience with formal computer security, Active Inference, Embedded Agency, biological gap junctions, or other frameworks that distinguish agents from their environment?
Note: We will likely be running larger boundaries workshops in mid 2024, even if there isn’t space for you at this February workshop.
Questions
- How can boundaries help with safety?
- What, formally, is a "boundary protocol" which describes the conditions under which exceptions can be made to the default prohibition on boundary violations?
- Andrew Critch's current formalization of boundaries is fundamentally dependent on physical time. How can this be generalized to logical time?
- What fields already have theories and implementations of the kind of boundaries we mean?
- What empirical projects could help make progress on verifying and implementing boundaries-based safety approaches ASAP?
Intended output
To identify promising research directions and empirical projects for formalizing boundaries and applying boundaries to safety.
For example, what would be needed to specify a formal language for describing boundaries-based ethics?
Related work
- Active Inference, Markov blankets
- Andrew Critch’s «Boundaries» Sequence [? · GW]
- cell gap junctions; Michael Levin’s work on cell cooperation
- Scott Garrabrant’s Cartesian Frames [LW · GW]
- «Boundaries/Membranes» and AI safety compilation [LW · GW]
- Agent membranes and formalizing “safety” [LW · GW]
- Agent membranes and causal distance [LW · GW]
- Formalizing «Boundaries» with Markov blankets [LW · GW]
- Boundaries-based security [LW · GW]; Object-capability model
Website, more details, and application
Apply by December 22
Conceptual Boundaries Workshop is financially supported by the Foresight Institute, Blake Borgeson, and LTFF.
Get notified about future boundaries events
We are also considering running other boundaries-related workshops in mid 2024. For example a larger more general workshop, or domain-specific workshops (e.g.: boundaries in biology, boundaries in computer security). If you would like to get notified about potential future events, sign up via the form on the footer of the website.
How you can help
- Repost this workshop on Twitter.
- Share with anyone you think might be a good fit.
- Let me know if you have ideas for other places I can advertise this.
- The workshop is fully financially supported, but we’re also looking for seed funding for projects ideated at the workshop. More details on Manifund. You may also email Chris and Evan directly.
0 comments
Comments sorted by top scores.