Posts

Comments

Comment by Angie Normandale (palindrome) on Why modelling multi-objective homeostasis is essential for AI alignment (and how it helps with AI safety as well) · 2025-01-13T18:33:59.057Z · LW · GW

I've been exploring this for the last year, I think it's a promising avenue for solving some key alignment issues. Homeostatic approaches are well documented in neuroscience but a surprisingly neglected approach to alignment. 

Research in support: 

Managing competing drives, homeostatic approaches ensure safety wins out 
 -Mathematical formalisations by Laurençon et al, Kermati and Gutkin
-Robotics researchers successfully implemented a homeostatic approach to help a system manage competing drives  
-Friston suggests it's a way to manage Free Energy 

Scaling to group behaviour
This could mathematically support Joel Leibo and team's appropriateness agenda and provide a mechanism for the unsolved problem of alignment with changing and influenceable reward functions

Declaration of interest: I recently joined Roland at Aintelope to support this agenda alongside other applications from neuroscience to alignment!

Comment by Angie Normandale (palindrome) on Inducing Unprompted Misalignment in LLMs · 2024-04-21T08:56:50.274Z · LW · GW

Great paper! Important findings.

What’s your intuition re ways to detect and control such behaviour?

An interesting extension would be training a model on a large dataset which includes low level but consistent elements of primed data. Do the harmful behaviours persist and generalise? If yes, could be used to exploit existing ‘aligned’ models which update on publicly modifiable datasets.