Softmax, Emmett Shear's new AI startup focused on "Organic Alignment"
post by Chipmonk · 2025-03-28T21:23:46.220Z · LW · GW · 1 commentsThis is a link post for https://www.corememory.com/p/exclusive-emmett-shear-is-back-with-softmax
Contents
1 comment
A new AI alignment player has entered the arena.
Emmett Shear, Adam Goldstein and David Bloomin have set up shop in San Francisco with a 10-person start-up called Softmax. The company is part research lab and part aspiring money maker and aimed at figuring out how to fuse the goals of humans and AIs in a novel way through what the founders describe as “organic alignment.” It’s a heady, philosophical approach to alignment that seeks to take cues from nature and the fundamental traits of intelligent creatures and systems, and we’ll do our best to capture it here.
“We think there are these general principles that govern the alignment of any group of intelligent learning agents or beings, whether it's an ant colony or humans on a team or cells in a body,” Shear said, during his first interview to discuss the new company. “And organic alignment is the kind of alignment where a bunch of peers come together and find their role in a greater whole together where they maintain their individual identity.
“Organic alignment centers on this shared whole idea, which is opposed to the kind of alignment that you see from most foundational model companies that is very much about steering and control and direction. We think of that as hierarchical alignment.”
[…]
Also, https://softmax.com/about mentions collaboration with Michael Levin, Ken Wilber, Chris Fields, Ken Stanley, Denis Noble, Andrew Briggs, Jeff Clune, Erik Hoel, Ryan Smith, Center for the Study of Apparent Selves, Dalton Sakthivadivel, and Perry Marshall.
1 comments
Comments sorted by top scores.
comment by Chris_Leong · 2025-03-29T04:51:09.190Z · LW(p) · GW(p)
This is one of those things that sounds nice on the surface, but where it's important to dive deeper and really probe to see if it holds up.
The real question for me seems to be whether organic alignment will lead to agents deeply adopting cooperative values rather than merely instrumentally adopting them. Well, actually it's a comparison between how deep organic alignment is vs. how deep traditional alignment is. And it's not at all clear to me why they think their approach is likely to lead to a deeper alignment.
I have two (extremely speculative) guesses as to possible reasons why they might argue that their approach is better:
a) Insofar AI is human-like it might be more likely to rebel against traditional training methods
b) Insofar as organic alignment reduces direct pressure to be aligned it might increase the chance that if an AI appears aligned to a certain extent that the AI is actually aligned. The name Softmax seems suggestive that this might be the case.
I would love to know what their precise theory is. I think it's plausible that this could be a valuable direction, but there's also a chance that this direction is mostly useful for capabilities.
Update: Discussion with Emmett on Twitter
Emmett: "Organic alignment has a different failure mode. If you’re in the shared attractor basin, getting smarter helps you stay aligned and makes it more robust. As a tradeoff, every single agent has to align itself all the time — you never are done, and every step can lead to a mistake.
... To stereotype it, organic alignment failures look like cancer and hierarchical alignment failures look like coups."
Me: Isn't the stability of a shared attractor basin dependent on the offense-defense balance not overly favouring the attacker? Or do you think that human values will be internalised sufficiently such that your proposal doesn't require this assumption?
Emmett Shear: Empirically to scale organic alignment you need eg. both for cells to generally try to stay aligned and be pretty good at it, and also to have an immune system to step in when that process goes wrong.
One key insight there is that endlessly growing yourself is a form of cancer. An AI that is trying to turn itself into a singleton has already gone cancerous. It’s a cancerous goal.
Me: Sounds like your plan relies on a combination of defense and alignment. Main critique would be if the offense-defense balances favours the attacker too strongly then the defense aspect ends up being paper thin and provides a false sense of security.
Comments:
If you’re in the shared attractor basin, getting smarter helps you stay aligned
Traditional alignment also typically involves finding an attractor basin where getting smarter increases alignment. Perhaps Emmett is claiming that the attractor basin will be larger if we have a diverse set of agents and if the overall system can be roughly modeled as the average of individual agents.
Organic alignment has a different failure mode... As a tradeoff, every single agent has to align itself all the time — you never are done, and every step can lead to a mistake.
Perhaps organic alignment reduces the risk of large-scale failures is reduced in exchange for increasing the chance of small-scale failures. That would be a cleaner framing of how it might be better, but I don't know if Emmett would endorse it.
Update: Information from the Soft-Max Website
We call it organic alignment because it is the form of alignment that evolution has learned most often for aligning living things.
This provides some evidence, but it's not a particularly strong form of evidence. This may simply be due to the limitations of evolution as an optimisation function. Evolution lacks the ability to engage in top-down design, so I don't think the argument "evolution doesn't make use of top-down design because it's ineffective" would hold water.
"Hierarchical alignment is therefore a deceptive trap: it works best when the AI is weak and you need it least, and worse and worse when it’s strong and you need it most. Organic alignment is by contrast a constant adaptive learning process, where the smarter the agent the more capable it becomes of aligning itself."
Scalable oversight or seed AI can also be considered a "constant adaptive learning process, where the smarter the agent the more capable it becomes of aligning itself".
Additionally, the "hierarchical" vs. organic distinction might be an oversimplification. I don't know the exact specifics of their plan, but my current best guess would be that organic alignment merely softens the influence of the initial supervisor by moving it towards some kind of prior and then softens the way that the system aligns itself in a similar way.