An Analysis of the ‘Digital Gaia’ Proposal from a Safety Perspective

post by marc/er · 2023-05-31T12:21:39.390Z · LW · GW · 1 comments


  Overview of the Gaia Architecture
  Safety Analysis
  Implementation Analysis
1 comment

This document addresses the Natural Intelligence whitepaper from the Digital Gaia project, but only the aspects relevant to the safe deployment of advanced artificial intelligence (the project is much more broad in scope). The whitepaper is currently incomplete and my criticisms may be addressed in later iterations of the document. My interpretation of it may also be incorrect, and my only reading into the project has been this whitepaper. 


Overview of the Gaia Architecture

Gaia is planned to be a decision-making support and automation system, and would be composed of Fangorn (a decentralized network of agents called Ents) and GaiaHub (humans). Ents are glass-box Bayesian reasoners wired to various input sources and actuators (including other Ents). Via access to skills (functions that specify ‘beliefs’ as random variables) they can obtain an understanding of things like scientific theories, and through a compositional active inference loop over a meta-model are expected to parameterize state/model space and update their beliefs continuously. 

Skills form a compute graph that determines a posterior distribution in terms of state and structural priors. These would be submitted via GaiaHub. The whitepaper refers to this repository of skills as the 'Skill Universe.' Representations of these computations will be written to an append-only database such that they can be tracked and reproduced, which allows for shared inference. Ents are intended to be proxies for real world systems, and the underlying logic of Ents is intended to be universally similar (called the Natural Intelligence Compute Engine, or ‘NICE’) and conducted in the language of the Natural Intelligence Protocol, or ‘NIP’.

Skills comprise an Ents local ontology and priors. Upon submission of a model to the Skill Universe; an Ent will assign a prior to that model and then compile an ensemble model composed of that and other applicable models from the Skill Universe. The inference algorithm is then run over the ensemble. After, a leave-one-out comparison is performed to produce contribution scores pertaining to each model which will be used to update model priors. Ents can interface with the world by dint of their internal resource pool, information bounties and recommendations (summary of desired trajectory for variables of interest). They approximately optimize their expected free energy and have hardwired initial preference priors.

Safety Analysis

I do not think that the architecture proposed in this proposal does what it claims to do (achieve safety/alignment). It doesn’t seem to me as though Gaia solves any core alignment difficulties [LW · GW]. Ents are essentially just model-based RL agents and I don’t see why their design is more inherently safe than any other. Where I do think Gaia makes worthwhile strides in safety is in multipolar outcomes. Standardized interpretable communication protocols make multipolar scenarios much easier to analyze, but also makes performing any law-abiding computations really dangerous. Even a robustly aligned Ent might be motivated to conceal its computations to avoid their reproduction by a malicious threat actor. All capable agents regardless of their persuasion seem to be incentivized to hide at least some of their evaluations. 

In worlds where capabilities progresses such that we are able to cause relations between powerful AI systems to cease or be modified; having unified and transparent communications protocols could be invaluable. I just do not see why you would need to make these transparent communications forcibly open, particularly at the level of the product of each inference loop. My intuition is also that agents with identical designs and alignment protocols are far more likely to cooperate, and this seems self-evident in cases where all Ents were initialized with the same preference priors, which brings me to my next point:

We do not know how to hardcode safe preference priors into model-based RL agents. I am strongly opposed to deploying potentially millions of pure RL agents with powerful actuators in the real world, and especially not with access (although heavily restricted) to what would likely become the most critical piece of digital infrastructure for humanity. Even if an Ent fulfills its role of representing the ‘will’ of an ecosystem; what is stopping it from succumbing to instrumental subgoals like power-seeking other than its hardwired preference priors (which we do not know how to build)?

I fear this is just reframing the same very problems MIRI and co spent so many years trying to solve.

I wish there was more I could say here but the whitepaper does not really make reference to any specific alignment difficulties or how it solves them. This is something I would highly suggest incorporating in future whitepaper iterations.

Implementation Analysis

In my opinion Gaia is 'too weird' to implement. If you aren’t aware, it is founded on almost ecocentric philosophy that draws from animism and places heavy emphasis on agentizing the environment. I do not see how a project like Gaia could realistically garner global backing or governmental support given that AI safety and the idea of slowing massive training runs is barely creeping into the Overton Window. Perhaps I am wrong and the Gaia architecture is just so efficient that it takes off regardless, but this seems improbable to me given that someone could just replicate the powerful elements of the architecture but tune it for purely efficiency as opposed to for the philosophy of the project. Doing so would also place such a proposal closer to the Overton Window.

As I mentioned earlier, I am strongly opposed to deploying potentially millions of pure RL agents with powerful actuators in the real world. I would not feel confident that a scaled deployment of the system described in this whitepaper would help humanity overcome AI x-risks, and may even elevate them.

The security burden in implementing a technology like Gaia is also immense. Even if all Ents were somehow initialized with perfectly aligned priors; it seems likely to me that enhanced malicious threat actors would have great incentive to manipulate the append-only database that computation representations will be written to. Manipulated representations could result in erroneous updating by Ents, and it isn't clear to me that Gaia would be robust enough self-correct given a failure like this. 


Comments sorted by top scores.

comment by Mazianni (john-williams-1) · 2023-06-02T05:03:58.640Z · LW(p) · GW(p)

For my part, this is the most troubling part of the proposed project (that the article assesses, link to the project in this article, above.)

... convincing nearly 8 billion humans to adopt animist beliefs and mores is unrealistic. However, instead of seeing this state of affairs as an insurmountable dead-end, we see it as a design challenge: can we build (or rather grow) prosthetic brains that would interact with us on Nature’s behalf?

Emphasis by original author (Gaia architecture draft v2).

It reads like a a strange mix of forced religious indoctrination and anthropomorphism of natural systems. Especially when coupled with an earlier paragraph in the same proposal

... natural entities have “spirits” capable of desires, intentions and capabilities, and where humans must indeed deal with those spirits, catering to their needs, paying tribute, and sometimes even explicitly negotiating with them. ...

Emphasis added by me.