Posts
Comments
Latecomer, but as this relates to some of my prior work on self- and other-modeling, I thought I'd comment... The consistently high task accuracy displayed on Figure D suggests that even your smallest neural network is significantly over-capacity/over-parameterized for the test dataset. Excess capacity seems to be the only way the model can take on the expensive self-modeling task (*) without losing accuracy on the main task. Indeed, this would suggest that the explanation for the regularization benefit of self-modeling here is precisely that it soaks up the excess capacity, avoiding overfitting. But obviously, you can have too much of a good thing -- as the experiments with fewer hidden layers show, the attention weight can take over the model's focus and destroy accuracy. So it seems that, if you up the problem complexity/network size knob, the "maximum allowable attention weight" that doesn't compromise accuracy will tend to zero. On the other hand, one can think of simpler tasks than fully predicting all of a layer's activations -- for example, predicting the activation signs, the maximum-minimum range, the mean activation, etc. I want to say these seem more meaningful anyway, and a way to avoid Borges's "Map of the Empire whose size was that of the Empire", no?
* BTW: Unless I missed it, the paper did not report the accuracy of the self-modeling task, only of the primary task, right? I must imagine it was far from perfect, as perfect self-modeling is only possible in trivial edge cases, right?
@Épiphanie Gédéon this is great, very complementary/related to what we've been developing for the Gaia Network. I'm particularly thrilled to see the focus on simplicity and incrementalism, as well as the willingness to roll up one's sleeves and write code (often sorely lacking in LW). And I'm glad that you are taking the map/territory problem seriously; I wholeheartedly agree with the following: "Most safe-by-design approaches seem to rely heavily on formal proofs. While formal proofs offer hard guarantees, they are often unreliable because their model of reality needs to be extremely close to reality itself and very detailed to provide assurance."
A few additional thoughts:
- To scale this approach, one will want to have "structural regularizers" towards modularity, interoperability and parsimony. Two of those we have strong opinions on are:
- A preference for reusing shared building blocks and building bottom-up. As a decentralized architecture, we implement this preference in terms of credit assignment, specifically free energy flow accounting.
- Constraints on the types of admissible model code. We have strongly advocated for probabilistic causal models expressed as probabilistic programs. This enables both a shared statistical notion of model grounding (effectively backing the free energy flow accounting as approximate Bayesian inference of higher-order model structure) and a shared basis for defining and evaluating policy spaces (instantly turning any descriptive model into a usable substrate for model-based RL / active inference).
- Learning models from data is super powerful as far as it goes, but it's sometimes necessary -- and often orders of magnitude more efficient -- to leverage prior knowledge. Two simple and powerful ways to do it, which we have successfully experimented with, are:
- LLM-driven model extraction from scientific literature and other sources of causal knowledge. This is crucial to bootstrap the component library. (See also our friends at system.com.)
- Collaborative modeling by LLM-assisted human expert groups. This fits and enhances the "pull request" framework perfectly.
- Scaling this to multiple (human or LLM) contributors will require a higher-order model economy of some sort. While one can get away with an implicit, top-down resource economy in the context of a closed contributor group, opening up will require something like a market economy. The free energy flow accounting described above is a suitable primitive for this.
I'd be keen to find ways to collaborate.
Also @Roman Leventov FYI
Hey Steven, I'll answer your question/suggestion below. One upfront request: please let us know if this helps. We'll write a follow-up post on LW explaining this.
As mentioned in the appendix, most of what we wrote up is generalized from concrete people (not made-up, my IRL company Digital Gaia) trying to build a specific concrete AI thing (software to help farmers and leaders of regeneration projects maximize their positive environmental impact and generate more revenue by being able to transparently validate their impact to donors or carbon credit buyers). We talked extensively to people in the ag, climate and nature industries, and came to the conclusion that the lack of transparent, unbiased impact measurement and validation -- ie, exactly the transaction costs you mention -- is the reason why humanity is massively underinvested in conservation and regeneration. There are gazillions of "climate AI" solutions that purport to measure and validate impact, but they are all fundamentally closed and centralized, and thus can't eliminate those transaction costs. In simple terms, none of the available systems, no matter how much money they spent on data or compute, can give a trustworthy, verifiable, privacy-preserving rationale for either scientific parameters ("why did you assume the soil carbon captured this year in this hectare was X tons?") or counterfactuals ("why did you recommend planting soybeans with an alfalfa rotation instead of a maize monoculture?"). We built the specific affordances that we did -- enabling local decision-support systems to connect to each other forming a distributed hierarchical causal model that can perform federated partial pooling -- as a solution to exactly that problem:
- The first adopters (farmers) already get day-1 benefits (a model-based rationale that is verifiable and privacy-preserving), using models and parameters bootstrapped from the state of the art of open transaction-cost-reduction: published scientific literature, anecdotal field reports on the Web, etc.
- The parameter posteriors contributed by the first adopters drive the flywheel. As more adopters join, network effects kick in and transaction RoI increases: both parameter posteriors become increasingly truthful and easier to verify (posterior estimates from multiple sources mostly corroborate each other, confidence bands get narrower).
- Any remaining uncertainty, in turn, drives incentives for scientists and domain experts to refine models and perform experiments, which will benefit all adopters by making their local impact rationales and recommendations more refined.
- As an open network, models and parameters can be leveraged in adjacent domains, which then generate their own adjacencies, eventually covering the entire spectrum of science and engineering. For instance, we have indoor farms and greenhouses interested in our solution; they would need to incorporate not only agronomic models but also energy consumption and efficiency models. This then opens the door to industrial and manufacturing use cases, and so on and so forth...
We validated the first two steps of this theory in a pilot; it worked so well that our pilot users keep ringing us back saying they need us to turn it into production-ready software...
Disclaimer: We did not fully implement or validate two important pieces of the architecture that are alluded to in the post: free energy-based economics and trust models. These are not crucial for a small-scale, controlled pilot, but would be relevant for use at scale in the wild.
people won’t want to prioritise informationally best comments, and that their main motivation for reading comments is confirming their pre-existing worldviews. This is sort of what is customary to expect, but leaning into my optimism bias, I should plan as if this is not the case. (Otherwise, aren’t we all doomed, anyway?)
There are countermoves to this. Preferences and behaviors are malleable. There can be incentives for adopting BetterDiscourse (potentially through public good funding), peer pressure, etc.
I think this should very valuable already in this completely local regime, however, things may get even more interesting, and to recapitulate the “collaborative filtration power” of Community Notes, Pol.is, and Viewpoints.xyz, (active) users’ feedbacks are aggregated to bubble up the best comments up for new users, or for users who choose not to vote actively to tune their predictive model well. Furthermore, when users with a similar state-space already voted positively for comments that their models didn’t predict then such comments could be shown earlier to other users in the same state-space cluster, overriding the predictions of their models.
I used to think this wouldn't reach a critical mass of high-quality active users, but I've started warming up to this idea. Just yesterday I was talking to some friends who basically described how they pack-hunted to debunk right-wing political commentary on Instagram and news site. And these are Brazilian diaspora normies in their 40s, highly educated but not the highly motivated teenage nerd persona that I would normally envision as an active contributor in this kind of thing. So I think if we find a way to help people like this, who already see collaborative moderation as an important public duty, by increasing and making more visible the material impact from their contributions, we can achieve critical mass and at least initially overcome the deluge of noise that characterizes online commentary.
Maybe just point to the relevant paper? https://arxiv.org/abs/2312.00752
Excellent post, a great starting point, but we must go deeper :) For instance:
- Voting is a very low-bandwidth signaling scheme. There's room for arbitrary expressions of preferences and plans.
- Most implementations of voting are also cast as irreversible. We'd want room for dynamic discovery of the aggregate preference by the individuals.
- The "collective" won't always have a coherent preference; for instance, if the individuals are locked into a zero-sum game. (Let alone if the individuals' preferences are incoherent to start with!) I'd like a theory that would output "there is no coherent collective here, you should just go your own ways and agree on a transactional relationship instead".