Posts

From Conceptual Spaces to Quantum Concepts: Formalising and Learning Structured Conceptual Models 2024-02-06T10:18:40.420Z
AI alignment as a translation problem 2024-02-05T14:14:15.060Z
Workshop (hackathon, residence program, etc.) about for-profit AI Safety projects? 2024-01-26T09:49:30.836Z
Institutional economics through the lens of scale-free regulative development, morphogenesis, and cognitive science 2024-01-23T19:42:31.739Z
Gaia Network: An Illustrated Primer 2024-01-18T18:23:25.295Z
Worrisome misunderstanding of the core issues with AI transition 2024-01-18T10:05:30.088Z
AGI will be made of heterogeneous components, Transformer and Selective SSM blocks will be among them 2023-12-27T14:51:37.713Z
Gaia Network: a practical, incremental pathway to Open Agency Architecture 2023-12-20T17:11:43.843Z
SociaLLM: proposal for a language model design for personalised apps, social science, and AI safety research 2023-12-19T16:49:51.966Z
Assessment of AI safety agendas: think about the downside risk 2023-12-19T09:00:48.278Z
Refinement of Active Inference agency ontology 2023-12-15T09:31:21.514Z
Proposal for improving the global online discourse through personalised comment ordering on all websites 2023-12-06T18:51:37.645Z
Open Agency model can solve the AI regulation dilemma 2023-11-08T20:00:56.395Z
Any research in "probe-tuning" of LLMs? 2023-08-15T21:01:32.838Z
AI romantic partners will harm society if they go unregulated 2023-08-01T09:32:13.417Z
Philosophical Cyborg (Part 1) 2023-06-14T16:20:40.317Z
An LLM-based “exemplary actor” 2023-05-29T11:12:50.762Z
Aligning an H-JEPA agent via training on the outputs of an LLM-based "exemplary actor" 2023-05-29T11:08:36.289Z
AI interpretability could be harmful? 2023-05-10T20:43:04.042Z
H-JEPA might be technically alignable in a modified form 2023-05-08T23:04:20.951Z
Annotated reply to Bengio's "AI Scientists: Safe and Useful AI?" 2023-05-08T21:26:11.374Z
For alignment, we should simultaneously use multiple theories of cognition and value 2023-04-24T10:37:14.757Z
An open letter to SERI MATS program organisers 2023-04-20T16:34:10.041Z
Scientism vs. people 2023-04-18T17:28:29.406Z
Goal alignment without alignment on epistemology, ethics, and science is futile 2023-04-07T08:22:24.647Z
Yoshua Bengio: "Slowing down development of AI systems passing the Turing test" 2023-04-06T03:31:39.120Z
Emergent Analogical Reasoning in Large Language Models 2023-03-22T05:18:50.548Z
Will people be motivated to learn difficult disciplines and skills without economic incentive? 2023-03-20T09:26:19.996Z
A reply to Byrnes on the Free Energy Principle 2023-03-03T13:03:48.990Z
Joscha Bach on Synthetic Intelligence [annotated] 2023-03-02T11:02:09.009Z
Powerful mesa-optimisation is already here 2023-02-17T04:59:59.794Z
The Linguistic Blind Spot of Value-Aligned Agency, Natural and Artificial 2023-02-14T06:57:58.036Z
Morphological intelligence, superhuman empathy, and ethical arbitration 2023-02-13T10:25:17.267Z
A multi-disciplinary view on AI safety research 2023-02-08T16:50:31.894Z
Temporally Layered Architecture for Adaptive, Distributed and Continuous Control 2023-02-02T06:29:21.137Z
Has private AGI research made independent safety research ineffective already? What should we do about this? 2023-01-23T07:36:48.124Z
Critique of some recent philosophy of LLMs’ minds 2023-01-20T12:53:38.477Z
Reward is not Necessary: How to Create a Compositional Self-Preserving Agent for Life-Long Learning 2023-01-12T16:43:42.357Z
AI psychology should ground the theories of AI consciousness and inform human-AI ethical interaction design 2023-01-08T06:37:54.090Z
How evolutionary lineages of LLMs can plan their own future and act on these plans 2022-12-25T18:11:18.754Z
Properties of current AIs and some predictions of the evolution of AI from the perspective of scale-free theories of agency and regulative development 2022-12-20T17:13:00.669Z
The two conceptions of Active Inference: an intelligence architecture and a theory of agency 2022-11-16T09:30:47.484Z
What is our current best infohazard policy for AGI (safety) research? 2022-11-15T22:33:34.768Z
The circular problem of epistemic irresponsibility 2022-10-31T17:23:50.719Z
The problem with the media presentation of “believing in AI” 2022-09-14T21:05:10.234Z
Roman Leventov's Shortform 2022-08-19T21:01:28.692Z
Are language models close to the superhuman level in philosophy? 2022-08-19T04:43:07.504Z
AGI-level reasoner will appear sooner than an agent; what the humanity will do with this reasoner is critical 2022-07-30T20:56:54.532Z
Active Inference as a formalisation of instrumental convergence 2022-07-26T17:55:58.309Z

Comments

Comment by Roman Leventov on On attunement · 2024-03-30T13:08:11.454Z · LW · GW

John Vervaeke calls attunement "relevance realization".

Comment by Roman Leventov on Modern Transformers are AGI, and Human-Level · 2024-03-27T03:16:32.926Z · LW · GW

Cf. DeepMind's "Levels of AGI" paper (https://arxiv.org/abs/2311.02462), calling modern transformers "emerging AGI" there, but also defining "expert", "virtuoso", and "superhuman" AGI.

Comment by Roman Leventov on AI Alignment Metastrategy · 2024-03-24T13:24:12.463Z · LW · GW

Humane/acc, https://twitter.com/AndrewCritchPhD

Comment by Roman Leventov on Value learning in the absence of ground truth · 2024-02-05T22:11:41.339Z · LW · GW

Well, yes, it also includes learning weak agent's models more generally, not just the "values". But I think the point stands. It's elaborated better in the linked post. As AIs will receive most of the same information that humans receive through always-on wearable sensors, there won't be much to learn for AIs from humans. Rather, it's humans that will need to do their homework, to increase the quality of their value judgements.

Comment by Roman Leventov on Value learning in the absence of ground truth · 2024-02-05T21:30:09.977Z · LW · GW

I agree with the core problem statement and most assumptions of the Pursuit of Happiness/Conventions Approach, but suggest a different solution: https://www.lesswrong.com/posts/rZWNxrzuHyKK2pE65/ai-alignment-as-a-translation-problem

I agree with OpenAI folks that generalisation is the key concept for understanding alignment process. But I think that with their weak-to-strong generalisation agenda, they (as well as almost everyone else) apply it I'm the reverse direction: learning values of weak agents (humans) doesn't make sense. Rather, weak agents should learn the causal models that strong agents employ to be able to express an informed value judgement. This is the way to circumvent the "absence of the ground truth for values" problem: instead, agent try to generalise their respective world models so that they sufficiently overlap, and then choose actions that seem net beneficial to both sides, without knowing how this value judgement way made by the other side.

In order to be able to generalise to shared world models with AIs, we must also engineer AIs to have human inductive biases from the beginning. Otherwise, this won't be feasible. This observation makes "brain-like AGI" one of the most important alignment agendas in my view.

Comment by Roman Leventov on AI alignment as a translation problem · 2024-02-05T18:44:25.611Z · LW · GW

If I understand correctly, by "discreteness" you mean that it simply says that one agent can know neither the meaning of symbols used by another agent nor the "degree" of grokking the meaning. Just cannot say anything.

This is correct, but the underlying reason why this is correct is the same as why solipsism or the simulation hypothesis cannot be disproven (or proven!).

So yeah, I think there is no tangible relationship to the alignment problem, except that it corroborates that we couldn't have 100% (literally, probability=1) certainty of alignment or safety of whatever we create, but it was obvious even without this philosophical argument.

So, I removed that paragraph about Quine's argument from the post.

Comment by Roman Leventov on Making every researcher seek grants is a broken model · 2024-01-27T15:55:25.019Z · LW · GW

That also was, naturally, the model in the Soviet Union, with orgs called "scientific research institutes". https://www.jstor.org/stable/284836

Comment by Roman Leventov on Workshop (hackathon, residence program, etc.) about for-profit AI Safety projects? · 2024-01-26T23:24:54.899Z · LW · GW

See a discussion of this point here with Marius Hobbhahn and others.

Comment by Roman Leventov on This might be the last AI Safety Camp · 2024-01-26T09:54:24.493Z · LW · GW

This post has led me to this idea:  Workshop (hackathon, residence program, etc.) about for-profit AI Safety projects?

Comment by Roman Leventov on Gaia Network: An Illustrated Primer · 2024-01-23T20:18:01.429Z · LW · GW

Collusion detection and prevention and trust modelling don't trivially follow from the basic architecture of the system described on the level of this article. Some specific mechanisms should be implemented in the Protocol to have collusion detection and trust modelling. And we don't have these mechanisms actually developed yet, but we think that they should be doable (though this is still a research bet, not a 100% certainty) because the Gaia Network directly embodies (or is amenable to) all six general principles for anti-collusion mechanism design (agency architecture) proposed by Eric Drexler (and these principles themselves should be further validated via formalisation and proving theorems about the collusion properties of the systems of distributed intelligence).

Of course, there should also be (at least initially, but practically for a very long time, if not forever) "traditional" governance mechanisms of the Gaia Network, nodes, model and data ownership, etc. So, there are a lot of open questions about interfacing GN with existing codes of law, judicial and law enforcement practice, intellectual property, political and governance processes, etc. Some of these interfaces and connections with existing institutions should in practice deal with bad actors and certain types of malicious behaviour on GN.

Comment by Roman Leventov on Worrisome misunderstanding of the core issues with AI transition · 2024-01-18T10:49:06.295Z · LW · GW

Fair, I edited the post.

Comment by Roman Leventov on AI doing philosophy = AI generating hands? · 2024-01-16T02:18:07.778Z · LW · GW

Apart from the view on philosophy as "cohesive stories that bind together and infuse meaning into scientific models", which I discussed with you earlier and you was not very satisfied with, another interpretation of philosophy (natural phil, phil of science, phil of mathematics, and metaphil, at least) is "apex generalisation/abstraction". Think Bengio's "AI scientist", but the GM should be even deeper to first sample a plausible "philosophy of science" given all the observations about the world up to the moment, then sample plausible scientific theory given the philosophy and all observations up to the moment on a specific level or coarse-graining/scientific abstraction (quantum, chemical, bio, physio, psycho, socio, etc.), then sample mechanistic model that describes the situation/system of interest at hand (e.g., a morphology of a particular organism, given the laws of biology, or morphology of the particular society), given the observations of the system up to the moment, and then finally sample plausible variables values that describe the particular situation at a particular point in time given all the above.

If this interpretation is correct, then doing philosophy well and not deluding ourselves is far off. And there is a huge risk in thinking we can do it well before we actually can.

Comment by Roman Leventov on An even deeper atheism · 2024-01-14T11:48:20.365Z · LW · GW

Extrapolated volition is a non-sensical concept altogether, as demonstrated in the OP. There is no extrapolated volition outside of it unfolding in real life in a specific context, which affects the trajectory of values/volition in a specific way. And which will be this context is unknown and unknowable (maybe aliens will visit Earth tomorrow, maybe not).

Comment by Roman Leventov on What technical topics could help with boundaries/membranes? · 2024-01-05T20:50:31.366Z · LW · GW

Related, consciousness frame: where is the boundary of it? Is our brain conscious, or the whole nervous system, or the whole human, or the whole human + the entire microbiome populating them, or human + robotic prosthetic limbs, or human + web search + chat AI + personal note taking app, or the whole human group (collective consciousness), etc.

Some computational theories of consciousness attempt to give a specific, mathematically formalised answer to this question.

Comment by Roman Leventov on What technical topics could help with boundaries/membranes? · 2024-01-05T20:45:32.698Z · LW · GW

Psychology may not be "technical enough" because an adequate mathematical science or process theory is not developed for it, yet, but it's ultimately very important, perhaps critically important: see the last paragraph of https://www.lesswrong.com/posts/AKBkDNeFLZxaMqjQG/gaia-network-a-practical-incremental-pathway-to-open-agency. Davidad apparently thinks that it can be captured with an Infra-Bayesian model of a person/human.

Also on psychology: what is the boundary of personality, where just a "role" (spouse, worker, etc) turns into multiple-personality disorder?

Comment by Roman Leventov on What technical topics could help with boundaries/membranes? · 2024-01-05T20:36:30.868Z · LW · GW

In the most recent episode of his podcast show, Jim Rutt (former president of SFI) and his guest talk about membranes a lot, the word appears 30 times on a transcript page: https://www.jimruttshow.com/cody-moser/

Comment by Roman Leventov on What technical topics could help with boundaries/membranes? · 2024-01-05T19:32:23.664Z · LW · GW

Related, quantum information theory:

Comment by Roman Leventov on AI Alignment Metastrategy · 2024-01-03T17:34:08.269Z · LW · GW

I think this metastrategy classification is overly simplified to the degree that I'm not sure it is net helpful. I don't see how Hendrycks' "Leviathan safety", Drexler's Open Agency Model, Davidad's OAA, Bengio's "AI pure scientist" and governance proposals (see https://slideslive.com/39014230/towards-quantitative-safety-guarantees-and-alignment), Kaufmann and Leventov's Gaia Network, AI Objectives Institute's agenda (and related Collective Intelligence Project's), Conjecture's CoEms, OpenAI's "AI alignment scientist" agenda, and Critch's h/acc (and related janus et al.'s Cyborgism) straightforwardly lend on this classification, at least not without losing some important nuance.

Furthermore, there is also the missing dimension of [technical strategy, organisational strategy, governance and political strategy] that could perhaps recombine to some degree.

Finally, in the spirit of "passing ideological Turing test" and "describing, not persuading" norms, it would be nice I think to include the criticism of the "conservative strategy" to the same level of fidelity that other metastrategies are criticised here, even if you or others discussed that in some other posts.

Comment by Roman Leventov on SociaLLM: proposal for a language model design for personalised apps, social science, and AI safety research · 2024-01-03T07:16:28.624Z · LW · GW

Announcement

I think SociaLLM has a good chance of getting OpenAI’s “Research into Agentic AI Systems” grant because it addresses both the challenges of the legibility of AI agent's behaviour by making the agent’s behaviour more “human-like” thanks to weight sharing and regularisation techniques/inductive biases described the post, as well as automatic monitoring: detection of duplicity or deception in AI agent's behaviour by comparing agent’s ToMs “in the eyes” of different other interlocutors, building on the work “Collective Intelligence in Human-AI Teams”.

I am looking for co-investigators for this (up to $100k, up to 8 months long) project with hands-on academic or practical experience in DL training (preferably), ML, Bayesian statistics, or NLP. The deadline for the grant application itself is the 20th of January, so I need to find a co-investigator by the 15th of January.

Another requirement for the co-investigator is that they preferably should be in academia, non-profit, or independent at the moment.

I plan to be hands-on during the project in data preparation (cleansing, generation by other LLMs, etc.) and training, too. However, I don’t have any prior experience with DL training, so if I apply for the project alone, this is a significant risk and a likely rejection.

If the project is successful, it could later be extended for further grants or turned into a startup.

If the project is not a good fit for you but you know someone who may be interested, I’d appreciate it a lot if you shared this with them or within your academic network!

Please reach out to me in DMs or at leventov.ru@gmail.com.

Comment by Roman Leventov on A hermeneutic net for agency · 2024-01-02T07:56:24.420Z · LW · GW

A lot of the examples of the concepts that you list already belong to established scientific fields: math, logic, probability, causal inference, ontology, semantics, physics, information theory, computer science, learning theory, and so on. These concepts don't need philosophical re-definition. Respecting the field boundaries, and the ways that fields are connected to each other via other fields (e.g., math and ontology to information theory/CS/learning theory via semantics) is also I think on net a good practice: it's better to focus attention on the fields that are actually most proto-scientific and philosophically confusing: intelligence, sentience, psychology, consciousness, agency, decision making, boundaries, safety, utility, value (axiology), and ethics[1].

Then, to make the overall idea solid, I think it's necessary to do a couple of extra things (you may already mention this in the post, but I semi-skimmed it and maybe missed these).

  • First, specify the concepts in this fuzzy proto-scientific area of intelligence, agency, and ethics not in terms of each other, but in terms of (or in a clearly specified connection with) those other scientific fields/ontologies that are already established, enumerated above. For example, a theory of agency should be compatible or connected with (or, specified in terms of) causal inference and learning theories. Theory of boundaries and ethics should be based on physics, information theory, semantics, and learning theory, among other things (cf. scale-free axiology and ethics).
  • Second, establish feedback loops that test these "proposed" theories of agency (psychology, ethics, decision-making, ethics) both in simulated environments (e.g., with LLM-based agents embodying these proposed theories acting in Minecraft- or Sims-like worlds) and (constrained) real life settings or environments. Note that the obligatory connection to physics, information theory, causal inference, and learning theory will ensure that these test themselves can be counted as scientific.

The good news are that now, there are sufficient (or almost sufficient) affordances to build AI agents that can embody sufficiently realistic and rich versions of these theories in realistic simulated environments as well as just the real life. And I think an actual R&D agenda proposal should be written about this and apply to a Superalignment grant.

There's an instinct to "ground" or "found" concepts. But there's no globally privileged direction of "more grounded" in the space of possible concepts. We have to settle for a reductholistic pluralism——or better, learn to think rightly, which will, as a side effect, make reductholism not seem like settling.

I disagree with the last sentence: "reductholism" should be the settling, as I argue in "For alignment, we should simultaneously use multiple theories of cognition and value". (Note that this view itself is based largely on quantum information theory: see "Information flow in context-dependent hierarchical Bayesian inference".)

 

  1. ^

    A counterargument could be made here that although logic, causal inference, ontology, semantics, physics, information theory, CS, learning theory, and so on are fairly established and all have SoTA, mature theories that look solid, these are probably not the final theories in all or many of these fields, and philosophical poking could highlight the problems with these theories, and perhaps this will actually be the key to "solving alignment". I agree that this is in principle possible chain of events, but it looks quite low expected impact to me from the "hermeneutic nets" perspective, so that this agenda is still better focused on the "core confusing" fields (intelligence, agency, ethics, etc.) and treat the established fields and the concepts therein "as given".

Comment by Roman Leventov on AGI will be made of heterogeneous components, Transformer and Selective SSM blocks will be among them · 2023-12-28T14:25:55.150Z · LW · GW

I agree with everything you said. Seems that we should distinguish between a sort of "cooperative" and "adversarial" safety approaches (cf. the comment above). I wrote the entire post as an extended reply to Marc Carauleanu upon his mixed feedback to my idea of adding "selective SSM blocks for theory of mind" to increase the Self-Other Overlap in AI architecture as a pathway to improve safety. Under the view that both Transformer and Selective SSM blocks will survive up until the AGI (if it is going to be created at all, of course), and even with the addition of your qualifications (that AutoML will try to stack these and other types of blocks in some quickly evolving ways), the approach seems solid to me, but only if we also make some basic assumptions about the good faith and cooperativeness of the AutoML / auto takeoff process. If we don't make such assumptions, of course, all bets are off, these "blocks for safety" could just be purged from the architecture.

Comment by Roman Leventov on AGI will be made of heterogeneous components, Transformer and Selective SSM blocks will be among them · 2023-12-28T14:16:41.896Z · LW · GW

I agree that training data governance is not robust to non-cooperative actors. But I think there is a much better chance to achieve a very broad industrial, academic, international, and legal consensus about it being a good way to jigsaw capabilities without sacrificing the raw reasoning ability, which the opponents of compute governance hold as purely counter-productive ("intelligence just makes things better"). That's why I titled my post "Open Agency model can solve the AI regulation dilemma" (emphasis on the last word).

This could even be seen not just as a "safety" measure, but as a truly good regularisation measure of the collective civilisational intelligence: to make intelligence more robust to distributional shifts and paradigm shifts, it's better to compartmentalise it and make communication between the compartments going through a relatively narrow, classical informational channel, namely human language or specific protocols rather than raw DNN activation dynamics.

Comment by Roman Leventov on Gaia Network: a practical, incremental pathway to Open Agency Architecture · 2023-12-27T20:21:55.757Z · LW · GW

BTW, this particular example sounds just like Numer.ai Signals, but Gaia Network is supposed to be more general and not to revolve around the stock market alone. E.g., the same nutritional data could be bought by food companies themselves, logistics companies, public health agencies, etc.

Comment by Roman Leventov on Gaia Network: a practical, incremental pathway to Open Agency Architecture · 2023-12-27T20:16:21.408Z · LW · GW

Thanks for suggestions.

An actual anecdote may look something like this: "We are a startup that creates nutrition assistant and family menu helper app. We collect anonymised data from the users and ensure differential privacy, yada-yada. We want to sell this data to hedge funds that trade food company stocks (so that we can offer the app for free for to our users), but we need to negotiate the terms of these agreements in an ad-hoc way with each hedge fund individually, and we don't have a principled way to come up with a fair price for the data. We would benefit from something like a 'platform' on which we can just publish the API spec of our data and then the platform (i.e., the Gaia Network) takes care of finding buyers for our data and paying us a fair price for it, etc."

Comment by Roman Leventov on AGI will be made of heterogeneous components, Transformer and Selective SSM blocks will be among them · 2023-12-27T20:11:28.150Z · LW · GW

The fact that hybridisation works better than pure architectures (architectures consisting of a single core type of block, we shall say), is exactly the point that Nathan Labenz makes in the podcast and I repeat in the beginning of the post.

(Ah, I actually forgot to repeat this point, apart from noting that Doyle predicted this in his architecture theory.)

Comment by Roman Leventov on The 'Neglected Approaches' Approach: AE Studio's Alignment Agenda · 2023-12-27T14:56:08.812Z · LW · GW

This conversation has prompted me to write "AGI will be made of heterogeneous components, Transformer and Selective SSM blocks will be among them".

Comment by Roman Leventov on On plans for a functional society · 2023-12-25T13:08:30.670Z · LW · GW

we're lacking all 4. We're lacking a coherent map of the polycrisis (if anyone wants to do and/or fund a version of aisafety.world for the polycrisis, I'm interested in contributing)

Joshua Williams created an initial version of a metacrisis map and I suggested to him a couple of days ago to make the development of such a resource more open, e.g., to turn it into a Github repository.

I think there's a ton of funding available in this space, specifically I think speculating on the markets informed by the kind of worldview that allows one to perceive the polycrisis has significant alpha. I think we can make much better predictions about the next 5-10 years than the market, and I don't think most of the market is even trying to make good predictions on those timescales.

Do you mean that it's possible to earn by betting long against the current market sentiment? I think this is wrong for multiple reasons, but perhaps most importantly, because the market specifically doesn't measure how well we are faring on a lot of components of polycrisis -- e.g., market would be great if all people are turned into addicted zombies. Secondly, people don't even try to make predictions in the stock market anymore -- its turned into a completely irrational valve of liquidity that is moved by Elon Musk's tweets, narratives, and memes more than by objective factors. 

Comment by Roman Leventov on On plans for a functional society · 2023-12-25T12:52:48.086Z · LW · GW

1.) Clearly state the problems that need to be worked on, and provide reasonable guidance as to where and how they might be worked on
2.) Notice what work is already being done on the problems, and who is doing it (avoid reinventing the wheel/not invented here syndrome; EA is especially guilty of this)
3.) Actively develop useful connections between 2.)
4.) Measure engagement (resource flows) and progress

I posted some parts of my current visions of 1) and 2) here and here. I think these, along with the Gaia Network design that we proposed recently (the Gaia Network is not "A Plan" in its entirety, but a significant portion of it), address @Vaniver's and @kave's points about realism and sociological/psychological viability.

The platform for generating the plan would need to be more-open-than-not, and should be fairly bleeding edge - incorporating prediction markets, consensus seeking (polis), eigenkarma etc

I think this is a mistake to import "democracy" at the vision level. Vision is essentially a very high-level plan, a creative engineering task. These are not decided by averaging opinions. "If you want to kill any idea in the world, get a committee working on it." Also, Deutsch was writing about this in "The Beginning of Infinity" in the chapter about democracy.

We should aggregate desiderata and preferences (see "Preference Aggregation as Bayesian Inference"), but not decisions (plans, engineering designs, visions). These should be created by a coherent creative entity. The same idea is evident in the design of Open Agency Architecture.

we're lacking meaningful 3rd party measurement

If I understand correctly what you are gesturing at here, I think that some high-level agents in the Gaia Network should become a trusted gauge for the "planetary health metrics" we care about.

Comment by Roman Leventov on Gaia Network: a practical, incremental pathway to Open Agency Architecture · 2023-12-24T14:43:27.679Z · LW · GW

Right now, if the Gaia Network already existed, but there were little models and agents on it, there would be no or little advantages (e.g., leveraging the tooling/infra built for the Gaia Network) in joining the network.

This is why I personally think that the bottom-up approach: building these apps and scaling them (thus building up QRFs) first is somewhat more promising path than the top-down approach, the ultimate version of which is the OAA itself, and the research agenda of building Gaia Network is a somewhat milder version, but still top-down-ish. That's why in the comment that I already linked to above, the implication is that these disparate apps/models/"agents" are built first completely independently (mostly as startups), without conforming to any shared protocol (like the Gaia protocol), and only once they grow large and sharing information across the domains becomes evidently valuable to these startups, then the conversation about a shared protocol will find more traction.

Then, why a shared protocol, still? Two reasons:

  • Practical: it will reduce transaction costs for all the models across the domains to start communicating to improve the predictions of each other. Without a shared protocol, this requires ad-hoc any prospective direction of information sharing. This is the practicality of any platforms, from the Internet itself to Airbnb to SWIFT (bank wires), and Gaia Network should be of this kind, too.
  • AI and catastrophic risk safety: to ensure some safety against rogue actors (AI or hybrid human-AI teams or whatever) through transparency and built-in mechanisms, we would want as much economic activity to be "on the network" as possible.
    • You may say that this is would be a tough political challenge to convince everybody to conform to the network in the name of some AI safety, but surely this would still be a smaller challenge than to abolish much of the current economic system altogether as (apparently) implied by the "vanilla" Davidad's OAA, and as we discuss throughout the article. In fact, this is one of the core points of the article.

Then, even though I advocate for a bottom-up approach above, there is still a room, and even a need for a parallel top-down activity (given the AGI timelines), so these two streams of activity meet each other somewhere in the middle. This is why we are debating all these blue-sky AI safety plans on LessWrong at all; this is why OAA was proposed, and this is why we are now proposing the Gaia Network.

Comment by Roman Leventov on Gaia Network: a practical, incremental pathway to Open Agency Architecture · 2023-12-24T13:03:04.052Z · LW · GW

One completely realistic example of an agent is given in the appendix (an agent that recommends actions to improve soil health or carbon sequestration). Some more examples are given in this comment:

  • An info agent that recommends me info resources (news, papers, posts, op-eds, books, videos) to consume, based on my current preferences and demands (and info from other others, such as those listed below, or this agent that predicts the personalised information value of the comments on the web)
    • Scaling to the group/coordination: optimise informational intake of a team, an organisation, or a family
  • Learning agent that recommends materials based on the previous learning trajectory and preferences, a-la liirn.space
    • Scaling to the group/coordination: coordinate learning experiences and lessons between individual learning agents based on who is on what learning level, availailiity, etc.
  • Financial agent that recommends spending based on my mid- and long-term goals
    • Equivalent of this for an organisation: "business development agent", recommends an org to optimise strategic investments based on the current market situation, goals of the company, other domain-specific models provided (i.e., in the limit, communicated by other Gaia agents responsible for these models), etc.
  • Investment agent recommends investment strategy based on my current financial situation, financial goals, and other investment goals (such as ESG)
    • Scaling to the group/coordination: optimise joint investment strategy for income and investment pools a-la pandopooling.com
  • Energy agent: decides when to accumulate energy and when to spend it based on the spot energy prices, weather forecast for renewables, and the current and future predicted demands for power
    • Scale this to microgrids, industrial/manufacturing sites, etc.
Comment by Roman Leventov on The 'Neglected Approaches' Approach: AE Studio's Alignment Agenda · 2023-12-23T04:42:22.563Z · LW · GW

I absolutely agree that the future TAI may look nothing like the current architectures. Cf. this tweet by Kenneth Stanley, with whom I agree 100%. At the same time, I think it's a methodological mistake to therefore conclude that we should only work on approaches and techniques that are applicable to any AI, in a black-box manner. It's like tying our hands behind our backs. We can and should affect the designs of future TAIs through our research, by demonstrating promise (or inherent limitations) of this or that alignment technique, so that these techniques get or lose traction and are included or excluded from the TAI design. So, we are not just making "assumptions" about the internals of the future TAIs; we are shaping these internals.

We can and should think about the proliferation risks[1] (i.e., the risks that some TAI will be created by downright rogue actors), but IMO most of that thinking should be on the governance, not technical side. We agree with Davidad here that a good technical AI safety plan should be accompanied with a good governance (including compute monitoring) plan.

  1. ^

    In our own plan (Gaia Network), we do this in the penultimate paragraph here.

Comment by Roman Leventov on On the future of language models · 2023-12-21T17:19:41.288Z · LW · GW

I think you tied yourself too much to the strict binary classification that you invented (finetuning/scaffolding). You overgeneralise and your classification blocks the truth more than clarifies things.

All the different things that can be done by LLMs: tool use, scaffolded reasoning aka LM agents, RAG, fine-tuning, semantic knowledge graph mining, reasoning with semantic knowledge graph, finetuning for following "virtue" (persona, character, role, style, etc.), finetuning for model checking, finetuning for heuristics for theorem proving, finetuning for generating causal models, (what else?), just don't easily fit into two simple categories with the properties that are consistent within the category.

But I don't understand the sense in which you think finetuning in this context has completely different properties.

In the summary (note: I actually didn't read the rest of the post, I've read only the summary), you write something that implies that finetuning is obscure or un-interpretable:

From a safety perspective, language model agents whose agency comes from scaffolding look greatly superior than ones whose agency comes from finetuning

  • Because you can get an extremely high degree of transparency by construction

But this totally doesn't apply to these other variants of finetuning that I mentioned. If the LLM creates is a heuristic engine to generate mathematical proofs that are later verified with Lean, it just stops to make any sense to discuss how interpretable or transparent these theorem-proving or model-checking LLM-based heuristic engine.

Comment by Roman Leventov on On the future of language models · 2023-12-20T22:36:18.792Z · LW · GW

Also, I would say, retrieval-augmented generation (RAG) is not just a mundane way to industrialise language model, but an important concept whose properties should be studied separately from scaffolding or fine-tuning or other techniques that I listed in the comment above.

Comment by Roman Leventov on The 'Neglected Approaches' Approach: AE Studio's Alignment Agenda · 2023-12-20T22:26:58.832Z · LW · GW

On (1), cf. this report: "The current portfolio of work on AI risk is over-indexed on work which treats “transformative AI” as a black box and tries to plan around that. I think that we can and should be peering inside that box (and this may involve plans targeted at more specific risks)."

On (2), I'm surprised to read this from you, since you suggested to engineer Self-Other Overlap into LLMs in your AI Safety Camp proposal, if I understood and remember correctly. Do you actually see a line (or a way) of increasing the overlap without furthering ToM and therefore "social capabilities"? (Which ties back to "almost all applied/empirical AI safety work is simultaneously capabilities work".)

Comment by Roman Leventov on On the future of language models · 2023-12-20T22:16:35.266Z · LW · GW

Notable techniques for getting value out of language models that are not mentioned:

Comment by Roman Leventov on SociaLLM: proposal for a language model design for personalised apps, social science, and AI safety research · 2023-12-20T05:54:25.446Z · LW · GW

In another thread, Marc Carauleanu wrote:

The main worry that I have with regards to your approach is how competitive SociaLLM would be with regards to SOTA foundation models given both (1) the different architecture you plan to use, and (2) practical constraints on collecting the requisite structured data. While it is certainly interesting that your architecture lends itself nicely to inducing self-other overlap, if it is not likely to be competitive at the frontier, then the methods uniquely designed to induce self-other overlap on SociaLLM are likely to not scale/transfer well to frontier models that do pose existential risks. (Proactively ensuring transferability is the reason we focus on an additional training objective and make minimal assumptions about the architecture in the self-other overlap agenda.)

I agree with worries (1) and (2). I think there is a way to de-risk this.

The block hierarchy that is responsible for tracking the local context consists of classic Transformer blocks. Only the user's own history tracking really needs to be an SSM hierarchy because it quickly surpasses the scalability limits of self-attention (also, interlocutor's tracking blocks in private 1-1 or small group chats that can be arbitrarily long, but there is probably no such data available for training). In the public data (such as forums, public chats rooms, Diplomacy and other text games), the interlocutor's history traces would 99% of the time easily fit into 100k symbols, but for the symmetry with user's own state (same weights!) and for having the same representation structure it should mirror the user's own SSM blocks, of course.

With such an approach, the SSM hierarchies could start very small, with only a few blocks or even just a single SSM block (i.e., two blocks in total: one for user's own and one for interlocutor's state), and attach to the middle of the Transformer hierarchy to select from it. However, I think this approach couldn't be just slapped on the tre-trained LLama or another large Transformer LLM model. I suspect the transformer should be co-trained with the SSM blocks to induce the Transformer to make the corresponding representations useful for the SSM blocks. "Pretraining Language Models with Human Preferences" is my intuition pump here.

Regarding the sufficiency and quality of training data, the Transformer hierarchy itself could still be trained on arbitrary texts, as well as the current LLMs. And we can adjust the size of the SSM hierarchies to the amounts of high-quality dialogue and forum data that we are able to obtain. I think this a no-brainer that this design would improve the frontier quality in LLM apps that value personalisation and attunement to the user's current state (psychological, emotional, levels of knowledge, etc.), relative to whatever "base" Transformer model we would take (such as Llama, or any other).

One additional worry is that many of the research benefits of SociaLLM may not be out of reach for current foundation models, and so it is unclear if investing in the unique data and architecture setup is worth it in comparison to the counterfactual of just scaling up current methods.

With this I disagree, I think it's critical for the user state tracking to be energy-based. I don't think there are ways to recapitulate this with auto-regressive Transformer language models (cf. any LeCun's presentation from the last year). There are potential ways to recapitulate this with other language modelling architectures (non-Transformer and non-SSM), but they currently don't hold any stronger promise than SSM, so I don't see any reasons to pick them.

Comment by Roman Leventov on The 'Neglected Approaches' Approach: AE Studio's Alignment Agenda · 2023-12-20T05:52:34.860Z · LW · GW

Thanks for feedback. I agree with worries (1) and (2). I think there is a way to de-risk this.

The block hierarchy that is responsible for tracking the local context consists of classic Transformer blocks. Only the user's own history tracking really needs to be an SSM hierarchy because it quickly surpasses the scalability limits of self-attention (also, interlocutor's tracking blocks on private 1-1 chats that can also be arbitrarily long, but there is probably no such data available for training). On the public data (such as forums, public chats room logs, Diplomacy and other text game logs) the interlocutor's history traces will 99% of the time would easily be less than 100k symbols, but for the symmetry with user's own state (same weights!) and for having the same representation structure it should mirror the user's own SSM blocks, of course.

With such an approach, the SSM hierarchies could start very small, with only a few blocks or even just a single SSM block (i.e., two blocks in total: one for user's own and one for interlocutor's state), and attach to the middle of the Transformer hierarchy to select from it. However, I think this approach couldn't be just slapped on the tre-trained LLama or another large Transformer LLM model. I suspect the transformer should be co-trained with the SSM blocks to induce the Transformer to make the corresponding representations useful for the SSM blocks. "Pretraining Language Models with Human Preferences" is my intuition pump here.

Regarding the sufficiency and quality of training data, the Transformer hierarchy itself could still be trained on arbitrary texts, as well as the current LLMs. And we can adjust the size of the SSM hierarchies to the amounts of high-quality dialogue and forum data that we are able to obtain. I think this a no-brainer that this design would improve the frontier quality in LLM apps that value personalisation and attunement to the user's current state (psychological, emotional, levels of knowledge, etc.), relative to whatever "base" Transformer model we would take (such as Llama, or any other).

One additional worry is that many of the research benefits of SociaLLM may not be out of reach for current foundation models

With this I disagree, I think it's critical for the user state tracking to be energy-based. I don't think there are ways to recapitulate this with auto-regressive Transformer language models (cf. any LeCun's presentation from the last year). There are potential ways to recapitulate this with other language modelling architectures (non-Transformer and non-SSM), but they currently don't hold any stronger promise than SSM, so I don't see any reasons to pick them.

Comment by Roman Leventov on SociaLLM: proposal for a language model design for personalised apps, social science, and AI safety research · 2023-12-19T18:19:46.518Z · LW · GW

Clarity check: this model has not been trained yet at this time, correct?

Yes, I've changed the title of the post and added a footnote on "is a foundation model".

Comment by Roman Leventov on The 'Neglected Approaches' Approach: AE Studio's Alignment Agenda · 2023-12-19T17:41:29.513Z · LW · GW

More generally, we strongly agree that building out BCI is like a tightrope walk. Our original theory of change explicitly focuses on this: in expectation, BCI is not going to be built safely by giant tech companies of the world, largely given short-term profit-related incentives—which is why we want to build it ourselves as a bootstrapped company whose revenue has come from things other than BCI. Accordingly, we can focus on walking this BCI developmental tightrope safely and for the benefit of humanity without worrying if we profit from this work.

I can push back on this somewhat by noting that most risks from BCI may lay outside of the scope of control of any company that builds it and "plugs people in", but rather in the wider economy and social ecosystem. The only thing that may matter is the bandwidth and the noisiness of information channel between the brain and the digital sphere, and it seems agnostic to whether a profit-maximising, risk-ambivalent, or a risk-conscious company is building the BCI.

Comment by Roman Leventov on The 'Neglected Approaches' Approach: AE Studio's Alignment Agenda · 2023-12-19T17:33:46.141Z · LW · GW

We think we have some potentially promising hypotheses. But because we know you do, too, we are actively soliciting input from the alignment community. We will be more formally pursuing this initiative in the near future, awarding some small prizes to the most promising expert-reviewed suggestions. Please submit any[3] agenda idea that you think is both plausible and neglected (even if you don’t have the bandwidth right now to pursue the idea! This is a contest for ideas, not for implementation). 

This is related to what @Kabir Kumar is doing with ai-plans.com, who just hosted a critique-a-thon a couple of days ago. So maybe you will find his platform useful or find other ways to collaborate.

Comment by Roman Leventov on The 'Neglected Approaches' Approach: AE Studio's Alignment Agenda · 2023-12-19T16:53:00.369Z · LW · GW

Reverse-engineering prosociality

Here's my idea on this topic: "SociaLLM: a language model design for personalised apps, social science, and AI safety research". Though it's more about engineering pro-sociality (including Self-Other Overlap) using architecture and inductive biases directly than reverse-engineering prosociality.

Comment by Roman Leventov on The 'Neglected Approaches' Approach: AE Studio's Alignment Agenda · 2023-12-19T09:30:27.340Z · LW · GW

You choose phrases like "help to solve alignment", in general mostly mention "alignment" and not "safety" (except in the sections where you discuss indirect agendas, such as "7. Facilitate the development of explicitly-safety-focused businesses"), and write "if/when we live in a world with superintelligent AI whose behavior is—likely by definition—outside our direct control" (implying that 'control' of AI would be desirable?).

Is this a deliberate choice of narrowing your direct, object-level technical work to alignment (because you think this where the predispositions of your team are?), or a disagreement with more systemic views on "what we should work on to reduce the AI risks", such as:

(1) Davidad's "AI Neorealism: a threat model & success criterion for existential safety":

For me the core question of existential safety is this: “Under these conditions, what would be the best strategy for building an AI system that helps us ethically end the acute risk period without creating its own catastrophic risks that would be worse than the status quo?”

It is not, for example, "how can we build an AI that is aligned with human values, including all that is good and beautiful?" or "how can we build an AI that optimises the world for whatever the operators actually specified?" Those could be useful subproblems, but they are not the top-level problem about AI risk (and, in my opinion, given current timelines and a quasi-worst-case assumption, they are probably not on the critical path at all).

(2) Leventov's "Beyond alignment theories":

Note that in this post, only a relatively narrow aspect of the multi-disciplinary view on AI safety is considered, namely the aspect of poly-theoretical approach to the technical alignment of humans to AIs. This mainly speaks to theories of cognition (intelligence, alignment) and ethics. But on a larger view, there are more theories and approaches that should be deployed in order to engineer our civilisational intelligence such that it “goes well”. These theories are not necessarily quite about “alignment”. Examples are control theory (we may be “aligned” with AIs but collectively “zombified” by powerful memetic viruses and walk towards a civilisational cliff), game theory (we may have good theories of alignment but our governance systems cannot deal with multi-polar traps so we cannot deploy these theories effectively), information security considerations, mechanistic anomaly detection and deep deceptiveness, etc. All these perspectives further demonstrate that no single compact theory can “save” us.

(3) Drexler's "Open Agency Model";

(4) Hendrycks' "Pragmatic AI Safety";

(5) Critch's "What Multipolar Failure Looks Like, and Robust Agent-Agnostic Processes (RAAPs)".

Comment by Roman Leventov on The 'Neglected Approaches' Approach: AE Studio's Alignment Agenda · 2023-12-19T09:07:25.738Z · LW · GW

Post-response: Assessment of AI safety agendas: think about the downside risk

You evidently follow a variant of 80000hours' framework for comparing (solving) particular problems in terms of expected impact: Neglectedness x Scale (potential upside) x Solvability.

I think for assessing AI safety ideas, agendas, and problems to solve, we should augment the assessment with another factor: the potential for a Waluigi turn, or more prosaically, the uncertainty about the sign of the impact (scale) and, therefore, the risks of solving the given problem or advancing far on the given agenda.

This reminds me of Taleb's mantra that to survive, we need to make many bets, but also limit the downside potential of each bet, i.e., the "ruin potential". See "The Logic of Risk Taking".

Of the approaches that you listed, some sound risky to me in this respect. Particularly, "4. ‘Reinforcement Learning from Neural Feedback’ (RLNF)" -- sounds like a direct invitation for wireheading to me. More generally, scaling BCI in any form and not falling into a dystopia at some stage is akin to walking a tightrope (at least at the current stage of civilisational maturity, I would say) This speaks to agendas #2 and #3 on your list.

There are also similar qualms about AI interpretability: there are at least four posts on LW warning of the potential risks of interpretability:

This speaks to the agenda "9. Neuroscience x mechanistic interpretability" on your list.

Related earlier posts

Comment by Roman Leventov on OpenAI: Preparedness framework · 2023-12-18T20:54:36.157Z · LW · GW

Everyone on Twitter has criticised the label "Responsible Scaling Policy", but the author of this post seems not to respect what seems like a gentle attempt by OpenAI to move past this label.

If we were a bit more serious about this we would perhaps immediately rename the tag "Responsible Scaling Policies" on LessWrong into "Preparedness Frameworks" with a note on the tag page "Anthropic calls their PF 'RSP', but we think this is a bad label".

Comment by Roman Leventov on QNR prospects are important for AI alignment research · 2023-12-15T12:28:23.126Z · LW · GW

Looks like OpenCog Hyperon (Goertzel et al., 2023) is similar to the QNR paradigm for learning and intelligence. Some of the ideas about not cramming intelligence into a unitary "agent" that Eric expressed in these comments and later posts are also taken up by Goertzel.

I didn't find references to Goertzel in the original QNR report from 2021; I wonder if there are references in the reverse direction and what Eric thinks of OpenCog Hyperon.

Comment by Roman Leventov on Some for-profit AI alignment org ideas · 2023-12-15T07:39:54.072Z · LW · GW

The strategy that I described above is also highly aligned with Earth Systems Predictability vision ("a roadmap for a planetary nervous system") by Trillium Tech, which is also a quasi-for-profit org.

Comment by Roman Leventov on Some for-profit AI alignment org ideas · 2023-12-14T20:37:44.164Z · LW · GW

An important factor that should go into this calculation (not just for you or your org but for anyone) is the following: given that AI safety is currently quite severely funding-constrained (just look at the examples of projects that are not getting funded right now), I think people should assess their own scientific calibre relative to other people in technical AI safety who will seek for funding.

It's not a black-and-white choice between doing technical AI safety research, or AI governance/policy/advocacy, or not contributing to reducing the AI risk at all. The relevant 80000 hours page perpetuates this view and therefore is not serving the cause well in this regard.

For people with more engineering, product, and business dispositions I believe there are many ways to help some to reduce the AI risk, many of which I referred to in other comments on this page, and here. And we should do a better job at laying out these paths for people, a-la "Work on Climate for AI risks".

Comment by Roman Leventov on Some for-profit AI alignment org ideas · 2023-12-14T19:57:11.806Z · LW · GW

If you actually belief that the LM paradigm towards ubiquitous agency in the economy and society is flawed (as I do), pursuing alternative AI paradigms, even thinking your chances of global success are small, would save you some "dignity points". And this is the stance that Verses.ai, Digital Gaia, Gaia Consortium, and Bioform Labs,  are taking, advocating for and developing the paradigm of Bayesian agents. Though, the key arguments for this paradigm (vs. language modelling) is not interpretability or "local controllability/robustness", but rather losing out information necessary for reliable cooperation, credit assignment, and "global" controllability/robustness[1] through "mixing" of Bayesian reference frames into a single bundle (LLM). This perhaps sounds cryptic, sorry. This deserves a much longer discussion and hopefully we will publish something about this soon.

Just to develop on this in the context of this post (how can we make something for-profit to advance AI safety?), I want to highlight a direction of thought that I didn't notice in your post: creating economic value by developing the mechanisms for multi-agent coordination and cooperation. This is what falls under "understanding cooperation" and "understanding agency" categories in this agenda list, although I'd replace "understanding" with "building" there.

Solving practical problems is a great way to keep the research grounded to reality, but also battle-testing it.

There are plenty economically valuable and neglected opportunities for improving coordination:

  • Energy and transportation: Enterprises to plan their production and logistics to optimise for energy (and grid stability, due to variable generation) and efficiency of the use of logistic systems. (Verses is tackling logistics).
  • Agriculture: Farmers coordinate on who plants what (and what fertilisers and chemicals do they apply, how much water do they use, etc.) to optimise food production on the regional, national, and international levels from the perspectives of food security (robustness to extreme weather events and pests) and demand. Digital Gaia is tackling this.
  • Finance: Collectives of people (maybe extended families) pool their resources to unlock them certain investment and financial instruments (like private debt) and reducing the associated management overhead. Coordination is required to negotiate between their diverse investment goals, appetite for risk, ethical and other investment constraints, and to create the best "balanced" investment strategy.
  • Networking app that recommends people to meet that optimises cumulative results without overloading certain people "in high demand" like billionares.
  • Attention economy: People to optimise attention to comments, i.e., effectively, collaborative rating system that keeps the amount of "work" on everyone manageable: this is what I'm alluding to here.
  • Team/org learning: "info agents" that I mentioned in the other comment could be coordinated within teams to optimise team learning: to avoid both the situations that "everyone reads the same stuff" and "everything reads their own stuff, no intersection". My understanding is that something like that was implemented at Google/DeepMind.
  • Medical treatment research: it's well known that the current paradigm for testing drugs and treatments is pseudo-scientific: measuring the average effect of the drug on the population doesn't predict that the drug will help a particular patient. Judea Pearl is a famous champion of this claim, which I agree with. Then, clinical trials could be coordinated between participants so that they actually infer causal models of the treatment effect on the individual rather than "frequentist public average", requiring the minimum number of people and the minimum duration of the trial.
  • Nutrition: Assuming that personal causal models of people's responses to this or that food are built, coordinate family menus to optimise for everyone's health while not requiring too much cooking and maximising meal sharing for social reasons.

Apart from just creating better mechanisms and algorithms for coordination from building businesses in all these diverse verticals (and hoping that these coordination algorithms will be transferable to some abstract "AI coordination" or "human-AI coordination"), there is a macro-strategy of sharing information between all these domain specific models, thus creating a loosely coupled, multi-way mega-model for the world as a whole. Our bet at Gaia Consortium is that this "world model merge" is very important in ameliorating multi-polar risks that @Andrew_Critch written about here and also Dan Hendrycks generally refers to as "AI Race" risks.

Comment by Roman Leventov on Some for-profit AI alignment org ideas · 2023-12-14T19:04:10.880Z · LW · GW

One way to advance the state of AI safety research is to build a company focused on automating work (such as a recruiter phone screen or talk therapy) and building an organization with safety at its core like Anthropic. This only works if it’s critical to do safety research to advance this organization’s capabilities. For example, automating a recruiter phone screen would likely require a high degree of explainability / interpretability (especially with respect to bias) in automating a decision, and automating talk therapy would require scalable oversight research to make sure the therapist is reaching the right conclusions.

I think Inflection is sort of like this ("talk therapy" and "creating best friend and companion" are very similar things). And Mustafa Suleyman seems to me a safety-conscious person.

Comment by Roman Leventov on Some for-profit AI alignment org ideas · 2023-12-14T18:54:48.432Z · LW · GW

Cybersecurity approaches

I think you are missing a few more important directions that you would call "security approaches", and I call "digital trust infrastructure": "decentralised identity, secure communication (see Layers 1 and 2 in Trust Over IP Stack), proof-of-humanness, proof of AI (such as, a proof that such and such artifact is created with such and such agent, e.g., provided by OpenAI -- watermarking failed, so need new robust solutions with zero-knowledge proofs)."

Stretching this even further, reputation and "Web of Trust" systems that we discuss with @mako yass in this thread are also probably important for creating the "stable equilibrium" of the civilisation on which AGI can land, and there are business opportunities there, such as combating spam on media platforms.

Further still from "AI safety" and towards differential technology development (newly branded as d/acc by Vitalik Buterin) and trying to create the aforementioned "stable equilibrium", we can talk about what Jim Rutt keeps calling a trillion-dollar opportunity, namely "info agents" that manage people's information intake. My Proposal for improving the global online discourse through personalised comment ordering on all websites is related to that, too.