Posts

The AI Revolution in Biology 2024-05-26T09:30:37.997Z
The two-tiered society 2024-05-13T07:53:25.438Z
From Conceptual Spaces to Quantum Concepts: Formalising and Learning Structured Conceptual Models 2024-02-06T10:18:40.420Z
AI alignment as a translation problem 2024-02-05T14:14:15.060Z
Workshop (hackathon, residence program, etc.) about for-profit AI Safety projects? 2024-01-26T09:49:30.836Z
Institutional economics through the lens of scale-free regulative development, morphogenesis, and cognitive science 2024-01-23T19:42:31.739Z
Gaia Network: An Illustrated Primer 2024-01-18T18:23:25.295Z
Worrisome misunderstanding of the core issues with AI transition 2024-01-18T10:05:30.088Z
AGI will be made of heterogeneous components, Transformer and Selective SSM blocks will be among them 2023-12-27T14:51:37.713Z
Gaia Network: a practical, incremental pathway to Open Agency Architecture 2023-12-20T17:11:43.843Z
SociaLLM: proposal for a language model design for personalised apps, social science, and AI safety research 2023-12-19T16:49:51.966Z
Assessment of AI safety agendas: think about the downside risk 2023-12-19T09:00:48.278Z
Refinement of Active Inference agency ontology 2023-12-15T09:31:21.514Z
Proposal for improving the global online discourse through personalised comment ordering on all websites 2023-12-06T18:51:37.645Z
Open Agency model can solve the AI regulation dilemma 2023-11-08T20:00:56.395Z
Any research in "probe-tuning" of LLMs? 2023-08-15T21:01:32.838Z
AI romantic partners will harm society if they go unregulated 2023-08-01T09:32:13.417Z
Philosophical Cyborg (Part 1) 2023-06-14T16:20:40.317Z
An LLM-based “exemplary actor” 2023-05-29T11:12:50.762Z
Aligning an H-JEPA agent via training on the outputs of an LLM-based "exemplary actor" 2023-05-29T11:08:36.289Z
AI interpretability could be harmful? 2023-05-10T20:43:04.042Z
H-JEPA might be technically alignable in a modified form 2023-05-08T23:04:20.951Z
Annotated reply to Bengio's "AI Scientists: Safe and Useful AI?" 2023-05-08T21:26:11.374Z
For alignment, we should simultaneously use multiple theories of cognition and value 2023-04-24T10:37:14.757Z
An open letter to SERI MATS program organisers 2023-04-20T16:34:10.041Z
Scientism vs. people 2023-04-18T17:28:29.406Z
Goal alignment without alignment on epistemology, ethics, and science is futile 2023-04-07T08:22:24.647Z
Yoshua Bengio: "Slowing down development of AI systems passing the Turing test" 2023-04-06T03:31:39.120Z
Emergent Analogical Reasoning in Large Language Models 2023-03-22T05:18:50.548Z
Will people be motivated to learn difficult disciplines and skills without economic incentive? 2023-03-20T09:26:19.996Z
A reply to Byrnes on the Free Energy Principle 2023-03-03T13:03:48.990Z
Joscha Bach on Synthetic Intelligence [annotated] 2023-03-02T11:02:09.009Z
Powerful mesa-optimisation is already here 2023-02-17T04:59:59.794Z
The Linguistic Blind Spot of Value-Aligned Agency, Natural and Artificial 2023-02-14T06:57:58.036Z
Morphological intelligence, superhuman empathy, and ethical arbitration 2023-02-13T10:25:17.267Z
A multi-disciplinary view on AI safety research 2023-02-08T16:50:31.894Z
Temporally Layered Architecture for Adaptive, Distributed and Continuous Control 2023-02-02T06:29:21.137Z
Has private AGI research made independent safety research ineffective already? What should we do about this? 2023-01-23T07:36:48.124Z
Critique of some recent philosophy of LLMs’ minds 2023-01-20T12:53:38.477Z
Reward is not Necessary: How to Create a Compositional Self-Preserving Agent for Life-Long Learning 2023-01-12T16:43:42.357Z
AI psychology should ground the theories of AI consciousness and inform human-AI ethical interaction design 2023-01-08T06:37:54.090Z
How evolutionary lineages of LLMs can plan their own future and act on these plans 2022-12-25T18:11:18.754Z
Properties of current AIs and some predictions of the evolution of AI from the perspective of scale-free theories of agency and regulative development 2022-12-20T17:13:00.669Z
The two conceptions of Active Inference: an intelligence architecture and a theory of agency 2022-11-16T09:30:47.484Z
What is our current best infohazard policy for AGI (safety) research? 2022-11-15T22:33:34.768Z
The circular problem of epistemic irresponsibility 2022-10-31T17:23:50.719Z
The problem with the media presentation of “believing in AI” 2022-09-14T21:05:10.234Z
Roman Leventov's Shortform 2022-08-19T21:01:28.692Z
Are language models close to the superhuman level in philosophy? 2022-08-19T04:43:07.504Z
AGI-level reasoner will appear sooner than an agent; what the humanity will do with this reasoner is critical 2022-07-30T20:56:54.532Z

Comments

Comment by Roman Leventov on There Should Be More Alignment-Driven Startups · 2024-06-01T15:47:53.008Z · LW · GW

I think the model of commercial R&D lab would often suit alignment work better than a "classical" startup company. Conjecture and AE Studio come to mind. Answer.AI, founded by Jeremy Howard (of Fast.ai and Kaggle) and Eric Ries (Lean Startup) elaborates on this business and organisational model here: https://www.answer.ai/posts/2023-12-12-launch.html.

Comment by Roman Leventov on The two-tiered society · 2024-05-14T05:50:54.298Z · LW · GW

But I should add, I agree that 1-3 poses challenging political and coordination problems. Nobody assumes it will be easy, including Acemoglu. It's just another one in the row of hard political challenges posed by AI, along with the questions of "aligned with whom?", considering/accounting for people's voice past dysfunctional governments and political elites in general, etc.

Comment by Roman Leventov on The two-tiered society · 2024-05-14T05:41:33.009Z · LW · GW

Separately, I at least spontaneously wonder: How would one even want to go about differentiating what is the 'bad automation' to be discouraged, from legit automation without which no modern economy could competitively run anyway? For a random example, say if Excel wouldn't yet exist (or, for its next update..), we'd have to say: Sorry, cannot do such software, as any given spreadsheet has the risk of removing thousands of hours of work...?! Or at least: Please, Excel, ask the human to manually confirm each cell's calculation...?? So I don't know how we'd in practice enforce non-automation. Just 'it uses a large LLM' feels weirdly arbitrary condition - though, ok, I could see how, due to a lack of alternatives, one might use something like that as an ad-hoc criterion, with all the problems it brings. But again, I think points 1. & 2. mean this is unrealistic or unsuccessful anyway.

Clearly, specific rule-based regulation is a dumb strategy. Acemoglu's suggestions: tax incentives to keep employment and "labour voice" to let people decide in the context of specific company and job how they want to work with AI. I like this self-governing strategy. Basically, the idea is that people will want to keep influencing things and will resist "job bullshittification" done to them, if they have the political power ("labour voice"). But they should also have alternative choice of technology and work arrangement/method that doesn't turn their work into rubber-stamping bullshit, but also alleviates the burden ("machine usefulness"). Because if they only have the choice between rubber-stamping bullshit job and burdensome job without AI, they may choose rubber-stamping.

Comment by Roman Leventov on The two-tiered society · 2024-05-14T05:33:07.862Z · LW · GW

If you'd really be able to coordinate globally to enable 1. or 2. globally - extremely unlikely in the current environment and given the huge incentives for individual countries to remain weak in enforcement - then it seems you might as well try to impose directly the economic first best solution w.r.t. robots vs. labor: high global tax rates and redistribution.

If anything, this problem seems more pernicious wrt. climate change mitigation and environmental damage: it's much more distributed, not only in US and China, but Russia and India are also big emitters, big leverage in Brazil, Congo, and Indonesia with their forests, overfishing and ocean pollution everywhere, etc.

With AI, it's basically the question of regulating US and UK companies: EU is always eager to over-regulate relative to the US, and China is already successfully and closely regulating their AI for a variety of reasons (which Acemoglu points out). The big problem of the Chinese economy is weak internal demand, and automating jobs and therefore increasing inequality and decreasing the local purchasing power is the last thing that China wants.

Comment by Roman Leventov on The two-tiered society · 2024-05-13T17:20:58.158Z · LW · GW

What levels of automation does the AI provide and at what rate is what he suggests to influence directly (specifically, slow down), through economic and political measures. So it's not fair to list that as an assumption.

Comment by Roman Leventov on The two-tiered society · 2024-05-13T17:09:37.448Z · LW · GW

It would depend on exact details, but if a machine can do something as well or better than a human, then the machine should do it.

It's a question of how to design work. Machine can cultivate better than a human a monoculture mega-farm, but not a small permaculture garden (at least, yet). Is a monoculture mega-farm more "effective"? Maybe, if we take the pre-AI opportunity cost of human labour, but also maybe not with the post-AI opportunity cost of human labour. And this is before factoring in the "economic value" of better psychological and physical health of people who work on small farms vs. those who eat processed food on their couches that is done from the crops grown on monoculture mega-farms, and do nothing. 

As I understand, Acemoglu rougly suggests to look for ways to apply this logic in other domain of economy, including the knowledge economy. Yes, it's not guaranteed that such arrangements will stay economical for a long time (but it's also not beyond my imagination, especially if we factor in the economic value of physical and psychological health), but it may set the economy and the society on a different trajectory with higher chances of eventualities that we would consider "not doom".

What does "foster labour voice" even mean?

Unions 2.0, or something like holacracy?

Especially in companies where everything is automated.

Not yet. Clearly, what he suggests could only remain effective for a limited time.

You can give more power to current employees of current companies, but soon there will be new startups with zero employees (or where, for tax reasons, owners will formally employ their friends or family members).

Not that soon at all, if we speak about the real economy. In IT sector, I suspect that Big Techs will win big in the AI race because only they have deep enough pockets (you already see Inflection AI quasi-acquired by MS, Stability essentially bust, etc.). And Big Techs still have huge workforces and it won't be just Nadella or just Pichai anytime soon. Many other knowledge sectors (banks, law) are regulated and also won't shed employees that fast.

Human-complementary AI technologies again sounds like a bullshit job, only mostly did by a machine, where a human is involved somewhere in the loop, but the machine could still do his part better, too.

In my gardening example, a human may wear AI goggles that tell them which plants or animal species do their see or what disease a plant has.

Tax on media platforms -- solves a completely different problem. Yes, it is important to care about public mental health. But that is separate from the problem of technological unemployment. (You could have technological unemployment even in the universe where all social media are banned.)

Tax on media platforms is just a concrete example of how "reforming business models" could be done in practice, maybe not the best one (but it's not my example). I will carry on with my gardening example and suggest "tax on fertiliser": make it so huge that megafarms (which require a lot of fertiliser) become less economical than permaculture gardens. Because without such a push, permaculture gardens won't magically materialise. Acemoglu underscores this point multiple times: it's not a matter of pure technological invention and application of it in a laissez-faire market to switch to a different socioeconomic trajectory. Inventing AI goggles for gardening (or any other technology which makes permaculture gardening arbitrarily convenient) won't make the economy to switch from monoculture mega-farms without an extra push.

Perhaps, Acemoglu also has something in his mind about attention/creator economy and the automation that may happen to them (AI influencers can replace human influencers) when he talks about "digital ad tax", but I don't see it.

Comment by Roman Leventov on On attunement · 2024-03-30T13:08:11.454Z · LW · GW

John Vervaeke calls attunement "relevance realization".

Comment by Roman Leventov on Modern Transformers are AGI, and Human-Level · 2024-03-27T03:16:32.926Z · LW · GW

Cf. DeepMind's "Levels of AGI" paper (https://arxiv.org/abs/2311.02462), calling modern transformers "emerging AGI" there, but also defining "expert", "virtuoso", and "superhuman" AGI.

Comment by Roman Leventov on AI Alignment Metastrategy · 2024-03-24T13:24:12.463Z · LW · GW

Humane/acc, https://twitter.com/AndrewCritchPhD

Comment by Roman Leventov on Value learning in the absence of ground truth · 2024-02-05T22:11:41.339Z · LW · GW

Well, yes, it also includes learning weak agent's models more generally, not just the "values". But I think the point stands. It's elaborated better in the linked post. As AIs will receive most of the same information that humans receive through always-on wearable sensors, there won't be much to learn for AIs from humans. Rather, it's humans that will need to do their homework, to increase the quality of their value judgements.

Comment by Roman Leventov on Value learning in the absence of ground truth · 2024-02-05T21:30:09.977Z · LW · GW

I agree with the core problem statement and most assumptions of the Pursuit of Happiness/Conventions Approach, but suggest a different solution: https://www.lesswrong.com/posts/rZWNxrzuHyKK2pE65/ai-alignment-as-a-translation-problem

I agree with OpenAI folks that generalisation is the key concept for understanding alignment process. But I think that with their weak-to-strong generalisation agenda, they (as well as almost everyone else) apply it I'm the reverse direction: learning values of weak agents (humans) doesn't make sense. Rather, weak agents should learn the causal models that strong agents employ to be able to express an informed value judgement. This is the way to circumvent the "absence of the ground truth for values" problem: instead, agent try to generalise their respective world models so that they sufficiently overlap, and then choose actions that seem net beneficial to both sides, without knowing how this value judgement way made by the other side.

In order to be able to generalise to shared world models with AIs, we must also engineer AIs to have human inductive biases from the beginning. Otherwise, this won't be feasible. This observation makes "brain-like AGI" one of the most important alignment agendas in my view.

Comment by Roman Leventov on AI alignment as a translation problem · 2024-02-05T18:44:25.611Z · LW · GW

If I understand correctly, by "discreteness" you mean that it simply says that one agent can know neither the meaning of symbols used by another agent nor the "degree" of grokking the meaning. Just cannot say anything.

This is correct, but the underlying reason why this is correct is the same as why solipsism or the simulation hypothesis cannot be disproven (or proven!).

So yeah, I think there is no tangible relationship to the alignment problem, except that it corroborates that we couldn't have 100% (literally, probability=1) certainty of alignment or safety of whatever we create, but it was obvious even without this philosophical argument.

So, I removed that paragraph about Quine's argument from the post.

Comment by Roman Leventov on Making every researcher seek grants is a broken model · 2024-01-27T15:55:25.019Z · LW · GW

That also was, naturally, the model in the Soviet Union, with orgs called "scientific research institutes". https://www.jstor.org/stable/284836

Comment by Roman Leventov on Workshop (hackathon, residence program, etc.) about for-profit AI Safety projects? · 2024-01-26T23:24:54.899Z · LW · GW

See a discussion of this point here with Marius Hobbhahn and others.

Comment by Roman Leventov on This might be the last AI Safety Camp · 2024-01-26T09:54:24.493Z · LW · GW

This post has led me to this idea:  Workshop (hackathon, residence program, etc.) about for-profit AI Safety projects?

Comment by Roman Leventov on Gaia Network: An Illustrated Primer · 2024-01-23T20:18:01.429Z · LW · GW

Collusion detection and prevention and trust modelling don't trivially follow from the basic architecture of the system described on the level of this article. Some specific mechanisms should be implemented in the Protocol to have collusion detection and trust modelling. And we don't have these mechanisms actually developed yet, but we think that they should be doable (though this is still a research bet, not a 100% certainty) because the Gaia Network directly embodies (or is amenable to) all six general principles for anti-collusion mechanism design (agency architecture) proposed by Eric Drexler (and these principles themselves should be further validated via formalisation and proving theorems about the collusion properties of the systems of distributed intelligence).

Of course, there should also be (at least initially, but practically for a very long time, if not forever) "traditional" governance mechanisms of the Gaia Network, nodes, model and data ownership, etc. So, there are a lot of open questions about interfacing GN with existing codes of law, judicial and law enforcement practice, intellectual property, political and governance processes, etc. Some of these interfaces and connections with existing institutions should in practice deal with bad actors and certain types of malicious behaviour on GN.

Comment by Roman Leventov on Worrisome misunderstanding of the core issues with AI transition · 2024-01-18T10:49:06.295Z · LW · GW

Fair, I edited the post.

Comment by Roman Leventov on AI doing philosophy = AI generating hands? · 2024-01-16T02:18:07.778Z · LW · GW

Apart from the view on philosophy as "cohesive stories that bind together and infuse meaning into scientific models", which I discussed with you earlier and you was not very satisfied with, another interpretation of philosophy (natural phil, phil of science, phil of mathematics, and metaphil, at least) is "apex generalisation/abstraction". Think Bengio's "AI scientist", but the GM should be even deeper to first sample a plausible "philosophy of science" given all the observations about the world up to the moment, then sample plausible scientific theory given the philosophy and all observations up to the moment on a specific level or coarse-graining/scientific abstraction (quantum, chemical, bio, physio, psycho, socio, etc.), then sample mechanistic model that describes the situation/system of interest at hand (e.g., a morphology of a particular organism, given the laws of biology, or morphology of the particular society), given the observations of the system up to the moment, and then finally sample plausible variables values that describe the particular situation at a particular point in time given all the above.

If this interpretation is correct, then doing philosophy well and not deluding ourselves is far off. And there is a huge risk in thinking we can do it well before we actually can.

Comment by Roman Leventov on An even deeper atheism · 2024-01-14T11:48:20.365Z · LW · GW

Extrapolated volition is a non-sensical concept altogether, as demonstrated in the OP. There is no extrapolated volition outside of it unfolding in real life in a specific context, which affects the trajectory of values/volition in a specific way. And which will be this context is unknown and unknowable (maybe aliens will visit Earth tomorrow, maybe not).

Comment by Roman Leventov on What technical topics could help with boundaries/membranes? · 2024-01-05T20:50:31.366Z · LW · GW

Related, consciousness frame: where is the boundary of it? Is our brain conscious, or the whole nervous system, or the whole human, or the whole human + the entire microbiome populating them, or human + robotic prosthetic limbs, or human + web search + chat AI + personal note taking app, or the whole human group (collective consciousness), etc.

Some computational theories of consciousness attempt to give a specific, mathematically formalised answer to this question.

Comment by Roman Leventov on What technical topics could help with boundaries/membranes? · 2024-01-05T20:45:32.698Z · LW · GW

Psychology may not be "technical enough" because an adequate mathematical science or process theory is not developed for it, yet, but it's ultimately very important, perhaps critically important: see the last paragraph of https://www.lesswrong.com/posts/AKBkDNeFLZxaMqjQG/gaia-network-a-practical-incremental-pathway-to-open-agency. Davidad apparently thinks that it can be captured with an Infra-Bayesian model of a person/human.

Also on psychology: what is the boundary of personality, where just a "role" (spouse, worker, etc) turns into multiple-personality disorder?

Comment by Roman Leventov on What technical topics could help with boundaries/membranes? · 2024-01-05T20:36:30.868Z · LW · GW

In the most recent episode of his podcast show, Jim Rutt (former president of SFI) and his guest talk about membranes a lot, the word appears 30 times on a transcript page: https://www.jimruttshow.com/cody-moser/

Comment by Roman Leventov on What technical topics could help with boundaries/membranes? · 2024-01-05T19:32:23.664Z · LW · GW

Related, quantum information theory:

Comment by Roman Leventov on AI Alignment Metastrategy · 2024-01-03T17:34:08.269Z · LW · GW

I think this metastrategy classification is overly simplified to the degree that I'm not sure it is net helpful. I don't see how Hendrycks' "Leviathan safety", Drexler's Open Agency Model, Davidad's OAA, Bengio's "AI pure scientist" and governance proposals (see https://slideslive.com/39014230/towards-quantitative-safety-guarantees-and-alignment), Kaufmann and Leventov's Gaia Network, AI Objectives Institute's agenda (and related Collective Intelligence Project's), Conjecture's CoEms, OpenAI's "AI alignment scientist" agenda, and Critch's h/acc (and related janus et al.'s Cyborgism) straightforwardly lend on this classification, at least not without losing some important nuance.

Furthermore, there is also the missing dimension of [technical strategy, organisational strategy, governance and political strategy] that could perhaps recombine to some degree.

Finally, in the spirit of "passing ideological Turing test" and "describing, not persuading" norms, it would be nice I think to include the criticism of the "conservative strategy" to the same level of fidelity that other metastrategies are criticised here, even if you or others discussed that in some other posts.

Comment by Roman Leventov on SociaLLM: proposal for a language model design for personalised apps, social science, and AI safety research · 2024-01-03T07:16:28.624Z · LW · GW

Announcement

I think SociaLLM has a good chance of getting OpenAI’s “Research into Agentic AI Systems” grant because it addresses both the challenges of the legibility of AI agent's behaviour by making the agent’s behaviour more “human-like” thanks to weight sharing and regularisation techniques/inductive biases described the post, as well as automatic monitoring: detection of duplicity or deception in AI agent's behaviour by comparing agent’s ToMs “in the eyes” of different other interlocutors, building on the work “Collective Intelligence in Human-AI Teams”.

I am looking for co-investigators for this (up to $100k, up to 8 months long) project with hands-on academic or practical experience in DL training (preferably), ML, Bayesian statistics, or NLP. The deadline for the grant application itself is the 20th of January, so I need to find a co-investigator by the 15th of January.

Another requirement for the co-investigator is that they preferably should be in academia, non-profit, or independent at the moment.

I plan to be hands-on during the project in data preparation (cleansing, generation by other LLMs, etc.) and training, too. However, I don’t have any prior experience with DL training, so if I apply for the project alone, this is a significant risk and a likely rejection.

If the project is successful, it could later be extended for further grants or turned into a startup.

If the project is not a good fit for you but you know someone who may be interested, I’d appreciate it a lot if you shared this with them or within your academic network!

Please reach out to me in DMs or at leventov.ru@gmail.com.

Comment by Roman Leventov on A hermeneutic net for agency · 2024-01-02T07:56:24.420Z · LW · GW

A lot of the examples of the concepts that you list already belong to established scientific fields: math, logic, probability, causal inference, ontology, semantics, physics, information theory, computer science, learning theory, and so on. These concepts don't need philosophical re-definition. Respecting the field boundaries, and the ways that fields are connected to each other via other fields (e.g., math and ontology to information theory/CS/learning theory via semantics) is also I think on net a good practice: it's better to focus attention on the fields that are actually most proto-scientific and philosophically confusing: intelligence, sentience, psychology, consciousness, agency, decision making, boundaries, safety, utility, value (axiology), and ethics[1].

Then, to make the overall idea solid, I think it's necessary to do a couple of extra things (you may already mention this in the post, but I semi-skimmed it and maybe missed these).

  • First, specify the concepts in this fuzzy proto-scientific area of intelligence, agency, and ethics not in terms of each other, but in terms of (or in a clearly specified connection with) those other scientific fields/ontologies that are already established, enumerated above. For example, a theory of agency should be compatible or connected with (or, specified in terms of) causal inference and learning theories. Theory of boundaries and ethics should be based on physics, information theory, semantics, and learning theory, among other things (cf. scale-free axiology and ethics).
  • Second, establish feedback loops that test these "proposed" theories of agency (psychology, ethics, decision-making, ethics) both in simulated environments (e.g., with LLM-based agents embodying these proposed theories acting in Minecraft- or Sims-like worlds) and (constrained) real life settings or environments. Note that the obligatory connection to physics, information theory, causal inference, and learning theory will ensure that these test themselves can be counted as scientific.

The good news are that now, there are sufficient (or almost sufficient) affordances to build AI agents that can embody sufficiently realistic and rich versions of these theories in realistic simulated environments as well as just the real life. And I think an actual R&D agenda proposal should be written about this and apply to a Superalignment grant.

There's an instinct to "ground" or "found" concepts. But there's no globally privileged direction of "more grounded" in the space of possible concepts. We have to settle for a reductholistic pluralism——or better, learn to think rightly, which will, as a side effect, make reductholism not seem like settling.

I disagree with the last sentence: "reductholism" should be the settling, as I argue in "For alignment, we should simultaneously use multiple theories of cognition and value". (Note that this view itself is based largely on quantum information theory: see "Information flow in context-dependent hierarchical Bayesian inference".)

 

  1. ^

    A counterargument could be made here that although logic, causal inference, ontology, semantics, physics, information theory, CS, learning theory, and so on are fairly established and all have SoTA, mature theories that look solid, these are probably not the final theories in all or many of these fields, and philosophical poking could highlight the problems with these theories, and perhaps this will actually be the key to "solving alignment". I agree that this is in principle possible chain of events, but it looks quite low expected impact to me from the "hermeneutic nets" perspective, so that this agenda is still better focused on the "core confusing" fields (intelligence, agency, ethics, etc.) and treat the established fields and the concepts therein "as given".

Comment by Roman Leventov on AGI will be made of heterogeneous components, Transformer and Selective SSM blocks will be among them · 2023-12-28T14:25:55.150Z · LW · GW

I agree with everything you said. Seems that we should distinguish between a sort of "cooperative" and "adversarial" safety approaches (cf. the comment above). I wrote the entire post as an extended reply to Marc Carauleanu upon his mixed feedback to my idea of adding "selective SSM blocks for theory of mind" to increase the Self-Other Overlap in AI architecture as a pathway to improve safety. Under the view that both Transformer and Selective SSM blocks will survive up until the AGI (if it is going to be created at all, of course), and even with the addition of your qualifications (that AutoML will try to stack these and other types of blocks in some quickly evolving ways), the approach seems solid to me, but only if we also make some basic assumptions about the good faith and cooperativeness of the AutoML / auto takeoff process. If we don't make such assumptions, of course, all bets are off, these "blocks for safety" could just be purged from the architecture.

Comment by Roman Leventov on AGI will be made of heterogeneous components, Transformer and Selective SSM blocks will be among them · 2023-12-28T14:16:41.896Z · LW · GW

I agree that training data governance is not robust to non-cooperative actors. But I think there is a much better chance to achieve a very broad industrial, academic, international, and legal consensus about it being a good way to jigsaw capabilities without sacrificing the raw reasoning ability, which the opponents of compute governance hold as purely counter-productive ("intelligence just makes things better"). That's why I titled my post "Open Agency model can solve the AI regulation dilemma" (emphasis on the last word).

This could even be seen not just as a "safety" measure, but as a truly good regularisation measure of the collective civilisational intelligence: to make intelligence more robust to distributional shifts and paradigm shifts, it's better to compartmentalise it and make communication between the compartments going through a relatively narrow, classical informational channel, namely human language or specific protocols rather than raw DNN activation dynamics.

Comment by Roman Leventov on Gaia Network: a practical, incremental pathway to Open Agency Architecture · 2023-12-27T20:21:55.757Z · LW · GW

BTW, this particular example sounds just like Numer.ai Signals, but Gaia Network is supposed to be more general and not to revolve around the stock market alone. E.g., the same nutritional data could be bought by food companies themselves, logistics companies, public health agencies, etc.

Comment by Roman Leventov on Gaia Network: a practical, incremental pathway to Open Agency Architecture · 2023-12-27T20:16:21.408Z · LW · GW

Thanks for suggestions.

An actual anecdote may look something like this: "We are a startup that creates nutrition assistant and family menu helper app. We collect anonymised data from the users and ensure differential privacy, yada-yada. We want to sell this data to hedge funds that trade food company stocks (so that we can offer the app for free for to our users), but we need to negotiate the terms of these agreements in an ad-hoc way with each hedge fund individually, and we don't have a principled way to come up with a fair price for the data. We would benefit from something like a 'platform' on which we can just publish the API spec of our data and then the platform (i.e., the Gaia Network) takes care of finding buyers for our data and paying us a fair price for it, etc."

Comment by Roman Leventov on AGI will be made of heterogeneous components, Transformer and Selective SSM blocks will be among them · 2023-12-27T20:11:28.150Z · LW · GW

The fact that hybridisation works better than pure architectures (architectures consisting of a single core type of block, we shall say), is exactly the point that Nathan Labenz makes in the podcast and I repeat in the beginning of the post.

(Ah, I actually forgot to repeat this point, apart from noting that Doyle predicted this in his architecture theory.)

Comment by Roman Leventov on The 'Neglected Approaches' Approach: AE Studio's Alignment Agenda · 2023-12-27T14:56:08.812Z · LW · GW

This conversation has prompted me to write "AGI will be made of heterogeneous components, Transformer and Selective SSM blocks will be among them".

Comment by Roman Leventov on On plans for a functional society · 2023-12-25T13:08:30.670Z · LW · GW

we're lacking all 4. We're lacking a coherent map of the polycrisis (if anyone wants to do and/or fund a version of aisafety.world for the polycrisis, I'm interested in contributing)

Joshua Williams created an initial version of a metacrisis map and I suggested to him a couple of days ago to make the development of such a resource more open, e.g., to turn it into a Github repository.

I think there's a ton of funding available in this space, specifically I think speculating on the markets informed by the kind of worldview that allows one to perceive the polycrisis has significant alpha. I think we can make much better predictions about the next 5-10 years than the market, and I don't think most of the market is even trying to make good predictions on those timescales.

Do you mean that it's possible to earn by betting long against the current market sentiment? I think this is wrong for multiple reasons, but perhaps most importantly, because the market specifically doesn't measure how well we are faring on a lot of components of polycrisis -- e.g., market would be great if all people are turned into addicted zombies. Secondly, people don't even try to make predictions in the stock market anymore -- its turned into a completely irrational valve of liquidity that is moved by Elon Musk's tweets, narratives, and memes more than by objective factors. 

Comment by Roman Leventov on On plans for a functional society · 2023-12-25T12:52:48.086Z · LW · GW

1.) Clearly state the problems that need to be worked on, and provide reasonable guidance as to where and how they might be worked on
2.) Notice what work is already being done on the problems, and who is doing it (avoid reinventing the wheel/not invented here syndrome; EA is especially guilty of this)
3.) Actively develop useful connections between 2.)
4.) Measure engagement (resource flows) and progress

I posted some parts of my current visions of 1) and 2) here and here. I think these, along with the Gaia Network design that we proposed recently (the Gaia Network is not "A Plan" in its entirety, but a significant portion of it), address @Vaniver's and @kave's points about realism and sociological/psychological viability.

The platform for generating the plan would need to be more-open-than-not, and should be fairly bleeding edge - incorporating prediction markets, consensus seeking (polis), eigenkarma etc

I think this is a mistake to import "democracy" at the vision level. Vision is essentially a very high-level plan, a creative engineering task. These are not decided by averaging opinions. "If you want to kill any idea in the world, get a committee working on it." Also, Deutsch was writing about this in "The Beginning of Infinity" in the chapter about democracy.

We should aggregate desiderata and preferences (see "Preference Aggregation as Bayesian Inference"), but not decisions (plans, engineering designs, visions). These should be created by a coherent creative entity. The same idea is evident in the design of Open Agency Architecture.

we're lacking meaningful 3rd party measurement

If I understand correctly what you are gesturing at here, I think that some high-level agents in the Gaia Network should become a trusted gauge for the "planetary health metrics" we care about.

Comment by Roman Leventov on Gaia Network: a practical, incremental pathway to Open Agency Architecture · 2023-12-24T14:43:27.679Z · LW · GW

Right now, if the Gaia Network already existed, but there were little models and agents on it, there would be no or little advantages (e.g., leveraging the tooling/infra built for the Gaia Network) in joining the network.

This is why I personally think that the bottom-up approach: building these apps and scaling them (thus building up QRFs) first is somewhat more promising path than the top-down approach, the ultimate version of which is the OAA itself, and the research agenda of building Gaia Network is a somewhat milder version, but still top-down-ish. That's why in the comment that I already linked to above, the implication is that these disparate apps/models/"agents" are built first completely independently (mostly as startups), without conforming to any shared protocol (like the Gaia protocol), and only once they grow large and sharing information across the domains becomes evidently valuable to these startups, then the conversation about a shared protocol will find more traction.

Then, why a shared protocol, still? Two reasons:

  • Practical: it will reduce transaction costs for all the models across the domains to start communicating to improve the predictions of each other. Without a shared protocol, this requires ad-hoc any prospective direction of information sharing. This is the practicality of any platforms, from the Internet itself to Airbnb to SWIFT (bank wires), and Gaia Network should be of this kind, too.
  • AI and catastrophic risk safety: to ensure some safety against rogue actors (AI or hybrid human-AI teams or whatever) through transparency and built-in mechanisms, we would want as much economic activity to be "on the network" as possible.
    • You may say that this is would be a tough political challenge to convince everybody to conform to the network in the name of some AI safety, but surely this would still be a smaller challenge than to abolish much of the current economic system altogether as (apparently) implied by the "vanilla" Davidad's OAA, and as we discuss throughout the article. In fact, this is one of the core points of the article.

Then, even though I advocate for a bottom-up approach above, there is still a room, and even a need for a parallel top-down activity (given the AGI timelines), so these two streams of activity meet each other somewhere in the middle. This is why we are debating all these blue-sky AI safety plans on LessWrong at all; this is why OAA was proposed, and this is why we are now proposing the Gaia Network.

Comment by Roman Leventov on Gaia Network: a practical, incremental pathway to Open Agency Architecture · 2023-12-24T13:03:04.052Z · LW · GW

One completely realistic example of an agent is given in the appendix (an agent that recommends actions to improve soil health or carbon sequestration). Some more examples are given in this comment:

  • An info agent that recommends me info resources (news, papers, posts, op-eds, books, videos) to consume, based on my current preferences and demands (and info from other others, such as those listed below, or this agent that predicts the personalised information value of the comments on the web)
    • Scaling to the group/coordination: optimise informational intake of a team, an organisation, or a family
  • Learning agent that recommends materials based on the previous learning trajectory and preferences, a-la liirn.space
    • Scaling to the group/coordination: coordinate learning experiences and lessons between individual learning agents based on who is on what learning level, availailiity, etc.
  • Financial agent that recommends spending based on my mid- and long-term goals
    • Equivalent of this for an organisation: "business development agent", recommends an org to optimise strategic investments based on the current market situation, goals of the company, other domain-specific models provided (i.e., in the limit, communicated by other Gaia agents responsible for these models), etc.
  • Investment agent recommends investment strategy based on my current financial situation, financial goals, and other investment goals (such as ESG)
    • Scaling to the group/coordination: optimise joint investment strategy for income and investment pools a-la pandopooling.com
  • Energy agent: decides when to accumulate energy and when to spend it based on the spot energy prices, weather forecast for renewables, and the current and future predicted demands for power
    • Scale this to microgrids, industrial/manufacturing sites, etc.
Comment by Roman Leventov on The 'Neglected Approaches' Approach: AE Studio's Alignment Agenda · 2023-12-23T04:42:22.563Z · LW · GW

I absolutely agree that the future TAI may look nothing like the current architectures. Cf. this tweet by Kenneth Stanley, with whom I agree 100%. At the same time, I think it's a methodological mistake to therefore conclude that we should only work on approaches and techniques that are applicable to any AI, in a black-box manner. It's like tying our hands behind our backs. We can and should affect the designs of future TAIs through our research, by demonstrating promise (or inherent limitations) of this or that alignment technique, so that these techniques get or lose traction and are included or excluded from the TAI design. So, we are not just making "assumptions" about the internals of the future TAIs; we are shaping these internals.

We can and should think about the proliferation risks[1] (i.e., the risks that some TAI will be created by downright rogue actors), but IMO most of that thinking should be on the governance, not technical side. We agree with Davidad here that a good technical AI safety plan should be accompanied with a good governance (including compute monitoring) plan.

  1. ^

    In our own plan (Gaia Network), we do this in the penultimate paragraph here.

Comment by Roman Leventov on On the future of language models · 2023-12-21T17:19:41.288Z · LW · GW

I think you tied yourself too much to the strict binary classification that you invented (finetuning/scaffolding). You overgeneralise and your classification blocks the truth more than clarifies things.

All the different things that can be done by LLMs: tool use, scaffolded reasoning aka LM agents, RAG, fine-tuning, semantic knowledge graph mining, reasoning with semantic knowledge graph, finetuning for following "virtue" (persona, character, role, style, etc.), finetuning for model checking, finetuning for heuristics for theorem proving, finetuning for generating causal models, (what else?), just don't easily fit into two simple categories with the properties that are consistent within the category.

But I don't understand the sense in which you think finetuning in this context has completely different properties.

In the summary (note: I actually didn't read the rest of the post, I've read only the summary), you write something that implies that finetuning is obscure or un-interpretable:

From a safety perspective, language model agents whose agency comes from scaffolding look greatly superior than ones whose agency comes from finetuning

  • Because you can get an extremely high degree of transparency by construction

But this totally doesn't apply to these other variants of finetuning that I mentioned. If the LLM creates is a heuristic engine to generate mathematical proofs that are later verified with Lean, it just stops to make any sense to discuss how interpretable or transparent these theorem-proving or model-checking LLM-based heuristic engine.

Comment by Roman Leventov on On the future of language models · 2023-12-20T22:36:18.792Z · LW · GW

Also, I would say, retrieval-augmented generation (RAG) is not just a mundane way to industrialise language model, but an important concept whose properties should be studied separately from scaffolding or fine-tuning or other techniques that I listed in the comment above.

Comment by Roman Leventov on The 'Neglected Approaches' Approach: AE Studio's Alignment Agenda · 2023-12-20T22:26:58.832Z · LW · GW

On (1), cf. this report: "The current portfolio of work on AI risk is over-indexed on work which treats “transformative AI” as a black box and tries to plan around that. I think that we can and should be peering inside that box (and this may involve plans targeted at more specific risks)."

On (2), I'm surprised to read this from you, since you suggested to engineer Self-Other Overlap into LLMs in your AI Safety Camp proposal, if I understood and remember correctly. Do you actually see a line (or a way) of increasing the overlap without furthering ToM and therefore "social capabilities"? (Which ties back to "almost all applied/empirical AI safety work is simultaneously capabilities work".)

Comment by Roman Leventov on On the future of language models · 2023-12-20T22:16:35.266Z · LW · GW

Notable techniques for getting value out of language models that are not mentioned:

Comment by Roman Leventov on SociaLLM: proposal for a language model design for personalised apps, social science, and AI safety research · 2023-12-20T05:54:25.446Z · LW · GW

In another thread, Marc Carauleanu wrote:

The main worry that I have with regards to your approach is how competitive SociaLLM would be with regards to SOTA foundation models given both (1) the different architecture you plan to use, and (2) practical constraints on collecting the requisite structured data. While it is certainly interesting that your architecture lends itself nicely to inducing self-other overlap, if it is not likely to be competitive at the frontier, then the methods uniquely designed to induce self-other overlap on SociaLLM are likely to not scale/transfer well to frontier models that do pose existential risks. (Proactively ensuring transferability is the reason we focus on an additional training objective and make minimal assumptions about the architecture in the self-other overlap agenda.)

I agree with worries (1) and (2). I think there is a way to de-risk this.

The block hierarchy that is responsible for tracking the local context consists of classic Transformer blocks. Only the user's own history tracking really needs to be an SSM hierarchy because it quickly surpasses the scalability limits of self-attention (also, interlocutor's tracking blocks in private 1-1 or small group chats that can be arbitrarily long, but there is probably no such data available for training). In the public data (such as forums, public chats rooms, Diplomacy and other text games), the interlocutor's history traces would 99% of the time easily fit into 100k symbols, but for the symmetry with user's own state (same weights!) and for having the same representation structure it should mirror the user's own SSM blocks, of course.

With such an approach, the SSM hierarchies could start very small, with only a few blocks or even just a single SSM block (i.e., two blocks in total: one for user's own and one for interlocutor's state), and attach to the middle of the Transformer hierarchy to select from it. However, I think this approach couldn't be just slapped on the tre-trained LLama or another large Transformer LLM model. I suspect the transformer should be co-trained with the SSM blocks to induce the Transformer to make the corresponding representations useful for the SSM blocks. "Pretraining Language Models with Human Preferences" is my intuition pump here.

Regarding the sufficiency and quality of training data, the Transformer hierarchy itself could still be trained on arbitrary texts, as well as the current LLMs. And we can adjust the size of the SSM hierarchies to the amounts of high-quality dialogue and forum data that we are able to obtain. I think this a no-brainer that this design would improve the frontier quality in LLM apps that value personalisation and attunement to the user's current state (psychological, emotional, levels of knowledge, etc.), relative to whatever "base" Transformer model we would take (such as Llama, or any other).

One additional worry is that many of the research benefits of SociaLLM may not be out of reach for current foundation models, and so it is unclear if investing in the unique data and architecture setup is worth it in comparison to the counterfactual of just scaling up current methods.

With this I disagree, I think it's critical for the user state tracking to be energy-based. I don't think there are ways to recapitulate this with auto-regressive Transformer language models (cf. any LeCun's presentation from the last year). There are potential ways to recapitulate this with other language modelling architectures (non-Transformer and non-SSM), but they currently don't hold any stronger promise than SSM, so I don't see any reasons to pick them.

Comment by Roman Leventov on The 'Neglected Approaches' Approach: AE Studio's Alignment Agenda · 2023-12-20T05:52:34.860Z · LW · GW

Thanks for feedback. I agree with worries (1) and (2). I think there is a way to de-risk this.

The block hierarchy that is responsible for tracking the local context consists of classic Transformer blocks. Only the user's own history tracking really needs to be an SSM hierarchy because it quickly surpasses the scalability limits of self-attention (also, interlocutor's tracking blocks on private 1-1 chats that can also be arbitrarily long, but there is probably no such data available for training). On the public data (such as forums, public chats room logs, Diplomacy and other text game logs) the interlocutor's history traces will 99% of the time would easily be less than 100k symbols, but for the symmetry with user's own state (same weights!) and for having the same representation structure it should mirror the user's own SSM blocks, of course.

With such an approach, the SSM hierarchies could start very small, with only a few blocks or even just a single SSM block (i.e., two blocks in total: one for user's own and one for interlocutor's state), and attach to the middle of the Transformer hierarchy to select from it. However, I think this approach couldn't be just slapped on the tre-trained LLama or another large Transformer LLM model. I suspect the transformer should be co-trained with the SSM blocks to induce the Transformer to make the corresponding representations useful for the SSM blocks. "Pretraining Language Models with Human Preferences" is my intuition pump here.

Regarding the sufficiency and quality of training data, the Transformer hierarchy itself could still be trained on arbitrary texts, as well as the current LLMs. And we can adjust the size of the SSM hierarchies to the amounts of high-quality dialogue and forum data that we are able to obtain. I think this a no-brainer that this design would improve the frontier quality in LLM apps that value personalisation and attunement to the user's current state (psychological, emotional, levels of knowledge, etc.), relative to whatever "base" Transformer model we would take (such as Llama, or any other).

One additional worry is that many of the research benefits of SociaLLM may not be out of reach for current foundation models

With this I disagree, I think it's critical for the user state tracking to be energy-based. I don't think there are ways to recapitulate this with auto-regressive Transformer language models (cf. any LeCun's presentation from the last year). There are potential ways to recapitulate this with other language modelling architectures (non-Transformer and non-SSM), but they currently don't hold any stronger promise than SSM, so I don't see any reasons to pick them.

Comment by Roman Leventov on SociaLLM: proposal for a language model design for personalised apps, social science, and AI safety research · 2023-12-19T18:19:46.518Z · LW · GW

Clarity check: this model has not been trained yet at this time, correct?

Yes, I've changed the title of the post and added a footnote on "is a foundation model".

Comment by Roman Leventov on The 'Neglected Approaches' Approach: AE Studio's Alignment Agenda · 2023-12-19T17:41:29.513Z · LW · GW

More generally, we strongly agree that building out BCI is like a tightrope walk. Our original theory of change explicitly focuses on this: in expectation, BCI is not going to be built safely by giant tech companies of the world, largely given short-term profit-related incentives—which is why we want to build it ourselves as a bootstrapped company whose revenue has come from things other than BCI. Accordingly, we can focus on walking this BCI developmental tightrope safely and for the benefit of humanity without worrying if we profit from this work.

I can push back on this somewhat by noting that most risks from BCI may lay outside of the scope of control of any company that builds it and "plugs people in", but rather in the wider economy and social ecosystem. The only thing that may matter is the bandwidth and the noisiness of information channel between the brain and the digital sphere, and it seems agnostic to whether a profit-maximising, risk-ambivalent, or a risk-conscious company is building the BCI.

Comment by Roman Leventov on The 'Neglected Approaches' Approach: AE Studio's Alignment Agenda · 2023-12-19T17:33:46.141Z · LW · GW

We think we have some potentially promising hypotheses. But because we know you do, too, we are actively soliciting input from the alignment community. We will be more formally pursuing this initiative in the near future, awarding some small prizes to the most promising expert-reviewed suggestions. Please submit any[3] agenda idea that you think is both plausible and neglected (even if you don’t have the bandwidth right now to pursue the idea! This is a contest for ideas, not for implementation). 

This is related to what @Kabir Kumar is doing with ai-plans.com, who just hosted a critique-a-thon a couple of days ago. So maybe you will find his platform useful or find other ways to collaborate.

Comment by Roman Leventov on The 'Neglected Approaches' Approach: AE Studio's Alignment Agenda · 2023-12-19T16:53:00.369Z · LW · GW

Reverse-engineering prosociality

Here's my idea on this topic: "SociaLLM: a language model design for personalised apps, social science, and AI safety research". Though it's more about engineering pro-sociality (including Self-Other Overlap) using architecture and inductive biases directly than reverse-engineering prosociality.

Comment by Roman Leventov on The 'Neglected Approaches' Approach: AE Studio's Alignment Agenda · 2023-12-19T09:30:27.340Z · LW · GW

You choose phrases like "help to solve alignment", in general mostly mention "alignment" and not "safety" (except in the sections where you discuss indirect agendas, such as "7. Facilitate the development of explicitly-safety-focused businesses"), and write "if/when we live in a world with superintelligent AI whose behavior is—likely by definition—outside our direct control" (implying that 'control' of AI would be desirable?).

Is this a deliberate choice of narrowing your direct, object-level technical work to alignment (because you think this where the predispositions of your team are?), or a disagreement with more systemic views on "what we should work on to reduce the AI risks", such as:

(1) Davidad's "AI Neorealism: a threat model & success criterion for existential safety":

For me the core question of existential safety is this: “Under these conditions, what would be the best strategy for building an AI system that helps us ethically end the acute risk period without creating its own catastrophic risks that would be worse than the status quo?”

It is not, for example, "how can we build an AI that is aligned with human values, including all that is good and beautiful?" or "how can we build an AI that optimises the world for whatever the operators actually specified?" Those could be useful subproblems, but they are not the top-level problem about AI risk (and, in my opinion, given current timelines and a quasi-worst-case assumption, they are probably not on the critical path at all).

(2) Leventov's "Beyond alignment theories":

Note that in this post, only a relatively narrow aspect of the multi-disciplinary view on AI safety is considered, namely the aspect of poly-theoretical approach to the technical alignment of humans to AIs. This mainly speaks to theories of cognition (intelligence, alignment) and ethics. But on a larger view, there are more theories and approaches that should be deployed in order to engineer our civilisational intelligence such that it “goes well”. These theories are not necessarily quite about “alignment”. Examples are control theory (we may be “aligned” with AIs but collectively “zombified” by powerful memetic viruses and walk towards a civilisational cliff), game theory (we may have good theories of alignment but our governance systems cannot deal with multi-polar traps so we cannot deploy these theories effectively), information security considerations, mechanistic anomaly detection and deep deceptiveness, etc. All these perspectives further demonstrate that no single compact theory can “save” us.

(3) Drexler's "Open Agency Model";

(4) Hendrycks' "Pragmatic AI Safety";

(5) Critch's "What Multipolar Failure Looks Like, and Robust Agent-Agnostic Processes (RAAPs)".

Comment by Roman Leventov on The 'Neglected Approaches' Approach: AE Studio's Alignment Agenda · 2023-12-19T09:07:25.738Z · LW · GW

Post-response: Assessment of AI safety agendas: think about the downside risk

You evidently follow a variant of 80000hours' framework for comparing (solving) particular problems in terms of expected impact: Neglectedness x Scale (potential upside) x Solvability.

I think for assessing AI safety ideas, agendas, and problems to solve, we should augment the assessment with another factor: the potential for a Waluigi turn, or more prosaically, the uncertainty about the sign of the impact (scale) and, therefore, the risks of solving the given problem or advancing far on the given agenda.

This reminds me of Taleb's mantra that to survive, we need to make many bets, but also limit the downside potential of each bet, i.e., the "ruin potential". See "The Logic of Risk Taking".

Of the approaches that you listed, some sound risky to me in this respect. Particularly, "4. ‘Reinforcement Learning from Neural Feedback’ (RLNF)" -- sounds like a direct invitation for wireheading to me. More generally, scaling BCI in any form and not falling into a dystopia at some stage is akin to walking a tightrope (at least at the current stage of civilisational maturity, I would say) This speaks to agendas #2 and #3 on your list.

There are also similar qualms about AI interpretability: there are at least four posts on LW warning of the potential risks of interpretability:

This speaks to the agenda "9. Neuroscience x mechanistic interpretability" on your list.

Related earlier posts

Comment by Roman Leventov on OpenAI: Preparedness framework · 2023-12-18T20:54:36.157Z · LW · GW

Everyone on Twitter has criticised the label "Responsible Scaling Policy", but the author of this post seems not to respect what seems like a gentle attempt by OpenAI to move past this label.

If we were a bit more serious about this we would perhaps immediately rename the tag "Responsible Scaling Policies" on LessWrong into "Preparedness Frameworks" with a note on the tag page "Anthropic calls their PF 'RSP', but we think this is a bad label".