roman-leventov

Very reliable, long-horizon agency is already in the capability overhang of Gemini 2.5 pro, perhaps even the previous-tier models (gemini 2.0 exp, sonnet 3.5/3.7, gpt-4o, grok 3, deepseek r1, llama 4). It's just the matter of harness/agent-wrapping logic and inference-time compute budget.

Agency engineering is currently in the brute-force stage. Agent engineers over rely on a "single LLM rollout" to be robust, but also often use LLM APIs that sometimes lack certain nitty-gritty affordances for implementing reliable agency, such as "N completions" with timely self-consistency pruning and perhaps scaling N up again when model's own uncertainty is up.

This somewhat reminds me of the early LLM scale-up era where LLM engineers over relied on "stack more layers" without digging more into the architectural details. The best example is perhaps Megatron, a trillion-parameter model from 2021 whose performance is probably abysmal relative to the 2025 models of ~10B parameters (perhaps even 1B).

So, the current agents (such as Cursor, Claude Code, Replit, Manus) are in the "Megatron era" of efficiency. In four years, even with the same raw LLM capability, agents will be very reliable.

To give a more specific example when robustness is a matter of spending more on inference, let's consider Gemini 2.5 pro: contrary to the hype, it often misses crucial considerations or acts strangely stupidly on modestly sized contexts (less than 50k tokens). However, seeing these omissions, it's obvious to me that if someone applied ~1k token-sized chunks of that context to 2.5-pro's output and asked a smaller LLM (flash or flash lite) "did this part of the context properly informed that output", flash would answer No when 2.5-pro indeed missed something important from that part of the context. This should trigger a fallback on N-completions, 2.5 self-review with smaller pieces of the context, breaking down the context hierarchically, etc.

Comment by Roman Leventov on Roman Leventov's Shortform · 2025-03-27T13:01:06.361Z · LW · GW

It seems that a lot of white collar jobs will become (already becoming) positional goods, such as aristocratic titles, at least for a few years, possibly longer.

AI will do 100% of the "meat" of the job better than almost all humans, and ~equally for every user (prompting won't matter much).

But business will still demand accountability for results, and that the workers can claim that they understand and attest AI outputs (these claims themselves won't be tested, though, nor would it really matter in the grand scheme of things). At the same time, the productivity of these jobs will increase more than businesses can absorb, at least for a few years (and then perhaps fully automated companies will ensue). Thus, fewer total white collar workers are needed.

When the skill doesn't really matter, and the demand decreases, the jobs will become highly contested and the credentials, prestige (pedigree), connections, and "soft skills" (primarily: of passing the interviews) will decide these contents rather than "hard skills" (of which only the skill of understanding sophisticated AI outputs and potentially fix remaining issues with AI outputs will really matter, but the marginal difference between workers who are good and bad at this skill will be relatively small for the company's bottom line, and testing candidates for this skill will be too hard).

The above straightforwardly applies to all "digital"/online/IT/analyst/manager jobs.

I don't buy the takes like Steve Yegge's https://sourcegraph.com/blog/revenge-of-the-junior-developer and similar, with projections of white collar workers becoming 10x, 100x more productive than today. Backlogs are not that deep, and the marginal value of churning through 99% of these backlog issues for companies is ~0.

I also don't believe in Jevon's paradox wonders of increased demand for "digital" work, again at least for a few years (or realistically, 10+ years) until the economy goes through a deeper transformation (including geographically). In the meantime, the economy looks to be already ~saturated (or even oversaturated) with IT/digitalization, marketing, compliance, legal proceedings, analysis, educational materials, and other similar outputs of white collar work.

Comment by Roman Leventov on Gradual Disempowerment, Shell Games and Flinches · 2025-02-04T08:59:52.490Z · LW · GW

Even for those not directly employed by AI labs, there are similar dynamics in the broader AI safety community. Careers, research funding, and professional networks are increasingly built around certain ways of thinking about AI risk. Gradual disempowerment doesn't fit neatly into these frameworks. It suggests we need different kinds of expertise and different approaches than what many have invested years developing. Academic incentives also currently do not point here - there are likely less than ten economists taking this seriously, trans-disciplinary nature of the problem makes it hard sell as a grant proposal.

I agree this is unfortunate, but this also seems irrelevant? Academic economics (as well as sociology, political science, anthropology, etc.) are approximately completely irrelevant to shaping major governments' AI policies. "Societal preparedness" and "governance" teams at major AI labs and BigTech giants seem to have approximately no influence on the concrete decisions and strategies of their employers.

The last economist who influenced the economic and policy trajectory significantly was Milton Friedman perhaps?

If not research, what can affect the economic and policy trajectory at all in a deliberate way (disqualifying the unsteerable memetic and cultural drift forces), apart from powerful leaders themselves (Xi, Trump, Putin, Musk, etc.)? Perhaps the way we explore the "technology tree" (see https://michaelnotebook.com/optimism/index.html)? Such as the internet, social media, blockchain, form factors of AI models, etc. I don't hold too much hope here, but this looks to me like the only plausible lever.

Comment by Roman Leventov on Gradual Disempowerment, Shell Games and Flinches · 2025-02-04T08:42:28.700Z · LW · GW

My quick impression is that this is a brutal and highly significant limitation of this kind of research. It's just incredibly expensive for others to read and evaluate, so it's very common for it to get ignored.

I'd predict that if you improved the arguments by 50%, it would lead to little extra uptake.

I think this is wrong. The introduction of the GD paper takes no more than 10 minutes to read and no significant cognitive effort to grasp, really. I don't think there is more than 10% potential of making it any clearer or approachable.

Comment by Roman Leventov on The Failed Strategy of Artificial Intelligence Doomers · 2025-02-03T07:41:42.227Z · LW · GW

https://gradual-disempowerment.ai/ is mostly about institutional progress, not narrow technical progress.

Comment by Roman Leventov on AI research assistants competition 2024Q3: Tie between Elicit and You.com · 2024-10-14T23:58:45.738Z · LW · GW

Undermind.ai I think is much more useful for searching concepts and ideas in papers rather than extracting tabular info a la Elicit. Nominally Elicit can do the former, too, but is quite bad in my experience.

Comment by Roman Leventov on The Great Data Integration Schlep · 2024-10-12T14:50:14.916Z · LW · GW

https://openmined.org/ develops Syft, a framework for "private computation" in secure enclaves. It potentially reduces the barriers for data integration both within particularly bureaucratic orgs and across orgs.

Comment by Roman Leventov on My motivation and theory of change for working in AI healthtech · 2024-10-12T13:04:02.542Z · LW · GW

Thanks for the post, I agree with it!

I just wrote a post with differential knowledge interconnection thesis, where I argue that it is on net beneficial to develop AI capabilities such as

Federated learning, privacy-preserving multi-party computation, and privacy-preserving machine learning.
Federated inference and belief sharing.
Protocols and file formats for data, belief, or claim exchange and validation.
Semantic knowledge mining and hybrid reasoning on (federated) knowledge graphs and multimodal data.
Structured or semantic search.
Datastore federation for retrieval-based LMs.
Cross-language (such as, English/French) retrieval, search, and semantic knowledge integration. This is especially important for low-online-presence languages.

I discuss whether knowledge interconnection exacerbates or abates the risk if industrial dehumanization on net in a section. It's a challenging question, but I reach the tentative conclusion that AI capabilities that favor obtaining and leveraging "interconnected" rather than "isolated" knowledge are on net risk-reducing. This is because the "human economy" is more complex than the hypothetical "pure machine-industrial economy", and "knowledge interconnection" capabilities support that greater complexity.

Would you agree or disagree with this?

Comment by Roman Leventov on There Should Be More Alignment-Driven Startups · 2024-06-01T15:47:53.008Z · LW · GW

I think the model of commercial R&D lab would often suit alignment work better than a "classical" startup company. Conjecture and AE Studio come to mind. Answer.AI, founded by Jeremy Howard (of Fast.ai and Kaggle) and Eric Ries (Lean Startup) elaborates on this business and organisational model here: https://www.answer.ai/posts/2023-12-12-launch.html.

Comment by Roman Leventov on The two-tiered society · 2024-05-14T05:50:54.298Z · LW · GW

But I should add, I agree that 1-3 poses challenging political and coordination problems. Nobody assumes it will be easy, including Acemoglu. It's just another one in the row of hard political challenges posed by AI, along with the questions of "aligned with whom?", considering/accounting for people's voice past dysfunctional governments and political elites in general, etc.

Comment by Roman Leventov on The two-tiered society · 2024-05-14T05:41:33.009Z · LW · GW

Separately, I at least spontaneously wonder: How would one even want to go about differentiating what is the 'bad automation' to be discouraged, from legit automation without which no modern economy could competitively run anyway? For a random example, say if Excel wouldn't yet exist (or, for its next update..), we'd have to say: Sorry, cannot do such software, as any given spreadsheet has the risk of removing thousands of hours of work...?! Or at least: Please, Excel, ask the human to manually confirm each cell's calculation...?? So I don't know how we'd in practice enforce non-automation. Just 'it uses a large LLM' feels weirdly arbitrary condition - though, ok, I could see how, due to a lack of alternatives, one might use something like that as an ad-hoc criterion, with all the problems it brings. But again, I think points 1. & 2. mean this is unrealistic or unsuccessful anyway.

Clearly, specific rule-based regulation is a dumb strategy. Acemoglu's suggestions: tax incentives to keep employment and "labour voice" to let people decide in the context of specific company and job how they want to work with AI. I like this self-governing strategy. Basically, the idea is that people will want to keep influencing things and will resist "job bullshittification" done to them, if they have the political power ("labour voice"). But they should also have alternative choice of technology and work arrangement/method that doesn't turn their work into rubber-stamping bullshit, but also alleviates the burden ("machine usefulness"). Because if they only have the choice between rubber-stamping bullshit job and burdensome job without AI, they may choose rubber-stamping.

Comment by Roman Leventov on The two-tiered society · 2024-05-14T05:33:07.862Z · LW · GW

If you'd really be able to coordinate globally to enable 1. or 2. globally - extremely unlikely in the current environment and given the huge incentives for individual countries to remain weak in enforcement - then it seems you might as well try to impose directly the economic first best solution w.r.t. robots vs. labor: high global tax rates and redistribution.

If anything, this problem seems more pernicious wrt. climate change mitigation and environmental damage: it's much more distributed, not only in US and China, but Russia and India are also big emitters, big leverage in Brazil, Congo, and Indonesia with their forests, overfishing and ocean pollution everywhere, etc.

With AI, it's basically the question of regulating US and UK companies: EU is always eager to over-regulate relative to the US, and China is already successfully and closely regulating their AI for a variety of reasons (which Acemoglu points out). The big problem of the Chinese economy is weak internal demand, and automating jobs and therefore increasing inequality and decreasing the local purchasing power is the last thing that China wants.

Comment by Roman Leventov on The two-tiered society · 2024-05-13T17:20:58.158Z · LW · GW

What levels of automation does the AI provide and at what rate is what he suggests to influence directly (specifically, slow down), through economic and political measures. So it's not fair to list that as an assumption.

Comment by Roman Leventov on The two-tiered society · 2024-05-13T17:09:37.448Z · LW · GW

It would depend on exact details, but if a machine can do something as well or better than a human, then the machine should do it.

It's a question of how to design work. Machine can cultivate better than a human a monoculture mega-farm, but not a small permaculture garden (at least, yet). Is a monoculture mega-farm more "effective"? Maybe, if we take the pre-AI opportunity cost of human labour, but also maybe not with the post-AI opportunity cost of human labour. And this is before factoring in the "economic value" of better psychological and physical health of people who work on small farms vs. those who eat processed food on their couches that is done from the crops grown on monoculture mega-farms, and do nothing.

As I understand, Acemoglu rougly suggests to look for ways to apply this logic in other domain of economy, including the knowledge economy. Yes, it's not guaranteed that such arrangements will stay economical for a long time (but it's also not beyond my imagination, especially if we factor in the economic value of physical and psychological health), but it may set the economy and the society on a different trajectory with higher chances of eventualities that we would consider "not doom".

What does "foster labour voice" even mean?

Unions 2.0, or something like holacracy?

Especially in companies where everything is automated.

Not yet. Clearly, what he suggests could only remain effective for a limited time.

You can give more power to current employees of current companies, but soon there will be new startups with zero employees (or where, for tax reasons, owners will formally employ their friends or family members).

Not that soon at all, if we speak about the real economy. In IT sector, I suspect that Big Techs will win big in the AI race because only they have deep enough pockets (you already see Inflection AI quasi-acquired by MS, Stability essentially bust, etc.). And Big Techs still have huge workforces and it won't be just Nadella or just Pichai anytime soon. Many other knowledge sectors (banks, law) are regulated and also won't shed employees that fast.

Human-complementary AI technologies again sounds like a bullshit job, only mostly did by a machine, where a human is involved somewhere in the loop, but the machine could still do his part better, too.

In my gardening example, a human may wear AI goggles that tell them which plants or animal species do their see or what disease a plant has.

Tax on media platforms -- solves a completely different problem. Yes, it is important to care about public mental health. But that is separate from the problem of technological unemployment. (You could have technological unemployment even in the universe where all social media are banned.)

Tax on media platforms is just a concrete example of how "reforming business models" could be done in practice, maybe not the best one (but it's not my example). I will carry on with my gardening example and suggest "tax on fertiliser": make it so huge that megafarms (which require a lot of fertiliser) become less economical than permaculture gardens. Because without such a push, permaculture gardens won't magically materialise. Acemoglu underscores this point multiple times: it's not a matter of pure technological invention and application of it in a laissez-faire market to switch to a different socioeconomic trajectory. Inventing AI goggles for gardening (or any other technology which makes permaculture gardening arbitrarily convenient) won't make the economy to switch from monoculture mega-farms without an extra push.

Perhaps, Acemoglu also has something in his mind about attention/creator economy and the automation that may happen to them (AI influencers can replace human influencers) when he talks about "digital ad tax", but I don't see it.

Comment by Roman Leventov on On attunement · 2024-03-30T13:08:11.454Z · LW · GW

John Vervaeke calls attunement "relevance realization".

Comment by Roman Leventov on Modern Transformers are AGI, and Human-Level · 2024-03-27T03:16:32.926Z · LW · GW

Cf. DeepMind's "Levels of AGI" paper (https://arxiv.org/abs/2311.02462), calling modern transformers "emerging AGI" there, but also defining "expert", "virtuoso", and "superhuman" AGI.

Comment by Roman Leventov on AI Alignment Metastrategy · 2024-03-24T13:24:12.463Z · LW · GW

Humane/acc, https://twitter.com/AndrewCritchPhD

Comment by Roman Leventov on Value learning in the absence of ground truth · 2024-02-05T22:11:41.339Z · LW · GW

Well, yes, it also includes learning weak agent's models more generally, not just the "values". But I think the point stands. It's elaborated better in the linked post. As AIs will receive most of the same information that humans receive through always-on wearable sensors, there won't be much to learn for AIs from humans. Rather, it's humans that will need to do their homework, to increase the quality of their value judgements.

Comment by Roman Leventov on Value learning in the absence of ground truth · 2024-02-05T21:30:09.977Z · LW · GW

I agree with the core problem statement and most assumptions of the Pursuit of Happiness/Conventions Approach, but suggest a different solution: https://www.lesswrong.com/posts/rZWNxrzuHyKK2pE65/ai-alignment-as-a-translation-problem

I agree with OpenAI folks that generalisation is the key concept for understanding alignment process. But I think that with their weak-to-strong generalisation agenda, they (as well as almost everyone else) apply it I'm the reverse direction: learning values of weak agents (humans) doesn't make sense. Rather, weak agents should learn the causal models that strong agents employ to be able to express an informed value judgement. This is the way to circumvent the "absence of the ground truth for values" problem: instead, agent try to generalise their respective world models so that they sufficiently overlap, and then choose actions that seem net beneficial to both sides, without knowing how this value judgement way made by the other side.

In order to be able to generalise to shared world models with AIs, we must also engineer AIs to have human inductive biases from the beginning. Otherwise, this won't be feasible. This observation makes "brain-like AGI" one of the most important alignment agendas in my view.

Comment by Roman Leventov on AI alignment as a translation problem · 2024-02-05T18:44:25.611Z · LW · GW

If I understand correctly, by "discreteness" you mean that it simply says that one agent can know neither the meaning of symbols used by another agent nor the "degree" of grokking the meaning. Just cannot say anything.

This is correct, but the underlying reason why this is correct is the same as why solipsism or the simulation hypothesis cannot be disproven (or proven!).

So yeah, I think there is no tangible relationship to the alignment problem, except that it corroborates that we couldn't have 100% (literally, probability=1) certainty of alignment or safety of whatever we create, but it was obvious even without this philosophical argument.

So, I removed that paragraph about Quine's argument from the post.

Comment by Roman Leventov on Making every researcher seek grants is a broken model · 2024-01-27T15:55:25.019Z · LW · GW

That also was, naturally, the model in the Soviet Union, with orgs called "scientific research institutes". https://www.jstor.org/stable/284836

Comment by Roman Leventov on Workshop (hackathon, residence program, etc.) about for-profit AI Safety projects? · 2024-01-26T23:24:54.899Z · LW · GW

See a discussion of this point here with Marius Hobbhahn and others.

Comment by Roman Leventov on This might be the last AI Safety Camp · 2024-01-26T09:54:24.493Z · LW · GW

This post has led me to this idea: Workshop (hackathon, residence program, etc.) about for-profit AI Safety projects?

Comment by Roman Leventov on Gaia Network: An Illustrated Primer · 2024-01-23T20:18:01.429Z · LW · GW

Collusion detection and prevention and trust modelling don't trivially follow from the basic architecture of the system described on the level of this article. Some specific mechanisms should be implemented in the Protocol to have collusion detection and trust modelling. And we don't have these mechanisms actually developed yet, but we think that they should be doable (though this is still a research bet, not a 100% certainty) because the Gaia Network directly embodies (or is amenable to) all six general principles for anti-collusion mechanism design (agency architecture) proposed by Eric Drexler (and these principles themselves should be further validated via formalisation and proving theorems about the collusion properties of the systems of distributed intelligence).

Of course, there should also be (at least initially, but practically for a very long time, if not forever) "traditional" governance mechanisms of the Gaia Network, nodes, model and data ownership, etc. So, there are a lot of open questions about interfacing GN with existing codes of law, judicial and law enforcement practice, intellectual property, political and governance processes, etc. Some of these interfaces and connections with existing institutions should in practice deal with bad actors and certain types of malicious behaviour on GN.

Comment by Roman Leventov on Worrisome misunderstanding of the core issues with AI transition · 2024-01-18T10:49:06.295Z · LW · GW

Fair, I edited the post.

Comment by Roman Leventov on AI doing philosophy = AI generating hands? · 2024-01-16T02:18:07.778Z · LW · GW

Apart from the view on philosophy as "cohesive stories that bind together and infuse meaning into scientific models", which I discussed with you earlier and you was not very satisfied with, another interpretation of philosophy (natural phil, phil of science, phil of mathematics, and metaphil, at least) is "apex generalisation/abstraction". Think Bengio's "AI scientist", but the GM should be even deeper to first sample a plausible "philosophy of science" given all the observations about the world up to the moment, then sample plausible scientific theory given the philosophy and all observations up to the moment on a specific level or coarse-graining/scientific abstraction (quantum, chemical, bio, physio, psycho, socio, etc.), then sample mechanistic model that describes the situation/system of interest at hand (e.g., a morphology of a particular organism, given the laws of biology, or morphology of the particular society), given the observations of the system up to the moment, and then finally sample plausible variables values that describe the particular situation at a particular point in time given all the above.

If this interpretation is correct, then doing philosophy well and not deluding ourselves is far off. And there is a huge risk in thinking we can do it well before we actually can.

Comment by Roman Leventov on An even deeper atheism · 2024-01-14T11:48:20.365Z · LW · GW

Extrapolated volition is a non-sensical concept altogether, as demonstrated in the OP. There is no extrapolated volition outside of it unfolding in real life in a specific context, which affects the trajectory of values/volition in a specific way. And which will be this context is unknown and unknowable (maybe aliens will visit Earth tomorrow, maybe not).

Comment by Roman Leventov on What technical topics could help with boundaries/membranes? · 2024-01-05T20:50:31.366Z · LW · GW

Related, consciousness frame: where is the boundary of it? Is our brain conscious, or the whole nervous system, or the whole human, or the whole human + the entire microbiome populating them, or human + robotic prosthetic limbs, or human + web search + chat AI + personal note taking app, or the whole human group (collective consciousness), etc.

Some computational theories of consciousness attempt to give a specific, mathematically formalised answer to this question.

Comment by Roman Leventov on What technical topics could help with boundaries/membranes? · 2024-01-05T20:45:32.698Z · LW · GW

Psychology may not be "technical enough" because an adequate mathematical science or process theory is not developed for it, yet, but it's ultimately very important, perhaps critically important: see the last paragraph of https://www.lesswrong.com/posts/AKBkDNeFLZxaMqjQG/gaia-network-a-practical-incremental-pathway-to-open-agency. Davidad apparently thinks that it can be captured with an Infra-Bayesian model of a person/human.

Also on psychology: what is the boundary of personality, where just a "role" (spouse, worker, etc) turns into multiple-personality disorder?

Comment by Roman Leventov on What technical topics could help with boundaries/membranes? · 2024-01-05T20:36:30.868Z · LW · GW

In the most recent episode of his podcast show, Jim Rutt (former president of SFI) and his guest talk about membranes a lot, the word appears 30 times on a transcript page: https://www.jimruttshow.com/cody-moser/

Comment by Roman Leventov on What technical topics could help with boundaries/membranes? · 2024-01-05T19:32:23.664Z · LW · GW

Related, quantum information theory:

The physical meaning of the Holographic Principle., https://chrisfieldsresearch.com/holo-pub-Quanta-2022.pdf, and many other related papers on Quantum Free Energy Principle at https://chrisfieldsresearch.com/

Comment by Roman Leventov on AI Alignment Metastrategy · 2024-01-03T17:34:08.269Z · LW · GW

I think this metastrategy classification is overly simplified to the degree that I'm not sure it is net helpful. I don't see how Hendrycks' "Leviathan safety", Drexler's Open Agency Model, Davidad's OAA, Bengio's "AI pure scientist" and governance proposals (see https://slideslive.com/39014230/towards-quantitative-safety-guarantees-and-alignment), Kaufmann and Leventov's Gaia Network, AI Objectives Institute's agenda (and related Collective Intelligence Project's), Conjecture's CoEms, OpenAI's "AI alignment scientist" agenda, and Critch's h/acc (and related janus et al.'s Cyborgism) straightforwardly lend on this classification, at least not without losing some important nuance.

Furthermore, there is also the missing dimension of [technical strategy, organisational strategy, governance and political strategy] that could perhaps recombine to some degree.

Finally, in the spirit of "passing ideological Turing test" and "describing, not persuading" norms, it would be nice I think to include the criticism of the "conservative strategy" to the same level of fidelity that other metastrategies are criticised here, even if you or others discussed that in some other posts.

Comment by Roman Leventov on SociaLLM: proposal for a language model design for personalised apps, social science, and AI safety research · 2024-01-03T07:16:28.624Z · LW · GW

Announcement

I think SociaLLM has a good chance of getting OpenAI’s “Research into Agentic AI Systems” grant because it addresses both the challenges of the legibility of AI agent's behaviour by making the agent’s behaviour more “human-like” thanks to weight sharing and regularisation techniques/inductive biases described the post, as well as automatic monitoring: detection of duplicity or deception in AI agent's behaviour by comparing agent’s ToMs “in the eyes” of different other interlocutors, building on the work “Collective Intelligence in Human-AI Teams”.

I am looking for co-investigators for this (up to $100k, up to 8 months long) project with hands-on academic or practical experience in DL training (preferably), ML, Bayesian statistics, or NLP. The deadline for the grant application itself is the 20th of January, so I need to find a co-investigator by the 15th of January.

Another requirement for the co-investigator is that they preferably should be in academia, non-profit, or independent at the moment.

I plan to be hands-on during the project in data preparation (cleansing, generation by other LLMs, etc.) and training, too. However, I don’t have any prior experience with DL training, so if I apply for the project alone, this is a significant risk and a likely rejection.

If the project is successful, it could later be extended for further grants or turned into a startup.

If the project is not a good fit for you but you know someone who may be interested, I’d appreciate it a lot if you shared this with them or within your academic network!

Please reach out to me in DMs or at leventov.ru@gmail.com.

Comment by Roman Leventov on A hermeneutic net for agency · 2024-01-02T07:56:24.420Z · LW · GW

A lot of the examples of the concepts that you list already belong to established scientific fields: math, logic, probability, causal inference, ontology, semantics, physics, information theory, computer science, learning theory, and so on. These concepts don't need philosophical re-definition. Respecting the field boundaries, and the ways that fields are connected to each other via other fields (e.g., math and ontology to information theory/CS/learning theory via semantics) is also I think on net a good practice: it's better to focus attention on the fields that are actually most proto-scientific and philosophically confusing: intelligence, sentience, psychology, consciousness, agency, decision making, boundaries, safety, utility, value (axiology), and ethics^[1].

Then, to make the overall idea solid, I think it's necessary to do a couple of extra things (you may already mention this in the post, but I semi-skimmed it and maybe missed these).

First, specify the concepts in this fuzzy proto-scientific area of intelligence, agency, and ethics not in terms of each other, but in terms of (or in a clearly specified connection with) those other scientific fields/ontologies that are already established, enumerated above. For example, a theory of agency should be compatible or connected with (or, specified in terms of) causal inference and learning theories. Theory of boundaries and ethics should be based on physics, information theory, semantics, and learning theory, among other things (cf. scale-free axiology and ethics).
Second, establish feedback loops that test these "proposed" theories of agency (psychology, ethics, decision-making, ethics) both in simulated environments (e.g., with LLM-based agents embodying these proposed theories acting in Minecraft- or Sims-like worlds) and (constrained) real life settings or environments. Note that the obligatory connection to physics, information theory, causal inference, and learning theory will ensure that these test themselves can be counted as scientific.

The good news are that now, there are sufficient (or almost sufficient) affordances to build AI agents that can embody sufficiently realistic and rich versions of these theories in realistic simulated environments as well as just the real life. And I think an actual R&D agenda proposal should be written about this and apply to a Superalignment grant.

There's an instinct to "ground" or "found" concepts. But there's no globally privileged direction of "more grounded" in the space of possible concepts. We have to settle for a reductholistic pluralism——or better, learn to think rightly, which will, as a side effect, make reductholism not seem like settling.

I disagree with the last sentence: "reductholism" should be the settling, as I argue in "For alignment, we should simultaneously use multiple theories of cognition and value". (Note that this view itself is based largely on quantum information theory: see "Information flow in context-dependent hierarchical Bayesian inference".)

^{^}
A counterargument could be made here that although logic, causal inference, ontology, semantics, physics, information theory, CS, learning theory, and so on are fairly established and all have SoTA, mature theories that look solid, these are probably not the final theories in all or many of these fields, and philosophical poking could highlight the problems with these theories, and perhaps this will actually be the key to "solving alignment". I agree that this is in principle possible chain of events, but it looks quite low expected impact to me from the "hermeneutic nets" perspective, so that this agenda is still better focused on the "core confusing" fields (intelligence, agency, ethics, etc.) and treat the established fields and the concepts therein "as given".

Comment by Roman Leventov on AGI will be made of heterogeneous components, Transformer and Selective SSM blocks will be among them · 2023-12-28T14:25:55.150Z · LW · GW

I agree with everything you said. Seems that we should distinguish between a sort of "cooperative" and "adversarial" safety approaches (cf. the comment above). I wrote the entire post as an extended reply to Marc Carauleanu upon his mixed feedback to my idea of adding "selective SSM blocks for theory of mind" to increase the Self-Other Overlap in AI architecture as a pathway to improve safety. Under the view that both Transformer and Selective SSM blocks will survive up until the AGI (if it is going to be created at all, of course), and even with the addition of your qualifications (that AutoML will try to stack these and other types of blocks in some quickly evolving ways), the approach seems solid to me, but only if we also make some basic assumptions about the good faith and cooperativeness of the AutoML / auto takeoff process. If we don't make such assumptions, of course, all bets are off, these "blocks for safety" could just be purged from the architecture.

Comment by Roman Leventov on AGI will be made of heterogeneous components, Transformer and Selective SSM blocks will be among them · 2023-12-28T14:16:41.896Z · LW · GW

I agree that training data governance is not robust to non-cooperative actors. But I think there is a much better chance to achieve a very broad industrial, academic, international, and legal consensus about it being a good way to jigsaw capabilities without sacrificing the raw reasoning ability, which the opponents of compute governance hold as purely counter-productive ("intelligence just makes things better"). That's why I titled my post "Open Agency model can solve the AI regulation dilemma" (emphasis on the last word).

This could even be seen not just as a "safety" measure, but as a truly good regularisation measure of the collective civilisational intelligence: to make intelligence more robust to distributional shifts and paradigm shifts, it's better to compartmentalise it and make communication between the compartments going through a relatively narrow, classical informational channel, namely human language or specific protocols rather than raw DNN activation dynamics.

Comment by Roman Leventov on Gaia Network: a practical, incremental pathway to Open Agency Architecture · 2023-12-27T20:21:55.757Z · LW · GW

BTW, this particular example sounds just like Numer.ai Signals, but Gaia Network is supposed to be more general and not to revolve around the stock market alone. E.g., the same nutritional data could be bought by food companies themselves, logistics companies, public health agencies, etc.

Comment by Roman Leventov on Gaia Network: a practical, incremental pathway to Open Agency Architecture · 2023-12-27T20:16:21.408Z · LW · GW

Thanks for suggestions.

An actual anecdote may look something like this: "We are a startup that creates nutrition assistant and family menu helper app. We collect anonymised data from the users and ensure differential privacy, yada-yada. We want to sell this data to hedge funds that trade food company stocks (so that we can offer the app for free for to our users), but we need to negotiate the terms of these agreements in an ad-hoc way with each hedge fund individually, and we don't have a principled way to come up with a fair price for the data. We would benefit from something like a 'platform' on which we can just publish the API spec of our data and then the platform (i.e., the Gaia Network) takes care of finding buyers for our data and paying us a fair price for it, etc."

Comment by Roman Leventov on AGI will be made of heterogeneous components, Transformer and Selective SSM blocks will be among them · 2023-12-27T20:11:28.150Z · LW · GW

The fact that hybridisation works better than pure architectures (architectures consisting of a single core type of block, we shall say), is exactly the point that Nathan Labenz makes in the podcast and I repeat in the beginning of the post.

(Ah, I actually forgot to repeat this point, apart from noting that Doyle predicted this in his architecture theory.)

Comment by Roman Leventov on The 'Neglected Approaches' Approach: AE Studio's Alignment Agenda · 2023-12-27T14:56:08.812Z · LW · GW

This conversation has prompted me to write "AGI will be made of heterogeneous components, Transformer and Selective SSM blocks will be among them".

Comment by Roman Leventov on On plans for a functional society · 2023-12-25T13:08:30.670Z · LW · GW

we're lacking all 4. We're lacking a coherent map of the polycrisis (if anyone wants to do and/or fund a version of aisafety.world for the polycrisis, I'm interested in contributing)

Joshua Williams created an initial version of a metacrisis map and I suggested to him a couple of days ago to make the development of such a resource more open, e.g., to turn it into a Github repository.

I think there's a ton of funding available in this space, specifically I think speculating on the markets informed by the kind of worldview that allows one to perceive the polycrisis has significant alpha. I think we can make much better predictions about the next 5-10 years than the market, and I don't think most of the market is even trying to make good predictions on those timescales.

Do you mean that it's possible to earn by betting long against the current market sentiment? I think this is wrong for multiple reasons, but perhaps most importantly, because the market specifically doesn't measure how well we are faring on a lot of components of polycrisis -- e.g., market would be great if all people are turned into addicted zombies. Secondly, people don't even try to make predictions in the stock market anymore -- its turned into a completely irrational valve of liquidity that is moved by Elon Musk's tweets, narratives, and memes more than by objective factors.

Comment by Roman Leventov on On plans for a functional society · 2023-12-25T12:52:48.086Z · LW · GW

1.) Clearly state the problems that need to be worked on, and provide reasonable guidance as to where and how they might be worked on
2.) Notice what work is already being done on the problems, and who is doing it (avoid reinventing the wheel/not invented here syndrome; EA is especially guilty of this)
3.) Actively develop useful connections between 2.)
4.) Measure engagement (resource flows) and progress

I posted some parts of my current visions of 1) and 2) here and here. I think these, along with the Gaia Network design that we proposed recently (the Gaia Network is not "A Plan" in its entirety, but a significant portion of it), address @Vaniver's and @kave's points about realism and sociological/psychological viability.

The platform for generating the plan would need to be more-open-than-not, and should be fairly bleeding edge - incorporating prediction markets, consensus seeking (polis), eigenkarma etc

I think this is a mistake to import "democracy" at the vision level. Vision is essentially a very high-level plan, a creative engineering task. These are not decided by averaging opinions. "If you want to kill any idea in the world, get a committee working on it." Also, Deutsch was writing about this in "The Beginning of Infinity" in the chapter about democracy.

We should aggregate desiderata and preferences (see "Preference Aggregation as Bayesian Inference"), but not decisions (plans, engineering designs, visions). These should be created by a coherent creative entity. The same idea is evident in the design of Open Agency Architecture.

we're lacking meaningful 3rd party measurement

If I understand correctly what you are gesturing at here, I think that some high-level agents in the Gaia Network should become a trusted gauge for the "planetary health metrics" we care about.

Comment by Roman Leventov on Gaia Network: a practical, incremental pathway to Open Agency Architecture · 2023-12-24T14:43:27.679Z · LW · GW

Right now, if the Gaia Network already existed, but there were little models and agents on it, there would be no or little advantages (e.g., leveraging the tooling/infra built for the Gaia Network) in joining the network.

This is why I personally think that the bottom-up approach: building these apps and scaling them (thus building up QRFs) first is somewhat more promising path than the top-down approach, the ultimate version of which is the OAA itself, and the research agenda of building Gaia Network is a somewhat milder version, but still top-down-ish. That's why in the comment that I already linked to above, the implication is that these disparate apps/models/"agents" are built first completely independently (mostly as startups), without conforming to any shared protocol (like the Gaia protocol), and only once they grow large and sharing information across the domains becomes evidently valuable to these startups, then the conversation about a shared protocol will find more traction.

Then, why a shared protocol, still? Two reasons:

Practical: it will reduce transaction costs for all the models across the domains to start communicating to improve the predictions of each other. Without a shared protocol, this requires ad-hoc any prospective direction of information sharing. This is the practicality of any platforms, from the Internet itself to Airbnb to SWIFT (bank wires), and Gaia Network should be of this kind, too.
AI and catastrophic risk safety: to ensure some safety against rogue actors (AI or hybrid human-AI teams or whatever) through transparency and built-in mechanisms, we would want as much economic activity to be "on the network" as possible.
- You may say that this is would be a tough political challenge to convince everybody to conform to the network in the name of some AI safety, but surely this would still be a smaller challenge than to abolish much of the current economic system altogether as (apparently) implied by the "vanilla" Davidad's OAA, and as we discuss throughout the article. In fact, this is one of the core points of the article.

Then, even though I advocate for a bottom-up approach above, there is still a room, and even a need for a parallel top-down activity (given the AGI timelines), so these two streams of activity meet each other somewhere in the middle. This is why we are debating all these blue-sky AI safety plans on LessWrong at all; this is why OAA was proposed, and this is why we are now proposing the Gaia Network.

Comment by Roman Leventov on Gaia Network: a practical, incremental pathway to Open Agency Architecture · 2023-12-24T13:03:04.052Z · LW · GW

One completely realistic example of an agent is given in the appendix (an agent that recommends actions to improve soil health or carbon sequestration). Some more examples are given in this comment:

An info agent that recommends me info resources (news, papers, posts, op-eds, books, videos) to consume, based on my current preferences and demands (and info from other others, such as those listed below, or this agent that predicts the personalised information value of the comments on the web)
- Scaling to the group/coordination: optimise informational intake of a team, an organisation, or a family
Learning agent that recommends materials based on the previous learning trajectory and preferences, a-la liirn.space
- Scaling to the group/coordination: coordinate learning experiences and lessons between individual learning agents based on who is on what learning level, availailiity, etc.
Financial agent that recommends spending based on my mid- and long-term goals
- Equivalent of this for an organisation: "business development agent", recommends an org to optimise strategic investments based on the current market situation, goals of the company, other domain-specific models provided (i.e., in the limit, communicated by other Gaia agents responsible for these models), etc.
Investment agent recommends investment strategy based on my current financial situation, financial goals, and other investment goals (such as ESG)
- Scaling to the group/coordination: optimise joint investment strategy for income and investment pools a-la pandopooling.com
Energy agent: decides when to accumulate energy and when to spend it based on the spot energy prices, weather forecast for renewables, and the current and future predicted demands for power
- Scale this to microgrids, industrial/manufacturing sites, etc.

Comment by Roman Leventov on The 'Neglected Approaches' Approach: AE Studio's Alignment Agenda · 2023-12-23T04:42:22.563Z · LW · GW

I absolutely agree that the future TAI may look nothing like the current architectures. Cf. this tweet by Kenneth Stanley, with whom I agree 100%. At the same time, I think it's a methodological mistake to therefore conclude that we should only work on approaches and techniques that are applicable to any AI, in a black-box manner. It's like tying our hands behind our backs. We can and should affect the designs of future TAIs through our research, by demonstrating promise (or inherent limitations) of this or that alignment technique, so that these techniques get or lose traction and are included or excluded from the TAI design. So, we are not just making "assumptions" about the internals of the future TAIs; we are shaping these internals.

We can and should think about the proliferation risks^[1] (i.e., the risks that some TAI will be created by downright rogue actors), but IMO most of that thinking should be on the governance, not technical side. We agree with Davidad here that a good technical AI safety plan should be accompanied with a good governance (including compute monitoring) plan.

^{^}
In our own plan (Gaia Network), we do this in the penultimate paragraph here.

Comment by Roman Leventov on On the future of language models · 2023-12-21T17:19:41.288Z · LW · GW

I think you tied yourself too much to the strict binary classification that you invented (finetuning/scaffolding). You overgeneralise and your classification blocks the truth more than clarifies things.

All the different things that can be done by LLMs: tool use, scaffolded reasoning aka LM agents, RAG, fine-tuning, semantic knowledge graph mining, reasoning with semantic knowledge graph, finetuning for following "virtue" (persona, character, role, style, etc.), finetuning for model checking, finetuning for heuristics for theorem proving, finetuning for generating causal models, (what else?), just don't easily fit into two simple categories with the properties that are consistent within the category.

But I don't understand the sense in which you think finetuning in this context has completely different properties.

In the summary (note: I actually didn't read the rest of the post, I've read only the summary), you write something that implies that finetuning is obscure or un-interpretable:

From a safety perspective, language model agents whose agency comes from scaffolding look greatly superior than ones whose agency comes from finetuning
Because you can get an extremely high degree of transparency by construction

But this totally doesn't apply to these other variants of finetuning that I mentioned. If the LLM creates is a heuristic engine to generate mathematical proofs that are later verified with Lean, it just stops to make any sense to discuss how interpretable or transparent these theorem-proving or model-checking LLM-based heuristic engine.

Comment by Roman Leventov on On the future of language models · 2023-12-20T22:36:18.792Z · LW · GW

Also, I would say, retrieval-augmented generation (RAG) is not just a mundane way to industrialise language model, but an important concept whose properties should be studied separately from scaffolding or fine-tuning or other techniques that I listed in the comment above.

Comment by Roman Leventov on The 'Neglected Approaches' Approach: AE Studio's Alignment Agenda · 2023-12-20T22:26:58.832Z · LW · GW

On (1), cf. this report: "The current portfolio of work on AI risk is over-indexed on work which treats “transformative AI” as a black box and tries to plan around that. I think that we can and should be peering inside that box (and this may involve plans targeted at more specific risks)."

On (2), I'm surprised to read this from you, since you suggested to engineer Self-Other Overlap into LLMs in your AI Safety Camp proposal, if I understood and remember correctly. Do you actually see a line (or a way) of increasing the overlap without furthering ToM and therefore "social capabilities"? (Which ties back to "almost all applied/empirical AI safety work is simultaneously capabilities work".)

Comment by Roman Leventov on On the future of language models · 2023-12-20T22:16:35.266Z · LW · GW

Notable techniques for getting value out of language models that are not mentioned:

Fine-tuning LLMs for model checking (though technically also "fine-tuning", has completely different properties from the kind of fine-tuning discussed in the post)
Similar: fine-tuning LLMs to generate causal models for problems at hand
Using language model for reasoning with and mining semantic knowledge graphs: see MindMap (Wen et al., 2023)

Comment by Roman Leventov on SociaLLM: proposal for a language model design for personalised apps, social science, and AI safety research · 2023-12-20T05:54:25.446Z · LW · GW

In another thread, Marc Carauleanu wrote:

The main worry that I have with regards to your approach is how competitive SociaLLM would be with regards to SOTA foundation models given both (1) the different architecture you plan to use, and (2) practical constraints on collecting the requisite structured data. While it is certainly interesting that your architecture lends itself nicely to inducing self-other overlap, if it is not likely to be competitive at the frontier, then the methods uniquely designed to induce self-other overlap on SociaLLM are likely to not scale/transfer well to frontier models that do pose existential risks. (Proactively ensuring transferability is the reason we focus on an additional training objective and make minimal assumptions about the architecture in the self-other overlap agenda.)

I agree with worries (1) and (2). I think there is a way to de-risk this.

The block hierarchy that is responsible for tracking the local context consists of classic Transformer blocks. Only the user's own history tracking really needs to be an SSM hierarchy because it quickly surpasses the scalability limits of self-attention (also, interlocutor's tracking blocks in private 1-1 or small group chats that can be arbitrarily long, but there is probably no such data available for training). In the public data (such as forums, public chats rooms, Diplomacy and other text games), the interlocutor's history traces would 99% of the time easily fit into 100k symbols, but for the symmetry with user's own state (same weights!) and for having the same representation structure it should mirror the user's own SSM blocks, of course.

With such an approach, the SSM hierarchies could start very small, with only a few blocks or even just a single SSM block (i.e., two blocks in total: one for user's own and one for interlocutor's state), and attach to the middle of the Transformer hierarchy to select from it. However, I think this approach couldn't be just slapped on the tre-trained LLama or another large Transformer LLM model. I suspect the transformer should be co-trained with the SSM blocks to induce the Transformer to make the corresponding representations useful for the SSM blocks. "Pretraining Language Models with Human Preferences" is my intuition pump here.

Regarding the sufficiency and quality of training data, the Transformer hierarchy itself could still be trained on arbitrary texts, as well as the current LLMs. And we can adjust the size of the SSM hierarchies to the amounts of high-quality dialogue and forum data that we are able to obtain. I think this a no-brainer that this design would improve the frontier quality in LLM apps that value personalisation and attunement to the user's current state (psychological, emotional, levels of knowledge, etc.), relative to whatever "base" Transformer model we would take (such as Llama, or any other).

One additional worry is that many of the research benefits of SociaLLM may not be out of reach for current foundation models, and so it is unclear if investing in the unique data and architecture setup is worth it in comparison to the counterfactual of just scaling up current methods.

With this I disagree, I think it's critical for the user state tracking to be energy-based. I don't think there are ways to recapitulate this with auto-regressive Transformer language models (cf. any LeCun's presentation from the last year). There are potential ways to recapitulate this with other language modelling architectures (non-Transformer and non-SSM), but they currently don't hold any stronger promise than SSM, so I don't see any reasons to pick them.

User info

Posts

Comments