Posts

The Problem With the Word ‘Alignment’ 2024-05-21T03:48:26.983Z

Paradigms and Theory Choice in AI: Adaptivity, Economy and Control 2023-08-28T22:19:11.167Z

Announcing “Key Phenomena in AI Risk” (facilitated reading group) 2023-05-09T00:31:10.294Z

Reflections on the PIBBSS Fellowship 2022 2022-12-11T21:53:19.690Z

particlemania's Shortform 2022-12-08T06:21:47.917Z

The economy as an analogy for advanced AI systems 2022-11-15T11:16:03.750Z

Epistemic Artefacts of (conceptual) AI alignment research 2022-08-19T17:18:47.941Z

[Linkpost] Danger of motivatiogenesis in interdisciplinary work 2021-11-25T00:13:46.733Z

Comments

Comment by particlemania on Recent AI model progress feels mostly like bullshit · 2025-04-09T17:05:37.021Z · LW · GW

I expect it matters to the extent we care about whether the generalizing to the new question is taking place in the expensive pretraining phase, or in the active in-context phase.

Comment by particlemania on Historiographical Compressions: Renaissance as An Example · 2025-03-03T12:29:51.137Z · LW · GW

Not to pick on you specifically, but just as a general comment, I'm getting a bit worried about the rationalist decontextualized content policing. It seems it usually goes like this: someone cultivates an epistemological practice (say how to extract conceptual insights from diverse practices) -> they decide to cross-post their thoughts on a community blog interested in epistemology -> somebody else unfamiliar with the former's body of work comes across it -> interprets it into a pattern they might rightfully have identified as critique-worthy -> dump the criticism there. So maybe it'd be better if comments were written by people who can click through the author's profile to interpret the post in the right context.

[Epistemic status of this comment: Performative, but not without substance.]

Comment by particlemania on The Problem With the Word ‘Alignment’ · 2024-06-16T12:08:29.640Z · LW · GW

I would agree that it would be good and reasonable to have a term to refer to the family of scientific and philosophical problem spanned by this space. At the same time, as the post says, the issue is when there is semantic dilution, people talking past each other, and coordination-inhibiting ambiguity.

P3 seems helpful but insufficient for good long term outcomes

Now take a look at something I could check with a simple search: an ICML Workshop that uses the term alignment mostly to mean P3 (task-reliability) https://arlet-workshop.github.io/

One might want to use alignment one way or the other, and be careful of the limited overlap with P3 in our own registers, but by the time the larger AI community has picked up on the use-semantics of 'RLHF is an alignment technique' and associated alignment primarily with task-reliability, you'd need some linguistic interventions and deliberation to clear the air.

Comment by particlemania on The Problem With the Word ‘Alignment’ · 2024-06-16T11:53:17.943Z · LW · GW

First of all, these are all meant to denote very rough attempts at demarcating research tastes.

It seems possible to be aiming to solve P1 without thinking much of P4, if a) you advocate ~Butlerian pause, or b) if you are working on aligned paternalism as the target behavior (where AI(s) are responsible for keeping humans happy, and humans have no residual agency or autonomy remaining).

Also a lot of people who focus on the problem from a P4 perspective tend to focus on the human-AI interface, where most of the relevant technical problems lie, but this might reduce their attention on issues of mesa-optimizers or emergent agency despite the massive importance of those issues to their project in the long run.

Comment by particlemania on "Concepts of Agency in Biology" (Okasha, 2023) - Brief Paper Summary · 2023-07-08T20:12:16.640Z · LW · GW

Okasha's paper is addressing emerging discussions in biology that are talking about organisms-as-agents in particular, otherwise being called the Return of the Organism turn in philosophy of biology.

In the paper, he adds "Various concepts have been offered as ways of fleshing out this idea of organismic autonomy, including goal-directedness, functional organization, emergence, self-maintenance, and individuality. Agency is another possible candidate for the job."

This seems like a reasonable stance so far as I can tell, since organisms seem to have some structural integrity -- in what can make delineated cartesian boundaries well-defined.

For collectives, a similar discussion may surface additional upsides and downsides to agency concepts, that may not apply at organism levels.

Comment by particlemania on Wittgenstein's Language Games and the Critique of the Natural Abstraction Hypothesis · 2023-03-17T06:18:23.395Z · LW · GW

My understanding of Steel Late Wittgenstein's response would be that you could agree with that words and concepts are distinct, and mapping is not always 1-1, but that what concepts get used is also significantly influenced by which features of the world are useful in some contexts of language (/word) use.

Comment by particlemania on Reward is not Necessary: How to Create a Compositional Self-Preserving Agent for Life-Long Learning · 2023-01-12T19:08:41.712Z · LW · GW

Rewards and Utilities are different concepts. To reject that reward is necessary to get/build agency is not the same thing as rejecting EU maximization as a basin of idealized agency.

Comment by particlemania on You can still fetch the coffee today if you're dead tomorrow · 2022-12-25T21:20:39.587Z · LW · GW

As an addendum, it seems to me that you may not necessarily need a 'long-term planner' (or 'time-unbounded agent') in the environment. A similar outcome may also be attainable if the environment contains a tiling of time-bound agents who can all trade across each other in ways such that the overall trade network implements long term power seeking.

Comment by particlemania on particlemania's Shortform · 2022-12-08T06:21:48.507Z · LW · GW

Concept Dictionary.

Concepts that I intend to use or invoke in my writings later, or are parts of my reasoning about AI risk or related complex systems phenomena.

User info

Posts

Comments

Concept Dictionary.