LessWrong 2.0 Reader

View: New · Old · Top

← previous page (newer posts) · next page (older posts) →

← previous page (newer posts) · next page (older posts) →

Recent comments

gordon-seidoh-worley on The Intentional Stance, LLMs Edition

I think we can more easily and generally justify the use of the intentional stance. Intentionality requires only the existence of some process (a subject) that can be said to regard things (objects). We can get this in any system that accepts input and interprets that input to generate a signal that distinguishes between object and not object (or for continuous "objects", more or less object).

For example, almost any sensor in a circuit makes the system intentional. Wire together a thermometer and a light that turns on when the temperature is over 0 degrees, off when below, and we have a system that is intentional about freezing temperatures.

Such a cybernetic argument, to me at least, is more appealing because it gets down to base reality immediately and avoid the need to sort out things people often want to lump in with intentionality, like consciousness.

philippe-chlenski on Transcoders enable fine-grained interpretable circuit analysis for language models

Possibly. But there is no optimization pressure from pre-training on the relationship between MLPs and transcoders. The MLPs are the thing that pre-training optimizes (as the "full-precision" master model), while transcoders only need to be maintained to remain in sync with the MLPs

I see. I was in fact misunderstanding this detail in your training setup. In this case, only engineering considerations really remain: these boil down to incorporating multiple transcoders simultaneously and modeling shifting MLP behavior with transcoders. These seem like tractable, although probably nontrivial and, because of the LLM pretraining objective, quite computationally expensive. If transcoders catch on, I hope to see someone with the compute budget for it run this experiment!

aysja on The Intentional Stance, LLMs Edition

Secondly, following Dennett, the point of modeling cognitive systems according to the intentional stance is that we evaluate them on a behavioral basis and that is all there is to evaluate.

I am confused on this point. Several people have stated that Dennett believes something like this, e.g., Quintin and Nora argue [LW · GW] that Dennett is a goal "reductionist," by which I think they mean something like "goal is the word we use to refer to certain patterns of behavior, but it's not more fundamental than that."

But I don't think Dennett believes this. He's pretty critical of behaviorism, for instance, and his essay Skinner Skinned does a good job, imo, of showing why this orientation is misguided. Dennett believes, I think, that things like "goals," "beliefs," "desires," etc. do exist, just that we haven't found the mechanistic or scientific explanation of them yet. But he doesn't think that explanations of intention will necessarily bottom out in just their outward behavior, he expects such explanations to make reference to internal states as well. Dennett is a materialist, so of course at the end of the day all explanations will be in terms of behavior (inward or outward), on some level, much like any physical explanation is. But that's a pretty different claim from "mental states do not exist."

I'm also not sure if you're making that claim here or not, but curious if you disagree with the above?

yanni-kyriacos on yanni's Shortform

That seems fair enough!

kave on LessOnline Festival Updates Thread

(I would agree-react but I can't actually make it)

markvy on The Solution to Sleeping Beauty

Thanks :) the recalibration may take a while… my intuition is still fighting ;)

algon on The Mom Test: Summary and Thoughts

Thank you for this, I'm conducting user interviews right now and there were some suprising things in your review, as well as obviously good ideas that I would probably have missed. Organizing meetups in the field would not have occured to me, and is a good idea.

andy-arditi on Refusal in LLMs is mediated by a single direction

Check out LEACE (Belrose et al. 2023) - their "concept erasure" is similar to what we call "feature ablation" here.

andy-arditi on Refusal in LLMs is mediated by a single direction

Second question is great. We've looked into this a bit, and (preliminarily) it seems like it's the latter (base models learn some "harmful feature," and this gets hooked into by the safety fine-tuned model). We'll be doing more diligence on checking this for the paper.

dagon on ChristianKl's Shortform

[note: I suspect we mostly agree on the impropriety of open selling and dissemination of this data. This is a narrow objection to the IMO hyperbolic focus on government assault risks. ]

I'm unhappy with the phrasing of "targeted by the Chinese government", which IMO implies violence or other real-world interventions when the major threats are "adversary use of AI-enabled capabilities in disinformation and influence operations." Thanks for mentioning blackmail - that IS a risk I put in the first category, and presumably becomes more possible with phone location data. I don't know how much it matters, but there is probably a margin where it does.

I don't disagree that this purchasable data makes advertising much more effective (in fact, I worked at a company based on this for some time). I only mean to say that "targeting" in the sense of disinformation campaigns is a very different level of threat from "targeting" of individuals for government ops.

LessWrong 2.0 Reader

Archive

Recent comments