LessWrong 2.0 Reader

View: New · Old · Top

← previous page (newer posts) · next page (older posts) →

A Nightmare for Eliezer
Madbadger · 2009-11-29T00:50:11.700Z · comments (75)

Rationality Quotes November 2009
Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2009-11-29T23:36:43.948Z · comments (281)

← previous page (newer posts) · next page (older posts) →

Recent comments

erik-jenner on Please stop publishing ideas/insights/research about AI

I think the status quo around publishing safety research is mostly fine (though being a bit more careful seems good); more confidently, I think going as far as the vibe of this post suggests would be bad.

Some possible cruxes, or reasons the post basically didn't move my view on that:

Most importantly, I think the research published by people working on x-risk tends to overall help safety/alignment more than capabilities.
- I suspect the main disagreement might be what kind of research is needed to make AI go well, and whether the research currently happening helps.
- Probably less importantly, I disagree a bit about how helpful that research likely is for advancing capabilities. In particular, I don't buy the argument that safety researchers have unusually good ideas/research compared to capability researchers at top labs (part of this is that my impression is capabilities aren't mainly bottlenecked by ideas, though of course sufficiently good ideas would help).
- It's getting harder to draw the boundary since people use "safety" or "alignment" for a lot of things now. So, to be clear, I'm talking about research published by people who think there are catastrophic risks from AI and care a lot about preventing those, it seems like that's your target audience.
Secondarily, longer timelines are only helpful if useful things are happening, and I think if everyone working on x-risk stopped publishing their research, way fewer useful things would happen on the research side. Maybe the plan is to mostly use the additional time for policy interventions? I think that's also complicated though (so far, visibly advancing capabilities have been one of the main things making policy progress feasible). Overall, I think more time would help, but it's not clear how much and I'm not even totally sure about the sign (taking into account worries from hardware overhang).

I think most of these are pretty long-standing disagreements, and I don't think the post really tries to argue its side of them, so my guess is it's not going to convince the main people it would need to convince (who are currently publishing prosaic safety/alignment research). That said, if someone hasn't thought at all about concepts like "differentially advancing safety" or "capabilities externalities," then reading this post would probably be helpful, and I'd endorse thinking about those issues. And I agree that some of the "But ..." objections you list are pretty weak.

quetzal_rainbow on quetzal_rainbow's Shortform

@jessicata [LW · GW] once wrote "Everyone wants to be a physicalist but no one wants to define physics". I decided to check SEP article on physicalism and found that, yep, it doesn't have definition of physics:

Carl Hempel (cf. Hempel 1969, see also Crane and Mellor 1990) provided a classic formulation of this problem: if physicalism is defined via reference to contemporary physics, then it is false — after all, who thinks that contemporary physics is complete? — but if physicalism is defined via reference to a future or ideal physics, then it is trivial — after all, who can predict what a future physics contains? Perhaps, for example, it contains even mental items. The conclusion of the dilemma is that one has no clear concept of a physical property, or at least no concept that is clear enough to do the job that philosophers of mind want the physical to play.
<...>
Perhaps one might appeal here to the fact that we have a number of paradigms of what a physical theory is: common sense physical theory, medieval impetus physics, Cartesian contact mechanics, Newtonian physics, and modern quantum physics. While it seems unlikely that there is any one factor that unifies this class of theories, perhaps there is a cluster of factors — a common or overlapping set of theoretical constructs, for example, or a shared methodology. If so, one might maintain that the notion of a physical theory is a Wittgensteinian family resemblance concept.

This surprised me because I have a definition of a physical theory and assumed that everyone else uses the same.

Perhaps my personal definition of physics is inspired by Engels's "Dialectics of Nature": "Motion is the mode of existence of matter." Assuming "matter is described by physics," we are getting "physics is the science that reduces studied phenomena to motion." Or, to express it in a more analytical manner, "a physicalist theory is a theory that assumes that everything can be explained by reduction to characteristics of space and its evolution in time."

For example, "vacuum" is a part of space with a "zero" value in all characteristics. A "particle" is a localized part of space with some non-zero characteristic. A "wave" is part of space with periodic changes of some characteristic in time and/or space. We can abstract away "part of space" from "particle" and start to talk about a particle as a separate entity, and speed of a particle is actually a derivative of spatial characteristic in time, and force is defined as the cause of acceleration, and mass is a measure of resistance to acceleration given the same force, and such-n-such charge is a cause of such-n-such force, and it all unfolds from the structure of various pure spatial characteristics in time.

The tricky part is, "Sure, we live in space and time, so everything that happens is some motion. How to separate physicalist theory from everything else?"

Let's imagine that we have some kind of "vitalist field." This field interacts with C, H, O, N atoms and also with molybdenum; it accelerates certain chemical reactions, and if you prepare an Oparin-Haldane soup and radiate it with vitalist particles, you will soon observe autocatalytic cycles resembling hypothetical primordial life. All living organisms utilize vitalist particles in their metabolic pathways, and if you somehow isolate them from an outside source of particles, they'll die.

Despite having a "vitalist field," such a world would be pretty much physicalist.

An unphysical vitalist world would look like this: if you have glowing rocks and a pile of organic matter, the organic matter is going to transform into mice. Or frogs. Or mosquitoes. Even if the glowing rocks have a constant glow and the composition of the organic matter is the same and the environment in a radius of a hundred miles is the same, nobody can predict from any observables which kind of complex life is going to emerge. It looks like the glowing rocks have their own will, unquantifiable by any kind of measurement.

The difference is that the "vitalist field" in the second case has its own dynamics not reducible to any spatial characteristics of the "vitalist field"; it has an "inner life."

akash-wasil on Questions for labs

@Zach Stein-Perlman [LW · GW], great work on this. I would be interested in you brainstorming some questions that have to do with the lab's stances toward (government) AI policy interventions.

After a quick 5 min brainstorm, here are some examples of things that seem relevant:

I remember hearing that OpenAI lobbied against the EU AI Act– what's up with that?
I heard a rumor that Congresspeople and their teams reached out to Sam/OpenAI after his testimony. They allegedly asked for OpenAI's help to craft legislation around licensing, and then OpenAI refused. Is that true?
Sam said we might need an IAEA for AI at some point– what did he mean by this? At what point would he see that as valuable?
In general, what do labs think the US government should be doing? What proposals would they actively support or even help bring about? (Flagging ofc that there are concerns about actual and perceived regulatory capture, but there are also major advantages to having industry players support & contribute to meaningful regulation).
Senator Cory Booker recently asked Jack Clark something along the lines of "what is your top policy priority right now//what would you do if you were a Senator." Jack responded with something along the lines of "I would make sure the government can deploy AI successfully. We need a testing regime to better understand risks, but the main risk is that we don't use AI enough, and we need to make sure we stay at the cutting edge." What's up with that?
Why haven't Dario and Jack made public statements about specific government interventions? Do they believe that there are some circumstances under which a moratorium would need to be implemented, labs would need to be nationalized (or internationalized), or something else would need to occur to curb race dynamics? (This could be asked to any of the lab CEOs/policy team leads– I don't mean to be picking on Anthropic, though I think Sam/OpenAI have had more public statements here, and I think the other labs are scoring more poorly across the board//don't fully buy into the risks in the first place.)
Big tech is spending a lot of money on AI lobbying. How much is each lab spending (this is something you can estimate with publicly available data), and what are they actually lobbying for/against?

I imagine there's a lot more in this general category of "labs and how they are interacting with governments and how they are contributing to broader AI policy efforts", and I'd be excited to see AI Lab Watch (or just you) dive into this more.

turntrout on Refusal in LLMs is mediated by a single direction

If that were true, I'd expect the reactions to a subsequent LLAMA3 weight orthogonalization jailbreak to be more like "yawn we already have better stuff" and not "oh cool, this is quite effective!" Seems to me from reception that this is letting people either do new things or do it faster, but maybe you have a concrete counter-consideration here?

yair-halberstadt on An explanation of evil in an organized world

To be honest, this just feels like the Euthyphro Dilemma all over again. "Good" is defined by what God does. God chooses to run the laws of physics. Laws of physics are "Good". Who gives a damn?

Also this is directly contradictory to Christianity, since the core beliefs of Christianity all assume some level of non-natural intervention in the world (e.g. resurrection of Christ). Same for almost all other religions. So who is this even for?

turntrout on Refusal in LLMs is mediated by a single direction

When we then run the model on harmless prompts, we intervene such that the expression of the "refusal direction" is set to the average expression on harmful prompts:

Note that the average projection measurement and the intervention are performed only at layer $l$ , the layer at which the best "refusal direction" $^r$ was extracted from.

Was it substantially less effective to instead use

a_{harmless}^{'} \leftarrow a_{harmless} + ({avg_proj}_{harmful})^r

We find this result unsurprising and implied by prior work, but include it for completeness. For example, Zou et al. 2023 showed that adding a harmfulness direction led to an 8 percentage point increase in refusal on harmless prompts in Vicuna 13B.

I do want to note that your boost in refusals seems absolutely huge, well beyond 8%? I am somewhat surprised by how huge your boost is.

using this direction to intervene on model activations to steer the model towards or away from the concept (Burns et al. 2022

Burns et al. do activation engineering? I thought the CCS paper didn't involve that.

turntrout on Refusal in LLMs is mediated by a single direction

Because fine-tuning can be a pain and expensive? But you can probably do this quite quickly and painlessly.

If you want to say finetuning is better than this, or (more relevantly) finetuning + this, can you provide some evidence?

nicholaskross on Please stop publishing ideas/insights/research about AI

"But my ideas are likely to fail! Can I share failed ideas?": If you share a failed idea, that saves the other person time/effort they would've spent chasing that idea. This, of course, speeds up that person's progress, so don't even share failed ideas/experiments about AI, in the status quo.

"So where do I privately share such research?" — good question! There is currently no infrastructure for this. I suggest keeping your ideas/insights/research to yourself. If you think that's difficult for you to do, then I suggest not thinking about AI, and doing something else with your time, like getting into factorio 2 or something.

"But I'm impatient about the infrastructure coming to exist!": Apply for a possibly-relevant grant and build it! Or build it in your spare time. Or be ready to help out if/when someone develops this infrastructure.

"But I have AI insights and I want to convert them into money/career-capital/personal-gain/status!": With that kind of brainpower/creativity, you can get any/all of those things pretty efficiently without publishing AI research, working at a lab, advancing a given SOTA, or doing basically (or literally) anything that differentially speeds up AI capabilities. This, of course, means "work on the object-level problem, without routing that work through AI capabilities", which is often as straightforward "do it yourself".

"But I'm wasting my time if I don't get involved in something related to AGI!": "I want to try LSD, but it's only available in another country. I could spend my time traveling to that country, or looking for mushrooms, or even just staying sober. Therefore, I'm wasting my time unless I immediately inject 999999 fentanyl."

yair-halberstadt on Can stealth aircraft be detected optically?

Lens and CCD technology is not trivial at those speeds and insane angular resolution.

But we can easily capture a picture of a fighter jet when it's close. And the further it is the higher the angular resolution required, but also the lower the angular speed, so do those cancel out to make it easier, or it doesn't work like that?

tamsin-leake on Please stop publishing ideas/insights/research about AI

One straightforward alternative is to just not do that; I agree it's not very satisfying but it should still be the action that's pursued if it's the one that has more utility.

I wish I had better alternatives, but I don't. But the null action is an alternative.

LessWrong 2.0 Reader

Archive

Recent comments