LessWrong 2.0 Reader

View: New · Old · Top

← previous page (newer posts) · next page (older posts) →

← previous page (newer posts) · next page (older posts) →

Recent comments

akash-wasil on Questions for labs

@Zach Stein-Perlman [LW · GW], great work on this. I would be interested in you brainstorming some questions that have to do with the lab's stances toward (government) AI policy interventions.

After a quick 5 min brainstorm, here are some examples of things that seem relevant:

I remember hearing that OpenAI lobbied against the EU AI Act– what's up with that?
I heard a rumor that Congresspeople and their teams reached out to Sam/OpenAI after his testimony asked for OpenAI's help to craft legislation around licensing, and then OpenAI refused. Is that true?
Sam said we might need an IAEA for AI at some point– what did he mean by this? At what point would he see that as valuable?
In general, what do labs think the US government should be doing? What proposals would they actively support or even help bring about? (Flagging ofc that there are concerns about actual and perceived regulatory capture, but there are also major advantages to having industry players support & contribute to meaningful regulation).
When asked by Senator Cory Booker something along the lines of "what is your top policy priority right now//what would you do if you were a Senator", Jack Clark said something along the lines of "I would make sure the government can deploy AI successfully. The main risk is that we don't use AI enough, and we need to make sure we stay at the cutting edge." What's up with that?
Why haven't Dario and Jack made public statements about specific government interventions? Do they believe that there are some circumstances under which a moratorium would need to be implemented, labs would need to be nationalized (or internationalized), or something else would need to occur to curb race dynamics? (This could be asked to any of the lab CEOs/policy team leads– I don't mean to be picking on Anthropic, though I think Sam/OpenAI have had more public statements here, and I think the other labs are scoring more poorly across the board//don't fully buy into the risks in the first place.)

turntrout on Refusal in LLMs is mediated by a single direction

If that were true, I'd expect the reactions to a subsequent LLAMA3 weight orthogonalization jailbreak to be more like "yawn we already have better stuff" and not "oh cool, this is quite effective!" Seems to me from reception that this is letting people either do new things or do it faster, but maybe you have a concrete counter-consideration here?

yair-halberstadt on An explanation of evil in an organized world

To be honest, this just feels like the Euthyphro Dilemma all over again. "Good" is defined by what God does. God chooses to run the laws of physics. Laws of physics are "Good". Who gives a damn?

Also this is directly contradictory to Christianity, since the core beliefs of Christianity all assume some level of non-natural intervention in the world (e.g. resurrection of Christ). Same for almost all other religions. So who is this even for?

turntrout on Refusal in LLMs is mediated by a single direction

When we then run the model on harmless prompts, we intervene such that the expression of the "refusal direction" is set to the average expression on harmful prompts:

Note that the average projection measurement and the intervention are performed only at layer $l$ , the layer at which the best "refusal direction" $^r$ was extracted from.

Was it substantially less effective to instead use

a_{harmless}^{'} \leftarrow a_{harmless} + ({avg_proj}_{harmful})^r

We find this result unsurprising and implied by prior work, but include it for completeness. For example, Zou et al. 2023 showed that adding a harmfulness direction led to an 8 percentage point increase in refusal on harmless prompts in Vicuna 13B.

I do want to note that your boost in refusals seems absolutely huge, well beyond 8%? I am somewhat surprised by how huge your boost is.

using this direction to intervene on model activations to steer the model towards or away from the concept (Burns et al. 2022

Burns et al. do activation engineering? I thought the CCS paper didn't involve that.

turntrout on Refusal in LLMs is mediated by a single direction

Because fine-tuning can be a pain and expensive? But you can probably do this quite quickly and painlessly.

If you want to say finetuning is better than this, or (more relevantly) finetuning + this, can you provide some evidence?

nicholaskross on Please stop publishing ideas/insights/research about AI

"But my ideas are likely to fail! Can I share failed ideas?": If you share a failed idea, that saves the other person time/effort they would've spent chasing that idea. This, of course, speeds up that person's progress, so don't even share failed ideas/experiments about AI, in the status quo.

"So where do I privately share such research?" — good question! There is currently no infrastructure for this. I suggest keeping your ideas/insights/research to yourself. If you think that's difficult for you to do, then I suggest not thinking about AI, and doing something else with your time, like getting into factorio 2 or something.

"But I'm impatient about the infrastructure coming to exist!": Apply for a possibly-relevant grant and build it! Or build it in your spare time. Or be ready to help out if/when someone develops this infrastructure.

"But I have AI insights and I want to convert them into money/career-capital/personal-gain/status!": With that kind of brainpower/creativity, you can get any/all of those things pretty efficiently without publishing AI research, working at a lab, advancing a given SOTA, or doing basically (or literally) anything that differentially speeds up AI capabilities. This, of course, means "work on the object-level problem, without routing that work through AI capabilities", which is often as straightforward "do it yourself".

"But I'm wasting my time if I don't get involved in something related to AGI!": "I want to try LSD, but it's only available in another country. I could spend my time traveling to that country, or looking for mushrooms, or even just staying sober. Therefore, I'm wasting my time unless I immediately inject 999999 fentanyl."

yair-halberstadt on Can stealth aircraft be detected optically?

Lens and CCD technology is not trivial at those speeds and insane angular resolution.

But we can easily capture a picture of a fighter jet when it's close. And the further it is the higher the angular resolution required, but also the lower the angular speed, so do those cancel out to make it easier, or it doesn't work like that?

tamsin-leake on Please stop publishing ideas/insights/research about AI

One straightforward alternative is to just not do that; I agree it's not very satisfying but it should still be the action that's pursued if it's the one that has more utility.

I wish I had better alternatives, but I don't. But the null action is an alternative.

jacques-thibodeau on Shane Legg's necessary properties for every AGI Safety plan

I'm going to assume that Shane Legg has thought about it more and read more of the existing work than many of us combined. Certainly, there are smart people who haven't thought about it much, but Shane is definitely not one of them. He only had a short 5-minute talk, but I do hope to see a longer treatment on how he expects we will fully solve necessary property 3.

matheus-popst on tlevin's Shortform

I'm not a decel, but the way this stuff often is resolved is that there are crazy people that aren't taken seriously by the managerial class but that are very loud and make obnoxious asks. Think the evangelicals against abortion or the Columbia protestors.

Then there is some elite, part of the managerial class, that makes reasonable policy claims. For Abortion, this is Mitch McConnel, being disciplined over a long period of time in choosing the correct judges. For Palestine, this is Blinken and his State Department bureaucracy.

The problem with decels is that theoretically they are part of the managerial class themselves. Or at least, they act like they are. They call themselves rationalists, read Eliezer and Scott Alexander, and what not. But the problem is that it's very hard for an uninterested third party to take seriously these Overton Window bogous claims from people that were supposed to be measured, part of the managerial class.

You need to split. There are the crazy ones that people don't take seriously, but will move the managerial class. And there are the serious people that EA Money will send to D.C. to work at Blumenthal's office. This person needs to make small policy requests that will sabotage IA, without looking so. And slowly, you get policy wins and you can sabotage the whole effort.

LessWrong 2.0 Reader

Archive

Recent comments