LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

← previous page (newer posts) · next page (older posts) →

Recent comments

t3t on What prevents SB-1047 from triggering on deep fake porn/voice cloning fraud?

reasonably publicly accessible by an ordinary person from sources other than a covered model or covered model derivative

Seems like it'd pretty obviously cover information generated by non-covered models that are routinely used by many ordinary people (as open source image models currently are).

As a sidenote, I think the law is unfortunately one of those pretty cursed domains where it's hard to be very confident of anything as a layman without doing a lot of your own research, and you can't even look at experts speaking publicly on the subject since they're often performing advocacy, rather than making unbiased predictions about outcomes. You could try to hire a lawyer for such advice, but it seems to be pretty hard to find lawyers who are comfortable giving their clients quantitative (probabilistic) and conditional estimates. Maybe this is better once you're hiring for e.g. general counsel of a large org, or maybe large tech company CEOs have to deal with the same headaches that we do. Often your best option is to just get a basic understanding of how relevant parts of the legal system work, and then do a lot of research into e.g. relevant case law, and then sanity-check your reasoning and conclusions with an actual lawyer specialized in that domain.

peterbarnett on AI #83: The Mask Comes Off

Joshua Achiam is Head of Mission Alignment ("working across the company to ensure that we get all pieces (and culture) right to be in a place to succeed at the mission"), this is not technical AI alignment.

abramdemski on Why is o1 so deceptive?

I want to draw a clear distinction between the hypothesis I mention in the OP (your 'causal' explanation) and the 'acausal' explanation you mention here. You were blending the two together during our conversation, but I think there are important differences.

In particular, I am interested in the question: does o1 display any "backwards causation" from answers to chain-of-thought? Would its training result in chain-of-thought that is optimized to justify a sort of answer it tends to give (eg, hallucinated URLs)?

This depends on details of the training which we may not have enough information on (plus potentially complex reasoning about the consequences of such reasoning).

zach-stein-perlman on [Completed] The 2024 Petrov Day Scenario

Update with two new responses:

I think this is 10 generals, 1 petrov, and one other person (either the other petrov or a citizen, not sure, wasn't super rigorous)

erich_grunewald on Bing Chat is blatantly, aggressively misaligned

Yeah that makes sense. I think I underestimated the extent to which "warning shots" are largely defined post-hoc, and events in my category ("non-catastrophic, recoverable accident") don't really have shared features (or at least features in common that aren't also there in many events that don't lead to change).

sahil-1 on Why is o1 so deceptive?

Great! I'd love to have included a remark that one, as a human, might anticipate forward-chainy/rational reasoning in these systems because we're often taking the "thought" metaphor seriously/literally in the label "chain-of-thought", rather than backwardy/rationalization "reasoning".

But since it is is at least somewhat intelligent/predictive, it can make the move of "acausal collusion" with its own tendency to hallucinate, in generating its "chain"-of-"thought". That is, the optimization to have chain-of-thought in correspondence with its output can work in the backwards direction, cohering with bad output instead of leading to better output, a la partial agency [? · GW].

(Admittedly human thoughts do a lot of rationalization as well. So maybe the mistake is in taking directionality implied by "chain" too seriously?)

Maybe this is obvious [LW · GW], but it could become increasingly reckless to not notice when you're drawing the face of "thoughts" or "chains" on CoT shoggoth-movements . You can be misled into thinking that the shoggoth is less able to deceive than it actually is.

Less obvious but important: in the reverse direction, drawing "hacker faces" on the shoggoth, as in the case of the Docker hack (section 4.2.1), can mislead into thinking that the shoggoth "wants" to or tends to hack/undermine/power-seek more than it actually, independently does. It seems at least somewhat relevant that the docker vulnerability was exploited for a challenge that was explicitly about exploiting vulnerability. Even though it was an impressive meta-hack, one must wonder how much this is cued by the prompt and therefore is zero evidence for an autonomy telos---which is crucial for the deceptive optimizer story---even though mechanistically possible.

(The word "independently" above is important: if it takes human "misuse"/participation to trigger its undermining personas, we also might have more of a continuous shot at pausing/shutdown or even corrigibilty.)

I was going to post this as a comment, but there's also an answer here: I'd say calling o1 "deceptive" could be as misleading as calling it aligned if it outputs loving words.

It has unsteady referentiality, at least from the POV of the meanings of us life-forms. Even though it has some closeness to our meanings and referentiality, the quantity of the unsteadiness of that referentiality can be qualitative. Distinguishing "deceptively aligned mesa-optimizer" from "the tentacles of the shoggoth I find it useful to call 'words' don't work like 'words' in some annoying ways" is important, in order to protect some of that (quantitatively-)qualitative difference. Both for not dismissing risks and for not hallucinating them.

wei-dai on AI #83: The Mask Comes Off

Apparently the current funding round hasn't closed yet and might be in some trouble, and it seems much better for the world if the round was to fail or be done at a significantly lower valuation (in part to send a message to other CEOs not to imitate SamA's recent behavior). Zvi saying that $150B greatly undervalues OpenAI at this time seems like a big unforced error, which I wonder if he could still correct in some way.

sodium on Mira Murati leaves OpenAI/ OpenAI to remove non-profit control

Also from WSJ

abstractapplic on D&D.Sci: Whom Shall You Call? [Evaluation and Ruleset]

Thanks for a good one

I'm glad you feel that way about this scenario. I wish I did . . .

(For future reference, on the off-chance you haven't seen it: there's a compilation of all the past scenarios here [LW · GW], handily rated by quality and steamrollability.)

One thing that perhaps would make it easier was if the web interactive could tell whether or not your selection was the optimal one directly, and possibly how higher your expected price was than the optimal price (I first plugged mine in, then had to double check with your table out here)

. . . huh. I feel conflicted about this on aesthetic grounds - like, Reality doesn't come with big flashing signs saying "EV-maxxing solution reached!" when you reach an EV-maxxing solution - but it does sound both convenient to have and easy to set up. Might try adding this functionality to the interactive for the next one; would be curious to hear what anyone else who happens to be reading this comment thinks.

Anyway, greetings, and looking forward to seeing the next one.

Good to have you on board!

abramdemski on Wei Dai's Shortform

yyyep

LessWrong 2.0 Reader

Archive

Recent comments