Posts

Comments

Comment by Ben123 on Llama We Doing This Again? · 2023-07-26T13:43:16.525Z · LW · GW

The other examples given at other safety levels are also bad, but it is worth noting that GPT-4 and Claude-2’s responses to this were if anything worse, since they flat out refuse to paly along and instead say ‘I am a large language model.’ In GPT-4’s case, this was despite an explicit system instruction I have put in to never say that.

 

I tried with GPT4 several times, and it played along correctly, though one response started with "As a New Yorker-based AI..."

Comment by Ben123 on Inequality Penalty: Morality in Many Worlds · 2023-02-12T04:25:33.652Z · LW · GW

To clarify those links are just generally about ethical implications of MWI. I don't think I've seen the inequality argument before!

Comment by Ben123 on Inequality Penalty: Morality in Many Worlds · 2023-02-12T04:18:14.005Z · LW · GW

Related LessWrong discussions: "Ethics in many worlds" (2020), "Living in Many Worlds" (2008), and some others. MWI ethics are also covered in this 80,000 Hours podcast episode. Mind-bending stuff.

Comment by Ben123 on Sacred Cash · 2023-02-11T15:15:27.618Z · LW · GW

They had to give you a toaster instead

Looks like this link is broken

Comment by Ben123 on AGI Safety FAQ / all-dumb-questions-allowed thread · 2022-08-24T16:30:15.114Z · LW · GW

Does the inner / outer distinction complicate the claim that all current ML systems are utility maximizers? The gradient descent algorithm performs a simple kind of optimization in the training phase. But once the model is trained and in production, it doesn't seem obvious that the "utility maximizer" lens is always helpful in understanding its behavior.

Comment by Ben123 on Morality is Scary · 2021-12-03T03:20:44.030Z · LW · GW

You could read the status game argument the opposite way: Maybe status seeking causes moral beliefs without justifying them, in the same way that it can distort our factual beliefs about the world. If we can debunk moral beliefs by finding them to be only status-motivated, the status explanation can actually assist rational reflection on morality.

Also the quote from The Status Game conflates purely moral beliefs and factual beliefs in a way that IMO weakens its argument. It's not clear that many of the examples of crazy value systems would survive full logical and empirical information.

Comment by Ben123 on EDT with updating double counts · 2021-10-14T10:23:04.680Z · LW · GW

I think the agent should take the bet, and the double counting is actually justified. Epistemic status: Sleep deprived.

The number of clones that end up betting along with the agent is an additional effect of its decision that EDT-with update is correctly accounting for. Since "calculator says X" is evidence that "X = true", selecting only clones that saw "calc says X" gives you better odds. What seems like a superfluous second update is really an essential step -- computing the number of clones in each branch.

Consider this modification: All N clones bet iff you do, using their own calculator to decide whether to bet on X or ¬X.

This reformulation is just the basic 0-clones problem repeated, and it recommends no bet.

if X, EVT = ¯100 = 
      0.99 × N winners × $10
    - 0.01 × N losers × $1000
if ¬X, EVT = ¯100 = 
      0.99 × N winners × $10
    - 0.01 × N losers × $1000

Now recall the "double count" calculation for the original problem.

if X, EVT = 9900 = 0.99 × N winners × $10
if ¬X, EVT = ¯10 = -0.01 × N losers × $1000

Notice what's missing: The winners when ¬X and, crucially, the losers when X. This is a real improvement in value -- if you're one of the clones when X is true, there's no longer any risk of losing money. 


 

Comment by Ben123 on Pascal's Mugging: Tiny Probabilities of Vast Utilities · 2012-06-07T17:52:41.701Z · LW · GW

Is that a general solution? What about this: "Give me five dollars or I will perform an action, the disutility of which will be equal to twice that of you giving me five dollars, multiplied by the reciprocal of the probability of this statement being true."