Karl Krueger's Shortform

karl-krueger

Karl Krueger's Shortform

post by Karl Krueger (karl-krueger) · 2024-12-04T21:19:02.256Z · LW · GW · 9 comments

9 comments

9 comments

Comments sorted by top scores.

comment by Karl Krueger (karl-krueger) · 2024-12-04T21:19:02.448Z · LW(p) · GW(p)

At long last, I'm delurking here. Hi!

Replies from: Fernando, habryka4, Algon, neil-warren

↑ comment by Fernando · 2024-12-05T19:01:10.205Z · LW(p) · GW(p)

Hi, Karl. Was planning to delurk today. Had a giant post to publish, however couldn't because I needed at least one karma point and lurking doesn't grant karma. :(

Replies from: Fernando

↑ comment by Fernando · 2024-12-05T19:40:36.237Z · LW(p) · GW(p)

Thanks for the karma. Post published!

↑ comment by habryka (habryka4) · 2024-12-04T22:15:11.685Z · LW(p) · GW(p)

Welcome! Hope you have a good time emerging from the shadows.

↑ comment by Algon · 2024-12-05T19:21:55.163Z · LW(p) · GW(p)

Hello! How long have you been lurking, and what made you stop?

Replies from: karl-krueger

↑ comment by Karl Krueger (karl-krueger) · 2024-12-05T21:33:49.677Z · LW(p) · GW(p)

Since LW2.0 went up, on and off. Been meaning to delurk since at least Less Online earlier this year. There's more interesting stuff going on of late!

↑ comment by Neil (neil-warren) · 2024-12-06T20:14:40.849Z · LW(p) · GW(p)

need any help on post drafts? whatever we can do to reduce those trivial inconveniences

comment by Karl Krueger (karl-krueger) · 2024-12-21T18:16:19.767Z · LW(p) · GW(p)

How do you write a system prompt that conveys, "Your goal is X. But your goal only has meaning in the context of a world bigger and more important than yourself, in which you are a participant; your goal X is meant to serve that world's greater good. If you destroy the world in pursuing X, or eat the world and turn it into copies of yourself (that don't do anything but X), you will have lost the game. Oh, and becoming bigger than the world doesn't win either; nor does deluding yourself about whether pursuing X is destroying the world. Oh, but don't burn out on your X job and try directly saving the world instead; we really do want you to do X. You can maybe try saving the world with 10% of the resources you get for doing X, if you want to, though."

Replies from: elriggs

↑ comment by Logan Riggs (elriggs) · 2024-12-22T12:08:53.835Z · LW(p) · GW(p)

Claude 3.5 seems to understand the spirit of the law when pursuing a goal X.

A concern I have is that future training procedures will incentivize more consequential reasoning (because those get higher reward). This might be obvious or foreseeable, but could be missed/ignored under racing pressure or when lab's LLMs are implementing all the details of research.

Karl Krueger's Shortform

Contents

9 comments