LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Intrinsic Drives and Extrinsic Misuse: Two Intertwined Risks of AI
jsteinhardt · 2023-10-31T05:10:02.581Z · comments (0)

[link] Increasing IQ is trivial
George3d6 · 2024-03-01T22:43:32.037Z · comments (59)

[link] ARC Evals: Responsible Scaling Policies
Zach Stein-Perlman · 2023-09-28T04:30:37.140Z · comments (9)

[link] "What if we could redesign society from scratch? The promise of charter cities." [Rational Animations video]
Jackson Wagner · 2024-02-18T00:57:50.444Z · comments (7)

[question] Does AI governance needs a "Federalist papers" debate?
azsantosk · 2023-10-18T21:08:26.098Z · answers+comments (4)

Jobs, Relationships, and Other Cults
Ruby · 2024-03-13T05:58:45.043Z · comments (9)

The Serendipity of Density
jefftk (jkaufman) · 2023-12-17T03:50:04.824Z · comments (4)

[link] Forecasting: the way I think about it
Molly (hickman-santini) · 2024-05-09T00:49:01.768Z · comments (4)

Box inversion revisited
Jan_Kulveit · 2023-11-07T11:09:36.557Z · comments (3)

Quantopian contest, but for food intake and weight
Lucent · 2023-11-08T05:41:35.050Z · comments (9)

[link] AI Regulation is Unsafe
Maxwell Tabarrok (maxwell-tabarrok) · 2024-04-22T16:37:55.431Z · comments (41)

[link] you should probably eat oatmeal sometimes
bhauth · 2024-08-25T14:50:37.570Z · comments (29)

[link] What's important in "AI for epistemics"?
Lukas Finnveden (Lanrian) · 2024-08-24T01:27:06.771Z · comments (0)

Californians, tell your reps to vote yes on SB 1047!
Holly_Elmore · 2024-08-12T19:50:09.817Z · comments (24)

Why did ChatGPT say that? Prompt engineering and more, with PIZZA.
Jessica Rumbelow (jessica-cooper) · 2024-08-03T12:07:46.302Z · comments (2)

instruction tuning and autoregressive distribution shift
nostalgebraist · 2024-09-05T16:53:41.497Z · comments (5)

[link] Adverse Selection by Life-Saving Charities
vaishnav92 · 2024-08-14T20:46:23.662Z · comments (14)

You're a Space Wizard, Luke
lsusr · 2024-08-18T05:35:39.238Z · comments (6)

[link] Eight Magic Lamps
Richard_Ngo (ricraz) · 2023-10-14T04:10:02.040Z · comments (0)

Forget Everything (Statistical Mechanics Part 1)
J Bostock (Jemist) · 2024-04-22T13:33:35.446Z · comments (6)

Prepsgiving, A Convergently Instrumental Human Practice
JenniferRM · 2023-11-23T17:24:56.784Z · comments (0)

[link] Legalize butanol?
bhauth · 2023-12-20T14:24:33.849Z · comments (20)

Instrumental deception and manipulation in LLMs - a case study
Olli Järviniemi (jarviniemi) · 2024-02-24T02:07:01.769Z · comments (13)

[link] Understanding Gödel’s completeness theorem
jessicata (jessica.liu.taylor) · 2024-05-27T18:55:02.079Z · comments (0)

[link] Linear infra-Bayesian Bandits
Vanessa Kosoy (vanessa-kosoy) · 2024-05-10T06:41:09.206Z · comments (5)

Simple distribution approximation: When sampled 100 times, can language models yield 80% A and 20% B?
Teun van der Weij (teun-van-der-weij) · 2024-01-29T00:24:27.706Z · comments (5)

Logical Line-Of-Sight Makes Games Sequential or Loopy
StrivingForLegibility · 2024-01-19T04:05:44.782Z · comments (0)

Individually incentivized safe Pareto improvements in open-source bargaining
Nicolas Macé (NicolasMace) · 2024-07-17T18:26:43.619Z · comments (2)

Nitric oxide for covid and other viral infections
Elizabeth (pktechgirl) · 2024-02-07T21:30:03.774Z · comments (6)

Apply to the PIBBSS Summer Research Fellowship
Nora_Ammann · 2024-01-12T04:06:58.328Z · comments (1)

[link] [Paper] Language Models Don't Learn the Physical Manifestation of Language
Bruce W. Lee (bruce-lee) · 2024-02-22T18:52:32.237Z · comments (23)

Stitching SAEs of different sizes
Bart Bussmann (Stuckwork) · 2024-07-13T17:19:20.506Z · comments (12)

Medical Roundup #3
Zvi · 2024-07-09T13:10:06.862Z · comments (4)

[link] Conflict in Posthuman Literature
Martín Soto (martinsq) · 2024-04-06T22:26:04.051Z · comments (1)

[Valence series] 5. “Valence Disorders” in Mental Health & Personality
Steven Byrnes (steve2152) · 2023-12-18T15:26:29.970Z · comments (12)

AI #48: The Talk of Davos
Zvi · 2024-01-25T16:20:26.625Z · comments (9)

AI #70: A Beautiful Sonnet
Zvi · 2024-06-27T14:40:08.087Z · comments (0)

Mud and Despair (Part 4 of "The Sense Of Physical Necessity")
LoganStrohl (BrienneYudkowsky) · 2024-03-07T00:14:23.975Z · comments (0)

When Are Results from Computational Complexity Not Too Coarse?
Dalcy (Darcy) · 2024-07-03T19:06:44.953Z · comments (7)

Whiteboard Pen Magazines are Useful
Johannes C. Mayer (johannes-c-mayer) · 2024-07-12T17:15:33.200Z · comments (6)

D&D.Sci: Whom Shall You Call?
abstractapplic · 2024-07-05T20:53:37.010Z · comments (6)

[link] Things You're Allowed to Do: At the Dentist
rbinnn · 2024-01-28T18:39:33.584Z · comments (16)

[link] Tinker
Richard_Ngo (ricraz) · 2024-04-16T18:26:38.679Z · comments (0)

LLMs as a Planning Overhang
Larks · 2024-07-14T02:54:14.295Z · comments (8)

Are we so good to simulate?
KatjaGrace · 2024-03-04T05:20:03.535Z · comments (24)

Stop talking about p(doom)
Isaac King (KingSupernova) · 2024-01-01T10:57:28.636Z · comments (22)

[link] Jailbreak steering generalization
Sarah Ball · 2024-06-20T17:25:24.110Z · comments (2)

[question] What progress have we made on automated auditing?
LawrenceC (LawChan) · 2024-07-06T01:49:43.714Z · answers+comments (1)

Dialogue on What It Means For Something to Have A Function/Purpose
johnswentworth · 2024-07-15T16:28:56.609Z · comments (5)

[link] On what research policymakers actually need
MondSemmel · 2024-04-23T19:50:12.833Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

jblack on Doing Nothing Utility Function

Ah, that does make it almost impossible then. Such a utility function when paused must have constant value for all outcomes, or it will have incentive to do something. Then in the non-paused state the otherwise reachable utility is either greater than that (in which case it has incentive to prevent being paused) or less than or equal (in which case its best outcome it to make itself paused).

raemon on 2024 Petrov Day Retrospective

I definitely erred explicitly in the direction of the Opt In button looking scary (Ben specifically argued against this but it felt right to me) I have heard from a few people that they didn't even consider pressing it because "c'mon, it's Petrov Day, you don't go clicking big red buttons." I'm not sure if it was the right call. In any case if we do a similar thing in the future my guess is we'll make the opt-in less scary looking.

huera on 2024 Petrov Day Retrospective

Were it not for the big red button™ , I probably would've opted in before reading what this year's Petrov Day was about (I didn't take part, since that would risk too much precious karma). I wonder whether it would be more fitting to make the opt-in option look maximally scary or nonthreatening.

nathan-helm-burger on shminux's Shortform

It would be a message customized deliberately for each human, and worked on gradually over years of subtle convincing arguments. That's how I understand the hypothetical.

I think that an AI competent enough to manage this would have faster easier ways to accomplish the same effect, but I do agree that this would quite likely work.

maxime-riche on COT Scaling implies slower takeoff speeds

If it takes a human 1 month to solve a difficult problem, it seems unlikely that a less capable human who can't solve it within 20 years of effort can still succeed in 40 years

Since the scaling is logarithmic, your example seems to be a strawman.

The real claim debated is more something like:

"If it takes a human 1 month to solve a difficult problem, it seems unlikely that a less capable human who can't solve it within 100 months of effort can still succeed in 10 000 months" And this formulation doesn't seem obviously true.

raemon on 2024 Petrov Day Retrospective

Yes. (That wasn’t meant to be a secret, sorry!)

nisan on 2024 Petrov Day Retrospective

So was the launch code really 000000?

raemon on 2024 Petrov Day Retrospective

I feel satisfied with Ben’s articulation of ‘taking responsibility’ as the primary Petrov virtue. It feels more like a real virtue than the overly consequentialist ‘don’t take actions that would destroy the world’, but it naturally lends itself to the other virtues on our poll from last year, when appropriate.

raemon on 2024 Petrov Day Retrospective

Yeah I find the ‘you want to keep the message consistent for Science’ argument convincing (but think it’s good to still stick with the most reasonable interpretation of what our word was that we can, unless we have a specific reason not to that a reasonable number of nonteammates agree makes sense.)

raemon on "Slow" takeoff is a terrible term for "maybe even faster takeoff, actually"

Fwiw I feel fine, with both slow/fast and smooth/sharp thinking of it as a continuum. Takeoffs and timelines can be slower or faster and compared on that axis.

I agree if you are just treating those as booleans your gonna get confused, but the words seem about as scalar a shorthand as one could hope for without literally switching entirely to more explicit quantification.