LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

(The) Lightcone is nothing without its people: LW + Lighthaven's big fundraiser
habryka (habryka4) · 2024-11-30T02:55:16.077Z · comments (214)

OpenAI Email Archives (from Musk v. Altman and OpenAI blog)
habryka (habryka4) · 2024-11-16T06:38:03.937Z · comments (80)

Alignment Faking in Large Language Models
ryan_greenblatt · 2024-12-18T17:19:06.665Z · comments (53)

[link] Biological risk from the mirror world
jasoncrawford · 2024-12-12T19:07:06.305Z · comments (30)

The Online Sports Gambling Experiment Has Failed
Zvi · 2024-11-11T14:30:04.371Z · comments (30)

The Field of AI Alignment: A Postmortem, and What To Do About It
johnswentworth · 2024-12-26T18:48:07.614Z · comments (43)

You are not too "irrational" to know your preferences.
DaystarEld · 2024-11-26T15:01:42.996Z · comments (50)

Ayn Rand’s model of “living money”; and an upside of burnout
AnnaSalamon · 2024-11-16T02:59:07.368Z · comments (58)

What Goes Without Saying
sarahconstantin · 2024-12-20T18:00:06.363Z · comments (11)

Frontier Models are Capable of In-context Scheming
Marius Hobbhahn (marius-hobbhahn) · 2024-12-05T22:11:17.320Z · comments (24)

Making a conservative case for alignment
Cameron Berg (cameron-berg) · 2024-11-15T18:55:40.864Z · comments (68)

[link] Understanding Shapley Values with Venn Diagrams
Carson L · 2024-12-06T21:56:43.960Z · comments (32)

Orienting to 3 year AGI timelines
Nikola Jurkovic (nikolaisalreadytaken) · 2024-12-22T01:15:11.401Z · comments (34)

Communications in Hard Mode (My new job at MIRI)
tanagrabeast · 2024-12-13T20:13:44.825Z · comments (24)

o1 is a bad idea
abramdemski · 2024-11-11T21:20:24.892Z · comments (38)

Neutrality
sarahconstantin · 2024-11-13T23:10:05.469Z · comments (27)

[link] When Is Insurance Worth It?
kqr · 2024-12-19T19:07:32.573Z · comments (57)

[link] Gradient Routing: Masking Gradients to Localize Computation in Neural Networks
cloud · 2024-12-06T22:19:26.717Z · comments (12)

"It's a 10% chance which I did 10 times, so it should be 100%"
egor.timatkov · 2024-11-18T01:14:27.738Z · comments (57)

o3
Zach Stein-Perlman · 2024-12-20T18:30:29.448Z · comments (148)

[link] o1: A Technical Primer
Jesse Hoogland (jhoogland) · 2024-12-09T19:09:12.413Z · comments (17)

Repeal the Jones Act of 1920
Zvi · 2024-11-27T15:00:06.801Z · comments (23)

“Alignment Faking” frame is somewhat fake
Jan_Kulveit · 2024-12-20T09:51:04.664Z · comments (13)

[link] China Hawks are Manufacturing an AI Arms Race
garrison · 2024-11-20T18:17:51.958Z · comments (42)

Subskills of "Listening to Wisdom"
Raemon · 2024-12-09T03:01:18.706Z · comments (16)

[question] Which things were you surprised to learn are not metaphors?
Eric Neyman (UnexpectedValues) · 2024-11-21T18:56:18.025Z · answers+comments (79)

"The Solomonoff Prior is Malign" is a special case of a simpler argument
David Matolcsi (matolcsid) · 2024-11-17T21:32:34.711Z · comments (44)

[link] OpenAI's CBRN tests seem unclear
LucaRighetti (Error404Dinosaur) · 2024-11-21T17:28:30.290Z · comments (6)

Why Don't We Just... Shoggoth+Face+Paraphraser?
Daniel Kokotajlo (daniel-kokotajlo) · 2024-11-19T20:53:52.084Z · comments (51)

What o3 Becomes by 2028
Vladimir_Nesov · 2024-12-22T12:37:20.929Z · comments (13)

[link] The Dangers of Mirrored Life
Niko_McCarty (niko-2) · 2024-12-12T20:58:32.750Z · comments (7)

Passages I Highlighted in The Letters of J.R.R.Tolkien
Ivan Vendrov (ivan-vendrov) · 2024-11-25T01:47:59.071Z · comments (10)

The Dream Machine
sarahconstantin · 2024-12-05T00:00:05.796Z · comments (6)

The o1 System Card Is Not About o1
Zvi · 2024-12-13T20:30:08.048Z · comments (5)

Hire (or become) a Thinking Assistant / Body Double
Raemon · 2024-12-23T03:58:42.061Z · comments (31)

You should consider applying to PhDs (soon!)
bilalchughtai (beelal) · 2024-11-29T20:33:12.462Z · comments (19)

DeepSeek beats o1-preview on math, ties on coding; will release weights
Zach Stein-Perlman · 2024-11-20T23:50:26.597Z · comments (26)

Sorry for the downtime, looks like we got DDosd
habryka (habryka4) · 2024-12-02T04:14:30.209Z · comments (13)

AIs Will Increasingly Attempt Shenanigans
Zvi · 2024-12-16T15:20:05.652Z · comments (2)

Ablations for “Frontier Models are Capable of In-context Scheming”
AlexMeinke (Paulawurm) · 2024-12-17T23:58:19.222Z · comments (1)

The Big Nonprofits Post
Zvi · 2024-11-29T16:10:06.938Z · comments (10)

[link] Announcing turntrout.com, my new digital home
TurnTrout · 2024-11-17T17:42:08.164Z · comments (24)

Hierarchical Agency: A Missing Piece in AI Alignment
Jan_Kulveit · 2024-11-27T05:49:04.241Z · comments (20)

A shortcoming of concrete demonstrations as AGI risk advocacy
Steven Byrnes (steve2152) · 2024-12-11T16:48:41.602Z · comments (27)

Takes on "Alignment Faking in Large Language Models"
Joe Carlsmith (joekc) · 2024-12-18T18:22:34.059Z · comments (8)

[link] How to replicate and extend our alignment faking demo
Fabien Roger (Fabien) · 2024-12-19T21:44:13.059Z · comments (2)

Bigger Livers?
sarahconstantin · 2024-11-08T21:50:09.814Z · comments (13)

MIRI’s 2024 End-of-Year Update
Rob Bensinger (RobbBB) · 2024-12-03T04:33:47.499Z · comments (2)

The "Think It Faster" Exercise
Raemon · 2024-12-11T19:14:10.427Z · comments (13)

[link] Seven lessons I didn't learn from election day
Eric Neyman (UnexpectedValues) · 2024-11-14T18:39:07.053Z · comments (33)

next page (older posts) →

Archive

Recent comments

tsvibt on The Field of AI Alignment: A Postmortem, and What To Do About It

Remember that the top-level commenter here is currently a physicist, so it's not like the usefulness of their work would be going down by doing a useless MATS project :P

Yes it would! It would eat up motivation and energy and hope that they could have put towards actual research. And it would put them in a social context where they are pressured to orient themselves toward streetlighty research--not just during the program, but also afterward. Unless they have some special ability to have it not do that.

Without MATS: not currently doing anything directly useful (though maybe indirectly useful, e.g. gaining problem-solving skill). Could, if given $30k/year, start doing real AGI alignment thinking from scratch not from scratch, thereby scratching their "will you think in a way that unlocks understanding of strong minds" lottery ticket that each person gets.

With MATS: gotta apply to extension, write my LTFF grant. Which org should I apply to? Should I do linear probes software engineering? Or evals? Red teaming? CoT? Constitution? Hyperparamter gippity? Honeypot? Scaling supervision? Superalign, better than regular align? Detecting deception?

joel-burget on The Field of AI Alignment: A Postmortem, and What To Do About It

A different way to think about types of work is within current ML paradigms vs outside of them. If you believe that timelines are short (e.g. 5 years or less), it makes much more sense to work within current paradigms, otherwise there's very little chance your work will become adopted in time to matter.

If I understand you correctly, there's a close (but not exact) correspondence between work I'd label in-paradigm and work you'd label as "streetlighting". On my model the reason to work in-paradigm is because that's where your work has a realistic chance to make a difference in this world.

lunatic_at_large on Whistleblowing Twitter Bot

I agree, though I think it would be a very ridiculous own-goal if e.g. GPT-4o decided to block a whistleblowing report about OpenAI because it was trained to serve OpenAI's interests. I think any model used by this kind of whistleblowing tool should be open-source (nothing fancy / more dangerous than what's already out there), run locally by the operators of the tool, and tested to make sure it doesn't block legitimate posts.

lunatic_at_large on Whistleblowing Twitter Bot

My gut instinct is that this would have been a fantastic thing to create 2-4 years ago. My biggest hesitation is that the probability a tool like this decreases existential risk is proportional to the fraction of lab researchers who know about it and adoption can be a slow / hard thing to make happen. I still think that this kind of program could be incredibly valuable under the right circumstances so someone should probably be working on this.

Also, I have a very amateurish security question: if someone provides their work email to verify their authenticity with this tool, can their employer find out? For example, I wouldn't put it past OpenAI to check if an employee's email account got pinged by this tool and then to pressure / fire that employee.

tailcalled on What's the best metric for measuring quality of life?

Since it's the FDA that's doing the regulating, they could pick the investigator. Completely ungameable.

viliam on ReSolsticed vol I: "We're Not Going Quietly"

Fantastic! Are lyrics available somewhere?

christiankl on What's the best metric for measuring quality of life?

That sounds like it's relatively easy to game by the company who chooses the investigators.

alexey on The Online Sports Gambling Experiment Has Failed

I mostly agree, but it's a double-digit percent increase in bankruptcies which ends up being (from the post)

about 4bps (0.04%)/year of additional bankruptcies

buck on The Field of AI Alignment: A Postmortem, and What To Do About It

Do you mean during the program? Sure, maybe the only MATS offers you can get are for projects you think aren't useful--I think some MATS projects are pretty useless (e.g. our dear OP's). But it's still an opportunity to argue with other people about the problems in the field and see whether anyone has good justifications for their prioritization. And you can stop doing the streetlight stuff afterwards if you want to.

Remember that the top-level commenter here is currently a physicist, so it's not like the usefulness of their work would be going down by doing a useless MATS project :P

alexey on The Online Sports Gambling Experiment Has Failed

But, crucially, if one product is not available, then these people will very likely form an addiction to something else. That is what 'addictive personality disorder' means.

Except whatever they got addicted to before the legalization of online sports betting, it apparently led to much lower bankruptcy rates etc.

I feel that the discourse has quietly assumed a fabricated option: if these people can't gamble then they will be happy unharmed non-addicts.

This post isn't quietly assuming something: it's loudly giving evidence that they will be much less harmed.