LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Truthseeking is the ground in which other principles grow
Elizabeth (pktechgirl) · 2024-05-27T01:09:20.796Z · comments (16)

AI companies aren't really using external evaluators
Zach Stein-Perlman · 2024-05-24T16:01:21.184Z · comments (15)

Laziness death spirals
PatrickDFarley · 2024-09-19T15:58:30.252Z · comments (34)

the case for CoT unfaithfulness is overstated
nostalgebraist · 2024-09-29T22:07:54.053Z · comments (38)

Believing In
AnnaSalamon · 2024-02-08T07:06:13.072Z · comments (51)

Refusal in LLMs is mediated by a single direction
Andy Arditi (andy-arditi) · 2024-04-27T11:13:06.235Z · comments (93)

[link] Introducing AI Lab Watch
Zach Stein-Perlman · 2024-04-30T17:00:12.652Z · comments (30)

What are the results of more parental supervision and less outdoor play?
juliawise · 2023-11-25T12:52:29.986Z · comments (31)

AGI Safety and Alignment at Google DeepMind: A Summary of Recent Work
Rohin Shah (rohinmshah) · 2024-08-20T16:22:45.888Z · comments (33)

MIRI 2024 Mission and Strategy Update
Malo (malo) · 2024-01-05T00:20:54.169Z · comments (44)

SAE feature geometry is outside the superposition hypothesis
jake_mendel · 2024-06-24T16:07:14.604Z · comments (17)

Modern Transformers are AGI, and Human-Level
abramdemski · 2024-03-26T17:46:19.373Z · comments (88)

LLM Generality is a Timeline Crux
eggsyntax · 2024-06-24T12:52:07.704Z · comments (117)

Brute Force Manufactured Consensus is Hiding the Crime of the Century
Roko · 2024-02-03T20:36:59.806Z · comments (156)

CFAR Takeaways: Andrew Critch
Raemon · 2024-02-14T01:37:03.931Z · comments (62)

The ‘strong’ feature hypothesis could be wrong
lewis smith (lsgos) · 2024-08-02T14:33:58.898Z · comments (17)

AI Control: Improving Safety Despite Intentional Subversion
Buck · 2023-12-13T15:51:35.982Z · comments (7)

Superbabies: Putting The Pieces Together
sarahconstantin · 2024-07-11T20:40:05.036Z · comments (37)

"Slow" takeoff is a terrible term for "maybe even faster takeoff, actually"
Raemon · 2024-09-28T23:38:25.512Z · comments (70)

ChatGPT can learn indirect control
Raymond D · 2024-03-21T21:11:06.649Z · comments (27)

[link] "How could I have thought that faster?"
mesaoptimizer · 2024-03-11T10:56:17.884Z · comments (32)

Towards more cooperative AI safety strategies
Richard_Ngo (ricraz) · 2024-07-16T04:36:29.191Z · comments (132)

Ability to solve long-horizon tasks correlates with wanting things in the behaviorist sense
So8res · 2023-11-24T17:37:43.020Z · comments (83)

OpenAI: Fallout
Zvi · 2024-05-28T13:20:04.325Z · comments (25)

Mechanistically Eliciting Latent Behaviors in Language Models
Andrew Mack (andrew-mack) · 2024-04-30T18:51:13.493Z · comments (40)

[link] Jaan Tallinn's 2023 Philanthropy Overview
jaan · 2024-05-20T12:11:39.416Z · comments (5)

Pay Risk Evaluators in Cash, Not Equity
Adam Scholl (adam_scholl) · 2024-09-07T02:37:59.659Z · comments (19)

Maybe Anthropic's Long-Term Benefit Trust is powerless
Zach Stein-Perlman · 2024-05-27T13:00:47.991Z · comments (21)

[link] Sam Altman’s Chip Ambitions Undercut OpenAI’s Safety Strategy
garrison · 2024-02-10T19:52:55.191Z · comments (52)

Funny Anecdote of Eliezer From His Sister
Noah Birnbaum (daniel-birnbaum) · 2024-04-22T22:05:31.886Z · comments (6)

This might be the last AI Safety Camp
Remmelt (remmelt-ellen) · 2024-01-24T09:33:29.438Z · comments (34)

Toward A Mathematical Framework for Computation in Superposition
Dmitry Vaintrob (dmitry-vaintrob) · 2024-01-18T21:06:57.040Z · comments (18)

Thoughts on “AI is easy to control” by Pope & Belrose
Steven Byrnes (steve2152) · 2023-12-01T17:30:52.720Z · comments (56)

The Sun is big, but superintelligences will not spare Earth a little sunlight
Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2024-09-23T03:39:16.243Z · comments (139)

The impossible problem of due process
mingyuan · 2024-01-16T05:18:33.415Z · comments (64)

[question] Examples of Highly Counterfactual Discoveries?
johnswentworth · 2024-04-23T22:19:19.399Z · answers+comments (100)

Optimistic Assumptions, Longterm Planning, and "Cope"
Raemon · 2024-07-17T22:14:24.090Z · comments (46)

Response to Aschenbrenner's "Situational Awareness"
Rob Bensinger (RobbBB) · 2024-06-06T22:57:11.737Z · comments (27)

How I Learned To Stop Trusting Prediction Markets and Love the Arbitrage
orthonormal · 2024-08-06T02:32:41.364Z · comments (26)

Self-Other Overlap: A Neglected Approach to AI Alignment
Marc Carauleanu (Marc-Everin Carauleanu) · 2024-07-30T16:22:29.561Z · comments (43)

[link] Sam Altman fired from OpenAI
LawrenceC (LawChan) · 2023-11-17T20:42:30.759Z · comments (75)

What's Going on With OpenAI's Messaging?
ozziegooen · 2024-05-21T02:22:04.171Z · comments (13)

My AI Model Delta Compared To Christiano
johnswentworth · 2024-06-12T18:19:44.768Z · comments (73)

Two easy things that maybe Just Work to improve AI discourse
jacobjacob · 2024-06-08T15:51:18.078Z · comments (35)

[link] What TMS is like
Sable · 2024-10-31T00:44:22.612Z · comments (16)

On Not Pulling The Ladder Up Behind You
Screwtape · 2024-04-26T21:58:29.455Z · comments (21)

My Interview With Cade Metz on His Reporting About Slate Star Codex
Zack_M_Davis · 2024-03-26T17:18:05.114Z · comments (187)

OMMC Announces RIP
Adam Scholl (adam_scholl) · 2024-04-01T23:20:00.433Z · comments (5)

A basic systems architecture for AI agents that do autonomous research
Buck · 2024-09-23T13:58:27.185Z · comments (12)

[link] The Compendium, A full argument about extinction risk from AGI
adamShimi · 2024-10-31T12:01:51.714Z · comments (39)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

kvmanthinking on Chapter 27: Empathy

Harry's brain tried to calculate the ramifications and implications of this and ran out of swap space.

this is very relatable

ryankidd44 on Ryan Kidd's Shortform

I'm not sure!

yair-halberstadt on Could orcas be (trained to be) smarter than humans? 

Douglas Adams answered this long ago of course:

For instance, on the planet Earth, man had always assumed that he was more intelligent than dolphins because he had achieved so much—the wheel, New York, wars and so on—whilst all the dolphins had ever done was muck about in the water having a good time. But conversely, the dolphins had always believed that they were far more intelligent than man—for precisely the same reasons.

petermccluskey on Investing for a World Transformed by AI

No, I don't recall any ethical concerns. Just basic concerns such as the difficulty of finding a boss that I'm comfortable with, having control over my hours, etc.

steve2152 on Complete Feedback

This is a confusing post from my perspective, because I think of LI as being about beliefs and corrigibility being about desires.

If I want my AGI to believe that the sky is green, I guess it’s good if it’s possible to do that. But it’s kinda weird, and not a central example of corrigibility.

Admittedly, one can try to squish beliefs and desires into the same framework. The Active Inference people do that. Does LI do that too? If so, well, I’m generally very skeptical of attempts to do that kind of thing. See here [LW · GW], especially Section 7. In the case of humans, it’s perfectly possible for a plan to seem desirable but not plausible, or for a plan to seem plausible but not desirable. I think there are very good reasons that our brains are set up that way.

gordon-seidoh-worley on What if muscle tension is sometimes signal jamming?

I don't know, but I can say that after a lot of hours of Alexander lessons my posture and movement improved in ways that would be described as "having less muscle tension" and this having less tension happened in conjunction with various sorts of opening and being more awake and moving closer to PNSE.

mitchell_porter on We can survive

If I understand you correctly, you want to create an unprecedentedly efficient and coordinated network, made out of intelligent people with goodwill, that will solve humanity's problems in theory and in practice?

steve2152 on What's a good book for a technically-minded 11-year old?

My 9yo has recently enjoyed Ender’s Game, Harry Potter, Hitchhiker’s Guide to the Galaxy, and What If. He recently asked to borrow my The Vital Question (it came up in conversation about abiogenesis) and he’s mostly following it so far but has occasional questions for me, we’ll see how far he gets or if he loses steam.

For non-books, he wanted to do Khan academy cosmology / astronomy, I think he did one big unit of Khan academy math before losing interest, he likes Eureka crates (little kits to build your own soap dispenser, rivet press, ukulele, whatever, they come once a month, good gift), lotsa video games, and he was doing DuoLingo Spanish every night (he has a streak, he’s a total sucker for gamification) but to my dismay decided to switch to the rather less practical DuoLingo Klingon. ¯\_(ツ)_/¯

crissman on Join a LessWrong Team for the Unaging System Challenge

Streamlined the registration page, and added a field to note that you want to join a Less Wrong team: https://www.unaging.com/unaging-system-2/

rhollerith_dot_com on Is OpenAI net negative for AI Safety?

Yes, humanity would be more likely to survive if OpenAI never existed or if it closed tomorrow.

I wonder why no one else has answered. Am I stepping on some landmine I don't know about?

What would drastically increase P(survival) is stopping all large training runs. If that turns out to be impossible, then some labs might be "better" than the average lab in that if their model is the first one to become capable of extincting humanity, it might choose not to do that because of technical details in how the lab made the model. (I don't put a lot of hope in this possibility.)

To my knowledge, no one who is not employed at OpenAI and who is not an investor in OpenAI believes that OpenAI is one of these "better" labs. OK, that is not literally true, but it is unlikely that anyone who is not employed at OpenAI and who is not an investor in OpenAI and who believes OpenAI is one of the "better" labs can provide an argument longer than 50 words that you or I would consider rational and coherent.

But even if OpenAI is shut down tomorrow and everyone working there is permanently prevented from working in AI (a pleasant thought!) the AI enterprise would still be the thorniest danger facing humanity.