LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Intransitive Trust
Screwtape · 2024-05-27T16:55:29.294Z · comments (15)

[link] hydrogen tube transport
bhauth · 2024-04-18T22:47:08.790Z · comments (12)

Doomsday Argument and the False Dilemma of Anthropic Reasoning
Ape in the coat · 2024-07-05T05:38:39.428Z · comments (55)

How I select alignment research projects
Ethan Perez (ethan-perez) · 2024-04-10T04:33:08.092Z · comments (4)

[link] Book review: On the Edge
PeterMcCluskey · 2024-08-30T22:18:39.581Z · comments (0)

Finding the Wisdom to Build Safe AI
Gordon Seidoh Worley (gworley) · 2024-07-04T19:04:16.089Z · comments (10)

LessWrong: After Dark, a new side of LessWrong
So8res · 2024-04-01T22:44:04.449Z · comments (5)

If You Can Climb Up, You Can Climb Down
jefftk (jkaufman) · 2024-07-30T00:00:06.295Z · comments (9)

AI Impacts Survey: December 2023 Edition
Zvi · 2024-01-05T14:40:06.156Z · comments (6)

[link] My Apartment Art Commission Process
jenn (pixx) · 2024-08-26T18:36:44.363Z · comments (4)

[link] Suffering Is Not Pain
jbkjr · 2024-06-18T18:04:43.407Z · comments (45)

AXRP Episode 33 - RLHF Problems with Scott Emmons
DanielFilan · 2024-06-12T03:30:05.747Z · comments (0)

[link] The $100B plan with "70% risk of killing us all" w Stephen Fry [video]
Oleg Trott (oleg-trott) · 2024-07-21T20:06:39.615Z · comments (8)

Experimentation (Part 7 of "The Sense Of Physical Necessity")
LoganStrohl (BrienneYudkowsky) · 2024-03-18T21:25:56.527Z · comments (0)

AI #63: Introducing Alpha Fold 3
Zvi · 2024-05-09T14:20:03.176Z · comments (2)

Rational Animations offers animation production and writing services!
Writer · 2024-03-15T17:26:07.976Z · comments (0)

[link] Twitter thread on open-source AI
Richard_Ngo (ricraz) · 2024-07-31T00:26:11.655Z · comments (6)

"Which chains-of-thought was that faster than?"
Emrik (Emrik North) · 2024-05-22T08:21:00.269Z · comments (4)

DIY LessWrong Jewelry
Fluffnutt (Pear) · 2024-08-25T21:33:56.173Z · comments (0)

One way violinists fail
Solenoid_Entity · 2024-05-29T04:08:17.675Z · comments (5)

Difficulty classes for alignment properties
Jozdien · 2024-02-20T09:08:24.783Z · comments (5)

One True Love
Zvi · 2024-02-09T15:10:05.298Z · comments (7)

UDT1.01: Logical Inductors and Implicit Beliefs (5/10)
Diffractor · 2024-04-18T08:39:13.368Z · comments (2)

Effectively Handling Disagreements - Introducing a New Workshop
Camille Berger (Camille Berger) · 2024-04-15T16:33:50.339Z · comments (2)

LLMs can strategically deceive while doing gain-of-function research
Igor Ivanov (igor-ivanov) · 2024-01-24T15:45:08.795Z · comments (4)

Introducing REBUS: A Robust Evaluation Benchmark of Understanding Symbols
Arjun Panickssery (arjun-panickssery) · 2024-01-15T21:21:03.962Z · comments (0)

[link] patent process problems
bhauth · 2024-07-14T21:12:04.953Z · comments (13)

Monthly Roundup #16: March 2024
Zvi · 2024-03-19T13:10:05.529Z · comments (4)

Mental Masturbation and the Intellectual Comfort Zone
Declan Molony (declan-molony) · 2024-05-07T05:47:05.257Z · comments (2)

[question] Is AlphaGo actually a consequentialist utility maximizer?
faul_sname · 2023-12-07T12:41:05.132Z · answers+comments (8)

Regrant up to $600,000 to AI safety projects with GiveWiki
Dawn Drescher (Telofy) · 2023-10-28T19:56:06.676Z · comments (1)

An illustrative model of backfire risks from pausing AI research
Maxime Riché (maxime-riche) · 2023-11-06T14:30:58.615Z · comments (3)

AGI will be made of heterogeneous components, Transformer and Selective SSM blocks will be among them
Roman Leventov · 2023-12-27T14:51:37.713Z · comments (9)

5. Moral Value for Sentient Animals? Alas, Not Yet
RogerDearnaley (roger-d-1) · 2023-12-27T06:42:09.130Z · comments (41)

Update #2 to "Dominant Assurance Contract Platform": EnsureDone
moyamo · 2023-11-28T18:02:50.367Z · comments (2)

[link] Provably Safe AI
PeterMcCluskey · 2023-10-05T22:18:26.013Z · comments (15)

[link] Fake Deeply
Zack_M_Davis · 2023-10-26T19:55:22.340Z · comments (7)

Takeaways from a Mechanistic Interpretability project on “Forbidden Facts”
Tony Wang (tw) · 2023-12-15T11:05:23.256Z · comments (8)

[link] Genocide isn't Decolonization
robotelvis · 2023-10-20T04:14:07.716Z · comments (19)

2024 ACX Predictions: Blind/Buy/Sell/Hold
Zvi · 2024-01-09T19:30:06.388Z · comments (2)

[question] Do websites and apps actually generally get worse after updates, or is it just an effect of the fear of change?
lillybaeum · 2023-12-10T17:26:34.206Z · answers+comments (34)

More on the Apple Vision Pro
Zvi · 2024-02-13T17:40:05.388Z · comments (5)

[link] The Hidden Complexity of Wishes - The Animation
Writer · 2023-09-27T17:59:37.188Z · comments (0)

We have promising alignment plans with low taxes
Seth Herd · 2023-11-10T18:51:38.604Z · comments (9)

[link] FTX expects to return all customer money; clawbacks may go away
Mikhail Samin (mikhail-samin) · 2024-02-14T03:43:13.218Z · comments (1)

The Consciousness Box
GradualImprovement · 2023-12-11T16:45:08.172Z · comments (22)

Conditionals All The Way Down
lunatic_at_large · 2023-09-30T21:06:14.609Z · comments (2)

Helpful examples to get a sense of modern automated manipulation
trevor (TrevorWiesinger) · 2023-11-12T20:49:57.422Z · comments (3)

Machine Unlearning Evaluations as Interpretability Benchmarks
NickyP (Nicky) · 2023-10-23T16:33:04.878Z · comments (2)

← previous page (newer posts) · next page (older posts) →

I'm especially interested in safety research that treats language models as at least somewhat neuromorphic or brain-like. Here's a whole essay I wrote on that topic; it explores analogies like predictive learning in ML and predictive processing in humans, RL in both types of systems, and consciuosness as a type of context window.

If deep learning systems are sufficiently neuromorphic, we might be able to import some insights from human alignment into AI safety research. For example, why don't most humans like wireheading? If we figured that out, it might help us ensure deep learning systems strive to avoid doing so themselves.

^{^}

And as usual, that doesn't mean one tries to determine the EV of every individual mental act. It means when setting up policies, habits, principles, etc., etc., that ultimate the thing that determines whether those are good is the underlying value consequentialism.

^{^}

To momentarily speak in terms of experiential value vs preference satisfaction value.

LessWrong 2.0 Reader

Archive

Recent comments