LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

What AI Safety Materials Do ML Researchers Find Compelling?
Vael Gates · 2022-12-28T02:03:31.894Z · comments (34)

Decision Theory with the Magic Parts Highlighted
moridinamael · 2023-05-16T17:39:55.038Z · comments (24)

Can you control the past?
Joe Carlsmith (joekc) · 2021-08-27T19:39:29.993Z · comments (90)

[link] [Linkpost] Introducing Superalignment
beren · 2023-07-05T18:23:18.419Z · comments (69)

Motive Ambiguity
Zvi · 2020-12-15T18:10:01.372Z · comments (58)

AGI ruin scenarios are likely (and disjunctive)
So8res · 2022-07-27T03:21:57.615Z · comments (38)

Gears-Level Models are Capital Investments
johnswentworth · 2019-11-22T22:41:52.943Z · comments (29)

WTH is Cerebrolysin, actually?
gsfitzgerald (neuroplume) · 2024-08-06T20:40:53.378Z · comments (23)

Book Launch: The Engines of Cognition
Ben Pace (Benito) · 2021-12-21T07:24:45.170Z · comments (56)

Specializing in Problems We Don't Understand
johnswentworth · 2021-04-10T22:40:40.690Z · comments (29)

Critical review of Christiano's disagreements with Yudkowsky
Vanessa Kosoy (vanessa-kosoy) · 2023-12-27T16:02:50.499Z · comments (40)

Alignment By Default
johnswentworth · 2020-08-12T18:54:00.751Z · comments (96)

Finite Factored Sets in Pictures
Magdalena Wache · 2022-12-11T18:49:00.000Z · comments (35)

The inordinately slow spread of good AGI conversations in ML
Rob Bensinger (RobbBB) · 2022-06-21T16:09:57.859Z · comments (62)

I'm from a parallel Earth with much higher coordination: AMA
Ben Pace (Benito) · 2021-04-05T22:09:24.033Z · comments (35)

Russia has Invaded Ukraine
lsusr · 2022-02-24T07:52:44.533Z · comments (268)

Geometric Rationality is Not VNM Rational
Scott Garrabrant · 2022-11-27T19:36:00.939Z · comments (27)

[link] Thiel on Progress and Stagnation
Richard_Ngo (ricraz) · 2020-07-20T20:27:59.112Z · comments (32)

Reneging Prosocially
Duncan Sabien (Deactivated) (Duncan_Sabien) · 2021-11-30T19:01:12.441Z · comments (15)

Three Subtle Examples of Data Leakage
abstractapplic · 2024-10-01T20:45:27.731Z · comments (16)

What's Up With Confusingly Pervasive Goal Directedness?
Raemon · 2022-01-20T19:22:37.515Z · comments (89)

[link] Anthropic's Core Views on AI Safety
Zac Hatfield-Dodds (zac-hatfield-dodds) · 2023-03-09T16:55:15.311Z · comments (39)

Why Are Bacteria So Simple?
aysja · 2023-02-06T03:00:31.837Z · comments (33)

[link] Parametrically retargetable decision-makers tend to seek power
TurnTrout · 2023-02-18T18:41:38.740Z · comments (10)

Timaeus's First Four Months
Jesse Hoogland (jhoogland) · 2024-02-28T17:01:53.437Z · comments (6)

Defunding My Mistake
ymeskhout · 2023-09-04T14:43:14.274Z · comments (41)

Thomas Kwa's MIRI research experience
Thomas Kwa (thomas-kwa) · 2023-10-02T16:42:37.886Z · comments (53)

[link] When Is Insurance Worth It?
kqr · 2024-12-19T19:07:32.573Z · comments (69)

Decision theory does not imply that we get to have nice things
So8res · 2022-10-18T03:04:48.682Z · comments (73)

'Empiricism!' as Anti-Epistemology
Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2024-03-14T02:02:59.723Z · comments (90)

Did Christopher Hitchens change his mind about waterboarding?
Isaac King (KingSupernova) · 2024-09-15T08:28:09.451Z · comments (22)

[link] President Biden Issues Executive Order on Safe, Secure, and Trustworthy Artificial Intelligence
Tristan Williams (tristan-williams) · 2023-10-30T11:15:38.422Z · comments (39)

Swiss Political System: More than You ever Wanted to Know (I.)
Martin Sustrik (sustrik) · 2020-07-19T01:11:54.756Z · comments (39)

Announcing the Inverse Scaling Prize ($250k Prize Pool)
Ethan Perez (ethan-perez) · 2022-06-27T15:58:19.135Z · comments (14)

AI #1: Sydney and Bing
Zvi · 2023-02-21T14:00:00.480Z · comments (45)

What I mean by "alignment is in large part about making cognition aimable at all"
So8res · 2023-01-30T15:22:09.294Z · comments (25)

Do bamboos set themselves on fire?
Malmesbury (Elmer of Malmesbury) · 2022-09-19T15:34:13.574Z · comments (14)

[link] Will the growing deer prion epidemic spread to humans? Why not?
eukaryote · 2023-06-25T04:31:56.824Z · comments (33)

Using GPT-Eliezer against ChatGPT Jailbreaking
Stuart_Armstrong · 2022-12-06T19:54:54.854Z · comments (85)

Transcripts of interviews with AI researchers
Vael Gates · 2022-05-09T05:57:15.872Z · comments (9)

Architects of Our Own Demise: We Should Stop Developing AI Carelessly
Roko · 2023-10-26T00:36:05.126Z · comments (75)

AI Could Defeat All Of Us Combined
HoldenKarnofsky · 2022-06-09T15:50:12.952Z · comments (42)

“Sharp Left Turn” discourse: An opinionated review
Steven Byrnes (steve2152) · 2025-01-28T18:47:04.395Z · comments (21)

Preprint is out! 100,000 lumens to treat seasonal affective disorder
Fabienne · 2021-11-12T17:59:20.077Z · comments (10)

[link] [Link] Still Alive - Astral Codex Ten
jimrandomh · 2021-01-21T23:20:03.782Z · comments (10)

The rationalist community's location problem
mingyuan · 2020-09-23T18:39:26.278Z · comments (142)

Reconsider the anti-cavity bacteria if you are Asian
Lao Mein (derpherpize) · 2024-04-15T07:02:02.655Z · comments (43)

[link] Recommendation: reports on the search for missing hiker Bill Ewasko
eukaryote · 2024-07-31T22:15:03.174Z · comments (28)

Will alignment-faking Claude accept a deal to reveal its misalignment?
ryan_greenblatt · 2025-01-31T16:49:47.316Z · comments (21)

Conflict vs. mistake in non-zero-sum games
Nisan · 2020-04-05T22:22:41.374Z · comments (40)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

mateusz-baginski on Anti-Slop Interventions?

I think Abram is saying the following:

Currently, AIs are lacking capabilities that would meaningfully speed up AI Safety research.
At some point, they are gonna get those capabilities.
However, by default, they are gonna get those AI Safety-helpful capabilities roughly at the same time as other, dangerous capabilities (or at least, not meaningfully earlier).
- In which case, we're not going to have much time to use the AI Safety-helpful capabilities to speed up AI Safety research sufficiently for us to be ready for those dangerous capabilities.
Therefore, it makes sense to speed up the development of AIS-helpful capabilities now. Even if it means that the AIs will acquire dangerous capabilities sooner, it gives us more time to use AI Safety-helpful capabilities to prepare for dangerous capabilities.

geoffrey-irving on Eliciting bad contexts

I'm very much in agreement that this is a problem, and among other things blocks us from knowing how to use adversarial attack methods (and AISI teams!) from helping here. Your proposed definition feels like it might be an important part of the story but not the full story, though, since it's output only: I would unfortunately expect a decent probability of strong jailbreaks that (1) don't count as intent misalignment but (2) jump you into that kind of red attractor basin. Certainly ending up in that kind of basin could cause a catastrophe, and I would like to avoid it, but I think there is a meaningful notion of "the AI is unlikely to end up in that basin of its own accord, under nonadversarial distributions of inputs".

Have you seen good attempts at input-side definitions along those lines? Perhaps an ideal story here would be a combination of an input-side definition and the kind of output-side definition you're pointing at.

jonas-hallgren on The Risk of Gradual Disempowerment from AI

Also, the solution is obviously to friendship is optimal the system that humans and AI coordinate in. Create an opt-in secure system that allows more resources if you cooperate and you will be able to outperform those silly defectors.

jonas-hallgren on The Risk of Gradual Disempowerment from AI

When it comes to solutions I think that humans versus AI axis doesn't make sense for the systems that we're in, it is rather about desirable system properties such as participation, exploration and caring for the participants in the system.

If we can foster a democratic, caring, open-ended decision making process where humans and AI can converge towards optimal solutions then I think our work is done.

Human disempowerment is okay as long as it is replaced by a better and smarter system so whilst I think the solutions are pointing in the right direction, the main axis of validation should rather be around system properties and not power distribution.

Good summary though, it is great that we finally have a great paper to point towards for these problems.

viliam on We Fell For It

Well, the government could take all your property away at a whim; millions of people have experienced that. Plus there are things like eminent domain.

Also, the government can tax your property (and punish you if you don't pay e.g. by taking away that thing), which is kinda weird philosophically... on one hand, something already belongs to you, on the other hand, you need to keep paying to keep it so -- but this is how renting is supposed to work, not owning.

A more obvious example is the intellectual property. There are no Schelling points; twenty years of copyright versus hundred years, there is no number that seems obviously correct.

And if after Singularity, the superintelligent AIs decide that they won't respect human property claims (but may respect each other's property), then... that's just the new reality. Just like a human ignores how the ants in his garden have distributed the territory among themselves.

So the fact that you have a property at all is downstream from "there is a consensus among those with power that this definition of 'property' is generally a good thing to have".

There is the way these things are generally done in the Western civilization, but if you asked some people living in tribes, they probably would have wildly different intuitions about what a person can own.

I can only own something if I am strong enough to defend it, or if there is a consensus among the others that they should let me keep it.

mateusz-baginski on Thread for Sense-Making on Recent Murders and How to Sanely Respond

I interpreted it as: not by "usual means", but rather something like suicide or murder.

ape-in-the-coat on Perry Cai's Shortform

Logic simply preserves truth. You can arrive to a valid conclusion that one should act altruistically if you start from some specific premises, and can't if you start from some other presimes.

What are the premises you start from?

jonas-hallgren on C'mon guys, Deliberate Practice is Real

First and foremost, I totally agree with your point on this sort of thing being instrumentally useful, I'm still having issues seeing how to apply it to my real life. Here are two questions that arise for me:

I'm curious about two aspects of deliberate practice that seem interconnected:

On OODA loops: I currently maintain yearly, quarterly, weekly, and daily review cycles where I plan and reflect on progress. However, I wonder if there are specific micro-skills you're pointing to beyond this - perhaps noticing subtle emotional tells when encountering uncomfortable topics, or developing finer-grained feedback mechanisms. How does this type of systematic review practice fit into your framework for deliberate practice? Are there particular refinements or additional elements you'd recommend? Is it noticing when I'm not doing OODA?
On unlearning: While your post focuses extensively on learning practices, I'm interested in your thoughts on "unlearning" - the process of identifying and releasing ineffective patterns or beliefs. In my experience with meditation, there seems to be a distinction between intellectual understanding and emotional understanding, where sometimes what holds us back isn't insufficient practice but rather old patterns that need to be examined and released. How do you see the relationship between building new skills and creating space for new patterns through deliberate unlearning? One of the sayings I've heard said is that "meditation is the process of taking intellectual understanding and turning it into emotional understanding" which I find quite interesting.

daniel-tan on Daniel Tan's Shortform

Yeah, I don’t think this phenomenon requires any deliberate strategic intent to deceive / collude. It’s just borne of having a subtle preference for how things should be said. As you say, humans probably also have these preferences

viliam on We Fell For It

You could interpret that as "rationalists are moving to the left... expect more leftism in the future", or you could interpret that as "rationalists are choosing the parts that seem correct to them (and rejecting the parts that don't)... expect more cherry-picking from all directions in the future".