LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

AGI Ruin: A List of Lethalities
Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2022-06-05T22:05:52.224Z · comments (704)

Where I agree and disagree with Eliezer
paulfchristiano · 2022-06-19T19:15:55.698Z · comments (223)

SolidGoldMagikarp (plus, prompt generation)
Jessica Rumbelow (jessica-cooper) · 2023-02-05T22:02:35.854Z · comments (206)

What an actually pessimistic containment strategy looks like
lc · 2022-04-05T00:19:50.212Z · comments (138)

The Waluigi Effect (mega-post)
Cleo Nardo (strawberry calm) · 2023-03-03T03:22:08.619Z · comments (188)

[link] Simulators
janus · 2022-09-02T12:45:33.723Z · comments (168)

(The) Lightcone is nothing without its people: LW + Lighthaven's big fundraiser
habryka (habryka4) · 2024-11-30T02:55:16.077Z · comments (263)

Rationalism before the Sequences
Eric Raymond (eric-raymond) · 2021-03-30T14:04:15.254Z · comments (83)

Making Vaccine
johnswentworth · 2021-02-03T20:24:18.756Z · comments (249)

LessWrong's (first) album: I Have Been A Good Bing
habryka (habryka4) · 2024-04-01T07:33:45.242Z · comments (179)

[link] Pain is not the unit of Effort
alkjash · 2020-11-24T20:00:19.584Z · comments (90)

Let’s think about slowing down AI
KatjaGrace · 2022-12-22T17:40:04.787Z · comments (182)

What 2026 looks like
Daniel Kokotajlo (daniel-kokotajlo) · 2021-08-06T16:14:49.772Z · comments (156)

OpenAI Email Archives (from Musk v. Altman and OpenAI blog)
habryka (habryka4) · 2024-11-16T06:38:03.937Z · comments (80)

The Talk: a brief explanation of sexual dimorphism
Malmesbury (Elmer of Malmesbury) · 2023-09-18T16:23:56.073Z · comments (75)

The Redaction Machine
Ben (ben-lang) · 2022-09-20T22:03:15.309Z · comments (48)

[link] How much do you believe your results?
Eric Neyman (UnexpectedValues) · 2023-05-06T20:31:31.277Z · comments (18)

[link] Luck based medicine: my resentful story of becoming a medical miracle
Elizabeth (pktechgirl) · 2022-10-16T17:40:03.702Z · comments (121)

Alignment Faking in Large Language Models
ryan_greenblatt · 2024-12-18T17:19:06.665Z · comments (74)

Losing the root for the tree
Adam Zerner (adamzerner) · 2022-09-20T04:53:53.435Z · comments (31)

How To Write Quickly While Maintaining Epistemic Rigor
johnswentworth · 2021-08-28T17:52:21.692Z · comments (38)

100 Tips for a Better Life
Ideopunk · 2020-12-22T14:30:12.756Z · comments (130)

[link] The ants and the grasshopper
Richard_Ngo (ricraz) · 2023-06-04T22:00:04.577Z · comments (40)

Significantly Enhancing Adult Intelligence With Gene Editing May Be Possible
GeneSmith · 2023-12-12T18:14:51.438Z · comments (205)

I would have shit in that alley, too
Declan Molony (declan-molony) · 2024-06-18T04:41:06.545Z · comments (135)

Counter-theses on Sleep
Natália (Natália Mendonça) · 2022-03-21T23:21:07.943Z · comments (135)

It’s Probably Not Lithium
Natália (Natália Mendonça) · 2022-06-28T21:24:10.246Z · comments (187)

Focus on the places where you feel shocked everyone's dropping the ball
So8res · 2023-02-02T00:27:55.687Z · comments (63)

Steering GPT-2-XL by adding an activation vector
TurnTrout · 2023-05-13T18:42:41.321Z · comments (98)

[link] Douglas Hofstadter changes his mind on Deep Learning & AI risk (June 2023)?
gwern · 2023-07-03T00:48:47.131Z · comments (54)

chinchilla's wild implications
nostalgebraist · 2022-07-31T01:18:28.254Z · comments (128)

Bets, Bonds, and Kindergarteners
jefftk (jkaufman) · 2021-01-03T21:20:03.563Z · comments (35)

[link] Things I Learned by Spending Five Thousand Hours In Non-EA Charities
jenn (pixx) · 2023-06-01T20:48:03.940Z · comments (35)

(My understanding of) What Everyone in Technical Alignment is Doing and Why
Thomas Larsen (thomas-larsen) · 2022-08-29T01:23:58.073Z · comments (90)

Transformers Represent Belief State Geometry in their Residual Stream
Adam Shai (adam-shai) · 2024-04-16T21:16:11.377Z · comments (100)

The Best Tacit Knowledge Videos on Every Subject
Parker Conley (parker-conley) · 2024-03-31T17:14:31.199Z · comments (156)

Failures in Kindness
silentbob · 2024-03-26T21:30:11.052Z · comments (60)

GPTs are Predictors, not Imitators
Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2023-04-08T19:59:13.601Z · comments (99)

[link] It Looks Like You're Trying To Take Over The World
gwern · 2022-03-09T16:35:35.326Z · comments (120)

You Are Not Measuring What You Think You Are Measuring
johnswentworth · 2022-09-20T20:04:22.899Z · comments (44)

Bing Chat is blatantly, aggressively misaligned
evhub · 2023-02-15T05:29:45.262Z · comments (181)

DeepMind alignment team opinions on AGI ruin arguments
Vika · 2022-08-12T21:06:40.582Z · comments (37)

Reliable Sources: The Story of David Gerard
TracingWoodgrains (tracingwoodgrains) · 2024-07-10T19:50:21.191Z · comments (54)

How I got 4.2M YouTube views without making a single video
Closed Limelike Curves · 2024-09-03T03:52:33.025Z · comments (36)

[link] Reflections on six months of fatherhood
jasoncrawford · 2022-01-31T05:28:09.154Z · comments (24)

The hostile telepaths problem
Valentine · 2024-10-27T15:26:53.610Z · comments (89)

Lies Told To Children
Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2022-04-14T11:25:10.282Z · comments (94)

Reward is not the optimization target
TurnTrout · 2022-07-25T00:03:18.307Z · comments (123)

Anti-Aging: State of the Art
JackH · 2020-12-31T19:07:03.430Z · comments (176)

[link] A Mechanistic Interpretability Analysis of Grokking
Neel Nanda (neel-nanda-1) · 2022-08-15T02:41:36.245Z · comments (47)

next page (older posts) →

Archive

Recent comments

tsvibt on evhub's Shortform

Isn't this what the "coherent" part is about? (I forget.)

jacques-thibodeau on jacquesthibs's Shortform

I keep hearing about dual-use risk concerns when I mention automated AI safety work. Here’s a simple solution that could even work in a startup setting:

Keep all of the infrastructure internally and only share with vetted partners/researchers.

You can hit two birds with one stone:

Does not turn into a mass-market product that leads to dual-use risks.
Builds a moat where you have complex internal infrastructure which is not shared, only the product of that system is shared. Investors love moats, you just got to convince them that this is the way to go for a product like this these days.

You don’t market the product to mass-market, you just find partners and use the system to spin out products and businesses that have nothing to do with frontier models. So, you can repurpose the system for specific application areas without releasing the platform and process, which would be copied in a day in the age of AI anyways.

nancylebovitz on A sense of logic

I think there's a strong motivation to believe in hell for other people. The wicked flourish like the green bay tree, and where is justice?

Alternatively, belief in hell for other people is mere spitefulness.

Also, I believe inventing the tortures of hell is very like the same drive that causes people to write horror fiction, though I have no idea of why they do it, or why I like horror fiction.

nancylebovitz on A sense of logic

It's evidence that God loves complexity even more than He loves beetles.

tailcalled on evhub's Shortform

Is there really some particular human whose volition you'd like to coherently extrapolate over eternity but where you refrain because you're worried it will generate infighting? Or is it more like, you can't think of anybody you'd pick, so you want a decision procedure to pick for you?

If there is some particular human, who is it?

daniel-herrmann on Subjective Naturalism in Decision Theory: Savage vs. Jeffrey–Bolker

Thanks for raising this important point. When modeling these situations carefully, we need to give terms like "today" a precise semantics that's well-defined for the agent. With proper semantics established, we can examine what credences make sense under different ways of handling indexicals. Matthias Hild's paper "Auto-epistemology and updating" demonstrates how to carefully construct time-indexed probability updates. We could then add centered worlds or other approaches for self-locating probabilities.

Some cases might lead to puzzles, particularly where epistemic fixed points don't exist. This might push us toward modeling credences differently or finding other solutions. But once we properly formalize "today" as an event, we can work on satisfying richness conditions. Whether this leads to inconsistent attitudes depends on what constraints we place on those attitudes - something that reasonable people might disagree about, as debates over sleeping beauty suggest.

christiankl on Thread for Sense-Making on Recent Murders and How to Sanely Respond

Pasek did couchsurf at my place in the days after a LessWrong Community Weekend in Berlin. That was before he went to the Bay Area, so probably 8 or 9 years ago and before he seemed to make contact with Ziz which was after Pazek left the Bay Area and moved to live with other rationalists in a group house in Gran Canaria. Pazek's contact with Ziz seemed to be mostly online while living in Gran Canaria.

If you read Pasek's post where he thinks about committing suicide, there's plenty of TDT-thinking in it. I matched my idea of how Pasek thinks even before engaging with Ziz.

Pasek was TDT-ish vegan.

Pasek had some QS tracking for how he spent every waking hour of the day that he did on paper and seemed to not suffer from akrasia while guiding his actions.

If I remember right, that he said that stealing is okay in cases where the TDT calculation would be in favor of stealing where traditional morality would say stealing is bad. I don't think that resulted in Pasek actually stealing things but I think we talked about some case where he thought it was justified to steal which surprised me at the time. My memory is here very fussy.

nancylebovitz on A sense of logic

I dislike raw oysters quite a bit, but they're okay cooked.

Speaking of logical fallacies, the fact that one person loves a thing means that other people will even tolerate it is not strongly likely. I don't know that people even have an obligation to try things other people love.

And yet, the temptation to think that other people do or should love what one loves it very strong. "I think this is great!" just doesn't feel as true as "This is great!".

ape-in-the-coat on Subjective Naturalism in Decision Theory: Savage vs. Jeffrey–Bolker

I'm not sure if I fully understand why this is supposed to pose a problem, but maybe it helps to say that by "meaningfully consider" we mean something like, is actually part of the agent's theory of the world. In your situation, since the agent is considering which envelope to take, I would guess that to satisfy richness she should have a credence in the proposition.

Okay, then I believe you definetely have a problem with this example and would be glad to show you where exactly.

I think (maybe?) what makes this case tricky or counterintuitive is that the agent seems to lack any basis for forming beliefs about which envelope contains the money - their memory is erased each time and the location depends on their previous (now forgotten) choice.
However, this doesn't mean they can't or don't have credences about the envelope contents. From the agent's subjective perspective upon waking, they might assign 0.5 credence to each envelope containing the money, reasoning that they have no information to favor either envelope.

Let's suppose that the agent does exactly that. Suppose they believe that on every awakening there is 50% chance that money is in envelope 1. Then picking envelope 1 every time will in expectation lead to winning 350$ per experiment.

But this is clearly false. The experiment is specifically designed in such a manner that the agent can win money only on the first awakening. On every other day (6 times out of 7) the money would be in the envelope 2.

So should the agent believe that there is only 1/7 chance that money are in envelope 1 then? Also no. I suppose you can see why. As soon as he tries to act on such belief it will turn out that 6 times out of 7 the money are in envelope 1.

In fact, we can notice, that there is no coherent value of credence for statement "Today the money are in envelope 1" that would not lead the agent to irrational behavior. This is because the term "Today" is not well-defined in the setting of such experiment.

By which I mean that in the same iteration of the experiment propositions including "Today" may not have a unique value. On the first day of the experiment statement "Today money are in envelope 1" may be true, while on the second day it may be false, so in the single iteration of the experiment that lasts 7 days the statement is simultaneously true and false!

Which means that "Today money are in envelope 1" isn't actually an event from the event space of the experiment and therefore doesn't have a probability value, as probability function's domain is event space.

But this is a nuance of formal probability theory that most people do not notice, or even try to ignore outright. Our intuitions are accustoimed to situations where statements about "Today" can be represented as well-defined events from the event space and therefore we assume that they can always be "meaningfully considered".

And so if you try to base you decision theory framework on what feels as meaningfull to an agent instead of what is formalizable mathematically, you will end up with a bunch of paradoxical situations, like the one I've just described.

daniel-herrmann on Subjective Naturalism in Decision Theory: Savage vs. Jeffrey–Bolker

Ah, so not like, A is strongly preferred to B and B is strongly preferred to A, but more of a violation of transitivity. Then I still think that the Broome paper is a place I'd look at, since you get that exact kind of structure in preference aggregation.

The Bradley paper assumes everything is transitive throughout, so I don't think you get the kind of structure you want there. I'm not immediately aware of any work of that kind of inconsistency in JB that isn't in the social choice context, but there might be some. I'll take a look.

There are ways to think about degrees and measures of incoherence, and how that connects up to decision making. I'm thinking mainly of this paper by Schervish, Seidenfeld, and Kadane, Measures of Incoherence: How Not to Gamble if You Must. There might a JB-style version of that kind of work, and if there isn't, I think it would be good to have one.

But to your core goal or weakening the preference axioms to more realistic standards, you can definitely do that in JB by weakening the preference axioms, but still keeping the background objects of preference be propositions in a single algebra. I think this would still preserve many of what I consider the naturalistic advantages of the JB system. For modifying the preference axioms, I would guess descriptively you might want something like prospect theory, or something else along those broad lines. Also depends on what kinds of agents we want to describe.