LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] [SEE NEW EDITS] No, *You* Need to Write Clearer
Nicholas / Heather Kross (NicholasKross) · 2023-04-29T05:04:01.559Z · comments (65)

Comment reply: my low-quality thoughts on why CFAR didn't get farther with a "real/efficacious art of rationality"
AnnaSalamon · 2022-06-09T02:12:35.151Z · comments (63)

larger language models may disappoint you [or, an eternally unfinished draft]
nostalgebraist · 2021-11-26T23:08:56.221Z · comments (31)

The Plan
johnswentworth · 2021-12-10T23:41:39.417Z · comments (78)

Omicron Variant Post #1: We’re F***ed, It’s Never Over
Zvi · 2021-11-26T19:00:00.988Z · comments (95)

Safetywashing
Adam Scholl (adam_scholl) · 2022-07-01T11:56:33.495Z · comments (20)

UFO Betting: Put Up or Shut Up
RatsWrongAboutUAP · 2023-06-13T04:05:32.652Z · comments (216)

[link] My PhD thesis: Algorithmic Bayesian Epistemology
Eric Neyman (UnexpectedValues) · 2024-03-16T22:56:59.283Z · comments (14)

So, geez there's a lot of AI content these days
Raemon · 2022-10-06T21:32:20.833Z · comments (140)

Alignment Implications of LLM Successes: a Debate in One Act
Zack_M_Davis · 2023-10-21T15:22:23.053Z · comments (55)

Orienting to 3 year AGI timelines
Nikola Jurkovic (nikolaisalreadytaken) · 2024-12-22T01:15:11.401Z · comments (49)

Sexual Abuse attitudes might be infohazardous
Pseudonymous Otter · 2022-07-19T18:06:43.956Z · comments (72)

[link] Paul Christiano named as US AI Safety Institute Head of AI Safety
Joel Burget (joel-burget) · 2024-04-16T16:22:06.937Z · comments (58)

My Model Of EA Burnout
LoganStrohl (BrienneYudkowsky) · 2023-01-25T17:52:42.770Z · comments (50)

Mental Health and the Alignment Problem: A Compilation of Resources (updated April 2023)
Chris Scammell (chris-scammell) · 2023-05-10T19:04:21.138Z · comments (54)

AI alignment is distinct from its near-term applications
paulfchristiano · 2022-12-13T07:10:04.407Z · comments (21)

The shard theory of human values
Quintin Pope (quintin-pope) · 2022-09-04T04:28:11.752Z · comments (67)

Your Dog is Even Smarter Than You Think
StyleOfDog · 2021-05-01T05:16:09.821Z · comments (108)

What cognitive biases feel like from the inside
chaosmage · 2020-01-03T14:24:22.265Z · comments (32)

CFAR Participant Handbook now available to all
Duncan Sabien (Deactivated) (Duncan_Sabien) · 2020-01-03T15:43:44.618Z · comments (40)

Ngo and Yudkowsky on alignment difficulty
Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2021-11-15T20:31:34.135Z · comments (151)

[link] Strong Evidence is Common
Mark Xu (mark-xu) · 2021-03-13T22:04:40.538Z · comments (50)

Omicron: My Current Model
Zvi · 2021-12-28T17:10:00.629Z · comments (72)

Notes from "Don't Shoot the Dog"
juliawise · 2021-04-02T16:34:46.170Z · comments (12)

Coordination as a Scarce Resource
johnswentworth · 2020-01-25T23:32:36.309Z · comments (22)

Thoughts on the impact of RLHF research
paulfchristiano · 2023-01-25T17:23:16.402Z · comments (102)

Laziness death spirals
PatrickDFarley · 2024-09-19T15:58:30.252Z · comments (36)

My Assessment of the Chinese AI Safety Community
Lao Mein (derpherpize) · 2023-04-25T04:21:19.274Z · comments (94)

My views on “doom”
paulfchristiano · 2023-04-27T17:50:01.415Z · comments (37)

Visible Thoughts Project and Bounty Announcement
So8res · 2021-11-30T00:19:08.408Z · comments (106)

The Best Lay Argument is not a Simple English Yud Essay
J Bostock (Jemist) · 2024-09-10T17:34:28.422Z · comments (15)

The ground of optimization
Alex Flint (alexflint) · 2020-06-20T00:38:15.521Z · comments (80)

Yes, It's Subjective, But Why All The Crabs?
johnswentworth · 2023-07-28T19:35:36.741Z · comments (15)

On AutoGPT
Zvi · 2023-04-13T12:30:01.059Z · comments (47)

[link] I hired 5 people to sit behind me and make me productive for a month
Simon Berens (sberens) · 2023-02-05T01:19:39.182Z · comments (83)

[link] DeepMind: Generally capable agents emerge from open-ended play
Daniel Kokotajlo (daniel-kokotajlo) · 2021-07-27T14:19:13.782Z · comments (53)

Truthseeking is the ground in which other principles grow
Elizabeth (pktechgirl) · 2024-05-27T01:09:20.796Z · comments (16)

the case for CoT unfaithfulness is overstated
nostalgebraist · 2024-09-29T22:07:54.053Z · comments (42)

You Don't Exist, Duncan
Duncan Sabien (Deactivated) (Duncan_Sabien) · 2023-02-02T08:37:01.049Z · comments (107)

The LessWrong Team is now Lightcone Infrastructure, come work with us!
habryka (habryka4) · 2021-10-01T01:20:33.411Z · comments (71)

The Feeling of Idea Scarcity
johnswentworth · 2022-12-31T17:34:04.306Z · comments (22)

Working With Monsters
johnswentworth · 2021-07-20T15:23:20.762Z · comments (54)

New Scaling Laws for Large Language Models
1a3orn · 2022-04-01T20:41:17.665Z · comments (22)

Ilya Sutskever and Jan Leike resign from OpenAI [updated]
Zach Stein-Perlman · 2024-05-15T00:45:02.436Z · comments (95)

Principles for the AGI Race
William_S · 2024-08-30T14:29:41.074Z · comments (13)

Another (outer) alignment failure story
paulfchristiano · 2021-04-07T20:12:32.043Z · comments (38)

Deep Deceptiveness
So8res · 2023-03-21T02:51:52.794Z · comments (60)

Lessons On How To Get Things Right On The First Try
johnswentworth · 2023-06-19T23:58:09.605Z · comments (57)

Munk AI debate: confusions and possible cruxes
Steven Byrnes (steve2152) · 2023-06-27T14:18:47.694Z · comments (21)

How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme
Collin (collin-burns) · 2022-12-15T18:22:40.109Z · comments (39)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

steve2152 on evhub's Shortform

Yeah, I’ve written about that in §2.7.3 here [LW · GW].

I kinda want to say that there are many possible future outcomes that we should feel happy about. It’s true that many of those possible outcomes would judge others of those possible outcomes to be a huge missed opportunity, and that we’ll be picking from this set somewhat arbitrarily (if all goes well), but oh well, there’s just some irreducible arbitrariness is the nature of goodness itself.

gunnar_zarncke on leogao's Shortform

What is often left out in papers is all of these experiments and the though chains people had about them.

jacques-thibodeau on jacquesthibs's Shortform

I keep hearing about dual-use risk concerns when I mention automated AI safety research. Here’s a simple solution that could even work in a startup setting:

Keep all of the infrastructure internally and only share with vetted partners/researchers.

You can hit two birds with one stone:

Does not turn into a mass-market product that leads to dual-use risks.
Builds a moat where you have complex internal infrastructure which is not shared, only the product of that system is shared. Investors love moats, you just got to convince them that this is the way to go for a product like this these days.

You don’t market the product to mass-market, you just find partners and use the system to spin out products and businesses that have nothing to do with frontier models. So, you can repurpose the system for specific application areas without releasing the platform and process, which would be copied in a day in the age of AI anyways.

tsvibt on evhub's Shortform

Isn't this what the "coherent" part is about? (I forget.)

nancylebovitz on A sense of logic

I think there's a strong motivation to believe in hell for other people. The wicked flourish like the green bay tree, and where is justice?

Alternatively, belief in hell for other people is mere spitefulness.

Also, I believe inventing the tortures of hell is very like the same drive that causes people to write horror fiction, though I have no idea of why they do it, or why I like horror fiction.

nancylebovitz on A sense of logic

It's evidence that God loves complexity even more than He loves beetles.

tailcalled on evhub's Shortform

Is there really some particular human whose volition you'd like to coherently extrapolate over eternity but where you refrain because you're worried it will generate infighting? Or is it more like, you can't think of anybody you'd pick, so you want a decision procedure to pick for you?

If there is some particular human, who is it?

daniel-herrmann on Subjective Naturalism in Decision Theory: Savage vs. Jeffrey–Bolker

Thanks for raising this important point. When modeling these situations carefully, we need to give terms like "today" a precise semantics that's well-defined for the agent. With proper semantics established, we can examine what credences make sense under different ways of handling indexicals. Matthias Hild's paper "Auto-epistemology and updating" demonstrates how to carefully construct time-indexed probability updates. We could then add centered worlds or other approaches for self-locating probabilities.

Some cases might lead to puzzles, particularly where epistemic fixed points don't exist. This might push us toward modeling credences differently or finding other solutions. But once we properly formalize "today" as an event, we can work on satisfying richness conditions. Whether this leads to inconsistent attitudes depends on what constraints we place on those attitudes - something that reasonable people might disagree about, as debates over sleeping beauty suggest.

christiankl on Thread for Sense-Making on Recent Murders and How to Sanely Respond

Pasek did couchsurf at my place in the days after a LessWrong Community Weekend in Berlin. That was before he went to the Bay Area, so probably 8 or 9 years ago and before he seemed to make contact with Ziz which was after Pazek left the Bay Area and moved to live with other rationalists in a group house in Gran Canaria. Pazek's contact with Ziz seemed to be mostly online while living in Gran Canaria.

If you read Pasek's post where he thinks about committing suicide, there's plenty of TDT-thinking in it. I matched my idea of how Pasek thinks even before engaging with Ziz.

Pasek was TDT-ish vegan.

Pasek had some QS tracking for how he spent every waking hour of the day that he did on paper and seemed to not suffer from akrasia while guiding his actions.

If I remember right, that he said that stealing is okay in cases where the TDT calculation would be in favor of stealing where traditional morality would say stealing is bad. I don't think that resulted in Pasek actually stealing things but I think we talked about some case where he thought it was justified to steal which surprised me at the time. My memory is here very fussy.

nancylebovitz on A sense of logic

I dislike raw oysters quite a bit, but they're okay cooked.

Speaking of logical fallacies, the fact that one person loves a thing means that other people will even tolerate it is not strongly likely. I don't know that people even have an obligation to try things other people love.

And yet, the temptation to think that other people do or should love what one loves it very strong. "I think this is great!" just doesn't feel as true as "This is great!".