LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] My thesis (Algorithmic Bayesian Epistemology) explained in more depth
Eric Neyman (UnexpectedValues) · 2024-05-09T19:43:16.543Z · comments (4)

Thomas Kwa's research journal
Thomas Kwa (thomas-kwa) · 2023-11-23T05:11:08.907Z · comments (1)

The case for a negative alignment tax
Cameron Berg (cameron-berg) · 2024-09-18T18:33:18.491Z · comments (20)

[link] MIRI's May 2024 Newsletter
Harlan · 2024-05-15T00:13:30.153Z · comments (1)

Spaciousness In Partner Dance: A Naturalism Demo
LoganStrohl (BrienneYudkowsky) · 2023-11-19T07:00:19.555Z · comments (5)

A couple productivity tips for overthinkers
Steven Byrnes (steve2152) · 2024-04-20T16:05:50.332Z · comments (13)

Corrigibility = Tool-ness?
johnswentworth · 2024-06-28T01:19:48.883Z · comments (8)

EU policymakers reach an agreement on the AI Act
tlevin (trevor) · 2023-12-15T06:02:44.668Z · comments (7)

[link] Is "superhuman" AI forecasting BS? Some experiments on the "539" bot from the Centre for AI Safety
titotal (lombertini) · 2024-09-18T13:07:40.754Z · comments (3)

[link] The Cognitive-Theoretic Model of the Universe: A Partial Summary and Review
jessicata (jessica.liu.taylor) · 2024-03-27T19:59:27.893Z · comments (36)

Secular interpretations of core perennialist claims
zhukeepa · 2024-08-25T23:41:02.683Z · comments (32)

Send us example gnarly bugs
Beth Barnes (beth-barnes) · 2023-12-10T05:23:00.773Z · comments (10)

ACX Covid Origins Post convinced readers
ErnestScribbler · 2024-05-01T13:06:20.818Z · comments (7)

Secondary forces of debt
KatjaGrace · 2024-06-27T21:10:06.131Z · comments (18)

MATS Summer 2023 Retrospective
utilistrutil · 2023-12-01T23:29:47.958Z · comments (34)

Three Notions of "Power"
johnswentworth · 2024-10-30T06:10:08.326Z · comments (42)

[link] [Linkpost] Practically-A-Book Review: Rootclaim $100,000 Lab Leak Debate
trevor (TrevorWiesinger) · 2024-03-28T16:03:36.452Z · comments (22)

Reward hacking behavior can generalize across tasks
Kei · 2024-05-28T16:33:50.674Z · comments (5)

Questions for labs
Zach Stein-Perlman · 2024-04-30T22:15:55.362Z · comments (11)

OpenAI: Leaks Confirm the Story
Zvi · 2023-12-12T14:00:04.812Z · comments (9)

The Parable Of The Fallen Pendulum - Part 2
johnswentworth · 2024-03-12T21:41:30.180Z · comments (8)

Attention SAEs Scale to GPT-2 Small
Connor Kissane (ckkissane) · 2024-02-03T06:50:22.583Z · comments (4)

Value fragility and AI takeover
Joe Carlsmith (joekc) · 2024-08-05T21:28:07.306Z · comments (5)

Coherence of Caches and Agents
johnswentworth · 2024-04-01T23:04:31.320Z · comments (9)

Universal Love Integration Test: Hitler
Raemon · 2024-01-10T23:55:35.526Z · comments (65)

On Claude 3.0
Zvi · 2024-03-06T18:50:04.766Z · comments (5)

Darwinian Traps and Existential Risks
KristianRonn · 2024-08-25T22:37:14.142Z · comments (14)

[link] Are language models good at making predictions?
dynomight · 2023-11-06T13:10:36.379Z · comments (14)

[link] AI takeoff and nuclear war
owencb · 2024-06-11T19:36:24.710Z · comments (6)

Lying Alignment Chart
Zack_M_Davis · 2023-11-29T16:15:28.102Z · comments (17)

Grief is a fire sale
Nathan Young · 2024-03-04T01:11:06.882Z · comments (1)

Mid-conditional love
KatjaGrace · 2024-04-17T04:00:08.341Z · comments (21)

[question] What could a policy banning AGI look like?
TsviBT · 2024-03-13T14:19:07.783Z · answers+comments (23)

Bitter lessons about lucid dreaming
avturchin · 2024-10-16T21:27:04.725Z · comments (59)

Creating unrestricted AI Agents with Command R+
Simon Lermen (dalasnoin) · 2024-04-16T14:52:50.917Z · comments (13)

Scaffolding for "Noticing Metacognition"
Raemon · 2024-10-09T17:54:13.657Z · comments (4)

AISC9 has ended and there will be an AISC10
Linda Linsefors · 2024-04-29T10:53:18.812Z · comments (4)

Analogies between scaling labs and misaligned superintelligent AI
scasper · 2024-02-21T19:29:39.033Z · comments (5)

[link] Bengio's Alignment Proposal: "Towards a Cautious Scientist AI with Convergent Safety Bounds"
mattmacdermott · 2024-02-29T13:59:34.959Z · comments (19)

The Obliqueness Thesis
jessicata (jessica.liu.taylor) · 2024-09-19T00:26:30.677Z · comments (16)

My 10-year retrospective on trying SSRIs
Kaj_Sotala · 2024-09-22T20:30:02.483Z · comments (10)

[link] Claude 3.5 Sonnet
Zach Stein-Perlman · 2024-06-20T18:00:35.443Z · comments (41)

Rationality Quotes - Fall 2024
Screwtape · 2024-10-10T18:37:55.013Z · comments (21)

[Valence series] 3. Valence & Beliefs
Steven Byrnes (steve2152) · 2023-12-11T20:21:30.570Z · comments (11)

On the CrowdStrike Incident
Zvi · 2024-07-22T12:40:05.894Z · comments (14)

[link] The Offense-Defense Balance Rarely Changes
Maxwell Tabarrok (maxwell-tabarrok) · 2023-12-09T15:21:23.340Z · comments (23)

My guess at Conjecture's vision: triggering a narrative bifurcation
Alexandre Variengien (alexandre-variengien) · 2024-02-06T19:10:42.690Z · comments (12)

Vote on Anthropic Topics to Discuss
Ben Pace (Benito) · 2024-03-06T19:43:47.194Z · comments (55)

[link] The problems with the concept of an infohazard as used by the LW community [Linkpost]
Noosphere89 (sharmake-farah) · 2023-12-22T16:13:54.822Z · comments (43)

Neural uncertainty estimation review article (for alignment)
Charlie Steiner · 2023-12-05T08:01:32.723Z · comments (3)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

danielfilan on 2024 Unofficial LW Community Census, Request for Comments

You say "higher numbers for polyamorous relationships" which is contrary to "If you're polyamorous, but happen to have one partner, you would also put 1 for this question."

tag on Set Theory Multiverse vs Mathematical Truth - Philosophical Discussion

However, I find myself appealing to basic logical principles like the law of non-contradiction.

The law of non contradiction isn't true in all "universes" , either. It's not true in paraconsistent logic, specifically.

thane-ruthenis on The Compendium, A full argument about extinction risk from AGI

I agree that inventing new arguments for X that sound kind-of plausible to you on the surface level, and which you imagine would work well on a given demographic, is not a recipe for good communication. Such arguments are "artificial", they're not native citizens of someone's internally consistent world-model, and it's going to show and lead to unconvincing messages that fall apart under minimal scrutiny.

That's not what I'm arguing for. The case for the AGI risk is overdetermined: there are enough true arguments for it that you can remove a subset of them and still end up with an internally consistent world-model in which the AGI risk is real. Arguably, there's even a set of correct arguments that convinces a Creationist, without making them not-a-Creationist in the process.

Convincing messaging towards Creationists involves instantiating a world-model in which only the subset of arguments Creationists would believe exist, and then (earnestly) arguing from within that world-model.

Edit: Like, here's a sanity-check: suppose you must convince a specific Creationist that the AGI Risk is real. Do you need to argue them out of Creationism in order to do so?

screwtape on 2024 Unofficial LW Community Census, Request for Comments

Yeah, this would either need options for many countries or one schema for many countries.

Asking whether they voted or not in a national election is straight forward enough, and there's been past questions like that.

"Voting Did you vote in your country's last major national election?"

screwtape on 2024 Unofficial LW Community Census, Request for Comments

Hrm. I parse this as part of an example: if you are partnered and monogamous (and faithful!) then you should put down 1. If you're polyamorous, but happen to have one partner, you would also put 1 for this question. There's a Relationship Styles question that gets at what people prefer.

Do you think this example will confuse people?

ryankidd44 on Ryan Kidd's Shortform

I don't agree with the following claims (which might misrepresent you):

"Skill levels" are domain agnostic.
Frontier oversight, control, evals, and non-"science of DL" interp research is strictly easier in practice than frontier agent foundations and "science of DL" interp research.
The main reason there is more funding/interest in the former category than the latter is due to skill issues, rather than worldview differences and clarity of scope.
MATS has mid researchers relative to other programs.

habryka4 on The Compendium, A full argument about extinction risk from AGI

No, I think this kind of very naive calculation does predictably result in worse arguments propagating, people rightfully dismissing those bad arguments (because they are not entangled with the real reasons why any of the people who have thought about the problem have formed beliefs on an issue themselves), and then ultimately the comms problem getting much harder.

I am in favor of people thinking hard about these issues, but I think exactly this kind of naive argument are in an uncanny valley where I think your comms gets substantially worse.

thane-ruthenis on The Compendium, A full argument about extinction risk from AGI

Perhaps. Admittedly, I don't have a solid model of whether a median American claiming to be a Creationist in surveys would instantly dismiss a message if it starts making arguments from evolution.

Still, I think the general point applies:

A convincing case for the AGI Omnicide Risk doesn't have to include arguments from human evolution.
Arguments from human evolution may trigger some people to instinctively dismiss the entire message.
If the fraction of such people is large enough, it makes sense to have public AI-Risk messages that avoid evolution-based arguments when making their case.

ryankidd44 on Ryan Kidd's Shortform

I don't think it makes sense to compare Google intern salary with AIS program stipends this way, as AIS programs are nonprofits (with associated salary cut) and generally trying to select against people motivated principally by money. It seems like good mechanism design to pay less than tech internships, even if the technical bar for is higher, given that value alignment is best selected by looking for "costly signals" like salary sacrifice.

I don't think the correlation for competence among AIS programs is as you describe.

danielfilan on 2024 Unofficial LW Community Census, Request for Comments

If you've been waiting for an excuse to be done, this is probably the point where twenty percent of the effort has gotten eighty percent of the effect.

Should be "eighty percent of the benefit" or similar.