LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Interpretability/Tool-ness/Alignment/Corrigibility are not Composable
johnswentworth · 2022-08-08T18:05:11.982Z · comments (12)

Covid 1/21: Turning the Corner
Zvi · 2021-01-21T16:40:00.941Z · comments (41)

EA orgs' legal structure inhibits risk taking and information sharing on the margin
Elizabeth (pktechgirl) · 2023-11-05T19:13:56.135Z · comments (17)

[Completed] The 2024 Petrov Day Scenario
Ben Pace (Benito) · 2024-09-26T08:08:32.495Z · comments (114)

A mechanistic model of meditation
Kaj_Sotala · 2019-11-06T21:37:03.819Z · comments (11)

"Rationalist Discourse" Is Like "Physicist Motors"
Zack_M_Davis · 2023-02-26T05:58:29.249Z · comments (153)

Read the Roon
Zvi · 2024-03-05T13:50:04.967Z · comments (6)

[question] LessWrong Coronavirus Agenda
Elizabeth (pktechgirl) · 2020-03-18T04:48:56.769Z · answers+comments (65)

Contra EY: Can AGI destroy us without trial & error?
Nikita Sokolsky (nikita-sokolsky) · 2022-06-13T18:26:09.460Z · comments (72)

Four mindset disagreements behind existential risk disagreements in ML
Rob Bensinger (RobbBB) · 2023-04-11T04:53:48.427Z · comments (12)

[link] Ten Thousand Years of Solitude
agp (antonio-papa) · 2023-08-15T17:45:34.556Z · comments (19)

[link] China Hawks are Manufacturing an AI Arms Race
garrison · 2024-11-20T18:17:51.958Z · comments (42)

Carrying the Torch: A Response to Anna Salamon by the Guild of the Rose
moridinamael · 2022-07-06T14:20:14.847Z · comments (16)

[link] Neuronpedia
Johnny Lin (hijohnnylin) · 2023-07-26T16:29:28.884Z · comments (51)

Five Ways To Prioritize Better
lynettebye · 2020-06-27T18:40:26.600Z · comments (7)

Debate update: Obfuscated arguments problem
Beth Barnes (beth-barnes) · 2020-12-23T03:24:38.191Z · comments (24)

LessWrong Now Has Dark Mode
jimrandomh · 2022-05-10T01:21:44.065Z · comments (31)

On Bounded Distrust
Zvi · 2022-02-03T14:50:00.883Z · comments (19)

Don't Dismiss Simple Alignment Approaches
Chris_Leong · 2023-10-07T00:35:26.789Z · comments (9)

Monitoring for deceptive alignment
evhub · 2022-09-08T23:07:03.327Z · comments (8)

Possible takeaways from the coronavirus pandemic for slow AI takeoff
Vika · 2020-05-31T17:51:26.437Z · comments (36)

Apply to the Redwood Research Mechanistic Interpretability Experiment (REMIX), a research program in Berkeley
maxnadeau · 2022-10-27T01:32:44.750Z · comments (14)

The 99% principle for personal problems
Kaj_Sotala · 2023-10-02T08:20:07.379Z · comments (20)

2018 Review: Voting Results!
Ben Pace (Benito) · 2020-01-24T02:00:34.656Z · comments (59)

Loving a world you don’t trust
Joe Carlsmith (joekc) · 2024-06-18T19:31:36.581Z · comments (13)

Integrity in AI Governance and Advocacy
habryka (habryka4) · 2023-11-03T19:52:33.180Z · comments (57)

Message Length
Zack_M_Davis · 2020-10-20T05:52:56.277Z · comments (25)

Limitations on Formal Verification for AI Safety
Andrew Dickson · 2024-08-19T23:03:52.706Z · comments (60)

Mechanistic anomaly detection and ELK
paulfchristiano · 2022-11-25T18:50:04.447Z · comments (22)

A non-mystical explanation of insight meditation and the three characteristics of existence: introduction and preamble
Kaj_Sotala · 2020-05-05T19:09:44.484Z · comments (40)

[link] How to slow down scientific progress, according to Leo Szilard
jasoncrawford · 2023-01-05T18:26:12.121Z · comments (18)

How LLMs are and are not myopic
janus · 2023-07-25T02:19:44.949Z · comments (16)

You have a place to stay in Sweden, should you need it.
Dojan · 2022-02-27T01:21:19.552Z · comments (3)

Brainstorm of things that could force an AI team to burn their lead
So8res · 2022-07-24T23:58:16.988Z · comments (8)

Wolf Incident Postmortem
jefftk (jkaufman) · 2023-01-09T03:20:03.723Z · comments (13)

[question] Will COVID-19 survivors suffer lasting disability at a high rate?
jimrandomh · 2020-02-11T20:23:50.664Z · answers+comments (11)

An Extremely Opinionated Annotated List of My Favourite Mechanistic Interpretability Papers v2
Neel Nanda (neel-nanda-1) · 2024-07-07T17:39:35.064Z · comments (16)

How to evaluate (50%) predictions
Rafael Harth (sil-ver) · 2020-04-10T17:12:02.867Z · comments (50)

[question] Forecasting Thread: AI Timelines
Amandango · 2020-08-22T02:33:09.431Z · answers+comments (98)

Pretraining Language Models with Human Preferences
Tomek Korbak (tomek-korbak) · 2023-02-21T17:57:09.774Z · comments (20)

AI will change the world, but won’t take it over by playing “3-dimensional chess”.
boazbarak · 2022-11-22T18:57:29.604Z · comments (97)

Apocalypse insurance, and the hardline libertarian take on AI risk
So8res · 2023-11-28T02:09:52.400Z · comments (40)

Modal Fixpoint Cooperation without Löb's Theorem
Andrew_Critch · 2023-02-05T00:58:40.975Z · comments (34)

Invulnerable Incomplete Preferences: A Formal Statement
SCP (sami-petersen) · 2023-08-30T21:59:36.186Z · comments (38)

The Worst Form Of Government (Except For Everything Else We've Tried)
johnswentworth · 2024-03-17T18:11:38.374Z · comments (47)

How it All Went Down: The Puzzle Hunt that took us way, way Less Online
A* (agendra) · 2024-06-02T08:01:40.109Z · comments (5)

On saying "Thank you" instead of "I'm Sorry"
Michael Cohn (michael-cohn) · 2024-07-08T03:13:50.663Z · comments (16)

LessWrong is paying $500 for Book Reviews
Ruby · 2021-09-14T00:24:23.507Z · comments (25)

Nuclear war is unlikely to cause human extinction
Jeffrey Ladish (jeff-ladish) · 2020-11-07T05:42:24.380Z · comments (48)

Demand offsetting
paulfchristiano · 2021-03-21T18:20:05.090Z · comments (41)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

vanessa-kosoy on What is the most impressive game LLMs can play well?

Relevant: Manifold market about LLM chess

foyle on Why do futurists care about the culture war?

Agree that most sociological, economic and environmental problems that loom large in current context will radically shift in importance in next decade or two, to the point that they are probably no longer worth devoting any significant resources to in the present. Impacts of AI are only issue worth worrying about. But even assuming utopian outcomes; who gets possession of the Malibu beach houses in post scarcity world?

Once significant white-collar job losses start to mount in a year or two I think it inevitable that a powerful and electorally dominant anti-AI movement will grow, at least in erstwhile democracies, and likely ban most AGI applications outside of a few fields where fewer workers would stand to lose jobs (health - with near endless demand, perhaps cutting edge tech where payoff to human net welfare is highest). Butlerian Jihad-lite.

It won't save us, and has substantial risk of ushering in repressive authoritarianism in the political ruckus caused but will likely delay our demise or (at best) delivery into powerless pet status by perhaps a decade or two.

fl33tw00d on How do you deal w/ Super Stimuli?

Firefox Focus on iPhone is useful here.

Delete all social media apps from your phone, and hide safari.

Only access any platforms via Firefox Focus - browsing sessions are ephemeral, so you need to login each time. This added friction basically solved it for me.

viliam on The low Information Density of Eliezer Yudkowsky & LessWrong

I'll try to respect your preference for brevity ;)

a shorter version would be very useful -- yes, fully agree
- at least there is readthesequences.com without the comments (10x as much text as the articles)
- there were summaries at LW wiki, but those were too short; we need something medium-sized
there are some good reasons why Eliezer wrote a long text
- there wasn't rationalist community yet, lines had to be drawn to separate it from many existing adjacent communities (atheists, skeptics, libertarians, sci-fi fans, self-help, contrarians, academia...)
- emotional, near-mode appeal -- why should we even care about "being rational"?
- popular bad memes/patterns (mysterious answers, applause lights, "trust the science"...)

tl;dr -- writing for an already existing rationalist(-ish) community is different from writing in order to create a rationalist community

davidmanheim on AI Safety as a YC Startup

True, and even more, if optimizing for impact or magnitude has Goodhart effects, of various types, then even otherwise good directions are likely to be ruined by pushing on them too hard. (In large part because it seems likely that the space we care about is not going to have linear divisions into good and bad, there will be much more complex regions, and even when pointed in a directino that is locally better, pushing too far is possible, and very hard to predict from local features even if people try, which they mostly don't.)

davidmanheim on AI Safety as a YC Startup

I think the point wasn't having a unit norm, it was that impact wasn't defined as directional, so we'd need to remove the dimensionality from a multidimensionally defined direction.

So to continue the nitpicking, I'd argue impact = || Magnitude * Direction ||, or better, ||Impact|| = Magnitude * Direction, so that we can talk about size of impact. And that makes my point in a different comment even clearer - because almost by assumption, the vast majority of those with large impact are pointed in net-negative directions, unless you think either a significant proportion of directions are positive, or that people are selecting for it very strongly, which seems not to be the case.

james-chua on Inference-Time-Compute: More Faithful? A Research Note

thanks! Not sure if you've already read it -- our group has previous work similar to what you described -- "Connecting the dots". Models can e.g. articulate functions that that implicit in the training data. This ability is not perfect, models still have a long way to go.

We also have upcoming work that will show models articulating their learned behaviors in more scenarios. Will be released soon.

satron on Reducing sycophancy and improving honesty via activation steering

A new method for reducing sycophancy. Sycophantic behavior is present in quite a few AI threat models, so it's an important area to work on.

The article not only uses activation steering to reduce sycophancy in AI models but also provides directions for future work [LW · GW].

Overall, this post is a valuable addition to the toolkit of people who wish to build safe advanced AI.

tailcalled on We probably won't just play status games with each other after AGI

I don't think consumers demand authentic AI friends because they already have authentic human friends. Also it's not clear how you imagine the AI companies could train the AIs to be more independent and less superficial; generally training an AI requires a differentiable loss, but human independence does not originate from a differentiable loss and so it's not obvious that one could come up with something functionally similar via gradient descent.

paddyc on Seeing Red: Dissolving Mary's Room and Qualia

It’s easy to imagine visual perception in relation to qualia ie how do I know that the blue I see looks the same to you beyond identifying it eg the sky is blue. But I think it’s harder to imagine qualia in relation to sound, that is how could sound have a subjective essence that is possibly unique to each individual. I think you either hear something or you don’t.