LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Patient Observation
LoganStrohl (BrienneYudkowsky) · 2022-02-23T19:31:45.062Z · comments (4)

High Status Eschews Quantification of Performance
niplav · 2023-03-19T22:14:16.523Z · comments (36)

Long covid: probably worth avoiding—some considerations
KatjaGrace · 2022-01-16T11:46:52.087Z · comments (88)

Limerence Messes Up Your Rationality Real Bad, Yo
Raemon · 2022-07-01T16:53:10.914Z · comments (42)

Clarifying AI X-risk
zac_kenton (zkenton) · 2022-11-01T11:03:01.144Z · comments (24)

On the Diplomacy AI
Zvi · 2022-11-28T13:20:00.884Z · comments (29)

I left Russia on March 8
avturchin · 2022-03-10T20:05:59.650Z · comments (16)

"Pivotal Acts" means something specific
Raemon · 2022-06-07T21:56:00.574Z · comments (23)

Community Notes by X
NicholasKees (nick_kees) · 2024-03-18T17:13:33.195Z · comments (15)

Selection Theorems: A Program For Understanding Agents
johnswentworth · 2021-09-28T05:03:19.316Z · comments (28)

[link] Parkinson's Law and the Ideology of Statistics
Benquo · 2025-01-04T15:49:21.247Z · comments (6)

Re-Examining LayerNorm
Eric Winsor (EricWinsor) · 2022-12-01T22:20:23.542Z · comments (12)

Firming Up Not-Lying Around Its Edge-Cases Is Less Broadly Useful Than One Might Initially Think
Zack_M_Davis · 2019-12-27T05:09:22.546Z · comments (43)

My Overview of the AI Alignment Landscape: A Bird's Eye View
Neel Nanda (neel-nanda-1) · 2021-12-15T23:44:31.873Z · comments (9)

Building AI Research Fleets
Ben Goldhaber (bgold) · 2025-01-12T18:23:09.682Z · comments (11)

Tell me about yourself: LLMs are aware of their learned behaviors
Martín Soto (martinsq) · 2025-01-22T00:47:15.023Z · comments (5)

One-layer transformers aren’t equivalent to a set of skip-trigrams
Buck · 2023-02-17T17:26:13.819Z · comments (11)

Goodhart's Law in Reinforcement Learning
jacek (jacek-karwowski) · 2023-10-16T00:54:11.669Z · comments (22)

[link] FLI open letter: Pause giant AI experiments
Zach Stein-Perlman · 2023-03-29T04:04:23.333Z · comments (123)

Warning Shots Probably Wouldn't Change The Picture Much
So8res · 2022-10-06T05:15:39.391Z · comments (42)

Shared reality: a key driver of human behavior
kdbscott · 2022-12-24T19:35:51.126Z · comments (25)

Pantheon Interface
NicholasKees (nick_kees) · 2024-07-08T19:03:51.681Z · comments (22)

[link] The Hubinger lectures on AGI safety: an introductory lecture series
evhub · 2023-06-22T00:59:27.820Z · comments (0)

ARC is hiring theoretical researchers
paulfchristiano · 2023-06-12T18:50:08.232Z · comments (12)

AI Alignment 2018-19 Review
Rohin Shah (rohinmshah) · 2020-01-28T02:19:52.782Z · comments (6)

Real-Life Examples of Prediction Systems Interfering with the Real World (Predict-O-Matic Problems)
NunoSempere (Radamantis) · 2020-12-03T22:00:26.889Z · comments (28)

A Longlist of Theories of Impact for Interpretability
Neel Nanda (neel-nanda-1) · 2022-03-11T14:55:35.356Z · comments (41)

The case for becoming a black-box investigator of language models
Buck · 2022-05-06T14:35:24.630Z · comments (20)

Some background for reasoning about dual-use alignment research
Charlie Steiner · 2023-05-18T14:50:54.401Z · comments (22)

Insights from Euclid's 'Elements'
TurnTrout · 2020-05-04T15:45:30.711Z · comments (17)

A Shutdown Problem Proposal
johnswentworth · 2024-01-21T18:12:48.664Z · comments (61)

Baking is Not a Ritual
Sisi Cheng (sisi-cheng) · 2020-05-25T18:08:24.836Z · comments (28)

Recommendation: Bug Bounties and Responsible Disclosure for Advanced ML Systems
Vaniver · 2023-02-17T20:11:39.255Z · comments (12)

From fear to excitement
Richard_Ngo (ricraz) · 2023-05-15T06:23:18.656Z · comments (9)

An even deeper atheism
Joe Carlsmith (joekc) · 2024-01-11T17:28:31.843Z · comments (47)

AI Safety "Success Stories"
Wei Dai (Wei_Dai) · 2019-09-07T02:54:15.003Z · comments (27)

One Minute Every Moment
abramdemski · 2023-09-01T20:23:56.391Z · comments (23)

[link] Gene drives: why the wait?
Metacelsus · 2022-09-19T23:37:17.595Z · comments (50)

There are (probably) no superhuman Go AIs: strong human players beat the strongest AIs
Taran · 2023-02-19T12:25:52.212Z · comments (34)

Induction heads - illustrated
CallumMcDougall (TheMcDouglas) · 2023-01-02T15:35:20.550Z · comments (10)

Transcript: "You Should Read HPMOR"
TurnTrout · 2021-11-02T18:20:53.161Z · comments (12)

[link] Fiber arts, mysterious dodecahedrons, and waiting on “Eureka!”
eukaryote · 2022-08-04T20:37:59.388Z · comments (15)

Deconfusing Direct vs Amortised Optimization
beren · 2022-12-02T11:30:46.754Z · comments (19)

Explaining the Twitter Postrat Scene
Jacob Falkovich (Jacobian) · 2022-04-05T22:23:27.125Z · comments (28)

"The Solomonoff Prior is Malign" is a special case of a simpler argument
David Matolcsi (matolcsid) · 2024-11-17T21:32:34.711Z · comments (44)

The Wicked Problem Experience
HoldenKarnofsky · 2022-03-02T17:50:18.621Z · comments (6)

[link] Bayesian Injustice
Kevin Dorst · 2023-12-14T15:44:08.664Z · comments (10)

Things I've Grieved
Raemon · 2024-02-18T19:32:47.169Z · comments (6)

[link] When discussing AI risks, talk about capabilities, not intelligence
Vika · 2023-08-11T13:38:48.844Z · comments (7)

[link] OpenAI's CBRN tests seem unclear
LucaRighetti (Error404Dinosaur) · 2024-11-21T17:28:30.290Z · comments (6)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

mohit-gore on Biases: An Introduction

In a study of bias blindness, experimental subjects predicted that they would have a harder time neutrally evaluating the quality of paintings if they knew the paintings were by famous artists. And indeed, these subjects exhibited the very bias they had predicted when the experimenters later tested their prediction. When asked afterward, however, the very same subjects claimed that their assessments of the paintings had been objective and unaffected by the bias.³

Also interesting to note that this mirrors how some LLMs work when asked to judge the work of acclaimed artists. If you present the work with the context of it being by someone widely recognized for their talent, the LLM will resort to praise over criticism and will return very shallow, surface-level observations and then hastily append the reassurance that of course, art is subjective and everyone has their own opinions, so really there is no criticism that can be done at all.

It is only when the text is given independent of its author that the LLM actually conducts some form of criticism upon it. And just like how it's written here, LLMs deny having bias blindness when confronted and claim to be entirely neutral in their takeaways.

testingthewaters on Shortform

The question as stated can be rephrased as "Should EAs establish a strategic stranglehold over all future resources necessary to sustain life using a series of unequal treaties, since other humans will be too short sighted/insensitive to scope/ignorant to realise the importance of these resources in the present day?"

And people here wonder why these other humans see EAs as power hungry.

cole-wyeth on nikola's Shortform

How are you interpreting this fact?

Sam Altman's power, money, and status all rely on people believing that GPT-(T+1) is going to be smarter than them. Altman doesn't have good track record of being honest and sincere when it comes to protecting his power, money, and status.

cole-wyeth on the devil's ontology

I don't think I understood this.

capresearcher on Martin Vlach's Shortform

Sadly, in my experience, looking at the representational capacity of neural networks quickly runs into very annoying technical problems. For example, for a fixed dimension, a finite size network can fit arbitrary continuous functions to arbitrary accuracy. The construction is pathological (in particular, the network weights become impractically large), but it shows why it's hard to prove limitations in the representational capacity of neural networks.

You could limited the network parameters to have finite precision, but that makes it extremely hard to reason formally. Numerical experiments could still yield interesting results though.

Personally, I'd put my money on research into what neural networks can learn (rather than what they can represent). We're still in early stages, but things like the leap complexity seem promising to me.

seth-herd on When you downvote, explain why

I'm curious why you disagree? I'd guess you're thinking that it's necessary to keep low-quality contributions from flooding the space, and telling people how to improve when they're just way off the mark is not helpful. Or if they haven't read the FAQ or read enough posts that shouldn't be rewarded.

But I'm very curious why you disagree.

geoffrey-irving on Debate, Oracles, and Obfuscated Arguments

You'd need some coupling argument to know that the problems have related difficulty, so that if A is constantly saying "I don't know" to other similar problems it counts as evidence that A can't reliably know the answer to this one. But to be clear, we don't know how to make this particular protocol go through, since we don't know how to formalise that kind of similarity assumption in a plausibly useful way. We do know a different protocol with better properties (coming soon).

christiankl on eliminating bias through language?

I don't think the mental model of "corrupted machinery" is a very useful one. Humans reason by using heuristics. Many heuristics have advantages and disadvantages instead of being perfect. Sometimes that's because they are making tradeoffs, other times it's because they have random quirks.

Real Character was a failed experiment. I don't know how capable Ithkuil IV happens to be.

mohit-gore on Preface

Only fifteen comments on a post with 804 upvotes? I guess I'll add the sixteenth. Thank you for writing the Sequences; going by others' comments it seems to be quite a transformative experience to read them, and I can't wait to get started.

Also, thank you for recognizing the flaws in choosing impressive-sounding examples in your writing. That was refreshing and validating to hear.

michaeldickens on Nisan's Shortform

If you publicly commit to something, taking down the written text does not constitute a de-commitment. Violating a prior commitment is unethical regardless of whether the text of the commitment is still on your website.

(Not that there's any mechanism to hold Google to its commitments, or that these commitments ever meant anything—Google was always going to do whatever it wanted anyway.)