LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] Contra Ngo et al. “Every ‘Every Bay Area House Party’ Bay Area House Party”
Ricki Heicklen (bayesshammai) · 2024-02-22T23:56:02.318Z · comments (5)

Why Would Belief-States Have A Fractal Structure, And Why Would That Matter For Interpretability? An Explainer
johnswentworth · 2024-04-18T00:27:43.451Z · comments (21)

Cryonics is free
Mati_Roy (MathieuRoy) · 2024-09-29T17:58:17.108Z · comments (37)

[link] Why I’m not a Bayesian
Richard_Ngo (ricraz) · 2024-10-06T15:22:45.644Z · comments (92)

[link] Daniel Kahneman has died
DanielFilan · 2024-03-27T15:59:14.517Z · comments (11)

Humming is not a free $100 bill
Elizabeth (pktechgirl) · 2024-06-06T20:10:02.457Z · comments (6)

Introducing Alignment Stress-Testing at Anthropic
evhub · 2024-01-12T23:51:25.875Z · comments (23)

Safety consultations for AI lab employees
Zach Stein-Perlman · 2024-07-27T15:00:27.276Z · comments (4)

"Humanity vs. AGI" Will Never Look Like "Humanity vs. AGI" to Humanity
Thane Ruthenis · 2023-12-16T20:08:39.375Z · comments (34)

Contra papers claiming superhuman AI forecasting
nikos (followtheargument) · 2024-09-12T18:10:50.582Z · comments (16)

Skills from a year of Purposeful Rationality Practice
Raemon · 2024-09-18T02:05:58.726Z · comments (18)

re: Yudkowsky on biological materials
bhauth · 2023-12-11T13:28:10.639Z · comments (30)

[link] Toward a Broader Conception of Adverse Selection
Ricki Heicklen (bayesshammai) · 2024-03-14T22:40:57.920Z · comments (61)

[question] Why is o1 so deceptive?
abramdemski · 2024-09-27T17:27:35.439Z · answers+comments (24)

Every "Every Bay Area House Party" Bay Area House Party
Richard_Ngo (ricraz) · 2024-02-16T18:53:28.567Z · comments (6)

[link] FHI (Future of Humanity Institute) has shut down (2005–2024)
gwern · 2024-04-17T13:54:16.791Z · comments (22)

WTH is Cerebrolysin, actually?
gsfitzgerald (neuroplume) · 2024-08-06T20:40:53.378Z · comments (23)

Struggling like a Shadowmoth
Raemon · 2024-09-24T00:47:05.030Z · comments (38)

Effective Aspersions: How the Nonlinear Investigation Went Wrong
TracingWoodgrains (tracingwoodgrains) · 2023-12-19T12:00:23.529Z · comments (170)

This is already your second chance
Malmesbury (Elmer of Malmesbury) · 2024-07-28T17:13:57.680Z · comments (13)

Critical review of Christiano's disagreements with Yudkowsky
Vanessa Kosoy (vanessa-kosoy) · 2023-12-27T16:02:50.499Z · comments (40)

Timaeus's First Four Months
Jesse Hoogland (jhoogland) · 2024-02-28T17:01:53.437Z · comments (6)

Did Christopher Hitchens change his mind about waterboarding?
Isaac King (KingSupernova) · 2024-09-15T08:28:09.451Z · comments (22)

'Empiricism!' as Anti-Epistemology
Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2024-03-14T02:02:59.723Z · comments (90)

Three Subtle Examples of Data Leakage
abstractapplic · 2024-10-01T20:45:27.731Z · comments (16)

2023 Unofficial LessWrong Census/Survey
Screwtape · 2023-12-02T04:41:51.418Z · comments (81)

[link] Recommendation: reports on the search for missing hiker Bill Ewasko
eukaryote · 2024-07-31T22:15:03.174Z · comments (28)

Reconsider the anti-cavity bacteria if you are Asian
Lao Mein (derpherpize) · 2024-04-15T07:02:02.655Z · comments (43)

The 'Neglected Approaches' Approach: AE Studio's Alignment Agenda
Cameron Berg (cameron-berg) · 2023-12-18T20:35:01.569Z · comments (21)

Many arguments for AI x-risk are wrong
TurnTrout · 2024-03-05T02:31:00.990Z · comments (86)

How useful is mechanistic interpretability?
ryan_greenblatt · 2023-12-01T02:54:53.488Z · comments (54)

Is being sexy for your homies?
Valentine · 2023-12-13T20:37:02.043Z · comments (92)

[link] Boycott OpenAI
PeterMcCluskey · 2024-06-18T19:52:42.854Z · comments (26)

The Median Researcher Problem
johnswentworth · 2024-11-02T20:16:11.341Z · comments (69)

Announcing ILIAD — Theoretical AI Alignment Conference
Nora_Ammann · 2024-06-05T09:37:39.546Z · comments (18)

The likely first longevity drug is based on sketchy science. This is bad for science and bad for longevity.
BobBurgers · 2023-12-12T02:42:18.559Z · comments (34)

You can remove GPT2’s LayerNorm by fine-tuning for an hour
StefanHex (Stefan42) · 2024-08-08T18:33:38.803Z · comments (11)

[link] Sycophancy to subterfuge: Investigating reward tampering in large language models
Carson Denison (carson-denison) · 2024-06-17T18:41:31.090Z · comments (22)

[link] Connecting the Dots: LLMs can Infer & Verbalize Latent Structure from Training Data
Johannes Treutlein (Johannes_Treutlein) · 2024-06-21T15:54:41.430Z · comments (13)

And All the Shoggoths Merely Players
Zack_M_Davis · 2024-02-10T19:56:59.513Z · comments (57)

Without fundamental advances, misalignment and catastrophe are the default outcomes of training powerful AI
Jeremy Gillen (jeremy-gillen) · 2024-01-26T07:22:06.370Z · comments (60)

DeepMind's "Frontier Safety Framework" is weak and unambitious
Zach Stein-Perlman · 2024-05-18T03:00:13.541Z · comments (14)

[link] Masterpiece
Richard_Ngo (ricraz) · 2024-02-13T23:10:35.376Z · comments (21)

[link] Making every researcher seek grants is a broken model
jasoncrawford · 2024-01-26T16:06:26.688Z · comments (41)

Most People Don't Realize We Have No Idea How Our AIs Work
Thane Ruthenis · 2023-12-21T20:02:00.360Z · comments (42)

[link] Succession
Richard_Ngo (ricraz) · 2023-12-20T19:25:03.185Z · comments (48)

What’s up with LLMs representing XORs of arbitrary features?
Sam Marks (samuel-marks) · 2024-01-03T19:44:33.162Z · comments (61)

EIS XIII: Reflections on Anthropic’s SAE Research Circa May 2024
scasper · 2024-05-21T20:15:36.502Z · comments (16)

The Summoned Heroine's Prediction Markets Keep Providing Financial Services To The Demon King!
abstractapplic · 2024-10-26T12:34:51.059Z · comments (16)

Language Models Model Us
eggsyntax · 2024-05-17T21:00:34.821Z · comments (55)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

cole-wyeth on A few questions about recent developments in EA

My intuition is kind of the opposite - I think EA has a less coherent purpose. It's actually kind of a large tent for animal welfare, longtermism, and global poverty. I think some of the divergence in priorities between EA's is about impact assessment / fact finding, and a lot of ink is spilled on this, but some is probably about values too. I think of EA as very outward-facing, coalitional, and ideally a little pragmatic, so I don't think it's a good basis for an organized totalizing worldview.

The study of human rationality is a more universal project. It makes sense to have a monastic class that (at least for some years of their life) sets aside politics and refines the craft, perhaps functioning as an impersonal interface when they go out into the world - almost like Bene Gesserit advisors (or a Confessor).

I have thought about building it. The physical building itself would be quite expensive, since the monastery would need to meet many psychological requirements - it would have to be both isolated and starkly beautiful. Also, well-provisioned. So this part would be expensive; and its an expense that EA organizations probably couldn't justify (that is, larger and more extravagant than buying a castle). Of course, most of the difficulty would be in creating the culture - but I think that building the monastery properly would go a long way (if you build it, they will come).

azergante on Ideas for an action coordination website

I really like the idea of milestones, I think seeing the result of each milestones will help create trust in the group, confidence that the end action will succeed and a realization of the real impact the group has. Each CA should probably start with small milestones (posting something on social medias) and ramp things up until the end goal is reached. Seeing actual impact early will definitely keep people engaged and might make the group more cohesive and ambitious.

marcus-williams on Habryka's Shortform Feed

I suppose you could use models trained before vulnerabilities happen?

marcus-williams on Have we seen any "ReLU instead of sigmoid-type improvements" recently

"We offer no explanation as to why these architectures seem to work; we attribute their success, as all else, to divine benevolence." -SwiGLU paper.

I think it varies, a few of these are trying "random" things, but mostly they are educated guesses which are then validated empirically. Often there is a spefic problem we want to solve i.e. exploding gradients or O(n^2) attention and then authors try things which may or may not solve/mitigate the problem.

azergante on If a "Kickstarter for Inadequate Equlibria" was built, do you have a concrete inadequate equilibrium to fix?

Ditch old software tools or programming languages for better, new ones.

sharmake-farah on quetzal_rainbow's Shortform

It's not surprising that a lot of people don't want to define physics while believing in physicalism, because properly explaining the equations that describe the physical world would take quite a long time, let alone describing what's actually going on in physics, and it would require a textbook minimum to make this work.

zvi on AI #91: Deep Thinking

I am not a software engineer, and I've encountered cases where it seems plausible that an engineer has basically stopped putting in work. It can be tough to know for sure for a while even when you notice. But yeah, it shouldn't be able to last for THAT long, but if no one is paying attention?

I've also had jobs where I've had periods with radically different hours worked, and where it would have been very difficult for others to tell which it was for a while if I was trying to hide it, which I wasn't.

zvi on Zvi’s Thoughts on His 2nd Round of SFF

I think twice as much time actually spent would have improved decisions substantially, but is tough - everyone is very busy these days, so it would require both a longer working window, and also probably higher compensation for recommenders. At minimum, it would allow a lot more investigations especially of non-connected outsider proposals.

lc on Shortform

Well that's at least a completely different kind of regulatory failure than the one that was proposed on Twitter. But this is probably motivated reasoning on Microsoft's part. Kernel access is only necessary for IDS because of Microsoft's design choices. If Microsoft wanted, they could also have exported a user API for IDS services, which is a project they are working on now. MacOS already has this! And Microsoft would never ever have done as good a job on their own if they hadn't faced competition from other companies, which is why everyone uses CrowdStrike in the first place.

oumuamua on AI #91: Deep Thinking

If future more capable models are indeed actively resisting their alignment training, and this is happening consistently, that seems like an important update to be making?

Could someone explain to me what this resisting behavior during alignment training looked like in practice?

Did the model outright say "I don't want to do this?", did it produce nonsensical results, did it become deceptive, did it just ... not work?

This claim seems very interesting if true, is there any further information on this?