LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Attitudes about Applied Rationality
Camille Berger (Camille Berger) · 2024-02-03T14:42:22.770Z · comments (18)

The case for more ambitious language model evals
Jozdien · 2024-01-30T00:01:13.876Z · comments (25)

The Pareto Best and the Curse of Doom
Screwtape · 2024-02-21T23:10:01.359Z · comments (22)

[question] What convincing warning shot could help prevent extinction from AI?
Charbel-Raphaël (charbel-raphael-segerie) · 2024-04-13T18:09:29.096Z · answers+comments (18)

[question] Which skincare products are evidence-based?
Vanessa Kosoy (vanessa-kosoy) · 2024-05-02T15:22:12.597Z · answers+comments (43)

[link] introduction to cancer vaccines
bhauth · 2024-05-05T01:06:16.972Z · comments (19)

A Selection of Randomly Selected SAE Features
CallumMcDougall (TheMcDouglas) · 2024-04-01T09:09:49.235Z · comments (2)

New LessWrong review winner UI ("The LeastWrong" section and full-art post pages)
kave · 2024-02-28T02:42:05.801Z · comments (63)

Skills I'd like my collaborators to have
Raemon · 2024-02-09T08:20:37.686Z · comments (9)

2023 in AI predictions
jessicata (jessica.liu.taylor) · 2024-01-01T05:23:42.514Z · comments (34)

The first future and the best future
KatjaGrace · 2024-04-25T06:40:04.510Z · comments (12)

Why I'm doing PauseAI
Joseph Miller (Josephm) · 2024-04-30T16:21:54.156Z · comments (16)

[link] A Chess-GPT Linear Emergent World Representation
karvonenadam · 2024-02-08T04:25:15.222Z · comments (14)

Discriminating Behaviorally Identical Classifiers: a model problem for applying interpretability to scalable oversight
Sam Marks (samuel-marks) · 2024-04-18T16:17:39.136Z · comments (7)

A Dozen Ways to Get More Dakka
Davidmanheim · 2024-04-08T04:45:19.427Z · comments (10)

Counting arguments provide no evidence for AI doom
Nora Belrose (nora-belrose) · 2024-02-27T23:03:49.296Z · comments (177)

[link] Notes from a Prompt Factory
Richard_Ngo (ricraz) · 2024-03-10T05:13:39.384Z · comments (19)

General Thoughts on Secular Solstice
Jeffrey Heninger (jeffrey-heninger) · 2024-03-23T18:48:43.940Z · comments (60)

Deceptive AI ≠ Deceptively-aligned AI
Steven Byrnes (steve2152) · 2024-01-07T16:55:13.761Z · comments (19)

Lsusr's Rationality Dojo
lsusr · 2024-02-13T05:52:03.757Z · comments (17)

[link] A case for AI alignment being difficult
jessicata (jessica.liu.taylor) · 2023-12-31T19:55:26.130Z · comments (53)

[link] Carl Sagan, nuking the moon, and not nuking the moon
eukaryote · 2024-04-13T04:08:50.166Z · comments (7)

Language Models Model Us
eggsyntax · 2024-05-17T21:00:34.821Z · comments (25)

Announcing the London Initiative for Safe AI (LISA)
James Fox · 2024-02-02T23:17:47.011Z · comments (0)

[link] Explaining Impact Markets
Saul Munn (saul-munn) · 2024-01-31T09:51:27.587Z · comments (2)

[link] Ideological Bayesians
Kevin Dorst · 2024-02-25T14:17:25.070Z · comments (4)

[link] Almost everyone I’ve met would be well-served thinking more about what to focus on
Henrik Karlsson (henrik-karlsson) · 2024-01-05T21:01:27.861Z · comments (8)

[link] MIRI's April 2024 Newsletter
Harlan · 2024-04-12T23:38:20.781Z · comments (0)

Simple versus Short: Higher-order degeneracy and error-correction
Daniel Murfet (dmurfet) · 2024-03-11T07:52:46.307Z · comments (5)

Explaining a Math Magic Trick
Robert_AIZI · 2024-05-05T19:41:52.048Z · comments (10)

[link] RAND report finds no effect of current LLMs on viability of bioterrorism attacks
StellaAthena · 2024-01-25T19:17:30.493Z · comments (14)

Open Source Sparse Autoencoders for all Residual Stream Layers of GPT2-Small
Joseph Bloom (Jbloom) · 2024-02-02T06:54:53.392Z · comments (37)

[link] "Deep Learning" Is Function Approximation
Zack_M_Davis · 2024-03-21T17:50:36.254Z · comments (28)

Notes on Dwarkesh Patel’s Podcast with Demis Hassabis
Zvi · 2024-03-01T16:30:08.687Z · comments (0)

OpenAI's Sora is an agent
CBiddulph (caleb-biddulph) · 2024-02-16T07:35:52.171Z · comments (25)

Dreams of AI alignment: The danger of suggestive names
TurnTrout · 2024-02-10T01:22:51.715Z · comments (58)

On attunement
Joe Carlsmith (joekc) · 2024-03-25T12:47:34.856Z · comments (8)

OpenAI: The Board Expands
Zvi · 2024-03-12T14:00:04.110Z · comments (1)

How to train your own "Sleeper Agents"
evhub · 2024-02-07T00:31:42.653Z · comments (10)

[link] My cover story in Jacobin on AI capitalism and the x-risk debates
garrison · 2024-02-12T23:34:16.526Z · comments (5)

Everything Wrong with Roko's Claims about an Engineered Pandemic
EZ97 · 2024-02-22T15:59:08.439Z · comments (10)

[link] New report: Safety Cases for AI
joshc (joshua-clymer) · 2024-03-20T16:45:27.984Z · comments (13)

[link] Introducing METR's Autonomy Evaluation Resources
Megan Kinniment (megan-kinniment) · 2024-03-15T23:16:59.696Z · comments (0)

[link] LessOnline (May 31—June 2, Berkeley, CA)
Ben Pace (Benito) · 2024-03-26T02:34:00.000Z · comments (23)

Partial value takeover without world takeover
KatjaGrace · 2024-04-05T06:20:03.961Z · comments (23)

Announcing Neuronpedia: Platform for accelerating research into Sparse Autoencoders
Johnny Lin (hijohnnylin) · 2024-03-25T21:17:58.421Z · comments (7)

story-based decision-making
bhauth · 2024-02-07T02:35:27.286Z · comments (11)

SAE reconstruction errors are (empirically) pathological
wesg (wes-gurnee) · 2024-03-29T16:37:29.608Z · comments (15)

On the abolition of man
Joe Carlsmith (joekc) · 2024-01-18T18:17:06.201Z · comments (18)

Key takeaways from our EA and alignment research surveys
Cameron Berg (cameron-berg) · 2024-05-03T18:10:41.416Z · comments (10)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

alexander-gietelink-oldenziel on Alexander Gietelink Oldenziel's Shortform

Could you train an LLM on pre 2014 Go games that could beat AlphaZero?

I rest my case.

alex_altair on Towards a formalization of the agent structure problem

Yep, that paper has been on my list for a while, but I have thus far been unable to penetrate the formalisms that the Causal Incentive Group uses. This paper in particular also seems have some fairly limiting assumptions in the theorem.

gwern on Language Models Model Us

I can be deanonymized in other ways more easily.

I write these as warnings to other people who might think that it is still adequate to simply use a pseudonym and write exclusively in text and not make the obvious OPSEC mistakes, and so you can safely write under multiple names. It is not, because you will have already lost in a few years.

Regrettable as it is, if you wish to write anything online which might invite persecution over the next few years or lead activists to try to dox you - if you are, say, blowing a whistle at a sophisticated megacorp company with the most punitive NDAs & equity policies in the industry - you would be well-advised to start laundering your writings through an LLM yesterday, despite the deplorable effects on style. Truesight will only get keener and flense away more of the security by obscurity we so take for granted, because "attacks only get better".

quetzal_rainbow on robo's Shortform

I am talking about belief state in ~2015, because everyone was already skeptical about policy approach at that time.

seth-herd on What's the risk that AI tortures us all?

The keyword for discussions of this topic is s-risk, for suffering risk.

People don't generally think it's too likely, but as with other topics in alignment, there is reasonable disagreement.

One source of s-risk is curiosity as a core drive in AI. Depending on how you define curiosity, that could lead to AGIs trying to "learn everything about humans", including how we'd react to really awful situations (as well as, presumably, really great situations).

So I think it's possible but unlikely. But you can find more discussion by searching for s-risks

charlie-steiner on What's the risk that AI tortures us all?

20% maybe? I'm feeling optimistic today.

habryka4 on Stephen Fowler's Shortform

I don't think this is true. Nonprofits can aim to amass large amounts of wealth, they just aren't allowed to distribute that wealth to its shareholders. A good chunk of obviously very wealthy and powerful companies are nonprofits.

carl-feynman on Alexander Gietelink Oldenziel's Shortform

When I brought up sample inefficiency, I was supporting Mr. Helm-Burger‘s statement that “there's huge algorithmic gains in …training efficiency (less data, less compute) … waiting to be discovered”. You’re right of course that a reduction in training data will not necessarily reduce the amount of computation needed. But once again, that’s the way to bet.

bogdan-ionut-cirstea on Success without dignity: a nearcasting story of avoiding catastrophe by luck

I think interpretability looks like a particularly promising area for “automated research” - AIs might grind through large numbers of analyses relatively quickly and reach a conclusion about the thought process of some larger, more sophisticated system.

Arguably, this is already starting to happen (very early, with obviously-non-x-risky systems) with interpretability LM agents like in FIND and MAIA.

kaj_sotala on Fund me please - I Work so Hard that my Feet start Bleeding and I Need to Infiltrate University

Yes, I think this was irrational to not clean up the glass. That is the point I want to make. I don't think it is virtual to have failed in this way at all. What I want to say is: "Look I am running into failure modes because I want to work so much."

Ah! I completely missed that, that changes my interpretation significantly. Thank you for the clarification, now I'm less worried for you since it no longer sounds like you have a blindspot around it.

Not running into these failure modes is important, but these failure modes where you are working too much are much easier to handle than the failure mode of "I can't get myself to put in at least 50 hours of work per week consistently."
While I do think that it is true, I am probably very bad in general at optimizing for myself to be happy. But the thing is while I was working so hard during AISC I was most of the time very happy. The same when I made these games. Most of the time I did these things because I deeply wanted to.

It sounds right that these failure modes are easier to handle than the failure mode of not being able to do much work.

Though working too much can lead to the failure mode of "I can't get myself put in work consistently". I'd be cautious in that it's possible to feel like you really enjoy your work... and then burn out anyway! I've heard several people report this happening to them. The way I model that is something like... there are some parts of the person [? · GW] that are obsessed with the work, and become really happy about being able to completely focus on the obsession. But meanwhile, that single-minded focus can lead to the person's other needs not being met, and eventually those unmet needs add up and cause a collapse.

I don't know how much you need to be worried about that, but it's at least good to be aware of.