LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

AI companies aren't really using external evaluators
Zach Stein-Perlman · 2024-05-24T16:01:21.184Z · comments (15)

Laziness death spirals
PatrickDFarley · 2024-09-19T15:58:30.252Z · comments (34)

the case for CoT unfaithfulness is overstated
nostalgebraist · 2024-09-29T22:07:54.053Z · comments (37)

Refusal in LLMs is mediated by a single direction
Andy Arditi (andy-arditi) · 2024-04-27T11:13:06.235Z · comments (93)

[link] I got dysentery so you don’t have to
eukaryote · 2024-10-22T04:55:58.422Z · comments (3)

Believing In
AnnaSalamon · 2024-02-08T07:06:13.072Z · comments (51)

[link] Introducing AI Lab Watch
Zach Stein-Perlman · 2024-04-30T17:00:12.652Z · comments (30)

What are the results of more parental supervision and less outdoor play?
juliawise · 2023-11-25T12:52:29.986Z · comments (31)

AGI Safety and Alignment at Google DeepMind: A Summary of Recent Work
Rohin Shah (rohinmshah) · 2024-08-20T16:22:45.888Z · comments (33)

Modern Transformers are AGI, and Human-Level
abramdemski · 2024-03-26T17:46:19.373Z · comments (88)

MIRI 2024 Mission and Strategy Update
Malo (malo) · 2024-01-05T00:20:54.169Z · comments (44)

SAE feature geometry is outside the superposition hypothesis
jake_mendel · 2024-06-24T16:07:14.604Z · comments (17)

Brute Force Manufactured Consensus is Hiding the Crime of the Century
Roko · 2024-02-03T20:36:59.806Z · comments (156)

LLM Generality is a Timeline Crux
eggsyntax · 2024-06-24T12:52:07.704Z · comments (116)

CFAR Takeaways: Andrew Critch
Raemon · 2024-02-14T01:37:03.931Z · comments (62)

The ‘strong’ feature hypothesis could be wrong
lewis smith (lsgos) · 2024-08-02T14:33:58.898Z · comments (17)

Superbabies: Putting The Pieces Together
sarahconstantin · 2024-07-11T20:40:05.036Z · comments (37)

ChatGPT can learn indirect control
Raymond D · 2024-03-21T21:11:06.649Z · comments (27)

AI Control: Improving Safety Despite Intentional Subversion
Buck · 2023-12-13T15:51:35.982Z · comments (7)

"Slow" takeoff is a terrible term for "maybe even faster takeoff, actually"
Raemon · 2024-09-28T23:38:25.512Z · comments (70)

[link] "How could I have thought that faster?"
mesaoptimizer · 2024-03-11T10:56:17.884Z · comments (32)

Towards more cooperative AI safety strategies
Richard_Ngo (ricraz) · 2024-07-16T04:36:29.191Z · comments (130)

Ability to solve long-horizon tasks correlates with wanting things in the behaviorist sense
So8res · 2023-11-24T17:37:43.020Z · comments (83)

OpenAI: Fallout
Zvi · 2024-05-28T13:20:04.325Z · comments (25)

Mechanistically Eliciting Latent Behaviors in Language Models
Andrew Mack (andrew-mack) · 2024-04-30T18:51:13.493Z · comments (40)

[link] Jaan Tallinn's 2023 Philanthropy Overview
jaan · 2024-05-20T12:11:39.416Z · comments (5)

Pay Risk Evaluators in Cash, Not Equity
Adam Scholl (adam_scholl) · 2024-09-07T02:37:59.659Z · comments (19)

Maybe Anthropic's Long-Term Benefit Trust is powerless
Zach Stein-Perlman · 2024-05-27T13:00:47.991Z · comments (21)

[link] Sam Altman’s Chip Ambitions Undercut OpenAI’s Safety Strategy
garrison · 2024-02-10T19:52:55.191Z · comments (52)

Funny Anecdote of Eliezer From His Sister
Noah Birnbaum (daniel-birnbaum) · 2024-04-22T22:05:31.886Z · comments (6)

This might be the last AI Safety Camp
Remmelt (remmelt-ellen) · 2024-01-24T09:33:29.438Z · comments (34)

Toward A Mathematical Framework for Computation in Superposition
Dmitry Vaintrob (dmitry-vaintrob) · 2024-01-18T21:06:57.040Z · comments (17)

Thoughts on “AI is easy to control” by Pope & Belrose
Steven Byrnes (steve2152) · 2023-12-01T17:30:52.720Z · comments (56)

The impossible problem of due process
mingyuan · 2024-01-16T05:18:33.415Z · comments (64)

[question] Examples of Highly Counterfactual Discoveries?
johnswentworth · 2024-04-23T22:19:19.399Z · answers+comments (100)

Optimistic Assumptions, Longterm Planning, and "Cope"
Raemon · 2024-07-17T22:14:24.090Z · comments (46)

Propaganda or Science: A Look at Open Source AI and Bioterrorism Risk
1a3orn · 2023-11-02T18:20:29.569Z · comments (79)

Response to Aschenbrenner's "Situational Awareness"
Rob Bensinger (RobbBB) · 2024-06-06T22:57:11.737Z · comments (27)

The Sun is big, but superintelligences will not spare Earth a little sunlight
Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2024-09-23T03:39:16.243Z · comments (139)

Self-Other Overlap: A Neglected Approach to AI Alignment
Marc Carauleanu (Marc-Everin Carauleanu) · 2024-07-30T16:22:29.561Z · comments (43)

[link] Sam Altman fired from OpenAI
LawrenceC (LawChan) · 2023-11-17T20:42:30.759Z · comments (75)

How I Learned To Stop Trusting Prediction Markets and Love the Arbitrage
orthonormal · 2024-08-06T02:32:41.364Z · comments (26)

What's Going on With OpenAI's Messaging?
ozziegooen · 2024-05-21T02:22:04.171Z · comments (13)

My AI Model Delta Compared To Christiano
johnswentworth · 2024-06-12T18:19:44.768Z · comments (73)

Two easy things that maybe Just Work to improve AI discourse
jacobjacob · 2024-06-08T15:51:18.078Z · comments (35)

On Not Pulling The Ladder Up Behind You
Screwtape · 2024-04-26T21:58:29.455Z · comments (21)

My Interview With Cade Metz on His Reporting About Slate Star Codex
Zack_M_Davis · 2024-03-26T17:18:05.114Z · comments (187)

OMMC Announces RIP
Adam Scholl (adam_scholl) · 2024-04-01T23:20:00.433Z · comments (5)

A basic systems architecture for AI agents that do autonomous research
Buck · 2024-09-23T13:58:27.185Z · comments (12)

Thinking By The Clock
Screwtape · 2023-11-08T07:40:59.936Z · comments (27)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

denkenberger on Is the Power Grid Sustainable?

It would be helpful to see a calculation with your rates, the installed cost of batteries, cost of the space taken up, losses in the batteries, any cost of maintenance, lifetime of batteries, and cost (or benefit) of disposal.

nathan-helm-burger on johnswentworth's Shortform

I like it. I do work that it, and The Narrow Path, are both missing how hard it will be to govern and restrict AI.

aprilsr on Alexander Gietelink Oldenziel's Shortform

I think it's pretty good to keep it in mind that heliocentrism is literally speaking just a change in what coordinate system you use, but it is legitimately a much more convenient coordinate system.

raemon on JargonBot Beta Test

Mmm, that does seem reasonable.

johnswentworth on Trading Candy

To be clear, I don't really think of myself as libertarian these days, though I guess it'd probably look that way if you just gave me a political alignment quiz.

To answer your question: I'm two years older than my brother, who is two years older than my sister.

zy on Trading Candy

I think that is probably not a good reason to be libertarian in my opinion? Could you also share maybe how much older were your than you siblings? If you are not that far apart, you and your siblings came from the same starting line, distributing is not going to happen in real life economically nor socially even if not libertarian (in real life, where we need equity is when the starting line is not the same and is not able to be changed by choice. A more similar analogy might be some kids are born with large ears, and large ears are favored by the society, and the large eared kids always get more candy). If you are ages apart with you being a lot older, it may make some limited sense to for your parents to re-distribute.

quila on JargonBot Beta Test

Let us know what you think!

the grey text feels disruptive to normal reading flow but idk why green link text wouldn't also be, maybe i'm just not used to it. e.g., in this post's "Curating technical posts" where 'Curating' is grey, my mind sees "<Curating | distinct term> technical posts" instead of [normal meaning inference not overfocused on individual words]

Is this useful, as a reader?

if the authors make sure they agree with all the definitions they allow into the glossary, yes. author-written definitions would be even more useful because how things are worded can implicitly convey things like, the underlying intuition, ontology, or related views they may be using wording to rule in or out.

Whenever an author with 100+ karma saves a draft of a post, our database queries a language model to:

i would prefer this be optional too, for drafts which are meant to be private (e.g. shared with a few other users, e.g. may contain possible capability-infohazards), where the author doesn't trust LM companies

johnswentworth on Ryan Kidd's Shortform

Y'know @Ryan [LW · GW], MATS should try to hire the PIBBSS folks to help with recruiting. IMO they tend to have the strongest participants of the programs on this chart which I'm familiar with (though high variance).

johnswentworth on Ryan Kidd's Shortform

... WOW that is not an efficient market.

johnswentworth on Trading Candy

My two siblings and I always used to trade candy after Halloween and Easter. We'd each lay out our candy on table, like a little booth, and then haggle a lot.

My memories are fuzzy, but apparently the way this most often went was that I tended to prioritize quantity moreso than my siblings, wanting to make sure that I had a stock of good candy which would last a while. So naturally, a few days later, my siblings had consumed their tastiest treats and complained that I had all the best candy. My mother then stepped in and redistributed the candy.

And that was how I became a libertarian at a very young age.

Only years later did we find out that my mother would also steal the best-looking candies after we went to bed and try to get us to blame each other, which... is altogether too on-the-nose for this particular analogy.