LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

What’s the short timeline plan?
Marius Hobbhahn (marius-hobbhahn) · 2025-01-02T14:59:20.026Z · comments (49)

The Case Against AI Control Research
johnswentworth · 2025-01-21T16:03:10.143Z · comments (80)

[link] The Gentle Romance
Richard_Ngo (ricraz) · 2025-01-19T18:29:18.469Z · comments (46)

“Sharp Left Turn” discourse: An opinionated review
Steven Byrnes (steve2152) · 2025-01-28T18:47:04.395Z · comments (26)

Mechanisms too simple for humans to design
Malmesbury (Elmer of Malmesbury) · 2025-01-22T16:54:37.601Z · comments (45)

What Is The Alignment Problem?
johnswentworth · 2025-01-16T01:20:16.826Z · comments (50)

Instrumental Goals Are A Different And Friendlier Kind Of Thing Than Terminal Goals
johnswentworth · 2025-01-24T20:20:28.881Z · comments (61)

How will we update about scheming?
ryan_greenblatt · 2025-01-06T20:21:52.281Z · comments (20)

[link] Gradual Disempowerment: Systemic Existential Risks from Incremental AI Development
Jan_Kulveit · 2025-01-30T17:03:45.545Z · comments (52)

Maximizing Communication, not Traffic
jefftk (jkaufman) · 2025-01-05T13:00:02.280Z · comments (10)

OpenAI #10: Reflections
Zvi · 2025-01-07T17:00:07.348Z · comments (7)

[link] Quotes from the Stargate press conference
Nikola Jurkovic (nikolaisalreadytaken) · 2025-01-22T00:50:14.793Z · comments (7)

Don’t ignore bad vibes you get from people
Kaj_Sotala · 2025-01-18T09:20:17.397Z · comments (50)

Capital Ownership Will Not Prevent Human Disempowerment
beren · 2025-01-05T06:00:23.095Z · comments (18)

AI companies are unlikely to make high-assurance safety cases if timelines are short
ryan_greenblatt · 2025-01-23T18:41:40.546Z · comments (5)

Activation space interpretability may be doomed
bilalchughtai (beelal) · 2025-01-08T12:49:38.421Z · comments (32)

Applying traditional economic thinking to AGI: a trilemma
Steven Byrnes (steve2152) · 2025-01-13T01:23:00.397Z · comments (32)

Planning for Extreme AI Risks
joshc (joshua-clymer) · 2025-01-29T18:33:14.844Z · comments (4)

What Indicators Should We Watch to Disambiguate AGI Timelines?
snewman · 2025-01-06T19:57:43.398Z · comments (57)

[Fiction] [Comic] Effective Altruism and Rationality meet at a Secular Solstice afterparty
tandem · 2025-01-07T19:11:21.238Z · comments (5)

Ten people on the inside
Buck · 2025-01-28T16:41:22.990Z · comments (28)

Anomalous Tokens in DeepSeek-V3 and r1
henry (henry-bass) · 2025-01-25T22:55:41.232Z · comments (2)

[link] Training on Documents About Reward Hacking Induces Reward Hacking
evhub · 2025-01-21T21:32:24.691Z · comments (14)

Human takeover might be worse than AI takeover
Tom Davidson (tom-davidson-1) · 2025-01-10T16:53:27.043Z · comments (54)

Tell me about yourself: LLMs are aware of their learned behaviors
Martín Soto (martinsq) · 2025-01-22T00:47:15.023Z · comments (5)

Building AI Research Fleets
Ben Goldhaber (bgold) · 2025-01-12T18:23:09.682Z · comments (11)

[link] Parkinson's Law and the Ideology of Statistics
Benquo · 2025-01-04T15:49:21.247Z · comments (7)

[link] The Intelligence Curse
lukedrago · 2025-01-03T19:07:43.493Z · comments (26)

2024 in AI predictions
jessicata (jessica.liu.taylor) · 2025-01-01T20:29:49.132Z · comments (3)

The Game Board has been Flipped: Now is a good time to rethink what you’re doing
LintzA (alex-lintz) · 2025-01-28T23:36:18.106Z · comments (30)

[link] Aristocracy and Hostage Capital
Arjun Panickssery (arjun-panickssery) · 2025-01-08T19:38:47.104Z · comments (7)

[link] Attribution-based parameter decomposition
Lucius Bushnaq (Lblack) · 2025-01-25T13:12:11.031Z · comments (21)

My supervillain origin story
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-27T12:20:46.101Z · comments (1)

Fake thinking and real thinking
Joe Carlsmith (joekc) · 2025-01-28T20:05:06.735Z · comments (11)

How do you deal w/ Super Stimuli?
Logan Riggs (elriggs) · 2025-01-14T15:14:51.552Z · comments (25)

Comment on "Death and the Gorgon"
Zack_M_Davis · 2025-01-01T05:47:30.730Z · comments (33)

The purposeful drunkard
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-12T12:27:51.952Z · comments (13)

Reasons for and against working on technical AI safety at a frontier AI lab
bilalchughtai (beelal) · 2025-01-05T14:49:53.529Z · comments (12)

We probably won't just play status games with each other after AGI
Matthew Barnett (matthew-barnett) · 2025-01-15T04:56:38.330Z · comments (21)

[link] On Eating the Sun
jessicata (jessica.liu.taylor) · 2025-01-08T04:57:20.457Z · comments (96)

The subset parity learning problem: much more than you wanted to know
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-03T09:13:59.245Z · comments (18)

The Rising Sea
Jesse Hoogland (jhoogland) · 2025-01-25T20:48:52.971Z · comments (2)

Introducing Squiggle AI
ozziegooen · 2025-01-03T17:53:42.915Z · comments (15)

Six Thoughts on AI Safety
boazbarak · 2025-01-24T22:20:50.768Z · comments (55)

Thoughts on the conservative assumptions in AI control
Buck · 2025-01-17T19:23:38.575Z · comments (5)

Tips On Empirical Research Slides
James Chua (james-chua) · 2025-01-08T05:06:44.942Z · comments (4)

Implications of the inference scaling paradigm for AI safety
Ryan Kidd (ryankidd44) · 2025-01-14T02:14:53.562Z · comments (70)

Tips and Code for Empirical Research Workflows
John Hughes (john-hughes) · 2025-01-20T22:31:51.498Z · comments (14)

[link] Five Recent AI Tutoring Studies
Arjun Panickssery (arjun-panickssery) · 2025-01-19T03:53:47.714Z · comments (0)

Agent Foundations 2025 at CMU
Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2025-01-19T23:48:22.569Z · comments (10)

next page (older posts) →

Archive

2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
- January
- February
- March
- April
- May
- June
- July
- August
- September
- October
- November
- December

Recent comments

gunnar_zarncke on eggsyntax's Shortform

I thought it would be good to have some examples where you could have a useful type signature, and I asked ChatGPT. I think these are too wishy-washy, but together with the given explanation, they seem to make sense.

Would you say that this level of "having a type signature in mind" would count?

ChatGPT 4o suggesting examples

1. Prediction vs Explanation

Explanation might be:
Phenomenon → (Theory, Mechanism)
Prediction might be:
Features → Label

These have different type signatures. A model that predicts well might not explain. People often conflate these roles. Type signatures remind us: different input-output relationships.

Moral Judgments vs Policy Proposals

Moral judgment (deontic):
Action → Good/Bad
Policy proposal (instrumental):
(State × Action) → (New State × Externalities)

People often act as if "this action is wrong" implies "we must ban it," but that only follows if the second signature supports the first. You can disagree about outcomes while agreeing on morals, or vice versa.

Interpersonal Feedback

Effective feedback:
(Action × Impact) → Updated Mental Model

People often act as if the type signature is just Action → Judgment. That’s blame, not feedback. This reframing can help structure nonviolent communication.

Creativity vs Optimization

Optimization:
(Goal × Constraints) → Best Action
Creativity:
Void → (Goal × Constraints × Ideas)

The creative act generates the very goal and constraints. Treating creative design like optimization prematurely can collapse valuable search space.

7. Education

Lecture model:
Speaker → (Concepts × StudentMemory)
Constructivist model:
(Student × Task × Environment) → Insight

If the type signature of insight requires active construction, then lecture-only formats may be inadequate. Helps justify pedagogy choices.

Source: https://chatgpt.com/share/67f836e2-1280-8001-a7ad-1ef1e2a7afa7

max-harms on Thoughts on AI 2027

I think if there are 40 IQ humanoid creatures (even having been shaped somewhat by the genes of existing humans) running around in habitats being very excited and happy about what the AIs are doing, this counts as an existentially bad ending comparable to death. I think if everyone's brains are destructively scanned and stored on a hard-drive that eventually decays in the year 1 billion having never been run, this is effectively dead. I could go on if it would be helpful.

Do you think these sorts of scenarios are worth describing as "everyone is effectively dead"?

max-harms on Thoughts on AI 2027

I don't think AI personhood will be mainstream, but I do think there will be a vocal minority. I already know some people like this, and as capabilities progress and things get less controlled by the labs, I do think we'll see this become an important issue.

Want to make a bet? I'll take 1:1 odds that in mid-Sept 2027 if we poll 200 people on whether they think AIs are people, at least 3 of them say "yes, and this is an important issue." (Other proposed options "yes, but not important", "no", and "unsure".) Feel free to name a dollar amount and an arbitrator to use in case of disputes.

jenniferrm on birds and mammals independently evolved intelligence

I came here to say "look at octopods!" but you already have. Yay team! :-)

One of the alignment strategies I have been researching in parallel with many others involves finding examples of human-and-animal benevolence and tracing convergent evolution therein, and proposing that "the shared abstracts here (across these genomes, these brains, these creatures all convergently doing these things)" is probably algorithmically simple, with algorithm-to-reality shims that might also be important, and please study it and lean in the direction of doing "more of that".

There is an octopod cognate of "ocytocin" (the "maternal love and protection hormone"), but from what I can tell they did NOT re-use it in the ways that we did. But also they mostly lay eggs while abandoning the individual babies to their own survival, rather than raising children carefully.

By contrast, birds and mammals share a relatively similar kind of "high parental investment"!

frank-bellamy on Who wants to bet me $25k at 1:7 odds that there won't be an AI market crash in the next year?

Suggesting specific odds without being able to define a threshold seems a bit, um, confused. Being willing to take the word of a stranger on the internet when these quantities of money are at stake seems outright stupid. I'm staying out of this market. I suggest that you withdraw your offer.

knight-lee on Disempowerment spirals as a likely mechanism for existential catastrophe

My very uncertain opinion is that, humanity may be very irrational and a little stupid, but humanity isn't that stupid.

The reason people do not take AI risk and other existential risk seriously is due to the complete lack of direct evidence (despite plenty of indirect evidence) of its presence. It's easy for you to consider it obvious due to the curse of knowledge, but this kind of "reasoning from first principles (that nothing disproves the risk and therefore the risk is likely)," is very hard for normal people to do.

Before the September 11th attacks, people didn't take airport security seriously because they lacked imagination on how things could go wrong. They considered worst case outcomes as speculative fiction, regardless of how logically plausible they were, because "it never happened before."

After the attacks, the government actually overreacted and created a massive amount of surveillance.

Once the threat starts to do real and serious damage against the systems for defending threats, the systems actually do wake up and start fighting in earnest. They are like animals which react when attacked, not trees which can be simply chopped down.

Right now the effort against existential risks is extremely tiny. E.g. AI Safety is only $0.1 to $0.2 billion [? · GW], while the US military budget is $800-$1000 billion, and the world GDP is $100,000 billion ($25,000 billion in the US). It's not just spending which is tiny, but effort in general.

I'm more worried about a very sudden threat which destroys these systems in a single "strike," when the damage done goes from 0% to 100% in one day, rather than gradually passing the point of no return.

But I may be wrong.

Edit: one form of point of no return is if the AI behaves more and more aligned even as it is secretly misaligned (like the AI 2027 story).

danielechlin on Misinformation is the default, and information is the government telling you your tap water is safe to drink

The problem is reception of reliable information not production of reliable information.

I've actually just wondered if you need to move science veracity to some external right leaning institution like betting on scientific markets or voting on replication experiments or something.

knight-lee on A collection of approaches to confronting doom, and my thoughts on them

I agree that it's useful in practice, to anticipate the experiences of the future you which you can actually influence the most. It makes life much more intuitive and simple, and is a practical fundamental assumption to make.

I don't think it is "supported by our experience," since if you experienced becoming someone else you wouldn't actually know it happened, you would think you were them all along.

I admit that although it's a subjective choice, it's useful. It's just that you're allowed to anticipate becoming anyone else when you die or otherwise cease to have influence.

ram-potham on Ram Potham's Shortform

I argue that the optimal ethical stance is to become a rational Bodhisattva: a synthesis of effective altruism, two‑level utilitarianism, and the Bodhisattva ideal.

Effective altruism insists on doing the most good per unit of resource, but can demand extreme sacrifices (e.g., donating almost all disposable income).
Two‑level utilitarianism lets us follow welfare‑promoting rules in daily life and switch to explicit cost‑benefit calculations when rules conflict. Yet it offers little emotional motivation.
The Bodhisattva ideal roots altruism in felt interdependence: the world’s suffering is one’s own. It supplies deep motivation and inner peace, but gives no algorithm for choosing the most beneficial act.

A rational Bodhisattva combines the strengths and cancels the weaknesses:

Motivation: Like a Bodhisattva, they experience others’ suffering as their own, so compassion is effortless and durable.
Method: Using reason and evidence (from effective altruism and two‑level utilitarianism), they pick the action that maximizes overall benefit.
Flexibility: They apply the “middle way,” recognizing that different compassionate choices can be permissible when values collide.

Illustration

Your grandparent needs $50,000 for a life‑saving treatment, but the same money could save ten strangers through a GiveWell charity.

A strict effective altruist/utilitarian would donate to GiveWell.
A purely sentimental agent might fund the treatment.
The rational Bodhisattva weighs both outcomes, also including duties into the calculation, acts from compassion, and accepts the result without regret. In most cases they will choose the option with the greatest net benefit, but they can act otherwise when a compassionate rule or relational duty justifies it.

Thus, the rational Bodhisattva unites rigorous impact with deep inner peace.

frank-bellamy on How Gay is the Vatican?

If we assume that the likelihood of a pregnancy leading to a child who lives long enough to be in our records is independent of the child's order / mother's age, then I agree that that the effects on the average birth order and average family size of our cardinals should in some sense cancel. However, it should still increase the error bars on both numbers and therefor the uncertainty of any conclusion. And I'm not sure I'd expect survival to be independent of order.