LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

How to Make Superbabies
GeneSmith · 2025-02-19T20:39:38.971Z · comments (331)

[link] How AI Takeover Might Happen in 2 Years
joshc (joshua-clymer) · 2025-02-07T17:10:10.530Z · comments (136)

Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs
Jan Betley (jan-betley) · 2025-02-25T17:39:31.059Z · comments (91)

Murder plots are infohazards
Chris Monteiro (chris-topher) · 2025-02-13T19:15:09.749Z · comments (44)

So You Want To Make Marginal Progress...
johnswentworth · 2025-02-07T23:22:19.825Z · comments (42)

Arbital has been imported to LessWrong
RobertM (T3t) · 2025-02-20T00:47:33.983Z · comments (31)

[link] A History of the Future, 2025-2040
L Rudolf L (LRudL) · 2025-02-17T12:03:58.355Z · comments (41)

[link] Why Did Elon Musk Just Offer to Buy Control of OpenAI for $100 Billion?
garrison · 2025-02-11T00:20:41.421Z · comments (8)

Eliezer's Lost Alignment Articles / The Arbital Sequence
Ruby · 2025-02-20T00:48:10.338Z · comments (9)

[question] Have LLMs Generated Novel Insights?
abramdemski · 2025-02-23T18:22:12.763Z · answers+comments (36)

It's been ten years. I propose HPMOR Anniversary Parties.
Screwtape · 2025-02-16T01:43:14.586Z · comments (3)

The Sorry State of AI X-Risk Advocacy, and Thoughts on Doing Better
Thane Ruthenis · 2025-02-21T20:15:11.545Z · comments (51)

Levels of Friction
Zvi · 2025-02-10T13:10:07.224Z · comments (7)

[link] Power Lies Trembling: a three-book review
Richard_Ngo (ricraz) · 2025-02-22T22:57:59.720Z · comments (7)

[link] A computational no-coincidence principle
Eric Neyman (UnexpectedValues) · 2025-02-14T21:39:39.277Z · comments (38)

The Paris AI Anti-Safety Summit
Zvi · 2025-02-12T14:00:07.383Z · comments (21)

Gradual Disempowerment, Shell Games and Flinches
Jan_Kulveit · 2025-02-02T14:47:53.404Z · comments (36)

[link] Research directions Open Phil wants to fund in technical AI safety
jake_mendel · 2025-02-08T01:40:00.968Z · comments (21)

The News is Never Neglected
lsusr · 2025-02-11T14:59:48.323Z · comments (18)

[link] Open Philanthropy Technical AI Safety RFP - $40M Available Across 21 Research Areas
jake_mendel · 2025-02-06T18:58:53.076Z · comments (0)

You can just wear a suit
lsusr · 2025-02-26T14:57:57.260Z · comments (48)

Two hemispheres - I do not think it means what you think it means
Viliam · 2025-02-09T15:33:53.391Z · comments (19)

Reviewing LessWrong: Screwtape's Basic Answer
Screwtape · 2025-02-05T04:30:34.347Z · comments (18)

Judgements: Merging Prediction & Evidence
abramdemski · 2025-02-23T19:35:51.488Z · comments (5)

[link] Detecting Strategic Deception Using Linear Probes
Nicholas Goldowsky-Dill (nicholas-goldowsky-dill) · 2025-02-06T15:46:53.024Z · comments (9)

AGI Safety & Alignment @ Google DeepMind is hiring
Rohin Shah (rohinmshah) · 2025-02-17T21:11:18.970Z · comments (19)

My model of what is going on with LLMs
Cole Wyeth (Amyr) · 2025-02-13T03:43:29.447Z · comments (49)

[link] A short course on AGI safety from the GDM Alignment team
Vika · 2025-02-14T15:43:50.903Z · comments (1)

C'mon guys, Deliberate Practice is Real
Raemon · 2025-02-05T22:33:59.069Z · comments (25)

Timaeus in 2024
Jesse Hoogland (jhoogland) · 2025-02-20T23:54:56.939Z · comments (1)

Dear AGI,
Nathan Young · 2025-02-18T10:48:15.030Z · comments (11)

[link] Wired on: "DOGE personnel with admin access to Federal Payment System"
Raemon · 2025-02-05T21:32:11.205Z · comments (45)

[link] Anthropic releases Claude 3.7 Sonnet with extended thinking mode
LawrenceC (LawChan) · 2025-02-24T19:32:43.947Z · comments (8)

Voting Results for the 2023 Review
Raemon · 2025-02-06T08:00:37.461Z · comments (3)

The Risk of Gradual Disempowerment from AI
Zvi · 2025-02-05T22:10:06.979Z · comments (15)

Ambiguous out-of-distribution generalization on an algorithmic task
Wilson Wu (wilson-wu) · 2025-02-13T18:24:36.160Z · comments (6)

The Mask Comes Off: A Trio of Tales
Zvi · 2025-02-14T15:30:15.372Z · comments (1)

How might we safely pass the buck to AI?
joshc (joshua-clymer) · 2025-02-19T17:48:32.249Z · comments (58)

Microplastics: Much Less Than You Wanted To Know
jenn (pixx) · 2025-02-15T19:08:14.561Z · comments (8)

[PAPER] Jacobian Sparse Autoencoders: Sparsify Computations, Not Just Activations
Lucy Farnik (lucy.fa) · 2025-02-26T12:50:04.204Z · comments (8)

[link] OpenAI releases deep research agent
Seth Herd · 2025-02-03T12:48:44.925Z · comments (21)

Pick two: concise, comprehensive, or clear rules
Screwtape · 2025-02-03T06:39:05.815Z · comments (27)

Evaluating “What 2026 Looks Like” So Far
Jonny Spicer (jonnyspicer) · 2025-02-24T18:55:27.373Z · comments (4)

Anti-Slop Interventions?
abramdemski · 2025-02-04T19:50:29.127Z · comments (33)

The Simplest Good
Jesse Hoogland (jhoogland) · 2025-02-02T19:51:14.155Z · comments (6)

MATS Applications + Research Directions I'm Currently Excited About
Neel Nanda (neel-nanda-1) · 2025-02-06T11:03:40.093Z · comments (7)

Osaka
lsusr · 2025-02-26T13:50:24.102Z · comments (11)

Language Models Use Trigonometry to Do Addition
Subhash Kantamneni (subhashk) · 2025-02-05T13:50:08.243Z · comments (1)

[link] Thermodynamic entropy = Kolmogorov complexity
Aram Ebtekar (EbTech) · 2025-02-17T05:56:06.960Z · comments (12)

Alignment can be the ‘clean energy’ of AI
Cameron Berg (cameron-berg) · 2025-02-22T00:08:30.391Z · comments (8)

next page (older posts) →

Archive

2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
- January
- February
- March
- April
- May
- June
- July
- August
- September
- October
- November
- December

Recent comments

gunnar_zarncke on eggsyntax's Shortform

I thought it would be good to have some examples where you could have a useful type signature, and I asked ChatGPT. I think these are too wishy-washy, but together with the given explanation, they seem to make sense.

Would you say that this level of "having a type signature in mind" would count?

ChatGPT 4o suggesting examples

1. Prediction vs Explanation

Explanation might be:
Phenomenon → (Theory, Mechanism)
Prediction might be:
Features → Label

These have different type signatures. A model that predicts well might not explain. People often conflate these roles. Type signatures remind us: different input-output relationships.

Moral Judgments vs Policy Proposals

Moral judgment (deontic):
Action → Good/Bad
Policy proposal (instrumental):
(State × Action) → (New State × Externalities)

People often act as if "this action is wrong" implies "we must ban it," but that only follows if the second signature supports the first. You can disagree about outcomes while agreeing on morals, or vice versa.

Interpersonal Feedback

Effective feedback:
(Action × Impact) → Updated Mental Model

People often act as if the type signature is just Action → Judgment. That’s blame, not feedback. This reframing can help structure nonviolent communication.

Creativity vs Optimization

Optimization:
(Goal × Constraints) → Best Action
Creativity:
Void → (Goal × Constraints × Ideas)

The creative act generates the very goal and constraints. Treating creative design like optimization prematurely can collapse valuable search space.

7. Education

Lecture model:
Speaker → (Concepts × StudentMemory)
Constructivist model:
(Student × Task × Environment) → Insight

If the type signature of insight requires active construction, then lecture-only formats may be inadequate. Helps justify pedagogy choices.

Source: https://chatgpt.com/share/67f836e2-1280-8001-a7ad-1ef1e2a7afa7

max-harms on Thoughts on AI 2027

I think if there are 40 IQ humanoid creatures (even having been shaped somewhat by the genes of existing humans) running around in habitats being very excited and happy about what the AIs are doing, this counts as an existentially bad ending comparable to death. I think if everyone's brains are destructively scanned and stored on a hard-drive that eventually decays in the year 1 billion having never been run, this is effectively dead. I could go on if it would be helpful.

Do you think these sorts of scenarios are worth describing as "everyone is effectively dead"?

max-harms on Thoughts on AI 2027

I don't think AI personhood will be mainstream, but I do think there will be a vocal minority. I already know some people like this, and as capabilities progress and things get less controlled by the labs, I do think we'll see this become an important issue.

Want to make a bet? I'll take 1:1 odds that in mid-Sept 2027 if we poll 200 people on whether they think AIs are people, at least 3 of them say "yes, and this is an important issue." (Other proposed options "yes, but not important", "no", and "unsure".) Feel free to name a dollar amount and an arbitrator to use in case of disputes.

jenniferrm on birds and mammals independently evolved intelligence

I came here to say "look at octopods!" but you already have. Yay team! :-)

One of the alignment strategies I have been researching in parallel with many others involves finding examples of human-and-animal benevolence and tracing convergent evolution therein, and proposing that "the shared abstracts here (across these genomes, these brains, these creatures all convergently doing these things)" is probably algorithmically simple, with algorithm-to-reality shims that might also be important, and please study it and lean in the direction of doing "more of that".

There is an octopod cognate of "ocytocin" (the "maternal love and protection hormone"), but from what I can tell they did NOT re-use it in the ways that we did. But also they mostly lay eggs while abandoning the individual babies to their own survival, rather than raising children carefully.

By contrast, birds and mammals share a relatively similar kind of "high parental investment"!

frank-bellamy on Who wants to bet me $25k at 1:7 odds that there won't be an AI market crash in the next year?

Suggesting specific odds without being able to define a threshold seems a bit, um, confused. Being willing to take the word of a stranger on the internet when these quantities of money are at stake seems outright stupid. I'm staying out of this market. I suggest that you withdraw your offer.

knight-lee on Disempowerment spirals as a likely mechanism for existential catastrophe

My very uncertain opinion is that, humanity may be very irrational and a little stupid, but humanity isn't that stupid.

The reason people do not take AI risk and other existential risk seriously is due to the complete lack of direct evidence (despite plenty of indirect evidence) of its presence. It's easy for you to consider it obvious due to the curse of knowledge, but this kind of "reasoning from first principles (that nothing disproves the risk and therefore the risk is likely)," is very hard for normal people to do.

Before the September 11th attacks, people didn't take airport security seriously because they lacked imagination on how things could go wrong. They considered worst case outcomes as speculative fiction, regardless of how logically plausible they were, because "it never happened before."

After the attacks, the government actually overreacted and created a massive amount of surveillance.

Once the threat starts to do real and serious damage against the systems for defending threats, the systems actually do wake up and start fighting in earnest. They are like animals which react when attacked, not trees which can be simply chopped down.

Right now the effort against existential risks is extremely tiny. E.g. AI Safety is only $0.1 to $0.2 billion [? · GW], while the US military budget is $800-$1000 billion, and the world GDP is $100,000 billion ($25,000 billion in the US). It's not just spending which is tiny, but effort in general.

I'm more worried about a very sudden threat which destroys these systems in a single "strike," when the damage done goes from 0% to 100% in one day, rather than gradually passing the point of no return.

But I may be wrong.

Edit: one form of point of no return is if the AI behaves more and more aligned even as it is secretly misaligned (like the AI 2027 story).

danielechlin on Misinformation is the default, and information is the government telling you your tap water is safe to drink

The problem is reception of reliable information not production of reliable information.

I've actually just wondered if you need to move science veracity to some external right leaning institution like betting on scientific markets or voting on replication experiments or something.

knight-lee on A collection of approaches to confronting doom, and my thoughts on them

I agree that it's useful in practice, to anticipate the experiences of the future you which you can actually influence the most. It makes life much more intuitive and simple, and is a practical fundamental assumption to make.

I don't think it is "supported by our experience," since if you experienced becoming someone else you wouldn't actually know it happened, you would think you were them all along.

I admit that although it's a subjective choice, it's useful. It's just that you're allowed to anticipate becoming anyone else when you die or otherwise cease to have influence.

ram-potham on Ram Potham's Shortform

I argue that the optimal ethical stance is to become a rational Bodhisattva: a synthesis of effective altruism, two‑level utilitarianism, and the Bodhisattva ideal.

Effective altruism insists on doing the most good per unit of resource, but can demand extreme sacrifices (e.g., donating almost all disposable income).
Two‑level utilitarianism lets us follow welfare‑promoting rules in daily life and switch to explicit cost‑benefit calculations when rules conflict. Yet it offers little emotional motivation.
The Bodhisattva ideal roots altruism in felt interdependence: the world’s suffering is one’s own. It supplies deep motivation and inner peace, but gives no algorithm for choosing the most beneficial act.

A rational Bodhisattva combines the strengths and cancels the weaknesses:

Motivation: Like a Bodhisattva, they experience others’ suffering as their own, so compassion is effortless and durable.
Method: Using reason and evidence (from effective altruism and two‑level utilitarianism), they pick the action that maximizes overall benefit.
Flexibility: They apply the “middle way,” recognizing that different compassionate choices can be permissible when values collide.

Illustration

Your grandparent needs $50,000 for a life‑saving treatment, but the same money could save ten strangers through a GiveWell charity.

A strict effective altruist/utilitarian would donate to GiveWell.
A purely sentimental agent might fund the treatment.
The rational Bodhisattva weighs both outcomes, also including duties into the calculation, acts from compassion, and accepts the result without regret. In most cases they will choose the option with the greatest net benefit, but they can act otherwise when a compassionate rule or relational duty justifies it.

Thus, the rational Bodhisattva unites rigorous impact with deep inner peace.

frank-bellamy on How Gay is the Vatican?

If we assume that the likelihood of a pregnancy leading to a child who lives long enough to be in our records is independent of the child's order / mother's age, then I agree that that the effects on the average birth order and average family size of our cardinals should in some sense cancel. However, it should still increase the error bars on both numbers and therefor the uncertainty of any conclusion. And I'm not sure I'd expect survival to be independent of order.