LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

A Bear Case: My Predictions Regarding AI Progress
Thane Ruthenis · 2025-03-05T16:41:37.639Z · comments (150)

[link] Will Jesus Christ return in an election year?
Eric Neyman (UnexpectedValues) · 2025-03-24T16:50:53.019Z · comments (44)

Policy for LLM Writing on LessWrong
jimrandomh · 2025-03-24T21:41:30.965Z · comments (59)

[link] Recent AI model progress feels mostly like bullshit
lc · 2025-03-24T19:28:43.450Z · comments (76)

[link] Good Research Takes are Not Sufficient for Good Strategic Takes
Neel Nanda (neel-nanda-1) · 2025-03-22T10:13:38.257Z · comments (27)

[link] METR: Measuring AI Ability to Complete Long Tasks
Zach Stein-Perlman · 2025-03-19T16:00:54.874Z · comments (92)

[link] Tracing the Thoughts of a Large Language Model
Adam Jermyn (adam-jermyn) · 2025-03-27T17:20:02.162Z · comments (22)

[link] Trojan Sky
Richard_Ngo (ricraz) · 2025-03-11T03:14:00.681Z · comments (39)

Why White-Box Redteaming Makes Me Feel Weird
Zygi Straznickas (nonagon) · 2025-03-16T18:54:48.078Z · comments (34)

Intention to Treat
Alicorn · 2025-03-20T20:01:19.456Z · comments (4)

[link] OpenAI: Detecting misbehavior in frontier reasoning models
Daniel Kokotajlo (daniel-kokotajlo) · 2025-03-11T02:17:21.026Z · comments (25)

Claude Sonnet 3.7 (often) knows when it’s in alignment evaluations
Nicholas Goldowsky-Dill (nicholas-goldowsky-dill) · 2025-03-17T19:11:00.813Z · comments (7)

So how well is Claude playing Pokémon?
Julian Bradshaw · 2025-03-07T05:54:45.357Z · comments (74)

[link] On the Rationality of Deterring ASI
Dan H (dan-hendrycks) · 2025-03-05T16:11:37.855Z · comments (34)

I make several million dollars per year and have hundreds of thousands of followers—what is the straightest line path to utilizing these resources to reduce existential-level AI threats?
shrimpy · 2025-03-16T16:52:42.177Z · comments (25)

Reducing LLM deception at scale with self-other overlap fine-tuning
Marc Carauleanu (Marc-Everin Carauleanu) · 2025-03-13T19:09:43.620Z · comments (40)

Statistical Challenges with Making Super IQ babies
Jan Christian Refsgaard (jan-christian-refsgaard) · 2025-03-02T20:26:22.103Z · comments (26)

[link] Self-fulfilling misalignment data might be poisoning our AI models
TurnTrout · 2025-03-02T19:51:14.775Z · comments (25)

[link] Conceptual Rounding Errors
Jan_Kulveit · 2025-03-26T19:00:31.549Z · comments (15)

Methods for strong human germline engineering
TsviBT · 2025-03-03T08:13:49.414Z · comments (28)

The Most Forbidden Technique
Zvi · 2025-03-12T13:20:04.732Z · comments (9)

Auditing language models for hidden objectives
Sam Marks (samuel-marks) · 2025-03-13T19:18:32.638Z · comments (15)

[link] The Hidden Cost of Our Lies to AI
Nicholas Andresen (nicholas-andresen) · 2025-03-06T05:03:47.239Z · comments (17)

The Milton Friedman Model of Policy Change
JohnofCharleston · 2025-03-04T00:38:56.778Z · comments (17)

[question] How Much Are LLMs Actually Boosting Real-World Programmer Productivity?
Thane Ruthenis · 2025-03-04T16:23:39.296Z · answers+comments (51)

Anthropic, and taking "technical philosophy" more seriously
Raemon · 2025-03-13T01:48:54.184Z · comments (29)

The Pando Problem: Rethinking AI Individuality
Jan_Kulveit · 2025-03-28T21:03:28.374Z · comments (13)

[question] when will LLMs become human-level bloggers?
nostalgebraist · 2025-03-09T21:10:08.837Z · answers+comments (34)

[link] Do reasoning models use their scratchpad like we do? Evidence from distilling paraphrases
Fabien Roger (Fabien) · 2025-03-11T11:52:38.994Z · comments (19)

Do models say what they learn?
Andy Arditi (andy-arditi) · 2025-03-22T15:19:18.800Z · comments (12)

How I've run major projects
benkuhn · 2025-03-16T18:40:04.223Z · comments (10)

2024 Unofficial LessWrong Survey Results
Screwtape · 2025-03-14T22:29:00.045Z · comments (28)

[link] Explaining British Naval Dominance During the Age of Sail
Arjun Panickssery (arjun-panickssery) · 2025-03-28T05:47:28.561Z · comments (5)

[link] What the Headlines Miss About the Latest Decision in the Musk vs. OpenAI Lawsuit
garrison · 2025-03-06T19:49:02.145Z · comments (0)

AI Control May Increase Existential Risk
Jan_Kulveit · 2025-03-11T14:30:05.972Z · comments (13)

Third-wave AI safety needs sociopolitical thinking
Richard_Ngo (ricraz) · 2025-03-27T00:55:30.548Z · comments (23)

How I talk to those above me
Maxwell Peterson (maxwell-peterson) · 2025-03-30T06:54:59.869Z · comments (13)

[link] Towards a scale-free theory of intelligent agency
Richard_Ngo (ricraz) · 2025-03-21T01:39:42.251Z · comments (21)

[link] Elite Coordination via the Consensus of Power
Richard_Ngo (ricraz) · 2025-03-19T06:56:44.825Z · comments (15)

We should start looking for scheming "in the wild"
Marius Hobbhahn (marius-hobbhahn) · 2025-03-06T13:49:39.739Z · comments (4)

How I force LLMs to generate correct code
claudio · 2025-03-21T14:40:19.211Z · comments (7)

Vacuum Decay: Expert Survey Results
JessRiedel · 2025-03-13T18:31:17.434Z · comments (25)

What goals will AIs have? A list of hypotheses
Daniel Kokotajlo (daniel-kokotajlo) · 2025-03-03T20:08:31.539Z · comments (19)

[link] Negative Results for SAEs On Downstream Tasks and Deprioritising SAE Research (GDM Mech Interp Team Progress Update #2)
lewis smith (lsgos) · 2025-03-26T19:07:48.710Z · comments (12)

OpenAI #11: America Action Plan
Zvi · 2025-03-18T12:50:03.880Z · comments (3)

Mistral Large 2 (123B) exhibits alignment faking
Marc Carauleanu (Marc-Everin Carauleanu) · 2025-03-27T15:39:02.176Z · comments (4)

Go home GPT-4o, you’re drunk: emergent misalignment as lowered inhibitions
Stuart_Armstrong · 2025-03-18T14:48:54.762Z · comments (12)

Elon Musk May Be Transitioning to Bipolar Type I
Cyborg25 · 2025-03-11T17:45:06.599Z · comments (22)

[link] Preparing for the Intelligence Explosion
fin · 2025-03-11T15:38:29.524Z · comments (17)

[link] Eukaryote Skips Town - Why I'm leaving DC
eukaryote · 2025-03-26T17:16:29.663Z · comments (1)

next page (older posts) →

Archive

2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
- January
- February
- March
- April
- May
- June
- July
- August
- September
- October
- November
- December

Recent comments

gunnar_zarncke on eggsyntax's Shortform

I thought it would be good to have some examples where you could have a useful type signature, and I asked ChatGPT. I think these are too wishy-washy, but together with the given explanation, they seem to make sense.

Would you say that this level of "having a type signature in mind" would count?

ChatGPT 4o suggesting examples

1. Prediction vs Explanation

Explanation might be:
Phenomenon → (Theory, Mechanism)
Prediction might be:
Features → Label

These have different type signatures. A model that predicts well might not explain. People often conflate these roles. Type signatures remind us: different input-output relationships.

Moral Judgments vs Policy Proposals

Moral judgment (deontic):
Action → Good/Bad
Policy proposal (instrumental):
(State × Action) → (New State × Externalities)

People often act as if "this action is wrong" implies "we must ban it," but that only follows if the second signature supports the first. You can disagree about outcomes while agreeing on morals, or vice versa.

Interpersonal Feedback

Effective feedback:
(Action × Impact) → Updated Mental Model

People often act as if the type signature is just Action → Judgment. That’s blame, not feedback. This reframing can help structure nonviolent communication.

Creativity vs Optimization

Optimization:
(Goal × Constraints) → Best Action
Creativity:
Void → (Goal × Constraints × Ideas)

The creative act generates the very goal and constraints. Treating creative design like optimization prematurely can collapse valuable search space.

7. Education

Lecture model:
Speaker → (Concepts × StudentMemory)
Constructivist model:
(Student × Task × Environment) → Insight

If the type signature of insight requires active construction, then lecture-only formats may be inadequate. Helps justify pedagogy choices.

Source: https://chatgpt.com/share/67f836e2-1280-8001-a7ad-1ef1e2a7afa7

max-harms on Thoughts on AI 2027

I think if there are 40 IQ humanoid creatures (even having been shaped somewhat by the genes of existing humans) running around in habitats being very excited and happy about what the AIs are doing, this counts as an existentially bad ending comparable to death. I think if everyone's brains are destructively scanned and stored on a hard-drive that eventually decays in the year 1 billion having never been run, this is effectively dead. I could go on if it would be helpful.

Do you think these sorts of scenarios are worth describing as "everyone is effectively dead"?

max-harms on Thoughts on AI 2027

I don't think AI personhood will be a mainstream cause area (i.e. most people will think it's weird/not true similar to animal rights), but I do think there will be a vocal minority. I already know some people like this, and as capabilities progress and things get less controlled by the labs, I do think we'll see this become an important issue.

Want to make a bet? I'll take 1:1 odds that in mid-Sept 2027 if we poll 200 people on whether they think AIs are people, at least 3 of them say "yes, and this is an important issue." (Other proposed options "yes, but not important", "no", and "unsure".) Feel free to name a dollar amount and an arbitrator to use in case of disputes.

jenniferrm on birds and mammals independently evolved intelligence

I came here to say "look at octopods!" but you already have. Yay team! :-)

One of the alignment strategies I have been researching in parallel with many others involves finding examples of human-and-animal benevolence and tracing convergent evolution therein, and proposing that "the shared abstracts here (across these genomes, these brains, these creatures all convergently doing these things)" is probably algorithmically simple, with algorithm-to-reality shims that might also be important, and please study it and lean in the direction of doing "more of that".

There is an octopod cognate of "ocytocin" (the "maternal love and protection hormone"), but from what I can tell they did NOT re-use it in the ways that we did. But also they mostly lay eggs while abandoning the individual babies to their own survival, rather than raising children carefully.

By contrast, birds and mammals share a relatively similar kind of "high parental investment"!

frank-bellamy on Who wants to bet me $25k at 1:7 odds that there won't be an AI market crash in the next year?

Suggesting specific odds without being able to define a threshold seems a bit, um, confused. Being willing to take the word of a stranger on the internet when these quantities of money are at stake seems outright stupid. I'm staying out of this market. I suggest that you withdraw your offer.

knight-lee on Disempowerment spirals as a likely mechanism for existential catastrophe

My very uncertain opinion is that, humanity may be very irrational and a little stupid, but humanity isn't that stupid.

The reason people do not take AI risk and other existential risk seriously is due to the complete lack of direct evidence (despite plenty of indirect evidence) of its presence. It's easy for you to consider it obvious due to the curse of knowledge, but this kind of "reasoning from first principles (that nothing disproves the risk and therefore the risk is likely)," is very hard for normal people to do.

Before the September 11th attacks, people didn't take airport security seriously because they lacked imagination on how things could go wrong. They considered worst case outcomes as speculative fiction, regardless of how logically plausible they were, because "it never happened before."

After the attacks, the government actually overreacted and created a massive amount of surveillance.

Once the threat starts to do real and serious damage against the systems for defending threats, the systems actually do wake up and start fighting in earnest. They are like animals which react when attacked, not trees which can be simply chopped down.

Right now the effort against existential risks is extremely tiny. E.g. AI Safety is only $0.1 to $0.2 billion [? · GW], while the US military budget is $800-$1000 billion, and the world GDP is $100,000 billion ($25,000 billion in the US). It's not just spending which is tiny, but effort in general.

I'm more worried about a very sudden threat which destroys these systems in a single "strike," when the damage done goes from 0% to 100% in one day, rather than gradually passing the point of no return.

But I may be wrong.

Edit: one form of point of no return is if the AI behaves more and more aligned even as it is secretly misaligned (like the AI 2027 story).

danielechlin on Misinformation is the default, and information is the government telling you your tap water is safe to drink

The problem is reception of reliable information not production of reliable information.

I've actually just wondered if you need to move science veracity to some external right leaning institution like betting on scientific markets or voting on replication experiments or something.

knight-lee on A collection of approaches to confronting doom, and my thoughts on them

I agree that it's useful in practice, to anticipate the experiences of the future you which you can actually influence the most. It makes life much more intuitive and simple, and is a practical fundamental assumption to make.

I don't think it is "supported by our experience," since if you experienced becoming someone else you wouldn't actually know it happened, you would think you were them all along.

I admit that although it's a subjective choice, it's useful. It's just that you're allowed to anticipate becoming anyone else when you die or otherwise cease to have influence.

ram-potham on Ram Potham's Shortform

I argue that the optimal ethical stance is to become a rational Bodhisattva: a synthesis of effective altruism, two‑level utilitarianism, and the Bodhisattva ideal.

Effective altruism insists on doing the most good per unit of resource, but can demand extreme sacrifices (e.g., donating almost all disposable income).
Two‑level utilitarianism lets us follow welfare‑promoting rules in daily life and switch to explicit cost‑benefit calculations when rules conflict. Yet it offers little emotional motivation.
The Bodhisattva ideal roots altruism in felt interdependence: the world’s suffering is one’s own. It supplies deep motivation and inner peace, but gives no algorithm for choosing the most beneficial act.

A rational Bodhisattva combines the strengths and cancels the weaknesses:

Motivation: Like a Bodhisattva, they experience others’ suffering as their own, so compassion is effortless and durable.
Method: Using reason and evidence (from effective altruism and two‑level utilitarianism), they pick the action that maximizes overall benefit.
Flexibility: They apply the “middle way,” recognizing that different compassionate choices can be permissible when values collide.

Illustration

Your grandparent needs $50,000 for a life‑saving treatment, but the same money could save ten strangers through a GiveWell charity.

A strict effective altruist/utilitarian would donate to GiveWell.
A purely sentimental agent might fund the treatment.
The rational Bodhisattva weighs both outcomes, also including duties into the calculation, acts from compassion, and accepts the result without regret. In most cases they will choose the option with the greatest net benefit, but they can act otherwise when a compassionate rule or relational duty justifies it.

Thus, the rational Bodhisattva unites rigorous impact with deep inner peace.

frank-bellamy on How Gay is the Vatican?

If we assume that the likelihood of a pregnancy leading to a child who lives long enough to be in our records is independent of the child's order / mother's age, then I agree that that the effects on the average birth order and average family size of our cardinals should in some sense cancel. However, it should still increase the error bars on both numbers and therefor the uncertainty of any conclusion. And I'm not sure I'd expect survival to be independent of order.