LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

The Field of AI Alignment: A Postmortem, and What To Do About It
johnswentworth · 2024-12-26T18:48:07.614Z · comments (68)

What Goes Without Saying
sarahconstantin · 2024-12-20T18:00:06.363Z · comments (12)

Orienting to 3 year AGI timelines
Nikola Jurkovic (nikolaisalreadytaken) · 2024-12-22T01:15:11.401Z · comments (34)

[link] When Is Insurance Worth It?
kqr · 2024-12-19T19:07:32.573Z · comments (58)

o3
Zach Stein-Perlman · 2024-12-20T18:30:29.448Z · comments (148)

“Alignment Faking” frame is somewhat fake
Jan_Kulveit · 2024-12-20T09:51:04.664Z · comments (13)

What o3 Becomes by 2028
Vladimir_Nesov · 2024-12-22T12:37:20.929Z · comments (13)

Hire (or become) a Thinking Assistant / Body Double
Raemon · 2024-12-23T03:58:42.061Z · comments (31)

[link] How to replicate and extend our alignment faking demo
Fabien Roger (Fabien) · 2024-12-19T21:44:13.059Z · comments (2)

The nihilism of NeurIPS
charlieoneill (kingchucky211) · 2024-12-20T23:58:11.858Z · comments (7)

[question] What are the strongest arguments for very short timelines?
Kaj_Sotala · 2024-12-23T09:38:56.905Z · answers+comments (68)

A breakdown of AI capability levels focused on AI R&D labor acceleration
ryan_greenblatt · 2024-12-22T20:56:00.298Z · comments (2)

AIs Will Increasingly Fake Alignment
Zvi · 2024-12-24T13:00:07.770Z · comments (0)

A Three-Layer Model of LLM Psychology
Jan_Kulveit · 2024-12-26T16:49:41.738Z · comments (3)

🇫🇷 Announcing CeSIA: The French Center for AI Safety
Charbel-Raphaël (charbel-raphael-segerie) · 2024-12-20T14:17:13.104Z · comments (0)

[link] Review: Planecrash
L Rudolf L (LRudL) · 2024-12-27T14:18:33.611Z · comments (6)

When AI 10x's AI R&D, What Do We Do?
Logan Riggs (elriggs) · 2024-12-21T23:56:11.069Z · comments (12)

Retrospective: PIBBSS Fellowship 2024
DusanDNesic · 2024-12-20T15:55:24.194Z · comments (1)

[link] Anthropic leadership conversation
Zach Stein-Perlman · 2024-12-20T22:00:45.229Z · comments (16)

AI #95: o1 Joins the API
Zvi · 2024-12-19T15:10:05.196Z · comments (1)

Checking in on Scott's composition image bet with imagen 3
Dave Orr (dave-orr) · 2024-12-22T19:04:17.495Z · comments (0)

Measuring whether AIs can statelessly strategize to subvert security measures
Alex Mallen (alex-mallen) · 2024-12-19T21:25:28.555Z · comments (0)

Vegans need to eat just enough Meat - emperically evaluate the minimum ammount of meat that maximizes utility
Johannes C. Mayer (johannes-c-mayer) · 2024-12-22T22:08:31.971Z · comments (28)

[question] What Have Been Your Most Valuable Casual Conversations At Conferences?
johnswentworth · 2024-12-25T05:49:36.711Z · answers+comments (19)

ReSolsticed vol I: "We're Not Going Quietly"
Raemon · 2024-12-26T17:52:33.727Z · comments (3)

AI #96: o3 But Not Yet For Thee
Zvi · 2024-12-26T20:30:06.722Z · comments (7)

[link] The Deep Lore of LightHaven, with Oliver Habryka (TBC episode 228)
Eneasz · 2024-12-24T22:45:50.065Z · comments (4)

Claude's Constitutional Consequentialism?
1a3orn · 2024-12-19T19:53:33.254Z · comments (6)

[link] Review: Good Strategy, Bad Strategy
L Rudolf L (LRudL) · 2024-12-21T17:17:04.342Z · comments (0)

[question] What are the most interesting / challenging evals (for humans) available?
Raemon · 2024-12-27T03:05:26.831Z · answers+comments (8)

[link] Moderately Skeptical of "Risks of Mirror Biology"
Davidmanheim · 2024-12-20T12:57:31.824Z · comments (3)

[link] Announcing the Q1 2025 Long-Term Future Fund grant round
Linch · 2024-12-20T02:20:22.448Z · comments (0)

[link] What I expected from this site: A LessWrong review
Nathan Young · 2024-12-20T11:27:39.683Z · comments (5)

[link] A progress policy agenda
jasoncrawford · 2024-12-19T18:42:37.327Z · comments (1)

You can validly be seen and validated by a chatbot
Kaj_Sotala · 2024-12-20T12:00:03.015Z · comments (3)

People aren't properly calibrated on FrontierMath
cakubilo · 2024-12-23T19:35:44.467Z · comments (4)

Compositionality and Ambiguity: Latent Co-occurrence and Interpretable Subspaces
Matthew A. Clarke (Antigone) · 2024-12-20T15:16:51.857Z · comments (0)

Learning Multi-Level Features with Matryoshka SAEs
Bart Bussmann (Stuckwork) · 2024-12-19T15:59:00.036Z · comments (4)

Corrigibility's Desirability is Timing-Sensitive
RobertM (T3t) · 2024-12-26T22:24:17.435Z · comments (4)

Living with Rats in College
lsusr · 2024-12-25T10:44:13.085Z · comments (0)

Good Reasons for Alts
jefftk (jkaufman) · 2024-12-21T01:30:03.113Z · comments (2)

[link] AI as systems, not just models
Andy Arditi (andy-arditi) · 2024-12-21T23:19:05.507Z · comments (0)

[link] The Alignment Simulator
Yair Halberstadt (yair-halberstadt) · 2024-12-22T11:45:55.220Z · comments (3)

[link] Funding Case: AI Safety Camp 11
Remmelt (remmelt-ellen) · 2024-12-23T08:51:55.255Z · comments (0)

[link] PCR retrospective
bhauth · 2024-12-26T21:20:56.484Z · comments (0)

Elon Musk and Solar Futurism
transhumanist_atom_understander · 2024-12-21T02:55:28.554Z · comments (27)

If all trade is voluntary, then what is "exploitation?"
Darmani · 2024-12-27T11:21:30.036Z · comments (19)

Acknowledging Background Information with P(Q|I)
JenniferRM · 2024-12-24T18:50:25.323Z · comments (4)

Non-Obvious Benefits of Insurance
jefftk (jkaufman) · 2024-12-23T03:40:02.184Z · comments (5)

Monthly Roundup #25: December 2024
Zvi · 2024-12-23T14:20:04.682Z · comments (3)

next page (older posts) →

Archive

Recent comments

tom-davidson on When Is Insurance Worth It?

Yep I'm saying you're wrong about this. If money compounds but you don't have utility=log($) then you shouldn't Kelly bet

dmitry-vaintrob on Review: Planecrash

Thank you for writing this! This is my favorite thing on this site in a while.

cstinesublime on ChristianKl's Shortform

This was a shocking revelation to me, I only discovered it a few months ago when I was wondering why one USB-c cable was data transferring between my laptop and an external SSD so much slower than another.
What is astounding is, at least in bricks and mortar retail, the price differential between different capabilities of cables. It's so high sometimes as to not even seem like a good deal "this cable costs three times that one, but only charges 30% faster with only one device I have which is capable of that speed of charging"

error on Acknowledging Background Information with P(Q|I)

Memento Errata

I love this phrase. It could practically be a LW motto, or a title for some adjacent project, or something like that. It's even self-referencing -- or at least, Claude tells me it's grammatically incorrect, and that feels appropriate.

mako-yass on If all trade is voluntary, then what is "exploitation?"

I'd just define exploitation to be precisely the opposite of shapley bargaining [LW · GW], situations where a person is not being compensated in proportion to their bargaining power.

This definition encompasses any situation where a person has grievances and it makes sense for them to complain about them and take a stand, or, where striking could reasonably be expected to lead to a stable bargaining equilibrium with higher net utility (not all strikes fall into this category).

This definition also doesn't fully capture the common sense meaning of exploitation, but I don't think a useful concept can.

cstinesublime on leogao's Shortform

What kind of changes or outcomes would you expect to see if people around these parts instead of publishing their work independently started trying to get it into traditional ML conferences and related publications?

nc-1 on The Field of AI Alignment: A Postmortem, and What To Do About It

I am surprised that you find theoretical physics research less tight funding-wise than AI alignment [is this because the paths to funding in physics are well-worn, rather than better resourced?].

This whole post was a little discouraging. I hope that the research community can find a way forward.

habryka4 on leogao's Shortform

A thing that I often see happening when people talk about "normie-legible status systems" is that they gaslight themselves into believing that some status system that is extraordinarily legible, or they are part of, is something that is consensus.

Academia is the most intense example of this. Most people don't care that much about academic status! This also happens in the other direction. Youtube is a major source of status in much of the world, especially among young people, but is considered low-brow whenever people argue about this, and so people dismiss it.

I also think people tend to do a fallacy of gray thing where if a status system is not maximally legible (like writing popular blogposts, or running a popular podcast, or making popular Youtube videos, or being popular on Twitter), they dismiss the status system as not real and "illegible".

I think modeling the real status and reputation systems that are present in the world is important, but for example, trying to ascent the academic status hierarchy is a bad use of time and resources. It's extremely competitive, and not actually that influential outside of the academic bubble. It is in some fields better correlated with actual skills and integrity and intelligence, and so I still think a reasonable thing to consider, but I think most people are better placed to trade off a bit of legibility against a whole amount of net realness in status (this importantly does not mean your LW quick takes will be the thing that causes you to become world-renowned, I am not saying "just say smart things and the world will recognize you", I am saying "don't think that only the most legible status systems, or the one with the most mobs hunting dissenters from the status system are the only real ways of gaining recognition in the world").

alexander-gietelink-oldenziel on Cognitive Work and AI Safety: A Thermodynamic Perspective

I agree with you that as stated the analogy is risking dangerous superficiality.

The 'cognitive' work of evolution came from the billion years of evolution in the innumerable forms of life that lived, hunted and reproduced through the eons. Effectively we could see evolution-by-natural selection as a something like a simple, highly-parallel, stochastic, slow algorithm. I.e. a simple many-tape random Turing machine taking a very large number of timesteps.
A way to try and maybe put some (vegan) meat on the bones of this analogy would be to look at conditional KT-complexity. KT-complexity is a version of Kolmogorov complexity that also accounts for the time- cost of running the generating program.
- In KT-complexity pseudorandomness functions just like randomness.
- Algorithms may indeed be copied and the copy operation is fast and takes very little memory overhead.
- Just as in Kolmogorov complexity we rejiggle and think in terms of an algorithmic probability.
- a private-public key is trivial in a pure Kolmogorov complexity framework but correctly modelled in a KT-complexity framework.

To deepen the analogy with thermodynamics one should probably carefully read John Wentworth's generalized heat engines [LW · GW] and Kolmogorov sufficient statistics.

ete on The Field of AI Alignment: A Postmortem, and What To Do About It

If you're mobile (able to be in the UK) and willing to try a different lifestyle, consider going to the EA hotel aka CEEALAR, they offer free food and accomodation for a bunch of people, including many people working on AI safety. Alternatively, taking a quick look at https://www.aisafety.com/funders, the current best options are maybe LTFF, OpenPhil, CLR, or maybe AE Studios?