LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] Outrage Bonding
Jonathan Moregård (JonathanMoregard) · 2024-08-09T13:46:59.818Z · comments (12)

An Illustrated Summary of "Robust Agents Learn Causal World Model"
Dalcy (Darcy) · 2024-12-14T15:02:44.828Z · comments (2)

Perils of Generalizing from One's Social Group
localdeity · 2024-11-24T15:31:18.332Z · comments (1)

What mistakes has the AI safety movement made?
EuanMcLean (euanmclean) · 2024-05-23T11:19:02.717Z · comments (29)

o3, Oh My
Zvi · 2024-12-30T14:10:05.144Z · comments (17)

[question] Is cybercrime really costing trillions per year?
Fabien Roger (Fabien) · 2024-09-27T08:44:07.621Z · answers+comments (28)

We Inspected Every Head In GPT-2 Small using SAEs So You Don’t Have To
robertzk (Technoguyrob) · 2024-03-06T05:03:09.639Z · comments (0)

[link] Improving Dictionary Learning with Gated Sparse Autoencoders
Senthooran Rajamanoharan (SenR) · 2024-04-25T18:43:47.003Z · comments (38)

Now THIS is forecasting: understanding Epoch’s Direct Approach
Elliot Mckernon (elliot) · 2024-05-04T12:06:48.144Z · comments (4)

AiPhone
Zvi · 2024-06-12T22:20:02.141Z · comments (4)

[link] Pay-on-results personal growth: first success
Chipmonk · 2024-09-14T03:39:12.975Z · comments (8)

On Llama-3 and Dwarkesh Patel’s Podcast with Zuckerberg
Zvi · 2024-04-22T13:10:02.645Z · comments (4)

A framework for thinking about AI power-seeking
Joe Carlsmith (joekc) · 2024-07-24T22:41:01.685Z · comments (15)

[link] Slightly More Than You Wanted To Know: Pregnancy Length Effects
JustisMills · 2024-10-21T01:26:02.030Z · comments (4)

RTFB: California’s AB 3211
Zvi · 2024-07-30T13:10:03.853Z · comments (2)

Catastrophic Goodhart in RL with KL penalty
Thomas Kwa (thomas-kwa) · 2024-05-15T00:58:20.763Z · comments (10)

[link] Ice: The Penultimate Frontier
Roko · 2024-07-13T23:44:56.827Z · comments (56)

Do not delete your misaligned AGI.
mako yass (MakoYass) · 2024-03-24T21:37:07.724Z · comments (13)

Why our politicians aren't Median
Yair Halberstadt (yair-halberstadt) · 2024-11-03T14:03:33.779Z · comments (15)

Interdictor Ship
lsusr · 2024-08-19T04:59:18.487Z · comments (9)

[link] Linkpost: Surely you can be serious
kave · 2024-07-18T22:18:09.271Z · comments (8)

"Metastrategic Brainstorming", a core building-block skill
Raemon · 2024-06-11T04:27:52.488Z · comments (5)

Consider the humble rock (or: why the dumb thing kills you)
pleiotroth · 2024-07-04T13:54:15.593Z · comments (11)

[question] What's with all the bans recently?
[deleted] · 2024-04-04T06:16:49.062Z · answers+comments (83)

A Problem to Solve Before Building a Deception Detector
Eleni Angelou (ea-1) · 2025-02-07T19:35:23.307Z · comments (8)

[link] Twitter thread on AI safety evals
Richard_Ngo (ricraz) · 2024-07-31T00:18:14.076Z · comments (3)

[link] "We know how to build AGI" - Sam Altman
Nikola Jurkovic (nikolaisalreadytaken) · 2025-01-06T02:05:05.134Z · comments (5)

Alignment can be the ‘clean energy’ of AI
Cameron Berg (cameron-berg) · 2025-02-22T00:08:30.391Z · comments (1)

[link] on bacteria, on teeth
bhauth · 2024-09-30T15:56:56.830Z · comments (9)

What is a Tool?
johnswentworth · 2024-06-25T23:40:07.483Z · comments (4)

AI #55: Keep Clauding Along
Zvi · 2024-03-14T15:40:09.335Z · comments (16)

AI #78: Some Welcome Calm
Zvi · 2024-08-22T14:20:10.812Z · comments (15)

Cognitive Work and AI Safety: A Thermodynamic Perspective
Daniel Murfet (dmurfet) · 2024-12-08T21:42:17.023Z · comments (9)

[link] Gary Marcus now saying AI can't do things it can already do
Benjamin_Todd · 2025-02-09T12:24:11.954Z · comments (12)

Why imperfect adversarial robustness doesn't doom AI control
Buck · 2024-11-18T16:05:06.763Z · comments (25)

[question] We might be dropping the ball on Autonomous Replication and Adaptation.
Charbel-Raphaël (charbel-raphael-segerie) · 2024-05-31T13:49:11.327Z · answers+comments (30)

[link] electric turbofans
bhauth · 2024-11-02T22:50:59.807Z · comments (2)

Natural Latents Are Not Robust To Tiny Mixtures
johnswentworth · 2024-06-07T18:53:36.643Z · comments (8)

ReSolsticed vol I: "We're Not Going Quietly"
Raemon · 2024-12-26T17:52:33.727Z · comments (4)

[link] DeepMind: Evaluating Frontier Models for Dangerous Capabilities
Zach Stein-Perlman · 2024-03-21T03:00:31.599Z · comments (8)

Inspired by: Failures in Kindness
X4vier · 2024-07-27T01:21:42.848Z · comments (2)

What is SB 1047 *for*?
Raemon · 2024-09-05T17:39:39.871Z · comments (8)

Measuring whether AIs can statelessly strategize to subvert security measures
Alex Mallen (alex-mallen) · 2024-12-19T21:25:28.555Z · comments (0)

A case for donating to AI risk reduction (including if you work in AI)
tlevin (trevor) · 2024-12-02T19:05:06.658Z · comments (2)

Book Review: On the Edge: The Future
Zvi · 2024-09-27T14:00:05.279Z · comments (1)

MATS Alumni Impact Analysis
utilistrutil · 2024-09-30T02:35:57.273Z · comments (7)

[link] Research Report: Sparse Autoencoders find only 9/180 board state features in OthelloGPT
Robert_AIZI · 2024-03-05T13:55:33.483Z · comments (24)

Training AI agents to solve hard problems could lead to Scheming
Marius Hobbhahn (marius-hobbhahn) · 2024-11-19T00:10:55.522Z · comments (12)

There Should Be More Alignment-Driven Startups
Vaniver · 2024-05-31T02:05:06.799Z · comments (14)

Balancing Games
jefftk (jkaufman) · 2024-02-24T14:40:04.237Z · comments (18)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

isabelj on Hauke Hillebrandt's Shortform

taking the dates literally, the first doubling took 19 months and the second doubling took 5 months, which does seem both surprising and increasingly fast.

mr-hire on Anthropic releases Claude 3.7 Sonnet with extended thinking mode

Why didn't they run agentic coding or tool use with their reasoning model?

d0themath on So You Want To Make Marginal Progress...

I think the connection would come from the concept of a Lagrangian dual problem in optimization. See also John's Mazes and Duality [LW · GW].

screwtape on I got dysentery so you don’t have to

You might enjoy knowing this got a brief shoutout during Boston's Secular Solstice. Thank you for your service!

brendan-long on Anthropic releases Claude 3.7 Sonnet with extended thinking mode

During our evaluations we noticed that Claude 3.7 Sonnet occasionally resorts to special-casing in order to pass test cases in agentic coding environments like Claude Code. Most often this takes the form of directly returning expected test values rather than implementing general solutions, but also includes modifying the problematic tests themselves to match the code’s output.

Claude officially passes the junior engineer Turing Test?

david-james on So You Want To Make Marginal Progress...

I find this article confusing. So I find myself returning to fundamentals of computer science algorithms: to greedy algorithms and under what conditions they are optimal. Would anyone care to build a bridge from this terminology to what the author is trying to convey?

holly_elmore on The Sorry State of AI X-Risk Advocacy, and Thoughts on Doing Better

Yeah I suspect that these one-shot big protests are drawing on a history of organizing in those or preceding fields. The Women’s March coalition comes together all for one big event but draws on a far on deeper history involving small demonstrations and deliberate organizing to make it to that point, is my point. Idk about Free Internet but I would bet it leaned on Free Speech organizing and advocacy.

I sure wish someone would put on a large AI Safety protest if they know a way to do this in one leap. If I got a sponsor for a concert or some other draw then perhaps I could see a larger thing happening quickly in the family of AI Safety protest, but I’d like the keep the brand pretty earnest and message-focused.

I have to note, based on our history, I interpret your posts as attacking, like the subtext is that I’m just not a good organizer and, if you wanted to, you could organize a way bigger movement way faster. If that’s true, I wish you would! I’m trying my best with my understanding of how this can work for me and I wish more people like you were embracing broad messaging like protests.

daniel-kokotajlo on Hauke Hillebrandt's Shortform

Feb '23: 100M[2]
Sep '24: 200M[3] of which 11.5M paid, Enterprise: 1M[4]
Feb '25: 400M[5] of which 15M paid, 15.5M[6] / Enterprise: 2M
One can see:
Surprisingly, increasingly faster user growth
While OpenAI converted 11.5M out of the first 200M users, they only got 3.5M users out of the most recent 200M to pay for ChatGPT

This user growth seems neither surprising nor 'increasingly faster' to me. Isn't it just doubling every year?

That said, I agree based on your second bullet point that probably they've got some headwinds incoming and will by default have slower growth in the future. I imagine competition is also part of the story here.

mattmacdermott on Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path?

Seems mistaken to think that the way you use a model is what determines whether or not it’s an agent. It’s surely determined by how you train it?

(And notably the proposal here isn’t to train the model on the outcomes of experiments it proposes, in case that’s what you’re thinking.)

steve2152 on [Intuitive self-models] 2. Conscious Awareness

Good questions, thanks!

In 2.4.2, you say that things can only get stored in episodic memory if they were in conscious awareness. People can sometimes remember events from their dreams. Does that mean that people have conscious awareness during (at least some of) their dreams?

My answer is basically “yes”, although different people might have different definitions of the concept “conscious awareness”. In other words, in terms of map-territory correspondence, I claim there’s a phenomenon P in the territory (some cortex neurons / concepts / representations are active at any given time, and others are not, as described in the post), and this phenomenon P gets incorporated into everyone’s map, and that’s what I’m talking about in this post. And this phenomenon P is part of the territory during dreaming too.

But it’s not necessarily the case that everyone will define the specific English-language phrase “conscious awareness” to indicate the part of their map whose boundaries are drawn exactly around that phenomenon P. Instead, for example, some people might feel like the proper definition of “conscious awareness” is something closer to “the phenomenon P in the case when I’m awake, and not drugged, etc.”, which is really P along with various additional details and connotations and associations, such as the links to voluntary control and memory. Those people would still be able to conceptualize the phenomenon P, of course, and it would still be a big part of their mental worlds, but to point to it you would need a whole sentence, not just the two words “conscious awareness”.

Is there anything you can say about what unconsciousness is? i.e. Why is there nothing in conscious awareness during this state? - Is the cortex not thinking any (coherent?) thoughts? (I have not studied unconsciousness.)

I think sometimes the cortex isn’t doing much of anything, or at least, not running close-enough-to-normal that neurons representing thoughts can be active.

Alternatively, maybe the cortex is doing its usual thing of activating groups of neurons that represent thoughts and concepts—but it’s neither forming memories (that last beyond a few seconds), nor taking immediate actions. Then you “look unconscious” from the outside, and you also “look unconscious” from the perspective of your future self. There’s no trace of what the cortex was doing, even if it was doing something. Maybe brain scans can distinguish that possibility though.

About the predictive learning algorithm in the human brain…

I think I declare that out-of-scope for this series, from some combination of “I don’t know the complete answer” and “it might be a dangerous capabilities question”. Those are related, of course—when I come upon things that might be dangerous capabilities questions, I often don’t bother trying to answer them :-P