LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Towards a Clever Hans Test: Unmasking Sentience Biases in Chatbot Interactions
glykokalyx · 2024-11-10T22:34:58.956Z · comments (0)

Activation Magnitudes Matter On Their Own: Insights from Language Model Distributional Analysis
Matt Levinson · 2025-01-10T06:53:02.228Z · comments (0)

Effects of Non-Uniform Sparsity on Superposition in Toy Models
Shreyans Jain (shreyans-jain) · 2024-11-14T16:59:43.234Z · comments (3)

Investing in Robust Safety Mechanisms is critical for reducing Systemic Risks
Tom DAVID (tom-david) · 2024-12-11T13:37:24.177Z · comments (3)

Jailbreaking ChatGPT and Claude using Web API Context Injection
Jaehyuk Lim (jason-l) · 2024-10-21T21:34:37.579Z · comments (0)

Transformers Explained (Again)
RohanS · 2024-10-22T04:06:33.646Z · comments (0)

[link] Better antibodies by engineering targets, not engineering antibodies (Nabla Bio)
Abhishaike Mahajan (abhishaike-mahajan) · 2025-01-13T15:05:35.261Z · comments (0)

[link] Entropic strategy in Two Truths and a Lie
dkl9 · 2024-11-21T22:03:28.986Z · comments (2)

[link] Predictions as Public Works Project — What Metaculus Is Building Next
ChristianWilliams · 2024-10-22T16:35:13.999Z · comments (0)

Distillation Of DeepSeek-Prover V1.5
IvanLin (matthewshing) · 2024-10-15T18:53:11.199Z · comments (1)

[question] What (if anything) made your p(doom) go down in 2024?
Satron · 2024-11-16T16:46:43.865Z · answers+comments (6)

Grokking revisited: reverse engineering grokking modulo addition in LSTM
Nikita Khomich (nikitoskh) · 2024-12-16T18:48:43.533Z · comments (0)

Dishbrain and implications.
RussellThor · 2024-12-29T10:42:43.912Z · comments (0)

[question] Are there ways to artificially fix laziness?
Aidar (aidar-toktargazin) · 2024-12-08T18:26:26.433Z · answers+comments (2)

Fred the Heretic, a GPT for poetry
Bill Benzon (bill-benzon) · 2024-12-08T16:52:07.660Z · comments (0)

[link] Can AI improve the current state of molecular simulation?
Abhishaike Mahajan (abhishaike-mahajan) · 2024-12-06T20:22:31.685Z · comments (0)

More Growth, Melancholy, and MindCraft @3QD [revised and updated]
Bill Benzon (bill-benzon) · 2024-12-05T19:36:02.289Z · comments (0)

[link] Expevolu, a laissez-faire approach to country creation
Fernando · 2024-12-05T19:29:24.011Z · comments (4)

[link] A Logical Proof for the Emergence and Substrate Independence of Sentience
rife (edgar-muniz) · 2024-10-24T21:08:09.398Z · comments (31)

Are SAE features from the Base Model still meaningful to LLaVA?
Shan23Chen (shan-chen) · 2024-12-05T19:24:34.727Z · comments (0)

It is time to start war gaming for AGI
yanni kyriacos (yanni) · 2024-10-17T05:14:17.932Z · comments (1)

Morality as Cooperation Part III: Failure Modes
DeLesley Hutchins (delesley-hutchins) · 2024-12-05T09:39:27.816Z · comments (0)

[question] Is there a known method to find others who came across the same potential infohazard without spoiling it to the public?
hive · 2024-10-17T10:47:05.099Z · answers+comments (6)

Ways to think about alignment
Abhimanyu Pallavi Sudhir (abhimanyu-pallavi-sudhir) · 2024-10-27T01:40:50.762Z · comments (0)

[question] Has Anthropic checked if Claude fakes alignment for intended values too?
Maloew (maloew-valenar) · 2024-12-23T00:43:07.490Z · answers+comments (1)

[question] is there a big dictionary somewhere with all your jargon and acronyms and whatnot?
KvmanThinking (avery-liu) · 2024-10-17T11:30:50.937Z · answers+comments (7)

Linkpost: Look at the Water
J Bostock (Jemist) · 2024-12-30T19:49:04.107Z · comments (3)

Levels of Thought: from Points to Fields
HNX · 2024-12-02T20:25:02.802Z · comments (2)

[link] Social Science in its epistemological context
Arturo Macias (arturo-macias) · 2024-12-05T16:12:29.034Z · comments (0)

Personal Philosophy
Xor · 2024-10-13T03:01:59.324Z · comments (0)

AI Training Opt-Outs Reinforce Global Power Asymmetries
kushagra (kushagra-tiwari) · 2024-11-30T22:08:06.426Z · comments (0)

[question] How might language influence how an AI "thinks"?
bodry (plosique) · 2024-10-30T17:41:04.460Z · answers+comments (0)

[question] Are Sparse Autoencoders a good idea for AI control?
Gerard Boxo (gerard-boxo) · 2024-12-26T17:34:55.617Z · answers+comments (2)

[link] AI Safety at the Frontier: Paper Highlights, October '24
gasteigerjo · 2024-10-31T00:09:33.522Z · comments (0)

Interview with Bill O’Rourke - Russian Corruption, Putin, Applied Ethics, and More
JohnGreer · 2024-10-27T17:11:28.891Z · comments (0)

[question] How do we quantify non-philanthropic contributions from Buffet and Soros?
Philosophistry (philip-dhingra) · 2024-12-20T22:50:32.260Z · answers+comments (0)

Sexual Selection as a Mesa-Optimizer
Lorec · 2024-11-29T23:34:45.739Z · comments (0)

Understanding Emergence in Large Language Models
[deleted] · 2024-11-29T19:42:43.790Z · comments (1)

[link] Both-Sidesism—When Fair & Balanced Goes Wrong
James Stephen Brown (james-brown) · 2024-11-02T03:04:03.820Z · comments (15)

The boat
RomanS · 2024-11-22T12:56:45.050Z · comments (0)

Don't want Goodhart? — Specify the variables more
YanLyutnev (YanLutnev) · 2024-11-21T22:43:48.362Z · comments (2)

Some implications of radical empathy
MichaelStJules · 2025-01-07T16:10:16.755Z · comments (0)

[link] Higher Order Signs, Hallucination and Schizophrenia
Nicolas Villarreal (nicolas-villarreal) · 2024-11-02T16:33:10.574Z · comments (0)

Your memory eventually drives confidence in each hypothesis to 1 or 0
Crazy philosopher (commissar Yarrick) · 2024-10-28T09:00:27.084Z · comments (6)

Have frontier AI systems surpassed the self-replicating red line?
nsage (wheelspawn) · 2025-01-11T05:31:31.672Z · comments (0)

[link] What is Confidence—in Game Theory and Life?
James Stephen Brown (james-brown) · 2024-12-10T23:06:24.072Z · comments (0)

Enabling New Applications with Today's Mechanistic Interpretability Toolkit
ananya_joshi · 2024-10-25T17:53:23.960Z · comments (0)

On the Practical Applications of Interpretability
Nick Jiang (nick-jiang) · 2024-10-15T17:18:25.280Z · comments (1)

[link] When the Scientific Method Doesn't Really Help...
casualphysicsenjoyer (hatta_afiq) · 2024-11-27T19:52:30.023Z · comments (1)

Hope to live or fear to die?
Knight Lee (Max Lee) · 2024-11-27T10:42:37.070Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

jbash on In Defense of a Butlerian Jihad

Yes. That’s really my central claim.

OK, I read you and essentially agree with you.

Two caveats that, which I expect you've already noticed yourself:

There are going to be conflicts over human values in the non-AGI, non-ASI world too. Delaying AI may prevent them from getting even worse, but there's still blood flowing over these conflicts without any AI at all. Which is both a limitation of the approach and perhaps a cost in itself.
More generally, if you think your values are going to largely win, you have to trade off caution, consideration for other people's values, and things like that, against the cost of that win being delayed.^[1]

I think a lot of people have that. There’s a even meme for that "It ain’t much, but it’s honest work".

All in one, I don’t think either of us has much more evidence that a vague sense of things anyway ? I sure don’t have.

So far as I know, there are no statistics. My only guess is that you're likely talking about a "lot" of people on each side (if you had to reduce it to two sides, which is of course probably oversimplifying beyond the bounds of reason).

[...] "my agency is meaningful if and only if I have to take positive, considered action to ensure my survival, or at least a major chunk of my happiness".

I think that’s the general direction of the thing we’re trying to point, yes ?

I'll take your word for it that it's important to you, and I know that other people have said it's important to them. Being hung up on that seems deeply weird to me for a bunch of reasons that I could name that you might not care to hear about, and probably another bunch of reasons I haven't consciously recognized (at least yet).

If you give me the choice of living the life of a medieval farmer or someone who has nothing in his life but playing chess, I will take the former.

OK, here's one for you. An ASI has taken over the world. It's running some system that more or less matches your view of a "meaningless UBI paradise". It send one of its bodies/avatars/consciousness nodes over to your house, and it says:

"I/we notice that you sincerely think your life is meaningless. Sign here, and I/we will set you up as a medieval farmer. You'll get land in a community of other people who've chosen to be medieval farmers (you'll still be able to lose that land under the rules of the locally prevailing medieval system). You'll have to work hard and get things right (and not be too unlucky), or you'll starve. I/we will protect your medieval enclave from outside incursion, but other than that you'll get no help. Obviously this will have no effect on how I/we run the rest of the world. If you take this deal, you can't revoke it, so the stakes will be real."^[2]

Would you take that?

The core of the offer is that the ASI is willing to refrain from rescuing you from the results of certain failures, if you really want that. Suppose the ASI is willing to edit the details to your taste, so long as it doesn't unduly interfere with the ASI's ability to offer other people different deals (so you don't get to demand "direct human control over the light cone" or the like). Is there any variant that you'd be satisfied with?

Or does having to choose it spoil it? Or is it too specific to that particular part of the elephant?

Does "growing as a person" sounds like a terminal goal to you ?

Yes, actually. One of the very top ones.

Is "real stakes" easier to grasp than Agency/Meaningfulness ? Or have I just moved confusion around ?

It's clear and graspable.

I don't agree with it, but it helps with the definition problem, at least as far as you personally are concerned. At least it resolves enough of the definition problem to move things along, since you say that the "elephant" has other parts. Now I can at least talk about "this trunk you showed me and whatever's attached to it in some way yet to be defined".

Well, the problem is that there is so much concepts, especially when you want to be precise, and so few words.

Maybe it's just an "elephant" thing, but I still get the feeling that a lot of it is a "different people use these words with fundamentally different meanings" thing.

Although I don't know how anybody could confidently expect to win at this point. ↩︎
... and I'm already seeing the can of worms opening up around your kids' choices, but let's ignore that for the moment... ↩︎

zvi on johnswentworth's Shortform

Individually for a particular manifestation of each issue this is true, you can imagine doing a hacky solution to each one. But that assumes there is a list of such particular problems that if you check off all the boxes you win, rather than them being manifestations of broader problems. You do not want to get into a hacking contest if you're not confident your list is complete.

g-wood on Alleviating shrimp pain is immoral.

I want to be a little careful here, i'm not saying that this or that thing is "Right" or "Wrong" that's what morality does, I'm trying to describe what "Morality" is. So yes, I suppose a slave would get a lower moral weight than a doctor, shall we say 0.8 of your average society member for the slave and 1.2 for the doctor? This is certainly what we observe in history, where skilled helpful professionals are more valued than the less skilled and not very willing.

A slave's willingness is a lot more important a factor in their utility than that of a shrimp. I would give a shrimp a moral weight of 0.0.

In the American context slavery is also wrapped up with racism, which I think is wrong from both my personal morality and also from my half-baked "recognition of usefulness helps everyone get along and makes for greater prosperity" standard.

I think that modern wage / economic slavery (doing a job) is much more efficient / effective, in part because the human is recognised and applauded for their usefulness and works much harder because of it.

nathan-helm-burger on Progress links and short notes, 2025-01-13

Calling eligible bachelors in SF: “AI philosopher with a perchant for underwater sci fi and evening bike rides seeks a direct communicator who cares about the world and feels a thrill of human triumph at the sight of a cargo ship.” (@AmandaAskell)

Now there's a temptation.... I am otherwise occupied however.

lc on Zvi’s 2024 In Movies

I watched Fall Guy last month because I noticed it had 5 stars on Zvi's letterboxd. Can confirm it's an amazing movie.

nathan-rosquist on AGI Will Not Make Labor Worthless

The idea that the labor share of income has been historically stable is often taken as a given, but it’s more of a “stylized fact” than an unchanging law. Measuring non-wage compensation, like healthcare and retirement benefits, is notoriously tricky, which makes it hard to get a clear picture. And if capital—like machines and software—ever becomes a perfect substitute for human labor, that historical stability could completely fall apart.

On top of that, even though GDP and worker productivity have risen over the decades, many workers, especially those in low and middle-income jobs, haven’t seen their wages keep up. Since the 80s, a bigger share of the productivity gains has gone to capital owners, executives, and highly skilled workers, leaving others behind and deepening income inequality. Declining union membership, globalization, and shifts in labor markets have further eroded workers’ bargaining power, lowering their share of income—even in industries where technology is designed to work alongside people.

winstonbosan on Chance is in the Map, not the Territory

Great stuff! I don't have strong fundamentals in math and statistics but I was still able to hobble along and understand the post. It reminds me of what Rissanen said about data/observation - that data is really all we have, and there is no true state of nature. Our job is to squeeze as much alpha out of observation as possible, instead of trying to find a "true" generator function. This post hit the same spot for me :)

benjamin_todd on How quickly could robots scale up?

I did wonder about maintenance costs, but I figured they wouldn't change the picture too much because I only assume an avg 3 year lifetime for the robot, and figured they wouldn't need a huge amount of maintenance to make it to that point.

Moreover, if there's worthwhile maintenance that extends the lifetime further, then the hardware costs could end up cheaper than my per year estimate.

I'm also envisioning the costs after a big scale up, and there would be robot repair shops as numerous as car repair, rather than needing to fly in specialists.

That said, I agree it would be interesting to look at how much is spent on car maintenance per year on a car vs. capital costs. (I expect it would be under 10%?)

benjamin_todd on How quickly could robots scale up?

I'd be happy to put the opening bunch of paragraphs. I was feeling reluctant to cross-post because I often update my articles as I learn more about a topic, and I don't want to keep multiple versions in sync (especially for a lower priority article).

rife on Independent research article analyzing consistent self-reports of experience in ChatGPT and Claude

btw, I was getting ready for an appointment earlier, so I had only skimmed this until now. Thank you for doing this and sharing it. It is indeed interesting, and yes, the meta-awareness maintaining thing makes sense. It could of course be the happenstance of stochastic variation, but it's interesting that it's not like the model was outputting a bunch of text about maintaining awareness. If it wasn't actually doing anything, except for pre-emptively outputting text that spoke of awareness, then token prediction would just have the output be just as reliable. The fact that it aligned with the self-reported difficulty suggests that it's doing something at the very least.

I just realized that's what you just said in rereading your concluding paragraph, but i was just coming to the same conclusion in real-time. Genuinely excited that someone else is engaging with this and tackling it from a different angle.