LessWrong 2.0 Reader

View: New · Old · Top

← previous page (newer posts) · next page (older posts) →

Monthly Roundup #28: March 2025
Zvi · 2025-03-17T12:50:03.097Z · comments (8)

[link] Are corporations superintelligent?
Vishakha (vishakha-agrawal) · 2025-03-17T10:36:12.703Z · comments (3)

[link] One pager
samuelshadrach (xpostah) · 2025-03-17T08:12:49.789Z · comments (2)

[link] The Case for AI Optimism
Annapurna (jorge-velez) · 2025-03-17T01:29:22.734Z · comments (1)

Notable runaway-optimiser-like LLM failure modes on Biologically and Economically aligned AI safety benchmarks for LLMs with simplified observation format
Roland Pihlakas (roland-pihlakas) · 2025-03-16T23:23:30.989Z · comments (6)

Read More News
utilistrutil · 2025-03-16T21:31:28.817Z · comments (2)

What would a post labor economy *actually* look like?
Ansh Juneja (ansh-juneja) · 2025-03-16T20:38:41.788Z · comments (1)

Why White-Box Redteaming Makes Me Feel Weird
Zygi Straznickas (nonagon) · 2025-03-16T18:54:48.078Z · comments (34)

How I've run major projects
benkuhn · 2025-03-16T18:40:04.223Z · comments (10)

Counting Objections to Housing
jefftk (jkaufman) · 2025-03-16T18:20:06.898Z · comments (7)

I make several million dollars per year and have hundreds of thousands of followers—what is the straightest line path to utilizing these resources to reduce existential-level AI threats?
shrimpy · 2025-03-16T16:52:42.177Z · comments (25)

Siberian Arctic origins of East Asian psychology
davidsun · 2025-03-16T16:52:23.068Z · comments (0)

[link] AI Model History is Being Lost
Vale · 2025-03-16T12:38:47.907Z · comments (1)

Metacognition Broke My Nail-Biting Habit
Rafka · 2025-03-16T12:36:47.437Z · comments (20)

[question] Can we ever ensure AI alignment if we can only test AI personas?
Karl von Wendt · 2025-03-16T08:06:42.345Z · answers+comments (8)

Can time preferences make AI safe?
TerriLeaf · 2025-03-15T21:41:33.127Z · comments (1)

Help make the orca language experiment happen
Towards_Keeperhood (Simon Skade) · 2025-03-15T21:39:43.276Z · comments (12)

Announcing EXP: Experimental Summer Workshop on Collective Cognition
Jan_Kulveit · 2025-03-15T20:14:47.972Z · comments (2)

AI Self-Correction vs. Self-Reflection: Is There a Fundamental Difference?
Project Solon · 2025-03-15T18:24:50.579Z · comments (0)

The Fork in the Road
testingthewaters · 2025-03-15T17:36:37.503Z · comments (12)

Any-Benefit Mindset and Any-Reason Reasoning
silentbob · 2025-03-15T17:10:14.682Z · comments (9)

The Silent War: AGI-on-AGI Warfare and What It Means For Us
funnyfranco · 2025-03-15T15:24:08.819Z · comments (2)

[link] Paper: Field-building and the epistemic culture of AI safety
peterslattery · 2025-03-15T12:30:14.088Z · comments (3)

Why Billionaires Will Not Survive an AGI Extinction Event
funnyfranco · 2025-03-15T06:08:23.829Z · comments (0)

AI Says It’s Not Conscious. That’s a Bad Answer to the Wrong Question.
JohnMarkNorman · 2025-03-15T01:25:44.019Z · comments (0)

Report & retrospective on the Dovetail fellowship
Alex_Altair · 2025-03-14T23:20:17.940Z · comments (3)

The Dangers of Outsourcing Thinking: Losing Our Critical Thinking to the Over-Reliance on AI Decision-Making
Cameron Tomé-Moreira · 2025-03-14T23:07:48.446Z · comments (4)

LLMs may enable direct democracy at scale
Davey Morse (davey-morse) · 2025-03-14T22:51:13.384Z · comments (16)

2024 Unofficial LessWrong Survey Results
Screwtape · 2025-03-14T22:29:00.045Z · comments (28)

AI4Science: The Hidden Power of Neural Networks in Scientific Discovery
Max Ma (max-ma) · 2025-03-14T21:18:33.941Z · comments (2)

[link] What are we doing when we do mathematics?
epicurus · 2025-03-14T20:54:31.985Z · comments (1)

[link] AI for Epistemics Hackathon
Austin Chen (austin-chen) · 2025-03-14T20:46:34.250Z · comments (10)

Geometry of Features in Mechanistic Interpretability
Gunnar Carlsson (gunnar-carlsson) · 2025-03-14T19:11:04.287Z · comments (0)

[link] AI Tools for Existential Security
Lizka · 2025-03-14T18:38:06.110Z · comments (4)

Capitalism as the Catalyst for AGI-Induced Human Extinction
funnyfranco · 2025-03-14T18:14:02.375Z · comments (2)

Minor interpretability exploration #3: Extending superposition to different activation functions (loss landscape)
Rareș Baron · 2025-03-14T15:45:14.365Z · comments (0)

[link] AI for AI safety
Joe Carlsmith (joekc) · 2025-03-14T15:00:23.491Z · comments (13)

On MAIM and Superintelligence Strategy
Zvi · 2025-03-14T12:30:07.451Z · comments (2)

Whether governments will control AGI is important and neglected
Seth Herd · 2025-03-14T09:48:34.062Z · comments (2)

Something to fight for
RomanS · 2025-03-14T08:27:13.810Z · comments (0)

Interpreting Complexity
Maxwell Adam (intern) · 2025-03-14T04:52:32.103Z · comments (7)

Bike Lights are Cheap Enough to Give Away
jefftk (jkaufman) · 2025-03-14T02:10:02.482Z · comments (0)

Superintelligence's goals are likely to be random
Mikhail Samin (mikhail-samin) · 2025-03-13T22:41:06.325Z · comments (6)

Should AI safety be a mass movement?
mhampton · 2025-03-13T20:36:59.284Z · comments (1)

Auditing language models for hidden objectives
Sam Marks (samuel-marks) · 2025-03-13T19:18:32.638Z · comments (15)

Reducing LLM deception at scale with self-other overlap fine-tuning
Marc Carauleanu (Marc-Everin Carauleanu) · 2025-03-13T19:09:43.620Z · comments (40)

Vacuum Decay: Expert Survey Results
JessRiedel · 2025-03-13T18:31:17.434Z · comments (26)

[link] A Frontier AI Risk Management Framework: Bridging the Gap Between Current AI Practices and Established Risk Management
simeon_c (WayZ) · 2025-03-13T18:29:52.776Z · comments (0)

Creating Complex Goals: A Model to Create Autonomous Agents
theraven · 2025-03-13T18:17:58.519Z · comments (1)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

pablo_stafforini on Pablo's Shortform

One of the biggest online threats to rational discourse, “RationalWiki”, just reached a settlement with all but one of the eight plaintiffs suing them and deleted the corresponding biographical entries. They are considering pre-emptively removing all their other hit pieces—countless articles that have ruined careers, stifled research, and brought entire fields of inquiry into undeserved disrepute.

jadael on Illiteracy in Silicon Valley

"Childish" is the word I also keep coming back to, but hesitate to use, for fear of insulting children by comparing them to people like Sam Altman and Elon Musk.

erich_grunewald on Why Should I Assume CCP AGI is Worse Than USG AGI?

There are some additional reasons, beyond the question of which values would be embedded in the AGI systems, to not prefer AGI development in China, that I haven't seen mentioned here:

Systemic opacity, state-driven censorship, and state control of the media means AGI development under direct or indirect CCP control would probably be less transparent than in the US, and the world may be less likely to learn about warning shots, wrongheaded decisions, reckless behaviour, etc. True, there was the Manhattan Project, but that was quite long ago; recent examples like the CCP's suppression of information related to the origins of COVID feel more salient and relevant.
There are more checks and balances in the US than in China, which you may think could e.g., positively influence regulation; or if there's a government project, help incentivise responsible decisions there; or if someone attempts to concentrate power using some early AGI, stop that from happening. E.g., in the West voters have some degree of influence over the government, there's the free press, the judiciary, an ecosystem of nonprofits, and so on. In China, the CCP doesn't have total control, but much more so than Western governments do.

I think it's also very rare that people are actually faced with a choice between "AGI in the US" versus "AGI in China". A more accurate but still flawed model of the choice people are sometimes faced with is "AGI in the US" versus "AGI in the US and in China", or even "AGI in the US, and in China 6-12 months later" versus "AGI in the US, and in China 3-6 months later".

tailcalled on johnswentworth's Shortform

Writing the part that I didn't get around to yesterday:

You could theoretically imagine e.g. scanning all the atoms of a human body and then using this scan to assemble a new human body in their image. It'd be a massive technical challenge of course, because atoms don't really sit still and let you look and position them. But with sufficient work, it seems like someone could figure it out.

This doesn't really give you artificial general agency of the sort that standard Yudkowsky-style AI worries are about, because you can't assign them a goal. You might get an Age of Em-adjacent situation from it, though even not quite that.

To reverse-engineer people in order to make AI, you'd instead want to identify separate faculties with interpretable effects and reconfigurable interface. This can be done for some of the human faculties because they are frequently applied to their full extent and because they are scaled up so much that the body had to anatomically separate them from everything else.

However, there's just no reason to suppose that it should apply to all the important human faculties, and if one considers all the random extreme events one ends up having to deal with when performing tasks in an unhomogenized part of the world, there's lots of reason to think humans are primarily adapted to those.

One way to think about the practical impact of AI is that it cannot really expand on its own, but that people will try to find or create sufficiently-homogenous places where AI can operate. The practical consequence of this is that there will be a direct correspondence between each part of the human work to prepare the AI to each part of the activities the AI is engaging in, which will (with caveats) eliminate alignment problems because the AI only does the sorts of things you explicitly make it able to do.

The above is similar to how we don't worry so much about 'website misalignment' because generally there's a direct correspondence between the behavior of the website and the underlying code, templates and database tables. This didn't have to be true, in the sense that there are many short programs with behavior that's not straightforwardly attributable to their source code and yet still in principle could be very influential, but we don't know how to select good versions of such programs, so instead we go for the ones with a more direct correspondence, even though they are larger and possibly less useful. Similarly with AI, since consequentialism is so limited, people will manually build out some apps where AI can earn them a profit operating on homogenized stuff, and because this building-out directly corresponds to the effect of the apps, they will be alignable but not very independently agentic.

(The major caveat is people may use AI as a sort of weapon against others, and this might force others to use AI to defend themselves. This won't lead to the traditional doom scenarios because they are too dependent on overestimating the power of consequentialism, but it may lead to other doom scenarios.)

eva_ on A Dissent on Honesty

(Maybe this point isn’t particularly important to the main discussion. I can’t tell, honestly!)

Yeah I think it's an irrelevant tangent where we're describing the same underlying process a bit differently, not really disagreeing.

Frankly, I think that it’s not as hard as some people make it out to be, to tell when it is necessary to tell the truth and when one should instead lie. Mostly, the right answer is obvious to everyone, and the debates, such as they are, mostly boil down to people trying to justify things that they know perfectly well cannot be justified.
... the arguments most often concern whether it’s permissible to lie. Note: not, “is it obligatory to tell the truth, or is it obligatory to lie”—but “is it obligatory to tell the truth, or do I have no obligation here and can I just lie”. I think that this is very telling. And what it tells us (with imperfect but nevertheless non-trivial certainty) is that the person asking the question, or making the argument against the obligation, knows perfectly well what the real—which is to say, moral—answer is. Yes, the right thing to do is to tell the truth.

I think I disagree with this framing. In my model of the sort of person who asks that, they're sometimes selfish-but-honourable people who have noticed telling the truth ends badly for them and will do it if it is an obligation but would prefer to help themselves otherwise, but they are just as often altruistic-and-honourable people who have noticed telling the truth ends badly for everyone and are trying to convince themselves it's okay to do the thing that will actually help. There are also selfish-but-cowardly people who just care if they'll be socially punished for lying, or selfish-and-cruel people chewing at the bit to punish someone else for it, and similar, but moral arguments don't move to them either way so it doesn't matter.

More strongly I disagree because I think a lot of people have harmed themselves or their altruistic causes by failing to correctly determine where the line is, either lying when they shouldn't or not lying when they should, and it is too the communities shame that we haven't been more help with illuminating how to tell those cases apart. If smart hardworking people are getting it wrong so often, you can't just say the task is easy.

If you want to try to put together a complete list of such rules, that’s certainly a project, and I may even contribute to it, but there’s not much point in expecting this to be a definitively completable task. We’re fitting a curve to the data provided by our values, which cannot be losslessly compressed.

This is in total a fair response. I am not sure I can say that you have changed my mind without more detail and I'm not going to take down my original post (as long as there isn't a better post to take its place) because it's still I think directionally correct but thank you for your words.

maxwell-peterson on D&D.Sci Tax Day: Adventurers and Assessments

I trained a booster (LightGBM) and used it to look for nonlinearity in the items - basically I made one ICE plot per item. From this I discovered the following nonlinearities:

Unicorns were the big thing - if you submit enough Unicorn Horns, you seem to get a discount or credit on your taxes. Perhaps they are medicinal, and there is a shortage. This happens at 5 horns, and submitting more than 5 doesn't get any further discount.

There was also some discounting going on with Cockatrice Eyes, but more confusing, where in one view of mine, it looked like the tax was bigger at 0 of them, smaller at 1, bigger at 2, smaller at 3, etc., oscillating.

Dragon, Lich, and Zombie parts looked mostly linear though.

There are a number of tax submissions for which the assessed tax was zero. Even property as large as [1 cockatrice eye, 1 lich skull, 6 zombie hands] had a zero-tax entry. So I took the strategy of starting by copying the zero-tax historical records, where I could, for three of the adventurers. For the fourth, Dragon Heads always incur a big chunk of tax, so I gave the final adventurer all the Dragon Heads, as well as 5 Unicorn Horns and an odd number of Cockatrice Eyes, to offset them.

Then from there I poked around and tried to ride the gradient downward manually. I arrived at:

1: {2 Lich Skull, 8 Zombie Hand} [for 3 gp 6 sp = 3.6 tax]
2: {1 Cockatrice Eye, 1 Dragon Head, 1 Unicorn Horn} [0.0]
3: {1 Dragon Head, 1 Unicorn Horn} [4.2]
4: {3 Cockatrice Eye, 2 Dragon Head, 3 Lich Skull, 5 Unicorn Horn} [19.2]

For a total tax of 27 gp 0 sp.

From this poking around, I've started to feel like maybe one Unicorn Horn can cancel a Dragon Head, or something? I couldn't get a proper black-box optimization program working, so it was just my manual optimization at the end that got me from 32.0 down to 27.0. There is probably a bit of room for progress.

maxwell-peterson on D&D.Sci Tax Day: Adventurers and Assessments

(Haven't yet read what others wrote).

Cool setup! Haven't done one of these for a few years, and I enjoyed it a lot.

I did have a terrible time trying to get a black-box optimizer running - the hard constraints on the sums seemed to be mostly not a thing in optimizer packages? I'm interested in the thoughts of someone who knows more about black-box optimization like genetic algorithms, simulated annealing, or whatever, and if they think they'd be suitable for a problem like this.

Posting my findings in the comment below.

robo on Why Should I Assume CCP AGI is Worse Than USG AGI?

There's more variance within countries than between countries. Where did the disruptive upstart that cares about Free Software (free as in freedom, not as in beer) come from? China. Is that because China's more libertarian than the US? No, it's because there's a wide variance in both the US and China and by chance the most software-libertarian company was Chinese. Don't treat countries like point estimates.

da_peach on Power Lies Trembling: a three-book review

To sway public opinion about AI safety, let us consider the case of nuclear warfare—a domain where long-term safety became a serious institutional concern. Nuclear technology wasn’t always surrounded by protocols, safeguards, and watchdogs. In the early days, it was a raw demonstration of power: the bombs dropped on Hiroshima and Nagasaki were enough to show the sheer magnitude of destruction possible. That spectacle shocked the global conscience. It didn’t take long before nation after nation realized that this wasn't just a powerful new toy, but an existential threat. As more countries acquired nuclear capabilities, the world recognized the urgent need for checks, treaties, and oversight. What began as an arms race slowly transformed into a field of serious, respected research and diplomacy—nuclear safety became a field in its own right.

The point is: public concern only follows recognition of risk. AI safety, like nuclear safety, will only be taken seriously when people see it as more than sci-fi paranoia. For that shift to happen, we need respected institutions to champion the threat. Right now, it’s mostly academics raising the alarm. But the public—especially the media and politicians—won’t engage until the danger is demonstrated or convincingly explained. Unfortunately for the AI safety issue, evidence of AI misalignment causing significant trouble will probably mean it's too late.
Adding fuel to this fire is the fact that politicians aren't gonna campaign about AI safety if the corpos in your country don't want to & your enemies are already neck-to-neck in AI dev.

In my subjective opinion, we need the AI variant of Hiroshima. But I'm not too keen on this idea, for it is a rather dreadful thought.

Edit: I should clarify what I mean by "the AI variant of Hiroshima." I don't think a large-scale inhuman military operation is necessary (as I already said, I don't want AI warfare). What I mean instead is something that causes significant damage & makes it to newspaper headlines worldwide. Examples: strong evidence that AI swayed the presidential election one way; a gigantic economic crash caused because of a rogue AI (not the AI bubble bursting); millions of jobs being lost in a short timeframe because of one revolutionary model, which then snaps because of misalignment; etc. There are still dreadful, but at least no human lives are lost & it gets the point across that AI safety is an existential issue.

quetzal_rainbow on quetzal_rainbow's Shortform

I mostly think about alignment methods like "model-based RL which maximizes reward iff it outputs action which is provably good under our specification of good".