LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

A collection of approaches to confronting doom, and my thoughts on them
Ruby · 2025-04-06T02:11:31.271Z · comments (18)

Prioritizing threats for AI control
ryan_greenblatt · 2025-03-19T17:09:45.044Z · comments (2)

Vestigial reasoning in RL
Caleb Biddulph (caleb-biddulph) · 2025-04-13T15:40:11.954Z · comments (7)

Reactions to METR task length paper are insane
Cole Wyeth (Amyr) · 2025-04-10T17:13:36.428Z · comments (41)

23andMe potentially for sale for <$50M
lemonhope (lcmgcd) · 2025-03-25T04:34:28.388Z · comments (2)

Tormenting Gemini 2.5 with the [[[]]][][[]] Puzzle
Czynski (JacobKopczynski) · 2025-03-29T02:51:29.786Z · comments (36)

[link] College Advice For People Like Me
henryj · 2025-04-12T14:36:46.643Z · comments (5)

Youth Lockout
Xavi CF (xavi-cf) · 2025-04-11T15:05:54.441Z · comments (6)

Try training token-level probes
StefanHex (Stefan42) · 2025-04-14T11:56:23.191Z · comments (4)

OpenAI #13: Altman at TED and OpenAI Cutting Corners on Safety Testing
Zvi · 2025-04-15T15:30:02.518Z · comments (3)

I changed my mind about orca intelligence
Towards_Keeperhood (Simon Skade) · 2025-03-18T10:15:29.860Z · comments (24)

[link] The Russell Conjugation Illuminator
TimmyM (timmym) · 2025-04-17T19:33:06.924Z · comments (14)

Equations Mean Things
abstractapplic · 2025-03-19T08:16:35.312Z · comments (10)

On (Not) Feeling the AGI
Zvi · 2025-03-25T14:30:02.215Z · comments (25)

[link] American College Admissions Doesn't Need to Be So Competitive
Arjun Panickssery (arjun-panickssery) · 2025-04-07T17:35:26.791Z · comments (18)

Silly Time
jefftk (jkaufman) · 2025-03-21T12:30:08.560Z · comments (2)

[question] Why do many people who care about AI Safety not clearly endorse PauseAI?
humnrdble · 2025-03-30T18:06:32.426Z · answers+comments (41)

Tabula Bio: towards a future free of disease (& looking for collaborators)
mpoon (michael-poon) · 2025-03-23T16:30:15.523Z · comments (15)

The first AI war will be in your computer
Viliam · 2025-04-08T09:28:53.191Z · comments (10)

ALLFED emergency appeal: Help us raise $800,000 to avoid cutting half of programs
denkenberger · 2025-04-16T21:47:40.687Z · comments (8)

AI #108: Straight Line on a Graph
Zvi · 2025-03-20T13:50:00.983Z · comments (5)

An Advent of Thought
Kaarel (kh) · 2025-03-17T14:21:08.765Z · comments (8)

Paper
dynomight · 2025-04-11T12:20:04.200Z · comments (12)

[link] Sentinel's Global Risks Weekly Roundup #15/2025: Tariff yoyo, OpenAI slashing safety testing, Iran nuclear programme negotiations, 1K H5N1 confirmed herd infections.
NunoSempere (Radamantis) · 2025-04-14T19:11:20.977Z · comments (0)

AI #109: Google Fails Marketing Forever
Zvi · 2025-03-27T14:50:01.825Z · comments (12)

[link] Automated Researchers Can Subtly Sandbag
gasteigerjo · 2025-03-26T19:13:26.879Z · comments (0)

Follow me on TikTok
lsusr · 2025-04-01T08:22:29.521Z · comments (8)

A Dissent on Honesty
eva_ · 2025-04-15T02:43:44.163Z · comments (42)

Handling schemers if shutdown is not an option
Buck · 2025-04-18T14:39:18.609Z · comments (0)

An overview of control measures
ryan_greenblatt · 2025-03-24T23:16:49.400Z · comments (0)

Analyzing long agent transcripts (Docent)
jsteinhardt · 2025-03-24T20:49:54.472Z · comments (2)

[link] The case for AGI by 2030
Benjamin_Todd · 2025-04-09T20:35:55.167Z · comments (6)

SHIFT relies on token-level features to de-bias Bias in Bios probes
Tim Hua · 2025-03-19T21:29:15.974Z · comments (2)

D&D.Sci Tax Day: Adventurers and Assessments
aphyer · 2025-04-15T23:43:14.733Z · comments (8)

[link] Map of all 40 copyright suits v. AI in U.S.
Remmelt (remmelt-ellen) · 2025-03-26T07:57:58.976Z · comments (3)

We need (a lot) more rogue agent honeypots
Ozyrus · 2025-03-23T22:24:52.785Z · comments (12)

They Took MY Job?
Zvi · 2025-03-21T13:30:38.507Z · comments (4)

LessOnline 2025: Early Bird Tickets On Sale
Ben Pace (Benito) · 2025-03-18T00:22:02.653Z · comments (4)

Meditation and Reduced Sleep Need
niplav · 2025-04-04T14:42:54.792Z · comments (8)

Scaffolding Skills
Screwtape · 2025-04-18T17:39:25.634Z · comments (1)

[link] Existing Safety Frameworks Imply Unreasonable Confidence
Joe Rogero · 2025-04-10T16:31:50.240Z · comments (1)

[link] Three Types of Intelligence Explosion
rosehadshar · 2025-03-17T14:47:46.696Z · comments (8)

[link] Forecasting time to automated superhuman coders [AI 2027 Timelines Forecast]
elifland · 2025-04-10T23:10:23.063Z · comments (0)

The Rise of Hyperpalatability
Jack (jack-3) · 2025-04-02T20:18:04.407Z · comments (10)

Can SAE steering reveal sandbagging?
jordine · 2025-04-15T12:33:41.264Z · comments (3)

Boots theory and Sybil Ramkin
philh · 2025-03-18T22:10:08.855Z · comments (17)

Call for Collaboration: Renormalization for AI safety
Lauren Greenspan (LaurenGreenspan) · 2025-03-31T21:01:56.500Z · comments (0)

Why Are The Human Sciences Hard? Two New Hypotheses
Aydin Mohseni (aydin-mohseni) · 2025-03-18T15:45:52.239Z · comments (14)

Avoid the Counterargument Collapse
marknm · 2025-03-26T03:19:58.655Z · comments (3)

More Fun With GPT-4o Image Generation
Zvi · 2025-04-03T02:10:02.317Z · comments (3)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

tailcalled on johnswentworth's Shortform

After all, if ‘those things are helpful’ wasn’t a learnable pattern, then evolution would not have discovered and exploited that pattern!

I've grown undecided about whether to consider evolution a form of intelligence-powered consequentialism because in certain ways it's much more powerful than individual intelligence (whether natural or artificial).

Individual intelligence mostly focuses on information that can be made use of over a very short time/space-scale. For instance an autoregressive model relates the immediate future to the immediate past. Meanwhile, evolution doesn't meaningfully register anything shorter than the reproductive cycle, and is clearly capable of registering things across the entire lifespan and arguably longer than that (like, if you set your children up in an advantageous situation, then that continues paying fitness dividends even after you die).

Of course this is somewhat counterbalanced by the fact that evolution has much lower information bandwidth. Though from what I understand, people also massively underestimate evolution's information bandwidth due to using an easy approximation (independent Bernoulli genotypes, linear short-tailed genotype-to-phenotype relationships and thus Gaussian phenotypes, quadratic fitness with independence between organisms). Whereas if you have a large number of different niches, then within each niche you can have the ordinary speed of evolution, and if you then have some sort of mixture niche, that niche can draw in organisms from each of the other niches and thus massively increase its genetic variance, and then since the speed of evolution is proportional to genetic variance, that makes this shared niche evolve way faster than normally. And if organisms then pass from the mixture niche out into the specialized niches, they can benefit from the fast evolution too.

(Mental picture to have in mind: we might distinguish niches like hunter, fisher, forager, farmer, herbalist, spinner, potter, bard, bandit, carpenter, trader, king, warlord (distinct from king in that kings gain power through expanding their family while warlords gain power by sniping the king off a kingdom), concubine, bureaucrat, ... . Each of them used to be evolving individually, but also genes flowed between them in various ways. Though I suspect this is undercounting the number of niches because there's also often subniches.)

And then obviously beyond these points, individual intelligence and evolution focus on different things - what's happening recently vs what's happened deep in the past. Neither are perfect; society has changed a lot, which renders what's happened deep in the past less relevant than it could have been, but at the same time what's happening recently (I argue) intrinsically struggles with rare, powerful factors.

If some other evolved aspects of the brain and body are helpful, then {intelligence, understanding, consequentialism} can likewise discover that they are helpful, and build them.
If the number of such aspects is dozens or hundreds or thousands, then whatever, {intelligence, understanding, consequentialism} can still get to work systematically discovering them all. The recipe for a human is not infinitely complex.

Part of the trouble is, if you just study the organism in isolation, you just get some genetic or phenotypic properties. You don't have any good way of knowing which of these are the important ones or not.

You can try developing a model of all the different relevant exogenous factors. But as I insist, a lot of them will be too rare to be practical to memorize. (Consider all the crazy things you hear people who make self-driving cars need to do to handle the long tail, and then consider that self-driving cars are much easier than many other tasks, with the main difficult part being the high energies involved in driving cars near people.)

The main theoretical hope is that one could use some clever algorithm to automatically sort of aggregate "small-scale" understanding (like an autoregressive convolutional model to predict next time given previous time) into "large-scale" understanding (being able to understand how a system could act extreme, by learning how it acts normally). But I've studied a bunch of different approaches for that, and ultimately it doesn't really seem feasible. (Typically the small-scale understanding learned is only going to be valid near the regime that it was originally observed within, and also the methods to aggregate small-scale behavior into large-scale behavior either rely on excessively nice properties or basically require you to already know what the extreme behaviors would be.)

If durability and strength are helpful, then {intelligence, understanding, consequentialism} can discover that durability and strength are helpful, and then build durability and strength.
Even if “the exact ways in which durability and strength will be helpful” does not constitute a learnable pattern, “durability and strength will be helpful” is nevertheless a (higher-level) learnable pattern.

First, I want to emphasize that durability and strength are near the furthest towards the easy side because e.g. durability is a common property seen in a lot of objects, and the benefits of durability can be seen relatively immediately and reasoned about locally. I brought them up to dispute the notion that we are guaranteed a sufficiently homogenous environment because otherwise intelligence couldn't develop.

Another complication is, you gotta consider that e.g. being cheap is also frequently useful, especially in the sort of helpful/assistant-based role that current AIs typically take. This trades off against agency because profit-maximizing companies don't want money tied up into durability or strength that you're not typically using. (People, meanwhile, might want durability or strength because they find it cool, sexy or excellent - and as a consequence, those people would then gain more agency.)

Also, I do get the impression you are overestimating the feasibility of "“durability and strength will be helpful” is nevertheless a (higher-level) learnable pattern". I can see some methods where maybe this would be robustly learnable, and I can see some regimes where even current methods would learn it, but considering its simplicity, it's relatively far from falling naturally out of the methods.

One complication here is, currently AI is ~never designing mechanical things, which makes it somewhat harder to talk about.

(I should maybe write more but it's past midnight and also I guess I wonder how you'd respond to this.)

ryan_greenblatt on AI 2027: What Superintelligence Looks Like

I wouldn't be surprised if 3-5% of questions were mislabeled or impossible to answer, but 25-50%? You're basically saying that HLE is worthless. I'm curious why.

Various people looked at randomly selected questions and found similar numbers.

(I don't think the dataset is worthless, I think if you filtered down to the best 25-50% of questions it would be a reasonable dataset with acceptable error rate.)

michaeldickens on Why Should I Assume CCP AGI is Worse Than USG AGI?

I strongly suspect that a Trump-controlled AGI would not respect democracy.
I strongly suspect that an Altman-controlled AGI would not respect democracy.
I have my doubts about the other heads of AI companies.

aidan-o-gara on aog's Shortform

Shoutout to Epoch for having its own intellectual culture.

Views on AGI seem suspiciously correlated to me, as if many people's views are more determined by diffusion through social networks and popular writing, rather than independent reasoning. This isn't unique to AGI. Most individual people are not capable of coming up with useful worldviews on their own. Often, the development of interesting, coherent, novel worldviews benefits from an intellectual scene.

What's an intellectual scene? It's not just an idea. Usually it has a set of complementary ideas, each of which make more sense with the others in place. Often there’s a small number of key thinkers who come up with many new ideas, and a broader group of people who agree with the ideas and follow their implied call to action. Scenes benefit from shared physical and online spaces, though they can also exist in social networks without a central hub. Members of a scene are shielded from pressure to defer to others who do not share their background assumptions, and therefore feel freer to come up with new ideas that would be unusual to outsiders, but make sense within the scene's shared intellectual framework. These conditions seem to raise the likelihood of novel intellectual progress.

There are many examples of intellectual scenes within AI risk, at varying levels of granularity and cohesion. I've been impressed with Davidad recently for putting forth a set of complementary ideas around Safeguarded AI and FlexHEGs, and creating opportunities for people who agree with his ideas to work on them. Perhaps the most influential scenes within AI risk are the MIRI / LessWrong / Conjecture / Control AI / Pause AI cluster, united by high p(doom) and focus on pausing or stopping AI development, and the Constellation / Redwood / METR / Anthropic cluster, focused on prosaic technical safety techniques and working with AI labs to make the best of the current default trajectory. (Though by saying these clusters have some shared ideas / influences / spaces, I don't mean to deny the fact that most people within those clusters disagree on many important questions.) Rationalism and effective altruism are their own scenes, as are the conservative legal movement, social justice, new atheism, progress studies, neoreaction, and neoliberalism.

Epoch has its own scene, with a distinct set of thinkers, beliefs, and implied calls to action. Matthew Barnett has written the most about these ideas publicly, so I'd encourage you to read his posts on these topics, though my understanding is that many of these ideas were developed with Tamay and Ege. Key ideas include long timelines, slow takeoff, optimism about alignment, concerns about overregulation, concerns about hawkishness towards China, advocating the likelihood of AI sentience and desirability of AI rights, debating the desirability of different futures, and so on. These ideas motivate much of Epoch's work, as well as Mechanize. Importantly, the people in this scene don't seem to mind much that many others (including me) disagree with them.

I'd like to see more intellectual scenes that seriously think about AGI and its implications. There are surely holes in our existing frameworks, and it can be hard for people operating within them to spot. Creating new spaces with different sets of shared assumptions seems like it could help.

expertium on AI 2027: What Superintelligence Looks Like

they put substantial probability on the trend being superexponential

I think that's too speculative.

I also think that around 25-50% of the questions are impossible or mislabeled.

I wouldn't be surprised if 3-5% of questions were mislabeled or impossible to answer, but 25-50%? You're basically saying that HLE is worthless. I'm curious why. I mean, I don't know much about the people who had to sift through all of the submissions, but I'd be surprised if they failed that badly. Plus, there was a "bug bounty" aimed at improving the quality of the dataset.

TBC, my median to superhuman coder is more like 2031.

Guess I'm a pessimist then, mine is more like 2034.

gwern on jacquesthibs's Shortform

I think it's a little more concerning that Dwarkesh has invested in this startup:

Mechanize is backed by investments from Nat Friedman and Daniel Gross, Patrick Collison, Dwarkesh Patel, Jeff Dean, Sholto Douglas, and Marcus Abramovitch.

And I do not see any disclosure of this in either the Youtube or the transcript.

mattj on Why Should I Assume CCP AGI is Worse Than USG AGI?

We don’t want an ASI to be ”democratic”. We want it to be ”moral”. Many people in the West conflate the two words thinking that democratic and moral is the same thing but it is not. Democracy is a certain system of organizing a state. Morality is how people and (in the future) an ASI behave towards one another.

There are no obvious reasons why an authocratic state would care more or less about a future ASI being immoral, but an argument can be made that autocratic states will be more cautious and put more restrictions on the development of an ASI because autocrats usually fear any kind of opposition and an ASI could be a powerful adversary of itself or in the hands of powerful competitors.

martinkunev on Why Should I Assume CCP AGI is Worse Than USG AGI?

To add to the discussion, my impression is that many people in the US believe they have some moral superiority or know what is good for other people. The whole "we need a manhattan project for AI" discourse is reminiscent of calling for global domination. Also, doing things for the public good is controversial in the US as it can infringe on individual freedom.

This makes me really uncertain as to which AGI would be better (assuming somebody controls it).

aynonymousprsn123 on The Potential Impossibility of Subjective Death

This sounds like another crazy thing that the logic says is right but is probably not right, but I don't know why.

ryan_greenblatt on AI 2027: What Superintelligence Looks Like

Isn't it kinda unreasonable to put 10% on superhuman coder in a year if current AIs have a 15 nanosecond time horizon? TBC, it seems fine IMO if the model just isn't very good at predicting the 10th/90th percentile, especially wiht extreme hyperparameters.

~~I also don't know how they ran this, I tried looking for model code and I couldn't find it.~~ (Edit: found the code.)