LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Subskills of "Listening to Wisdom"
Raemon · 2024-12-09T03:01:18.706Z · comments (29)

Activation space interpretability may be doomed
bilalchughtai (beelal) · 2025-01-08T12:49:38.421Z · comments (31)

Maximizing Communication, not Traffic
jefftk (jkaufman) · 2025-01-05T13:00:02.280Z · comments (10)

Capital Ownership Will Not Prevent Human Disempowerment
beren · 2025-01-05T06:00:23.095Z · comments (18)

Repeal the Jones Act of 1920
Zvi · 2024-11-27T15:00:06.801Z · comments (24)

“Alignment Faking” frame is somewhat fake
Jan_Kulveit · 2024-12-20T09:51:04.664Z · comments (13)

Don’t ignore bad vibes you get from people
Kaj_Sotala · 2025-01-18T09:20:17.397Z · comments (50)

[link] A History of the Future, 2025-2040
L Rudolf L (LRudL) · 2025-02-17T12:03:58.355Z · comments (14)

AI companies are unlikely to make high-assurance safety cases if timelines are short
ryan_greenblatt · 2025-01-23T18:41:40.546Z · comments (5)

What o3 Becomes by 2028
Vladimir_Nesov · 2024-12-22T12:37:20.929Z · comments (15)

Applying traditional economic thinking to AGI: a trilemma
Steven Byrnes (steve2152) · 2025-01-13T01:23:00.397Z · comments (32)

The "Think It Faster" Exercise
Raemon · 2024-12-11T19:14:10.427Z · comments (34)

Why Don't We Just... Shoggoth+Face+Paraphraser?
Daniel Kokotajlo (daniel-kokotajlo) · 2024-11-19T20:53:52.084Z · comments (57)

Planning for Extreme AI Risks
joshc (joshua-clymer) · 2025-01-29T18:33:14.844Z · comments (4)

Passages I Highlighted in The Letters of J.R.R.Tolkien
Ivan Vendrov (ivan-vendrov) · 2024-11-25T01:47:59.071Z · comments (38)

It's been ten years. I propose HPMOR Anniversary Parties.
Screwtape · 2025-02-16T01:43:14.586Z · comments (1)

[link] China Hawks are Manufacturing an AI Arms Race
garrison · 2024-11-20T18:17:51.958Z · comments (43)

What Indicators Should We Watch to Disambiguate AGI Timelines?
snewman · 2025-01-06T19:57:43.398Z · comments (55)

[Fiction] [Comic] Effective Altruism and Rationality meet at a Secular Solstice afterparty
tandem · 2025-01-07T19:11:21.238Z · comments (5)

Anomalous Tokens in DeepSeek-V3 and r1
henry (henry-bass) · 2025-01-25T22:55:41.232Z · comments (2)

Hire (or Become) a Thinking Assistant
Raemon · 2024-12-23T03:58:42.061Z · comments (47)

Ten people on the inside
Buck · 2025-01-28T16:41:22.990Z · comments (27)

[link] Training on Documents About Reward Hacking Induces Reward Hacking
evhub · 2025-01-21T21:32:24.691Z · comments (13)

[question] Which things were you surprised to learn are not metaphors?
Eric Neyman (UnexpectedValues) · 2024-11-21T18:56:18.025Z · answers+comments (88)

Building AI Research Fleets
Ben Goldhaber (bgold) · 2025-01-12T18:23:09.682Z · comments (11)

[link] Parkinson's Law and the Ideology of Statistics
Benquo · 2025-01-04T15:49:21.247Z · comments (7)

Tell me about yourself: LLMs are aware of their learned behaviors
Martín Soto (martinsq) · 2025-01-22T00:47:15.023Z · comments (5)

[link] A computational no-coincidence principle
Eric Neyman (UnexpectedValues) · 2025-02-14T21:39:39.277Z · comments (26)

[link] The Failed Strategy of Artificial Intelligence Doomers
Ben Pace (Benito) · 2025-01-31T18:56:06.784Z · comments (73)

Gradual Disempowerment, Shell Games and Flinches
Jan_Kulveit · 2025-02-02T14:47:53.404Z · comments (35)

"The Solomonoff Prior is Malign" is a special case of a simpler argument
David Matolcsi (matolcsid) · 2024-11-17T21:32:34.711Z · comments (44)

Human takeover might be worse than AI takeover
Tom Davidson (tom-davidson-1) · 2025-01-10T16:53:27.043Z · comments (54)

[link] OpenAI's CBRN tests seem unclear
LucaRighetti (Error404Dinosaur) · 2024-11-21T17:28:30.290Z · comments (6)

[link] The Dangers of Mirrored Life
Niko_McCarty (niko-2) · 2024-12-12T20:58:32.750Z · comments (7)

Scissors Statements for President?
AnnaSalamon · 2024-11-06T10:38:21.230Z · comments (32)

The Paris AI Anti-Safety Summit
Zvi · 2025-02-12T14:00:07.383Z · comments (20)

The Dream Machine
sarahconstantin · 2024-12-05T00:00:05.796Z · comments (6)

2024 in AI predictions
jessicata (jessica.liu.taylor) · 2025-01-01T20:29:49.132Z · comments (3)

[link] Research directions Open Phil wants to fund in technical AI safety
jake_mendel · 2025-02-08T01:40:00.968Z · comments (21)

The o1 System Card Is Not About o1
Zvi · 2024-12-13T20:30:08.048Z · comments (5)

The Plan - 2024 Update
johnswentworth · 2024-12-31T13:29:53.888Z · comments (27)

Should CA, TX, OK, and LA merge into a giant swing state, just for elections?
Thomas Kwa (thomas-kwa) · 2024-11-06T23:01:48.992Z · comments (35)

AIs Will Increasingly Attempt Shenanigans
Zvi · 2024-12-16T15:20:05.652Z · comments (2)

The Big Nonprofits Post
Zvi · 2024-11-29T16:10:06.938Z · comments (10)

You should consider applying to PhDs (soon!)
bilalchughtai (beelal) · 2024-11-29T20:33:12.462Z · comments (19)

DeepSeek beats o1-preview on math, ties on coding; will release weights
Zach Stein-Perlman · 2024-11-20T23:50:26.597Z · comments (26)

Why I'm Moving from Mechanistic to Prosaic Interpretability
Daniel Tan (dtch1997) · 2024-12-30T06:35:43.417Z · comments (34)

Ablations for “Frontier Models are Capable of In-context Scheming”
AlexMeinke (Paulawurm) · 2024-12-17T23:58:19.222Z · comments (1)

Hierarchical Agency: A Missing Piece in AI Alignment
Jan_Kulveit · 2024-11-27T05:49:04.241Z · comments (20)

The Game Board has been Flipped: Now is a good time to rethink what you’re doing
LintzA (alex-lintz) · 2025-01-28T23:36:18.106Z · comments (30)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

lwlw on How to Make Superbabies

I’m sure you’ve already thought about this, but it seems like the people who would be willing and able to jump through all of the hoops necessary would likely have a higher propensity towards power-seeking and dominance. So if you don’t edit the personality as well, what was it all for besides creating a smarter god-emperor? I think that in the sane world you’ve outlined where people deliberately avoid developing AGI, an additional level of sanity would be holding off on modifying intelligence until we have the capacity to perform the personality edits to make it safe.

I can just imagine this turning into a world where the rich who are able to make their children superbabies compete with the rest of the elite over whose child will end up ruling the world.

habryka4 on Noah Birnbaum's Shortform

Someone I trust on this says:

AFAICT what's going on here is just that AISI and CHIPS are getting hit especially hard by the decision to fire probationary staff across USG, since they're new and therefore have lots of probationary staff - it's not an indication (yet) that either office is being targeted to be killed

morpheus on [deleted]

I noticed the tag posts imported from Arbital that haven't been edited on LW yet can't be found when searching those tags from the "Add Tags" button above posts. Adding ineffective edits like spaces at the end of a paragraph seems to fix that problem.

morpheus on [deleted]

I noticed the tag posts imported from Arbital that haven't been edited on LW yet can't be found when searching those tags from the "Add Tags" button above posts. Adding ineffective edits like spaces at the end of a paragraph seems to fix that problem.

yonatan-cale-1 on Yonatan Cale's Shortform

This is very cool, thanks!
1. I'm tempted to add Claude support
It isn't exactly what I'm going for. Example use cases I have in mind:
1. "Here's a list of projects I'm considering working on, and I'm adding curxes/considerations for each"
2. "Here's my new alignment research agenda" (can an AI suggest places where this research is wrong? Seems like checking this would help the Control agenda?)
3. "Here's a cost-effectiveness analysis of an org"

kvmanthinking on Why do we have the NATO logo?

Ah. Thanks! (by the way, when these questions get answered, should I take them down or leave them up for others?)

genesmith on How to Make Superbabies

Very little at the moment. Unlike intelligence and health, a lot of the variance in personality traits seems to be the result of combinations of genes rather than purely additive effects.

This is one of the few areas where AI could potentially make a big difference. You need more complex models to figure out the relationship between genes and personality.

But the actual limiting factor right now is not model complexity, but rather data. Even if you have more complex models, I don't think you're going to be able to actually train them until you have a lot more data. Probably a minimum of a few million samples.

We'd like to look into this problem at some point and make scaling law graphs like the ones we made for intelligence and disease risk but haven't had the time yet.

lwlw on How to Make Superbabies

How much do people know about the genetic components of personality traits like empathy? Editing personality traits might be almost as or even more controversial than modifying “vanity” traits. But in the sane world you sketched out this could essentially be a very trivial and simple first step of alignment. “We are about to introduce agents more capable than any humans except for extreme outliers: let’s make them nice.” Also, curing personality disorders like NPD and BPD would do a lot of good for subjective wellbeing.

I guess I’m just thinking of a failure mode where we create superbabies who solve task-alignment and then control the world. The people running the world might be smarter than the current candidates for god-emperor, but we’re still in a god-emperor world. This also seems like the part of the plan most likely to fail. The people who would pursue making their children superbabies might be disinclined towards making their children more caring.

morpheus on Static Place AI Makes AGI Redundant: Multiversal AI Alignment & Rational Utopia

I didn't downvote, but my impression is the post seems to hand-wave away a lot of problems and gives the impression you haven't actually thought clearly and in detail about whether the ideas you propose here are feasible.

Some people have been thinking for quite some time now that an AI that wants to be changed would be great, but that it's not that easy to create one, so how is your proposal different? Maybe checkout the corrigibility tag [? · GW]. Figuring out which desiderata are actually feasible to implement and how is the hard part. Same goes for your Matroshka bunkers. What useful work are you getting out of your 100% safe Matroshka bunkers? After you thought about that for 5 minutes+, maybe checkout the AI boxing tag [? · GW] and the AI oracle tag [? · GW]. Maybe there is something to the reversibility idea ¯\_(ツ)_/¯.

Also using so many tags gives a bad impression ("AI Timelines"? "Tiling Agents"? "Infinities in Ethics"?). Read the description of the tags.

habryka4 on Arbital has been imported to LessWrong

The central problem of any wiki system is ^[1]"what edits do you accept to a wiki page?". The lenses system is trying to provide a better answer to that question.

My default experience on e.g. Wikipedia when I am on pages where I am highly familiar with the domain is "man, I could write a much better page". But writing a whole better page is a lot of effort, and the default consequence of rewriting the page is that the editor who wrote the previous page advocates for your edits to be reverted, because they are attached to their version of the page.

With lenses, if you want to suggest large changes to a wiki page, your default action is now "write a new lens". This leaves the work of the previous authors intact, while still giving your now page the potential for substantial readership. Lenses are sorted in order of how many people like them. If you think you can write a better lens, you can make a new lens, and if it's better, it can replace the original lens after it got traction.

More broadly, wikis suffer a lot from everything feeling like it is written by a committee. Lenses enable more individual authorship, while still trying to have some collective iteration on canonicity and structure of the wiki.

^{^}
Well, after you have solved the problem of "does anyone care about this wiki?"