LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

How to Make Superbabies
GeneSmith · 2025-02-19T20:39:38.971Z · comments (15)

[link] A History of the Future, 2025-2040
L Rudolf L (LRudL) · 2025-02-17T12:03:58.355Z · comments (14)

It's been ten years. I propose HPMOR Anniversary Parties.
Screwtape · 2025-02-16T01:43:14.586Z · comments (1)

AGI Safety & Alignment @ Google DeepMind is hiring
Rohin Shah (rohinmshah) · 2025-02-17T21:11:18.970Z · comments (9)

Dear AGI,
Nathan Young · 2025-02-18T10:48:15.030Z · comments (7)

Gauging Interest for a Learning-Theoretic Agenda Mentorship Programme
Vanessa Kosoy (vanessa-kosoy) · 2025-02-16T16:24:57.654Z · comments (2)

Arbital has been imported to LessWrong
RobertM (T3t) · 2025-02-20T00:47:33.983Z · comments (4)

[link] Thermodynamic entropy = Kolmogorov complexity
Aram Ebtekar (EbTech) · 2025-02-17T05:56:06.960Z · comments (11)

How might we safely pass the buck to AI?
joshc (joshua-clymer) · 2025-02-19T17:48:32.249Z · comments (22)

Celtic Knots on Einstein Lattice
Ben (ben-lang) · 2025-02-16T15:56:06.888Z · comments (11)

Do models know when they are being evaluated?
Govind Pimpale (govind-pimpale) · 2025-02-17T23:13:22.017Z · comments (0)

Eliezer's Lost Alignment Articles / The Arbital Sequence
Ruby · 2025-02-20T00:48:10.338Z · comments (0)

Go Grok Yourself
Zvi · 2025-02-19T20:20:09.371Z · comments (1)

How accurate was my "Altered Traits" book review?
lsusr · 2025-02-18T17:00:55.584Z · comments (3)

[link] SuperBabies podcast with Gene Smith
Eneasz · 2025-02-19T19:36:49.852Z · comments (1)

Monthly Roundup #27: February 2025
Zvi · 2025-02-17T14:10:06.486Z · comments (3)

Abstract Mathematical Concepts vs. Abstractions Over Real-World Systems
Thane Ruthenis · 2025-02-18T18:04:46.717Z · comments (9)

Medical Roundup #4
Zvi · 2025-02-18T13:40:06.574Z · comments (1)

[question] What are the surviving worlds like?
KvmanThinking (avery-liu) · 2025-02-17T00:41:49.810Z · answers+comments (1)

Undergrad AI Safety Conference
JoNeedsSleep (joanna-j-1) · 2025-02-19T03:43:47.969Z · comments (0)

[link] Ascetic hedonism
dkl9 · 2025-02-17T15:56:30.267Z · comments (5)

[link] The Peeperi (unfinished) - By Katja Grace
Nathan Young · 2025-02-17T19:33:29.894Z · comments (0)

Using Prompt Evaluation to Combat Bio-Weapon Research
Stuart_Armstrong · 2025-02-19T12:39:00.491Z · comments (0)

[question] Take over my project: do computable agents plan against the universal distribution pessimistically?
Cole Wyeth (Amyr) · 2025-02-19T20:17:04.813Z · answers+comments (0)

[link] DeepSeek Made it Even Harder for US AI Companies to Ever Reach Profitability
garrison · 2025-02-19T21:02:42.879Z · comments (1)

Literature Review of Text AutoEncoders
NickyP (Nicky) · 2025-02-19T21:54:14.905Z · comments (0)

Talking to laymen about AI development
David Steel · 2025-02-17T18:42:23.289Z · comments (0)

[link] Progress links and short notes, 2025-02-17
jasoncrawford · 2025-02-17T19:18:29.422Z · comments (0)

[link] Are SAE features from the Base Model still meaningful to LLaVA?
Shan23Chen (shan-chen) · 2025-02-18T22:16:14.449Z · comments (2)

[link] When should we worry about AI power-seeking?
Joe Carlsmith (joekc) · 2025-02-19T19:44:25.062Z · comments (0)

[link] Cooperation for AI safety must transcend geopolitical interference
Matrice Jacobine · 2025-02-16T18:18:01.539Z · comments (6)

Call for Applications: XLab Summer Research Fellowship
JoNeedsSleep (joanna-j-1) · 2025-02-18T19:19:20.155Z · comments (0)

THE ARCHIVE
Jason Reid (jason-reid) · 2025-02-17T01:12:41.486Z · comments (0)

There are a lot of upcoming retreats/conferences between March and July (2025)
gergogaspar (gergo-gaspar) · 2025-02-18T09:30:30.258Z · comments (0)

AIS Berlin, events, opportunities and the flipped gameboard - Fieldbuilders Newsletter, February 2025
gergogaspar (gergo-gaspar) · 2025-02-17T14:16:31.834Z · comments (0)

[link] Metaculus Q4 AI Benchmarking: Bots Are Closing The Gap
Molly (hickman-santini) · 2025-02-19T22:42:39.055Z · comments (0)

Closed-ended questions aren't as hard as you think
electroswing · 2025-02-19T03:53:11.855Z · comments (0)

[link] Won't vs. Can't: Sandbagging-like Behavior from Claude Models
Joe Benton · 2025-02-19T20:47:06.792Z · comments (0)

[link] The Dilemma’s Dilemma
James Stephen Brown (james-brown) · 2025-02-19T23:50:47.485Z · comments (1)

A fable on AI x-risk
bgaesop · 2025-02-18T20:15:24.933Z · comments (0)

Intelligence Is Jagged
Adam Train (aetrain) · 2025-02-19T07:08:46.444Z · comments (0)

Claude 3.5 Sonnet (New)'s AGI scenario
Nathan Young · 2025-02-17T18:47:04.669Z · comments (2)

[link] Sparse Autoencoder Features for Classifications and Transferability
Shan23Chen (shan-chen) · 2025-02-18T22:14:12.994Z · comments (0)

What new x- or s-risk fieldbuilding organisations would you like to see? An EOI form. (FBB #3)
gergogaspar (gergo-gaspar) · 2025-02-17T12:39:09.196Z · comments (0)

[link] AISN #48: Utility Engineering and EnigmaEval
Corin Katzke (corin-katzke) · 2025-02-18T19:15:16.751Z · comments (0)

Permanent properties of things are a self-fulfilling prophecy
YanLyutnev (YanLutnev) · 2025-02-19T00:08:20.776Z · comments (0)

Undesirable Conclusions and Origin Adjustment
Jerdle (daniel-amdurer) · 2025-02-19T18:35:23.732Z · comments (0)

Misaligned actions and what to do with them? - A proposed framework and open problems
Shivam · 2025-02-18T00:06:31.518Z · comments (0)

arch-anarchist reading list
Peter lawless · 2025-02-16T22:47:00.273Z · comments (1)

[question] Why do we have the NATO logo?
KvmanThinking (avery-liu) · 2025-02-19T22:59:41.755Z · answers+comments (2)

next page (older posts) →

Archive

Recent comments

metacelsus on How to Make Superbabies

Unfortunately monkeys (specifically marmosets) are not cheap. To demonstrate germline transmission (the first step towards demonstrating safety in humans), Sergiy needs $4 million.

And marmosets are actually the cheapest monkey. (Also, as New World monkeys, marmosets are more distantly related to humans than rhesus or cynomolgus monkeys are.)

habryka4 on How might we safely pass the buck to AI?

Seems good!

FWIW, at least in my mind this is in some sense approximately the only and central core of the alignment problem, and so having it left unaddressed feels confusing. It feels a bit like making a post about how to make a nuclear reactor where you happen to not say anything about how to prevent the uranium from going critical, but you did spend a lot of words about the how to make the cooling towers and the color of the bikeshed next door and how to translate the hot steam into energy.

Like, it's fine, and I think it's not crazy to think there are other hard parts, but it felt quite confusing to me.

genesmith on How to Make Superbabies

The issue is that it takes a long time for PGC-like cells to develop to eggs, if you're strictly following the natural developmental trajectory.

Thanks for the clarification. I'll amend the original post.

genesmith on How to Make Superbabies

It's a fair concern. But the problem of predicting personality can be solved! We just need more data.

I also worry somewhat about brilliant psychopaths. But making your child a psychopath is not necessarily going to give them an advantage.

Also can you imagine how unpleasant raising a psychopath would be? I don't think many parents would willingly sign up for that.

david-gross on How to Make Superbabies

There are some promising but under-utilized interventions for improving personality traits / virtues in already-developed humans,* [? · GW] and a dearth of research about possible interventions for others. If we want more of that sort of thing, we might be better advised to fill in some of those gaps rather than waiting for a new technology and a new generation of megalopsychebabies.

joshc on How might we safely pass the buck to AI?

agree that it's tricky iteration and requires careful thinking about what might be going on and paranoia.

I'll share you on the post I'm writing about this before I publish. I'd guess this discussion will be more productive when I've finished it (there are some questions I want to think through regarding this and my framings aren't very crisp yet).

vinayak-pathak on Neural networks generalize because of this one weird trick

Thanks, this clarifies many things! Thanks also for linking to your very comprehensive post on generalization.

To be clear, I didn't mean to claim that VC theory explains NN generalization. It is indeed famously bad at explaining modern ML. But "models have singularities and thus number of parameters is not a good complexity measure" is not a valid criticism of VC theory. If SLT indeed helps figure out the mysteries from the "understanding deep learning..." paper then that will be amazing!

But what we'd really like to get at is an understanding of how perturbations to the true distribution lead to changes in model behavior.

Ah, I didn't realize earlier that this was the goal. Are there any theorems that use SLT to quantify out-of-distribution generalization? The SLT papers I have read so far seem to still be talking about in-distribution generalization, with the added comment that Bayesian learning/SGD is more likely to give us "simpler" models and simpler models generalize better.

ryan_greenblatt on How might we safely pass the buck to AI?

I think there is an important component of trustworthiness that you don't emphasize enough. It isn't sufficient to just rule out alignment faking, we need the AI to actually try hard to faithfully pursue our interests including on long, confusing, open-ended, and hard to check tasks. You discuss establishing this with behavioral testing, but I don't think this is trivial to establish with behavioral testing. (I happen to think this is pretty doable and easier than ruling out reasons why our tests might be misleading, but this seems nonobvious.)

Perhaps you should specifically call out that this post isn't explaining how to do this testing and this could be hard.

(This is related to Eliezer's comment [LW(p) · GW(p)] and your response to that comment.)

joshc on How might we safely pass the buck to AI?

I'd replace "controlling" with "creating" but given this change, then yes, that's what I'm proposing.

james-camacho on The Dilemma’s Dilemma

If you're going to be talking about trust in society, you should definitely take a look at Gossner's Simple Bounds on the Value of a Reputation.