LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

You should consider applying to PhDs (soon!)
bilalchughtai (beelal) · 2024-11-29T20:33:12.462Z · comments (19)

DeepSeek beats o1-preview on math, ties on coding; will release weights
Zach Stein-Perlman · 2024-11-20T23:50:26.597Z · comments (23)

Sorry for the downtime, looks like we got DDosd
habryka (habryka4) · 2024-12-02T04:14:30.209Z · comments (13)

The Big Nonprofits Post
Zvi · 2024-11-29T16:10:06.938Z · comments (10)

[link] Announcing turntrout.com, my new digital home
TurnTrout · 2024-11-17T17:42:08.164Z · comments (24)

Ablations for “Frontier Models are Capable of In-context Scheming”
AlexMeinke (Paulawurm) · 2024-12-17T23:58:19.222Z · comments (1)

Hierarchical Agency: A Missing Piece in AI Alignment
Jan_Kulveit · 2024-11-27T05:49:04.241Z · comments (20)

A shortcoming of concrete demonstrations as AGI risk advocacy
Steven Byrnes (steve2152) · 2024-12-11T16:48:41.602Z · comments (27)

I turned decision theory problems into memes about trolleys
Tapatakt · 2024-10-30T20:13:29.589Z · comments (20)

AIs Will Increasingly Attempt Shenanigans
Zvi · 2024-12-16T15:20:05.652Z · comments (2)

LLMs can learn about themselves by introspection
Felix J Binder (fjb) · 2024-10-18T16:12:51.231Z · comments (38)

[link] How to replicate and extend our alignment faking demo
Fabien Roger (Fabien) · 2024-12-19T21:44:13.059Z · comments (0)

Bigger Livers?
sarahconstantin · 2024-11-08T21:50:09.814Z · comments (13)

MIRI’s 2024 End-of-Year Update
Rob Bensinger (RobbBB) · 2024-12-03T04:33:47.499Z · comments (2)

[link] Seven lessons I didn't learn from election day
Eric Neyman (UnexpectedValues) · 2024-11-14T18:39:07.053Z · comments (33)

The "Think It Faster" Exercise
Raemon · 2024-12-11T19:14:10.427Z · comments (13)

[link] Anthropic: Three Sketches of ASL-4 Safety Case Components
Zach Stein-Perlman · 2024-11-06T16:00:06.940Z · comments (33)

[link] Finishing The SB-1047 Documentary In 6 Weeks
Michaël Trazzi (mtrazzi) · 2024-10-28T20:17:47.465Z · comments (5)

Takes on "Alignment Faking in Large Language Models"
Joe Carlsmith (joekc) · 2024-12-18T18:22:34.059Z · comments (8)

[link] Sabotage Evaluations for Frontier Models
David Duvenaud (david-duvenaud) · 2024-10-18T22:33:14.320Z · comments (55)

Science advances one funeral at a time
Cameron Berg (cameron-berg) · 2024-11-01T23:06:19.381Z · comments (9)

Hire (or become) a Thinking Assistant / Body Double
Raemon · 2024-12-23T03:58:42.061Z · comments (16)

2024 Unofficial LessWrong Census/Survey
Screwtape · 2024-12-02T05:30:53.019Z · comments (42)

Catastrophic sabotage as a major threat model for human-level AI systems
evhub · 2024-10-22T20:57:11.395Z · comments (11)

Zvi’s Thoughts on His 2nd Round of SFF
Zvi · 2024-11-20T13:40:08.092Z · comments (2)

LLMs Look Increasingly Like General Reasoners
eggsyntax · 2024-11-08T23:47:28.886Z · comments (45)

A very strange probability paradox
notfnofn · 2024-11-22T14:01:36.587Z · comments (26)

Anvil Problems
Screwtape · 2024-11-13T22:57:41.974Z · comments (13)

The nihilism of NeurIPS
charlieoneill (kingchucky211) · 2024-12-20T23:58:11.858Z · comments (4)

Three Notions of "Power"
johnswentworth · 2024-10-30T06:10:08.326Z · comments (44)

(Salt) Water Gargling as an Antiviral
Elizabeth (pktechgirl) · 2024-11-22T18:00:02.765Z · comments (6)

Parable of the vanilla ice cream curse (and how it would prevent a car from starting!)
Mati_Roy (MathieuRoy) · 2024-12-08T06:57:45.783Z · comments (21)

5 homegrown EA projects, seeking small donors
Austin Chen (austin-chen) · 2024-10-28T23:24:25.745Z · comments (4)

Circling as practice for “just be yourself”
Kaj_Sotala · 2024-12-16T07:40:04.482Z · comments (5)

Self-prediction acts as an emergent regularizer
Cameron Berg (cameron-berg) · 2024-10-23T22:27:03.664Z · comments (4)

[link] Is Deep Learning Actually Hitting a Wall? Evaluating Ilya Sutskever's Recent Claims
garrison · 2024-11-13T17:00:01.005Z · comments (14)

JargonBot Beta Test
Raemon · 2024-11-01T01:05:26.552Z · comments (55)

Deep Causal Transcoding: A Framework for Mechanistically Eliciting Latent Behaviors in Language Models
Andrew Mack (andrew-mack) · 2024-12-03T21:19:42.333Z · comments (7)

[question] What are the good rationality films?
Ben Pace (Benito) · 2024-11-20T06:04:56.757Z · answers+comments (53)

AI #92: Behind the Curve
Zvi · 2024-11-28T14:40:05.448Z · comments (7)

Remap your caps lock key
bilalchughtai (beelal) · 2024-12-15T14:03:33.623Z · comments (16)

Testing which LLM architectures can do hidden serial reasoning
Filip Sondej · 2024-12-16T13:48:34.204Z · comments (9)

[link] Gwern Branwen interview on Dwarkesh Patel’s podcast: “How an Anonymous Researcher Predicted AI's Trajectory”
Said Achmiz (SaidAchmiz) · 2024-11-14T23:53:34.922Z · comments (0)

[link] Should you be worried about H5N1?
gw · 2024-12-05T21:11:06.996Z · comments (2)

Graceful Degradation
Screwtape · 2024-11-05T23:57:53.362Z · comments (8)

[link] Best-of-N Jailbreaking
John Hughes (john-hughes) · 2024-12-14T04:58:48.974Z · comments (6)

Should there be just one western AGI project?
rosehadshar · 2024-12-03T10:11:17.914Z · comments (72)

[link] Gwern: Why So Few Matt Levines?
kave · 2024-10-29T01:07:27.564Z · comments (10)

Bitter lessons about lucid dreaming
avturchin · 2024-10-16T21:27:04.725Z · comments (62)

LLM chatbots have ~half of the kinds of "consciousness" that humans believe in. Humans should avoid going crazy about that.
Andrew_Critch · 2024-11-22T03:26:11.681Z · comments (53)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

awenonian on When Is Insurance Worth It?

Whether or not to get insurance should have nothing to do with what makes one sleep – again, it is a mathematical decision with a correct answer.

Don't be overly naive consequentialist about this. "Nothing" is an overstatement.

Peace of mind can absolutely be one of the things you are purchasing with an insurance contract. If your Kelly calculation says that motorcycle insurance is worth $899 a month, and costs $900 a month, but you'll spend time worrying about not being insured if you don't buy it, and won't if you do, I fully expect that is worth more than $1 a month.

But do be actual consequentialist about it. If the value of the insurance is more like $10, but the cost is $900, I doubt peace of mind about this one thing is worth $890 a month.

sharmake-farah on johnswentworth's Shortform

Actually, I've changed my mind, in that the reliability issue probably does need at least non-trivial theoretical insights to make AIs work.

kqr on When Is Insurance Worth It?

Your formula is only valid if utility = log($).

This is a synonym for "if money compounds and you want more of it at lower risk". So in a sense, yes, but it seems confusing to phrase it in terms of utility as if the choice was arbitrary and not determined by other constraints.

thane-ruthenis on johnswentworth's Shortform

I do think that something like dumb scaling can mostly just work

The exact degree of "mostly" is load-bearing here. You'd mentioned [LW(p) · GW(p)] provisions for error-correction before. But are the necessary provisions something simple, such that the most blatantly obvious wrappers/prompt-engineering works, or do we need to derive some additional nontrivial theoretical insights to correctly implement them?

Last I checked, AutoGPT-like stuff has mostly failed, so I'm inclined to think it's closer to the latter.

charlie-steiner on What are the main arguments against AGI?

I think the history of things being predicted Real Soon Now is one of the main counterarguments to short timelines. It just seemed Obvious that we were getting flying cars, or fusion power, or self-driving cars, or video-phones, for years, before in some cases we eventually did get those things, and in other cases maybe we'll never get those things because technology just followed a different path than we expected.

Like, maybe the "we'll just merge with the machines" people will turn out to actually be right. I don't believe it. But it could happen, and there are plenty of similar things that "could happen" that eventually add up to a nontrivial chunk of probability.

charlie-steiner on Why is neuron count of human brain relevant to AI timelines?

In the strongest sense, neither the human brain analogy nor the evolution analogy really apply to AI. They only apply in a weaker sense where you are aware you're working with analogy, and should hopefully be tracking some more detailed model behind the scenes.

The best argument to consider human development a stronger analogy than evolutionary history is that present-day AIs work more like human brains than they do like evolution. See e.g. papers finding that you can use a linear function to translate some concepts between brain scans and internal layers in a LLM, or the extremely close correspondence between ConvNet feature and neurons in the visual cortex. In contrast, I predict it's extremely unlikely that you'll be able to find a nontrivial correspondence between the internals of AI and evolutionary history or the trajectory of ecosystems or similar.

Of course, just because they work more like human brains after training doesn't necessarily mean they learn similarly - and they don't learn similarly! In some ways AI's better (backpropagation is great, but it's basically impossible to implement in a brain), in other ways AI's worse (biological neurons are way smarter than artificial 'neurons'). Don't take the analogy too literally. But most of the human brain (the neocortex) already learns its 'weights' from experience over a human lifetime, in a way that's not all that different from self-supervised learning if you squint.

sharmake-farah on johnswentworth's Shortform

of the amazing things they do should be considered surprising facts about how far this trick can scale; not surprising facts about how close we are to AGI.

I agree that the trick scaling as far as it has is surprising, but I'd disagree with the claim that this doesn't bear on AGI.

I do think that something like dumb scaling can mostly just work, and I think the main takeaway I take from AI progress is that there will not a be a clear resolution to when AGI happens, as the first AIs to automate AI research will have very different skill profiles from humans, and most importantly we need to disentangle capabilities in a way we usually don't for humans.

I agree with faul sname here:

we should stop asking when we will get AGI and start asking about when we will see each of the phenomena that we are using AGI as a proxy for".

kqr on When Is Insurance Worth It?

The insurance company does not have logarithmic discounting on wealth, it will not be using Kelly to allocate bets. From the perspective of the company, it is purely dependent on the direct profitability of the bet - premium minus expected payout and overheads.

Not true. Risk management is a huge part of many types of insurance, and that is about finding the appropriate exposure to a risk -- and this exposure is found through the Kelly criterion.

This matters less in some types of insurance (e.g. life, which has stable long-term rates and rare catastrophic events) but significantly in other types (liability, natural disaster-linked.)

This is only about maximising profit for a given level of risk, it has nothing to do with specific shapes of utility functions.

seth-herd on Orienting to 3 year AGI timelines

This is a good point. Nationalization is hard and complex, and it would probably slow progress - and the current administration would be against it on general principles, as you say.

But I think people are underestimating the government's flexibility and willingness to exert control when things get weird and dangerous. Governments typically do just that. Even Soft Nationalization: How the US Government Will Control AI Labs [LW · GW] underestimates this; perhaps this would happen in long timelines, but I think there are more direct but still easy routes to control when things heat up and the bright boys in national security realize what's going on.

I expect a "softer nationalization" of the government just asking politely to be included in deliberations among org leadership. Existing emergency act procedures very likely apply as soon as you take AGIs security implications seriously. They don't have to nationalize in any strong sense to exert control over the technology. Anyone being asked politely by the NSA to do something they could legally demand would be wise to comply, or at least appear to comply.

seth-herd on Orienting to 3 year AGI timelines

That is very likely what "safe" means. Instruction-following AGI is easier and more likely than value aligned AGI [LW · GW]. It seems very likely to be the default alignment goal as soon as someone thinks seriously about what they want their AGI aligned to.

As for whether it's actually good for most people: it depends entirely on who in the NSA controls it. There are very probably both good (ethically typical) and bad (sociopathic/sadistic) people there.

I have a whole draft speculating on which people could be trusted to control the world by controlling an AGI as it becomes ASI; I think it's between 90 and 99% of people who have a "positive empathy-sadism balance". But I'm not at all sure; it depends on who they're surrounded by and the circumstances. Being in conflict with other AGI wielders gives lots more room for negative emotions to dominate. And it could be bad for most people even if it's good in the much longer run.