LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

(The) Lightcone is nothing without its people: LW + Lighthaven's big fundraiser
habryka (habryka4) · 2024-11-30T02:55:16.077Z · comments (238)

OpenAI Email Archives (from Musk v. Altman and OpenAI blog)
habryka (habryka4) · 2024-11-16T06:38:03.937Z · comments (80)

Alignment Faking in Large Language Models
ryan_greenblatt · 2024-12-18T17:19:06.665Z · comments (53)

The hostile telepaths problem
Valentine · 2024-10-27T15:26:53.610Z · comments (85)

[link] Survival without dignity
L Rudolf L (LRudL) · 2024-11-04T02:29:38.758Z · comments (29)

[link] Biological risk from the mirror world
jasoncrawford · 2024-12-12T19:07:06.305Z · comments (36)

[link] Review: Planecrash
L Rudolf L (LRudL) · 2024-12-27T14:18:33.611Z · comments (39)

[link] I got dysentery so you don’t have to
eukaryote · 2024-10-22T04:55:58.422Z · comments (4)

What Goes Without Saying
sarahconstantin · 2024-12-20T18:00:06.363Z · comments (19)

The Field of AI Alignment: A Postmortem, and What To Do About It
johnswentworth · 2024-12-26T18:48:07.614Z · comments (152)

What’s the short timeline plan?
Marius Hobbhahn (marius-hobbhahn) · 2025-01-02T14:59:20.026Z · comments (37)

The Online Sports Gambling Experiment Has Failed
Zvi · 2024-11-11T14:30:04.371Z · comments (56)

[link] By default, capital will matter more than ever after AGI
L Rudolf L (LRudL) · 2024-12-28T17:52:58.358Z · comments (94)

Orienting to 3 year AGI timelines
Nikola Jurkovic (nikolaisalreadytaken) · 2024-12-22T01:15:11.401Z · comments (42)

You are not too "irrational" to know your preferences.
DaystarEld · 2024-11-26T15:01:42.996Z · comments (50)

Ayn Rand’s model of “living money”; and an upside of burnout
AnnaSalamon · 2024-11-16T02:59:07.368Z · comments (58)

[link] Understanding Shapley Values with Venn Diagrams
Carson L · 2024-12-06T21:56:43.960Z · comments (34)

[link] What TMS is like
Sable · 2024-10-31T00:44:22.612Z · comments (23)

Frontier Models are Capable of In-context Scheming
Marius Hobbhahn (marius-hobbhahn) · 2024-12-05T22:11:17.320Z · comments (24)

The Hopium Wars: the AGI Entente Delusion
Max Tegmark (MaxTegmark) · 2024-10-13T17:00:29.033Z · comments (55)

Making a conservative case for alignment
Cameron Berg (cameron-berg) · 2024-11-15T18:55:40.864Z · comments (68)

[link] The Compendium, A full argument about extinction risk from AGI
adamShimi · 2024-10-31T12:01:51.714Z · comments (52)

Communications in Hard Mode (My new job at MIRI)
tanagrabeast · 2024-12-13T20:13:44.825Z · comments (25)

Information vs Assurance
johnswentworth · 2024-10-20T23:16:25.762Z · comments (17)

My motivation and theory of change for working in AI healthtech
Andrew_Critch · 2024-10-12T00:36:30.925Z · comments (37)

[link] When Is Insurance Worth It?
kqr · 2024-12-19T19:07:32.573Z · comments (65)

[link] Overcoming Bias Anthology
Arjun Panickssery (arjun-panickssery) · 2024-10-20T02:01:23.463Z · comments (14)

Shallow review of technical AI safety, 2024
technicalities · 2024-12-29T12:01:14.724Z · comments (31)

The Median Researcher Problem
johnswentworth · 2024-11-02T20:16:11.341Z · comments (71)

o1 is a bad idea
abramdemski · 2024-11-11T21:20:24.892Z · comments (38)

The Summoned Heroine's Prediction Markets Keep Providing Financial Services To The Demon King!
abstractapplic · 2024-10-26T12:34:51.059Z · comments (16)

Neutrality
sarahconstantin · 2024-11-13T23:10:05.469Z · comments (27)

Current safety training techniques do not fully transfer to the agent setting
Simon Lermen (dalasnoin) · 2024-11-03T19:24:51.537Z · comments (8)

o3
Zach Stein-Perlman · 2024-12-20T18:30:29.448Z · comments (155)

[link] o1: A Technical Primer
Jesse Hoogland (jhoogland) · 2024-12-09T19:09:12.413Z · comments (17)

[link] Gradient Routing: Masking Gradients to Localize Computation in Neural Networks
cloud · 2024-12-06T22:19:26.717Z · comments (12)

[link] Arithmetic is an underrated world-modeling technology
dynomight · 2024-10-17T14:00:22.475Z · comments (33)

"It's a 10% chance which I did 10 times, so it should be 100%"
egor.timatkov · 2024-11-18T01:14:27.738Z · comments (57)

A Rocket–Interpretability Analogy
plex (ete) · 2024-10-21T13:55:18.184Z · comments (31)

Repeal the Jones Act of 1920
Zvi · 2024-11-27T15:00:06.801Z · comments (23)

“Alignment Faking” frame is somewhat fake
Jan_Kulveit · 2024-12-20T09:51:04.664Z · comments (13)

Subskills of "Listening to Wisdom"
Raemon · 2024-12-09T03:01:18.706Z · comments (16)

[link] China Hawks are Manufacturing an AI Arms Race
garrison · 2024-11-20T18:17:51.958Z · comments (42)

Maximizing Communication, not Traffic
jefftk (jkaufman) · 2025-01-05T13:00:02.280Z · comments (7)

OpenAI #10: Reflections
Zvi · 2025-01-07T17:00:07.348Z · comments (6)

[question] Which things were you surprised to learn are not metaphors?
Eric Neyman (UnexpectedValues) · 2024-11-21T18:56:18.025Z · answers+comments (79)

How will we update about scheming?
ryan_greenblatt · 2025-01-06T20:21:52.281Z · comments (4)

Circuits in Superposition: Compressing many small neural networks into one
Lucius Bushnaq (Lblack) · 2024-10-14T13:06:14.596Z · comments (8)

Why Don't We Just... Shoggoth+Face+Paraphraser?
Daniel Kokotajlo (daniel-kokotajlo) · 2024-11-19T20:53:52.084Z · comments (55)

"The Solomonoff Prior is Malign" is a special case of a simpler argument
David Matolcsi (matolcsid) · 2024-11-17T21:32:34.711Z · comments (44)

next page (older posts) →

Archive

Recent comments

gwern on Disagreement on AGI Suggests It’s Near

Specifically, as an antichrist, as the Gospels specifically warn that "false messiahs and false prophets will appear and produce great signs and omens", among other things. (And the position that the second coming has already happened - completely, not merely partially - is hyperpreterism.)

yair-halberstadt on XX by Rian Hughes: Pretentious Bullshit

I did also like the Ascension story. It did a very good job of imitating 1960s sci fi magazine stories. In a way it shows off his talent as an author more than the main story does!

charbel-raphael on ryan_greenblatt's Shortform

Yeah, fair enough. I think someone should try to do a more representative experiment and we could then monitor this metric.

btw, something that bothers me a little bit with this metric is the fact that a very simple AI that just asks me periodically "Hey, do you endorse what you are doing right now? Are you time boxing? Are you following your plan?" makes me (I think) significantly more strategic and productive. Similar to I hired 5 people to sit behind me and make me productive for a month. But this is maybe off topic.

jessica-liu-taylor on On Eating the Sun

I think partially it's meant to go from some sort of abstract model of intelligence as a scalar variable that increases at some rate (like, on a x/y graph) to concrete, material milestones. Like, people can imagine "intelligence goes up rapidly! singularity!" and it's unclear what that implies, I'm saying sufficient levels would imply eating the sun, that makes it harder to confuse with things like "getting higher scores on math tests".

I suppose a more general category would be, the relevant kind of self-improving intelligence would be the sort that can re-purpose mass-energy to creating more computation that can run its intelligence, and "eat the Sun" is an obvious target given this background notion of intelligence.

(Note, there is skepticism about feasibility on Twitter/X, that's some info about how non-singulatarians react)

ryan_greenblatt on ryan_greenblatt's Shortform

This case seems extremely cherry picked for cases where uplift is especially high. (Note that this is in copilot's interest.) Now, this task could probably be solved autonomously by an AI in like 10 minutes with good scaffolding.

I think you have to consider the full diverse range of tasks to get a reasonable sense or at least consider harder tasks. Like RE-bench seems much closer, but I still expect uplift on RE-bench to probably (but not certainly!) considerably overstate real world speed up.

raemon on On Eating the Sun

This seemed like a nice explainer post, though it's somewhat confusing who the post is for – if I imagine being someone who didn't really understand any arguments about superintelligence, I think I might bounce off the opening paragraph or title because I'm like "why would I care about eating the sun."

There is something nice and straightforward about the current phrasing but suspect there's an opening paragraph that would do a better job explaining why you might care about this.

(But I'd be curious to hear from people who weren't really sold on any singularity stuff who read it and can describe how it was for them)

jacobjacob on AI Safety as a YC Startup

Impact = Magnitude * Direction

Surely one should think of this as a vector in a space with more dimensions than 1.

In your equation you can just 1,000,000x magnitude and it will move in the "positive direction".

In the real world you can become a billionaire from selling toothbrushes and still be "overtaken" by a guy who wrote one blog post that happened to be real dang good

I made a drawing but lw won't allow adding it on phone I think

charbel-raphael on ryan_greenblatt's Shortform

I was saying 2x because I've memorised the results from this study. Do we have better numbers today? R&D is harder, so this is an upper bound. However, since this was from one year ago, so perhaps the factors cancel each other out?

Summary of the experiment process and results (described in following paragraph)

sharmake-farah on Disagreement on AGI Suggests It’s Near

My response to this is to focus on when a Dyson Swarm is being built, not AGI, because it's easier to define the term less controversially.

And a large portion of disagreements here fundamentally revolves around being unable to coordinate on what a given word means, which from an epistemic perspective doesn't matter at all, but it does matter from a utility/coordination perspective, where coordination is required for a lot of human feats.

j-bostock on Turning up the Heat on Deceptively-Misaligned AI

I'm only referring to the reward constraint being satisfied for scenarios that are in the training distribution, since this maths is entirely applied to a decision taking place in training. Therefore I don't think distributional shift applies.