LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Explaining a Math Magic Trick
Robert_AIZI · 2024-05-05T19:41:52.048Z · comments (10)

[link] LessOnline (May 31—June 2, Berkeley, CA)
Ben Pace (Benito) · 2024-03-26T02:34:00.000Z · comments (23)

[link] New report: Safety Cases for AI
joshc (joshua-clymer) · 2024-03-20T16:45:27.984Z · comments (13)

Partial value takeover without world takeover
KatjaGrace · 2024-04-05T06:20:03.961Z · comments (23)

Announcing Neuronpedia: Platform for accelerating research into Sparse Autoencoders
Johnny Lin (hijohnnylin) · 2024-03-25T21:17:58.421Z · comments (7)

A Dozen Ways to Get More Dakka
Davidmanheim · 2024-04-08T04:45:19.427Z · comments (5)

SAE reconstruction errors are (empirically) pathological
wesg (wes-gurnee) · 2024-03-29T16:37:29.608Z · comments (15)

Key takeaways from our EA and alignment research surveys
Cameron Berg (cameron-berg) · 2024-05-03T18:10:41.416Z · comments (10)

Apply to be a Safety Engineer at Lockheed Martin!
yanni kyriacos (yanni) · 2024-03-31T21:02:08.499Z · comments (3)

Sparsify: A mechanistic interpretability research agenda
Lee Sharkey (Lee_Sharkey) · 2024-04-03T12:34:12.043Z · comments (22)

[link] Uncovering Deceptive Tendencies in Language Models: A Simulated Company AI Assistant
Olli Järviniemi (jarviniemi) · 2024-05-06T07:07:05.019Z · comments (4)

Priors and Prejudice
MathiasKB (MathiasKirkBonde) · 2024-04-22T15:00:41.782Z · comments (16)

Stagewise Development in Neural Networks
Jesse Hoogland (jhoogland) · 2024-03-20T19:54:06.181Z · comments (1)

We might be missing some key feature of AI takeoff; it'll probably seem like "we could've seen this coming"
Lukas_Gloor · 2024-05-09T15:43:11.490Z · comments (35)

[link] "AI Safety for Fleshy Humans" an AI Safety explainer by Nicky Case
habryka (habryka4) · 2024-05-03T18:10:12.478Z · comments (10)

Natural Latents: The Concepts
johnswentworth · 2024-03-20T18:21:19.878Z · comments (16)

[link] Essay competition on the Automation of Wisdom and Philosophy — $25k in prizes
owencb · 2024-04-16T10:10:13.338Z · comments (6)

When is a mind me?
Rob Bensinger (RobbBB) · 2024-04-17T05:56:38.482Z · comments (62)

[link] [Linkpost] Practically-A-Book Review: Rootclaim $100,000 Lab Leak Debate
trevor (TrevorWiesinger) · 2024-03-28T16:03:36.452Z · comments (22)

A couple productivity tips for overthinkers
Steven Byrnes (steve2152) · 2024-04-20T16:05:50.332Z · comments (13)

ACX Covid Origins Post convinced readers
ErnestScribbler · 2024-05-01T13:06:20.818Z · comments (7)

Teaching CS During Take-Off
andrew carle (andrew-carle) · 2024-05-14T22:45:39.447Z · comments (10)

MATS Winter 2023-24 Retrospective
Rocket (utilistrutil) · 2024-05-11T00:09:17.059Z · comments (28)

[link] MIRI's May 2024 Newsletter
Harlan · 2024-05-15T00:13:30.153Z · comments (1)

SAE-VIS: Announcement Post
CallumMcDougall (TheMcDouglas) · 2024-03-31T15:30:49.079Z · comments (8)

Coherence of Caches and Agents
johnswentworth · 2024-04-01T23:04:31.320Z · comments (7)

Q&A on Proposed SB 1047
Zvi · 2024-05-02T15:10:02.916Z · comments (6)

A Gentle Introduction to Risk Frameworks Beyond Forecasting
pendingsurvival · 2024-04-11T18:03:25.605Z · comments (10)

[link] Nick Bostrom’s new book, “Deep Utopia”, is out today
PeterH · 2024-03-27T11:24:01.401Z · comments (5)

AXRP Episode 31 - Singular Learning Theory with Daniel Murfet
DanielFilan · 2024-05-07T03:50:05.001Z · comments (4)

[link] Environmentalism in the United States Is Unusually Partisan
Jeffrey Heninger (jeffrey-heninger) · 2024-05-13T21:23:10.755Z · comments (11)

Rejecting Television
Declan Molony (declan-molony) · 2024-04-23T04:59:50.253Z · comments (10)

[Full Post] Progress Update #1 from the GDM Mech Interp Team
Neel Nanda (neel-nanda-1) · 2024-04-19T19:06:59.185Z · comments (8)

[link] My thesis (Algorithmic Bayesian Epistemology) explained in more depth
Eric Neyman (UnexpectedValues) · 2024-05-09T19:43:16.543Z · comments (4)

[link] The Cognitive-Theoretic Model of the Universe: A Partial Summary and Review
jessicata (jessica.liu.taylor) · 2024-03-27T19:59:27.893Z · comments (32)

Announcing Suffering For Good
Garrett Baker (D0TheMath) · 2024-04-01T17:08:12.322Z · comments (5)

Towards Multimodal Interpretability: Learning Sparse Interpretable Features in Vision Transformers
hugofry · 2024-04-29T20:57:35.127Z · comments (7)

Creating unrestricted AI Agents with Command R+
Simon Lermen (dalasnoin) · 2024-04-16T14:52:50.917Z · comments (12)

AXRP Episode 27 - AI Control with Buck Shlegeris and Ryan Greenblatt
DanielFilan · 2024-04-11T21:30:04.244Z · comments (10)

[link] Motivation gaps: Why so much EA criticism is hostile and lazy
titotal (lombertini) · 2024-04-22T11:49:59.389Z · comments (5)

Questions for labs
Zach Stein-Perlman · 2024-04-30T22:15:55.362Z · comments (10)

Mid-conditional love
KatjaGrace · 2024-04-17T04:00:08.341Z · comments (19)

[link] The 2nd Demographic Transition
Maxwell Tabarrok (maxwell-tabarrok) · 2024-04-06T14:10:13.095Z · comments (17)

Generalized Stat Mech: The Boltzmann Approach
David Lorell · 2024-04-12T17:47:31.880Z · comments (7)

LessWrong Community Weekend 2024, open for applications
UnplannedCauliflower · 2024-05-01T10:18:21.992Z · comments (2)

[link] The Inner Ring by C. S. Lewis
Saul Munn (saul-munn) · 2024-04-24T22:48:09.228Z · comments (6)

[Summary] Progress Update #1 from the GDM Mech Interp Team
Neel Nanda (neel-nanda-1) · 2024-04-19T19:06:17.755Z · comments (0)

Constructability: Plainly-coded AGIs may be feasible in the near future
Épiphanie Gédéon (joy_void_joy) · 2024-04-27T16:04:45.894Z · comments (12)

Best in Class Life Improvement
sapphire (deluks917) · 2024-04-04T01:51:02.556Z · comments (15)

Duct Tape security
Isaac King (KingSupernova) · 2024-04-26T18:57:05.659Z · comments (12)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

arthur-malone on Ilya Sutskever and Jan Leike resign from OpenAI

It also occurs to me that the causality could go the other way: Ilya and Jan may have timed their departure to coincide with the 4o release for a number of reasons. If they go on to launch a new safety org soon, for example, I'd be more inclined to think that the timing of the two events was a result of Ilya/Jan trying to use the moment to their advantage.

richard_kennaway on David Gross's Shortform

Dragging files around in a GUI is a familiar action that does known things with known consequences. Somewhere on the hard disc (or SSD, or somewhere in the cloud, etc.) there is indeed a "file" which has indeed been "moved" into a "folder", and taking off those quotation marks only requires some background knowledge (which in fact I have) of the lower-level things that are going on and which the GUI presents to me through this visual metaphor.

Some explanations work better than others. The idea that there is stuff out there that gives rise to my perceptions, and which I can act on with predictable results, seems to me the obvious explanation that any other contender will have to do a great deal of work to topple from the plinth. The various philosophical arguments over doctrines such as "idealism", "realism", and so on are more like a musical recreation (see my other comment [LW(p) · GW(p)]) than anything to take seriously as a search for truth. They are hardly the sort of thing that can be right or wrong, and to the extent that they are, they are all wrong.

Ok, that's my personal view of a lot of philosophy, but I'm not the only one.

niplav on shortplav

Oh damn superalignment team has been dissolved.

daniel-samuel on AI: Practical Advice for the Worried

Every time I start to freak out about AI—be it timelines, risks, or whatever—I come back to this post to get down to earth a bit and avoid making foolish life decisions.

sharmake-farah on Catastrophic Goodhart in RL with KL penalty

I have a question about this post, and it has to do with the case where both utility and error are heavy tailed:

Where does the expected value converge to if both utility and errors are heavy tailed? Is it 0, infinity, some other number, or does it not converge to any number at all?

zach-stein-perlman on Ilya Sutskever and Jan Leike resign from OpenAI

Added updates to the post:

Superalignment dissolves.

Leike tweets, including:

I have been disagreeing with OpenAI leadership about the company's core priorities for quite some time, until we finally reached a breaking point.
I believe much more of our bandwidth should be spent getting ready for the next generations of models, on security, monitoring, preparedness, safety, adversarial robustness, (super)alignment, confidentiality, societal impact, and related topics.
These problems are quite hard to get right, and I am concerned we aren't on a trajectory to get there.
Over the past few months my team has been sailing against the wind. Sometimes we were struggling for compute and it was getting harder and harder to get this crucial research done.
Building smarter-than-human machines is an inherently dangerous endeavor. OpenAI is shouldering an enormous responsibility on behalf of all of humanity.
But over the past years, safety culture and processes have taken a backseat to shiny products.

Daniel Kokotajlo tells Vox:

“I joined with substantial hope that OpenAI would rise to the occasion and behave more responsibly as they got closer to achieving AGI. It slowly became clear to many of us that this would not happen,” Kokotajlo told me. “I gradually lost trust in OpenAI leadership and their ability to responsibly handle AGI, so I quit.”

Kelsey Piper says:

I have seen the extremely restrictive off-boarding agreement that contains nondisclosure and non-disparagement provisions former OpenAI employees are subject to. It forbids them, for the rest of their lives, from criticizing their former employer. Even acknowledging that the NDA exists is a violation of it.

o-o on Ilya Sutskever and Jan Leike resign from OpenAI

https://x.com/janleike/status/1791498174659715494?s=46&t=lZJAHzXMXI1MgQuyBgEhgA

Leike explains his decisions.

logan-zoellner on Against "argument from overhang risk"

We ran into a hardware shortage during a period of time where there was no pause, which is evidence that the hardware manufacturer was behaving conservatively.

Alternative hypothesis, there are physical limits on how fast you can build things.

Also, NVIDIA currently has a monopoly on "decent AI accelerator you can actually buy". Part of the "shortage" is just the standard economic result that a monopoly produces less of something to increase profits.

This monopoly will not last forever, so in that sense we are currently in hardware "underhang".

This and the rest of your comment seems to have ignored the rest of my post (see: multiple inputs to progress, all of which seem sensitive to "demand"

Nvidia doesn't just make AGI accelerators. They are are video game graphics card company.

And even if we pause large training runs, demand for inference of existing models will continue to increase.

If you think my model of how inputs to capabilities progress are sensitive to demand for those inputs from AGI labs is wrong, then please argue so directly, or explain how your proposed scenario is compatible with it.

This is me arguing directly.

The model "all demand for hardware is driven by a handful of labs training cutting edge models" is completely implausible. It doesn't explain how we got the hardware in the first place (video games) and it ignores the fact that there exist uses for AI acceleration hardware other than training cutting-edge models.

unexpectedvalues on Ilya Sutskever and Jan Leike resign from OpenAI

My Manifold market on Collin Burns, lead author of the weak-to-strong generalization paper

trevorone on romeostevensit's Shortform

If they ban it, then this would count as "selling it on the street corners in retaliation". For now, I think Lumina should get the profit for the research they did.

Cases of people doing mouth-to-mouth will also make it easier for the American Dental Association to smear Lumina, which might even shift the Overton window enough to accidentally turn public opinion against similar products in the future (e.g. like what happened with cryonics, literally billions of people died as a direct result of early marketing/political failures [? · GW]).