LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Deceptive AI ≠ Deceptively-aligned AI
Steven Byrnes (steve2152) · 2024-01-07T16:55:13.761Z · comments (19)

[link] Ilya Sutskever created a new AGI startup
harfe · 2024-06-19T17:17:17.366Z · comments (35)

[link] Explaining Impact Markets
Saul Munn (saul-munn) · 2024-01-31T09:51:27.587Z · comments (2)

Catching AIs red-handed
ryan_greenblatt · 2024-01-05T17:43:10.948Z · comments (21)

Access to powerful AI might make computer security radically easier
Buck · 2024-06-08T06:00:19.310Z · comments (14)

[link] Ideological Bayesians
Kevin Dorst · 2024-02-25T14:17:25.070Z · comments (4)

[question] How to get nerds fascinated about mysterious chronic illness research?
riceissa · 2024-05-27T22:58:29.707Z · answers+comments (50)

[link] MIRI's April 2024 Newsletter
Harlan · 2024-04-12T23:38:20.781Z · comments (0)

[link] Compact Proofs of Model Performance via Mechanistic Interpretability
LawrenceC (LawChan) · 2024-06-24T19:27:21.214Z · comments (3)

[link] Uncovering Deceptive Tendencies in Language Models: A Simulated Company AI Assistant
Olli Järviniemi (jarviniemi) · 2024-05-06T07:07:05.019Z · comments (13)

I am the Golden Gate Bridge
Zvi · 2024-05-27T14:40:03.216Z · comments (6)

On Claude 3.5 Sonnet
Zvi · 2024-06-24T12:00:05.719Z · comments (14)

Kids or No kids
Kids or no kids (grosseholz.f@gmail.com) · 2023-11-14T18:37:02.799Z · comments (10)

[link] Almost everyone I’ve met would be well-served thinking more about what to focus on
Henrik Karlsson (henrik-karlsson) · 2024-01-05T21:01:27.861Z · comments (8)

OpenAI's Sora is an agent
CBiddulph (caleb-biddulph) · 2024-02-16T07:35:52.171Z · comments (25)

Counting arguments provide no evidence for AI doom
Nora Belrose (nora-belrose) · 2024-02-27T23:03:49.296Z · comments (188)

[link] Things You’re Allowed to Do: University Edition
Saul Munn (saul-munn) · 2024-02-06T00:36:11.690Z · comments (13)

Sparsify: A mechanistic interpretability research agenda
Lee Sharkey (Lee_Sharkey) · 2024-04-03T12:34:12.043Z · comments (22)

[link] RAND report finds no effect of current LLMs on viability of bioterrorism attacks
StellaAthena · 2024-01-25T19:17:30.493Z · comments (14)

[link] I found >800 orthogonal "write code" steering vectors
Jacob G-W (g-w1) · 2024-07-15T19:06:17.636Z · comments (19)

[link] the Giga Press was a mistake
bhauth · 2024-08-21T04:51:24.150Z · comments (26)

Notes on Dwarkesh Patel’s Podcast with Demis Hassabis
Zvi · 2024-03-01T16:30:08.687Z · comments (0)

[link] Against Aschenbrenner: How 'Situational Awareness' constructs a narrative that undermines safety and threatens humanity
GideonF · 2024-07-15T18:37:40.232Z · comments (17)

Apollo Research 1-year update
Marius Hobbhahn (marius-hobbhahn) · 2024-05-29T17:44:32.484Z · comments (0)

2024 Petrov Day Retrospective
Ben Pace (Benito) · 2024-09-28T21:30:14.952Z · comments (25)

Towards a Less Bullshit Model of Semantics
johnswentworth · 2024-06-17T15:51:06.060Z · comments (44)

It's time for a self-reproducing machine
Carl Feynman (carl-feynman) · 2024-08-07T21:52:22.819Z · comments (68)

[link] Sabotage Evaluations for Frontier Models
David Duvenaud (david-duvenaud) · 2024-10-18T22:33:14.320Z · comments (11)

A Solomonoff Inductor Walks Into a Bar: Schelling Points for Communication
johnswentworth · 2024-07-26T00:33:42.000Z · comments (1)

Takeoff speeds presentation at Anthropic
Tom Davidson (tom-davidson-1) · 2024-06-04T22:46:35.448Z · comments (0)

[question] Am I confused about the "malign universal prior" argument?
nostalgebraist · 2024-08-27T23:17:22.779Z · answers+comments (33)

SB 1047: Final Takes and Also AB 3211
Zvi · 2024-08-27T22:10:07.647Z · comments (11)

[link] The Soul Key
Richard_Ngo (ricraz) · 2023-11-04T17:51:53.176Z · comments (9)

On attunement
Joe Carlsmith (joekc) · 2024-03-25T12:47:34.856Z · comments (8)

You can, in fact, bamboozle an unaligned AI into sparing your life
David Matolcsi (matolcsid) · 2024-09-29T16:59:43.942Z · comments (171)

OpenAI: The Board Expands
Zvi · 2024-03-12T14:00:04.110Z · comments (1)

Everything Wrong with Roko's Claims about an Engineered Pandemic
WitheringWeights (EZ97) · 2024-02-22T15:59:08.439Z · comments (10)

Information vs Assurance
johnswentworth · 2024-10-20T23:16:25.762Z · comments (7)

[link] Finishing The SB-1047 Documentary In 6 Weeks
Michaël Trazzi (mtrazzi) · 2024-10-28T20:17:47.465Z · comments (5)

Meaning & Agency
abramdemski · 2023-12-19T22:27:32.123Z · comments (17)

Quotes from Leopold Aschenbrenner’s Situational Awareness Paper
Zvi · 2024-06-07T11:40:03.981Z · comments (10)

Live Theory Part 0: Taking Intelligence Seriously
Sahil · 2024-06-26T21:37:10.479Z · comments (3)

Circular Reasoning
abramdemski · 2024-08-05T18:10:32.736Z · comments (36)

Announcing Neuronpedia: Platform for accelerating research into Sparse Autoencoders
Johnny Lin (hijohnnylin) · 2024-03-25T21:17:58.421Z · comments (7)

Just admit that you’ve zoned out
joec · 2024-06-04T02:51:27.594Z · comments (22)

The case for unlearning that removes information from LLM weights
Fabien Roger (Fabien) · 2024-10-14T14:08:04.775Z · comments (14)

How to train your own "Sleeper Agents"
evhub · 2024-02-07T00:31:42.653Z · comments (11)

Defining alignment research
Richard_Ngo (ricraz) · 2024-08-19T20:42:29.279Z · comments (23)

New page: Integrity
Zach Stein-Perlman · 2024-07-10T15:00:41.050Z · comments (3)

Review: Conor Moreton's "Civilization & Cooperation"
Duncan Sabien (Deactivated) (Duncan_Sabien) · 2024-05-26T19:32:43.131Z · comments (8)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

habryka4 on The Shallow Bench

Note: I added some spoiler warnings (given the one comment complaining). I don't feel strongly, so feel free to revert

lelapin on CstineSublime's Shortform

Interesting thoughts, ty.

A difficulty to common understanding I see here is that you're talking of "good" or "bad" paragraphs in the absolute, but didn't particularly define "good" or "bad" paragraph by some objective standard, so you're relying on your own understanding of what's good or bad. If you were defining good or bad relatively, you'd look for a 100 paragraphs, and post the worse 10 as bad. I'd be interested in seeing what were the worse paragraphs you found, some 50 percentile ones, and what were the best, then I'd tell you if I have the same absolute standards as you have.

lelapin on The Shallow Bench

Enjoyed this post.

Fyi, from the front page I just hovered this post "The shallow bench" and was immediately spoiled on Project Hail Mary (which I had started listening to, but didn't get far into). Maybe add some spoiler tag or warning directly after the title?

adamshimi on The Compendium, A full argument about extinction risk from AGI

Thanks for the comment!

We have indeed gotten the feedback by multiple people that this part didn't feel detailed enough (although we got this much more from very technical readers than from non-technical ones), and are working at improving the arguments.

adamshimi on The Compendium, A full argument about extinction risk from AGI

Thanks for the comment!

We'll correct the typo in the next patch/bug fix.

As for the more direct adversarial tone of the prologue, it is an explicit choice (and is contrasted by the rest of the document). For the moment, we're waiting to get more feedback on the doc to see if it really turns people off or not.

tapatakt on Survival without dignity

I guess the big problem for someone who tries to do it not in small form is that while you write the story it is already getting old. There are writers who can write a novel in a season, but not many. At least if we talk about good writers. Hm-m-m, did rationalists try to hire Stephen King? :)

kajus on Winning isn't enough

I don't fully understand the post. Without a clear definition of "winning," the points you're trying to make — as well as the distinction between pragmatic and non-pragmatic principles (which also aligns with strategies and knowledge formation) — aren't totally clear. For instance, "winning," in some vague sense, probably also includes things like "fitting with evidence," taking advice from others, and so on. You don't necessarily need to turn to non-pragmatic principles or those that don’t derive from the principle of winning. "Winning" is a pretty loose term.

carl-feynman on What's a good book for a technically-minded 11-year old?

Here is a category of book that I really loved at that age: non-embarrasing novels about how adults do stuff. Since, for me, that age was in 1973, the particular books I name might be obsolete. There’s a series of novels by Arthur Hailey, with titles like “Hotel” and “Airport”, that are set inside the titular institutions, and follow people as they deal with problems and interact with each other. And there is no, or at least minimal, sex, so they’re not icky to a kid. They’re not idealized; there is a reasonable degree of fallibility, venality and scheming, but that is also fascinating. And all the motivations, and the way the systems work, is clearly explained, so it can be understood by an unsophisticated reader.

These books were bestsellers back in the day, so you might be able to find a copy in the library. See if he likes it!

Another novel in this vein is “The view from the fortieth floor”, which is about a badly managed magazine going bankrupt. Doesn’t sound amazing, I know, but if you’re a kid, who’s never seen bad managers blunder into ineluctable financial doom, it’s really neat.

My wife is a middle school librarian. I’ll ask her when I see her for more books like this.

rotatingpaguro on Bogdan Ionut Cirstea's Shortform

current inference scaling methods tend to be tied to CoT and the like, which are quite transparent

Aschenbrenner in Situational Awareness predicts illegible chains of thought are going to prevail because they are more efficient. I know of one developer claiming to do this (https://platonicresearch.com/) but I guess there must be many.

alex-k-chen-parrot on Why Recursion Pharmaceuticals abandoned cell painting for brightfield imaging

Can't you theoretically use both CellPainting assays and light-sheet microscopy?

I mean, I did look at CellPainting assays a small amount of time ago and I was still struck by how little control one had over the process, and how it isn't great for many kinds of mechanistic interpretability. I know there's a Brazil team looking at use of CellPainting for sphere-based silver-particle nanoplastics, but there are still many concrete variables, like intrinsic oxidative stress, that you can't necessarily get from CellPainting alone.

CellPainter can be used for toxicological predictions of organophosphate toxicity (predicting that they're more toxic than many other classes of compounds), but the toxicological assays used weren't able to use much nuance, especially the kind that's relevant to physiological concentrations that people are normally exposed to. I remember ketocozanole scored very highly on toxicity, but what does this say about physiological doses that are much smaller than the ones used for CellPainter?

Also, the cell lines were all cancer cell lines (OS osteosarcoma cancer cell lines), which gives little predictive power for neurotoxicity or a compound's ability to disrupt neuronal signalling.

Still, the CellPainter support ecosystem is extremely impressive, even though it doesn't produce Janelia-standard PB datasets that are used for lightsheet.. [cf https://www.cytodata.org/symposia/2024/ ]

https://markovbio.github.io/biomedical-progress/

FWIW, some of the most impressive near-term work might be whatever the https://www.abugootlab.org/ lab is going to do soon (large-scale perturb-seq combined with optical pooling to do readouts of genetic perturbations...)