LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] hydrogen tube transport
bhauth · 2024-04-18T22:47:08.790Z · comments (12)

Difficulty classes for alignment properties
Jozdien · 2024-02-20T09:08:24.783Z · comments (5)

[link] math terminology as convolution
bhauth · 2023-10-30T01:05:11.823Z · comments (1)

Intransitive Trust
Screwtape · 2024-05-27T16:55:29.294Z · comments (15)

Flipping Out: The Cosmic Coinflip Thought Experiment Is Bad Philosophy
Joe Rogero · 2024-11-12T23:55:46.770Z · comments (17)

Augmenting Statistical Models with Natural Language Parameters
jsteinhardt · 2024-09-20T18:30:10.816Z · comments (0)

Basics of Handling Disagreements with People
Camille Berger (Camille Berger) · 2024-11-12T17:55:08.143Z · comments (4)

Adam Smith Meets AI Doomers
James_Miller · 2024-01-31T15:53:03.070Z · comments (10)

[link] legged robot scaling laws
bhauth · 2024-01-20T05:45:56.632Z · comments (8)

Cross-context abduction: LLMs make inferences about procedural training data leveraging declarative facts in earlier training data
Sohaib Imran (sohaib-imran) · 2024-11-16T23:22:21.857Z · comments (5)

The Schumer Report on AI (RTFB)
Zvi · 2024-05-24T15:10:03.122Z · comments (3)

[link] AI governance needs a theory of victory
Corin Katzke (corin-katzke) · 2024-06-21T16:15:46.560Z · comments (6)

How to develop a photographic memory 1/3
PhilosophicalSoul (LiamLaw) · 2023-12-28T13:26:36.669Z · comments (6)

Motivating Alignment of LLM-Powered Agents: Easy for AGI, Hard for ASI?
RogerDearnaley (roger-d-1) · 2024-01-11T12:56:29.672Z · comments (4)

[link] GPT2, Five Years On
Joel Burget (joel-burget) · 2024-06-05T17:44:17.552Z · comments (0)

My disagreements with "AGI ruin: A List of Lethalities"
Noosphere89 (sharmake-farah) · 2024-09-15T17:22:18.367Z · comments (46)

[link] Robin Hanson & Liron Shapira Debate AI X-Risk
Liron · 2024-07-08T21:45:40.609Z · comments (4)

CHAI internship applications are open (due Nov 13)
Erik Jenner (ejenner) · 2023-10-26T00:53:49.640Z · comments (0)

Wireheading and misalignment by composition on NetHack
pierlucadoro · 2023-10-27T17:43:41.727Z · comments (4)

Copyright Confrontation #1
Zvi · 2024-01-03T15:50:04.850Z · comments (7)

Computational Mechanics Hackathon (June 1 & 2)
Adam Shai (adam-shai) · 2024-05-24T22:18:44.352Z · comments (5)

[link] Book review: On the Edge
PeterMcCluskey · 2024-08-30T22:18:39.581Z · comments (0)

[link] The $100B plan with "70% risk of killing us all" w Stephen Fry [video]
Oleg Trott (oleg-trott) · 2024-07-21T20:06:39.615Z · comments (8)

[question] Do websites and apps actually generally get worse after updates, or is it just an effect of the fear of change?
lillybaeum · 2023-12-10T17:26:34.206Z · answers+comments (34)

Mech Interp Lacks Good Paradigms
Daniel Tan (dtch1997) · 2024-07-16T15:47:32.171Z · comments (0)

ChatGPT 4 solved all the gotcha problems I posed that tripped ChatGPT 3.5
VipulNaik · 2023-11-29T18:11:53.252Z · comments (16)

UDT1.01: Logical Inductors and Implicit Beliefs (5/10)
Diffractor · 2024-04-18T08:39:13.368Z · comments (2)

[link] FTX expects to return all customer money; clawbacks may go away
Mikhail Samin (mikhail-samin) · 2024-02-14T03:43:13.218Z · comments (1)

Proveably Safe Self Driving Cars [Modulo Assumptions]
Davidmanheim · 2024-09-15T13:58:19.472Z · comments (26)

[link] On Lies and Liars
Gabriel Alfour (gabriel-alfour-1) · 2023-11-17T17:13:03.726Z · comments (4)

5. Moral Value for Sentient Animals? Alas, Not Yet
RogerDearnaley (roger-d-1) · 2023-12-27T06:42:09.130Z · comments (41)

AGI will be made of heterogeneous components, Transformer and Selective SSM blocks will be among them
Roman Leventov · 2023-12-27T14:51:37.713Z · comments (9)

Helpful examples to get a sense of modern automated manipulation
trevor (TrevorWiesinger) · 2023-11-12T20:49:57.422Z · comments (3)

Musings on LLM Scale (Jul 2024)
Vladimir_Nesov · 2024-07-03T18:35:48.373Z · comments (0)

What AI companies should do: Some rough ideas
Zach Stein-Perlman · 2024-10-21T14:00:10.412Z · comments (10)

2024 ACX Predictions: Blind/Buy/Sell/Hold
Zvi · 2024-01-09T19:30:06.388Z · comments (2)

Sparse autoencoders find composed features in small toy models
Evan Anders (evan-anders) · 2024-03-14T18:00:43.339Z · comments (12)

[link] Genocide isn't Decolonization
robotelvis · 2023-10-20T04:14:07.716Z · comments (19)

[question] Feedback request: what am I missing?
Nathan Helm-Burger (nathan-helm-burger) · 2024-11-02T17:38:39.625Z · answers+comments (5)

[link] College technical AI safety hackathon retrospective - Georgia Tech
yix (Yixiong Hao) · 2024-11-15T00:22:53.159Z · comments (2)

Empathy/Systemizing Quotient is a poor/biased model for the autism/sex link
tailcalled · 2024-11-04T21:11:57.788Z · comments (0)

How good are LLMs at doing ML on an unknown dataset?
Håvard Tveit Ihle (havard-tveit-ihle) · 2024-07-01T09:04:03.687Z · comments (4)

AI #63: Introducing Alpha Fold 3
Zvi · 2024-05-09T14:20:03.176Z · comments (2)

[link] Provably Safe AI
PeterMcCluskey · 2023-10-05T22:18:26.013Z · comments (15)

[link] Fake Deeply
Zack_M_Davis · 2023-10-26T19:55:22.340Z · comments (7)

Regrant up to $600,000 to AI safety projects with GiveWiki
Dawn Drescher (Telofy) · 2023-10-28T19:56:06.676Z · comments (1)

Disentangling four motivations for acting in accordance with UDT
Julian Stastny · 2023-11-05T21:26:22.514Z · comments (3)

"Which chains-of-thought was that faster than?"
Emrik (Emrik North) · 2024-05-22T08:21:00.269Z · comments (4)

We have promising alignment plans with low taxes
Seth Herd · 2023-11-10T18:51:38.604Z · comments (9)

Update #2 to "Dominant Assurance Contract Platform": EnsureDone
moyamo · 2023-11-28T18:02:50.367Z · comments (2)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

cousin_it on Social events with plausible deniability

Does the list need to be pre-composed? Couldn't they just ask attendees to write some hot takes and put them in a hat? It might make the party even funnier.

nathan-helm-burger on A Narrow Path: a plan to deal with AI extinction risk

My essay is here: https://www.lesswrong.com/posts/NRZfxAJztvx2ES5LG/a-path-to-human-autonomy [LW · GW]

And a further discussion about the primary weakness I see in your plan (that AI algorithmic improvement progress is not easily blocked by regulating and monitoring large data centers) is discussed in my post here: https://www.lesswrong.com/posts/xoMqPzBZ9juEjKGHL/proactive-if-then-safety-cases [LW · GW]

daniel-kokotajlo on Will Orion/Gemini 2/Llama-4 outperform o1

Performance on what benchmarks? Do you mean better at practically everything? Or do you just mean 'better in practice for what most people use it for?' or what?

Also what counts as the next frontier model? E.g. if Anthropic releases "Sonnet 3.5 New v1.1" does that count?

Sorry to be nitpicky here.

I expect there to be something better than o1 available within six months. OpenAI has said that they'll have an agentic assistant up in January IIRC; I expect it to be better than o1.

nathan-helm-burger on Theories With Mentalistic Atoms Are As Validly Called Theories As Theories With Only Non-Mentalistic Atoms

Is your question directed at me, or the person I was replying to? I agree with the point "Sun is big, but..." makes. Here's a link to a recent summary of my view on a plausible plan for the world to handle surviving AI. Please feel free to share your thoughts on it. https://www.lesswrong.com/posts/NRZfxAJztvx2ES5LG/a-path-to-human-autonomy [LW · GW]

radford-neal-1 on Social events with plausible deniability

Wouldn't that destroy the whole idea? Anyone could tell that an opinion voiced that's not on the list must have been the person's true opinion.

In fact, I'd hope that several people composed the list, and didn't tell each other what items they added, so no one can say for sure that an opinion expressed wasn't one of the "hot takes".

dagon on Ethical Implications of the Quantum Multiverse

I think all the same arguments that it doesn't change decisions also apply to why it doesn't change virtue evaluations. It still all adds up to normality. It's still unimaginably big. Our actions as well as our beliefs and evaluations are irrelevant at most scales of measurement.

nostalgebraist on interpreting GPT: the logit lens

Because model has residual connections.

ryan_greenblatt on 5 ways to improve CoT faithfulness

I guess one way of framing it is that I find the shoggoth/face idea great as a science experiment; it gives us useful evidence! However, it doesn't make very much sense to me as a safety method intended for deployment.

Sadly, gathering evidence of misalignment in deployment seems likely to me to be one of the most effective strategies for gathering legible evidence (at least for early systems) given likely constraints. (E.g., because people won't believe results in text beds and because RL might be too expensive to run twice.)

ryan_greenblatt on 5 ways to improve CoT faithfulness

Sure, I mean that I expect that for the critical regime for TAI, we can literally have the Face be 3.5 sonnet.

declan-molony on Social events with plausible deniability

Who generated the hot takes? I'd love to see the full list.