LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

A Path out of Insufficient Views
Unreal · 2024-09-24T20:00:27.332Z · comments (46)

Ten Modes of Culture War Discourse
jchan · 2024-01-31T13:58:20.572Z · comments (15)

[question] Could orcas be (trained to be) smarter than humans? 
Towards_Keeperhood (Simon Skade) · 2024-11-04T23:29:26.677Z · answers+comments (11)

On Anthropic’s Sleeper Agents Paper
Zvi · 2024-01-17T16:10:05.145Z · comments (5)

[Closed] PIBBSS is hiring in a variety of roles (alignment research and incubation program)
Nora_Ammann · 2024-04-09T08:12:59.241Z · comments (0)

Towards a formalization of the agent structure problem
Alex_Altair · 2024-04-29T20:28:15.190Z · comments (5)

[Intuitive self-models] 5. Dissociative Identity (Multiple Personality) Disorder
Steven Byrnes (steve2152) · 2024-10-15T13:31:46.157Z · comments (7)

[link] Unlocking Solutions—By Understanding Coordination Problems
James Stephen Brown (james-brown) · 2024-07-27T04:52:13.435Z · comments (4)

On “first critical tries” in AI alignment
Joe Carlsmith (joekc) · 2024-06-05T00:19:02.814Z · comments (8)

Math-to-English Cheat Sheet
nahoj · 2024-04-08T09:19:40.814Z · comments (5)

[link] Land Reclamation is in the 9th Circle of Stagnation Hell
Maxwell Tabarrok (maxwell-tabarrok) · 2024-01-12T13:36:27.159Z · comments (6)

[link] Google Gemini Announced
Jacob G-W (g-w1) · 2023-12-06T16:14:07.192Z · comments (22)

[link] On the Role of Proto-Languages
adamShimi · 2024-09-22T16:50:34.720Z · comments (1)

[link] Questions are usually too cheap
Nathan Young · 2024-05-11T13:00:54.302Z · comments (19)

Thiel on AI & Racing with China
Ben Pace (Benito) · 2024-08-20T03:19:18.966Z · comments (10)

[link] the micro-fulfillment cambrian explosion
bhauth · 2023-12-04T01:15:34.342Z · comments (5)

Dating Roundup #2: If At First You Don’t Succeed
Zvi · 2024-01-02T16:00:04.955Z · comments (29)

[link] Come to Manifest 2024 (June 7-9 in Berkeley)
Saul Munn (saul-munn) · 2024-03-27T21:30:17.306Z · comments (2)

Complexity of value but not disvalue implies more focus on s-risk. Moral uncertainty and preference utilitarianism also do.
Chi Nguyen · 2024-02-23T06:10:05.881Z · comments (18)

[link] OpenAI releases GPT-4o, natively interfacing with text, voice and vision
Martín Soto (martinsq) · 2024-05-13T18:50:52.337Z · comments (23)

AI #44: Copyright Confrontation
Zvi · 2023-12-28T14:30:10.237Z · comments (13)

Monthly Roundup #17: April 2024
Zvi · 2024-04-15T12:10:03.126Z · comments (4)

[link] How Likely Are Various Precursors of Existential Risk?
NunoSempere (Radamantis) · 2024-10-28T13:27:31.620Z · comments (4)

Safe Stasis Fallacy
Davidmanheim · 2024-02-05T10:54:44.061Z · comments (2)

Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision
leogao · 2023-12-16T05:39:10.558Z · comments (5)

[link] Theories of Change for AI Auditing
Lee Sharkey (Lee_Sharkey) · 2023-11-13T19:33:43.928Z · comments (0)

[link] Breaking Circuit Breakers
mikes · 2024-07-14T18:57:20.251Z · comments (13)

Per protocol analysis as medical malpractice
braces · 2024-01-31T16:22:21.367Z · comments (8)

Zvi's Manifold Markets House Rules
Zvi · 2023-11-13T00:28:02.147Z · comments (6)

Be More Katja
Nathan Young · 2024-03-11T21:12:14.249Z · comments (0)

AI #50: The Most Dangerous Thing
Zvi · 2024-02-08T14:30:13.168Z · comments (4)

Calendar feature geometry in GPT-2 layer 8 residual stream SAEs
Patrick Leask (patrickleask) · 2024-08-17T01:16:53.764Z · comments (0)

AI #87: Staying in Character
Zvi · 2024-10-29T07:10:08.212Z · comments (3)

Trading off Lives
jefftk (jkaufman) · 2024-01-03T03:40:05.603Z · comments (12)

AI #71: Farewell to Chevron
Zvi · 2024-07-04T13:40:05.905Z · comments (9)

Fat Tails Discourage Compromise
niplav · 2024-06-17T09:39:16.489Z · comments (5)

Causal Graphs of GPT-2-Small's Residual Stream
David Udell · 2024-07-09T22:06:55.775Z · comments (7)

AI #40: A Vision from Vitalik
Zvi · 2023-11-30T17:30:08.350Z · comments (12)

A D&D.Sci Dodecalogue
abstractapplic · 2024-04-12T01:10:01.625Z · comments (0)

We are headed into an extreme compute overhang
devrandom · 2024-04-26T21:38:21.694Z · comments (33)

[question] Can we get an AI to "do our alignment homework for us"?
Chris_Leong · 2024-02-26T07:56:22.320Z · answers+comments (33)

AI #37: Moving Too Fast
Zvi · 2023-11-09T17:50:04.324Z · comments (5)

[link] Open Phil releases RFPs on LLM Benchmarks and Forecasting
LawrenceC (LawChan) · 2023-11-11T03:01:09.526Z · comments (0)

AI #76: Six Shorts Stories About OpenAI
Zvi · 2024-08-08T13:50:04.659Z · comments (10)

2022 (and All Time) Posts by Pingback Count
Raemon · 2023-12-16T21:17:00.572Z · comments (14)

[link] LLMs seem (relatively) safe
JustisMills · 2024-04-25T22:13:06.221Z · comments (24)

[link] S-Risks: Fates Worse Than Extinction
aggliu · 2024-05-04T15:30:36.666Z · comments (2)

Acting Wholesomely
owencb · 2024-02-26T21:49:16.526Z · comments (64)

Gradient Descent on the Human Brain
Jozdien · 2024-04-01T22:39:24.862Z · comments (5)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

cole-wyeth on Heresies in the Shadow of the Sequences

My impression is that e.g. the Catholic church has a pretty deeply thought out moral philosophy that has persisted across generations. That doesn't mean that every individual Catholic understands and executes it properly.

linda-linsefors on Seven lessons I didn't learn from election day

If people are ashamed to vote for Trump, why would they let their neighbours know?

buck on Untrusted smart models and trusted dumb models

I mostly mean "we are sure that it isn't egregiously unaligned and thus treating us adversarially". So models can be aligned but untrusted (if they're capable enough that they could have been unaligned, but they aren't actually unaligned). There shouldn't be models that are trusted but unaligned.

Everywhere I wrote "unaligned" here, I meant the fairly specific thing of "trying to defeat our safety measures so as to grab power", which is not the only way the word "aligned" is used.

buck on Untrusted smart models and trusted dumb models

I think your short definition should include the part about our epistemic status: "We are happy to assume the AI isn't adversarially trying to cause a bad outcome".

tahp on Thoughts after the Wolfram and Yudkowsky discussion

I think we're both saying the same thing here, except that the thing I'm saying implies that I would bet for Eliezer being pessimistic about this. My point was that I have a lot of pessimism that people would code something wrong even if we knew what we were trying to code, and this is where a lot of my doom comes from. Beyond that, I think we don't know what it is we're trying to code up, and you give some evidence for that. I'm not saying that if we knew how to make good AI, it would still fail if we coded it perfectly. I'm saying we don't know how to make good AI (even though we could in principle figure it out), and also current industry standards for coding things would not get it right the first time even if we knew what we were trying to build. I feel like I basically understanding the second thing, but I don't have any gears-level understanding for why it's hard to encode human desires beyond a bunch of intuitions from monkey's-paw things that go wrong if you try to come up with creative disastrous ways to accomplish what seem like laudable goals.

I don't think Eliezer is a DOOM rock, although I think a DOOM rock would be about as useful as Eliezer in practice right now because everyone making capability progress has doomed alignment strategies. My model of Eliezer's doom argument for the current timeline is approximately "programming smart stuff that does anything useful is dangerous, we don't know how to specify smart stuff that avoids that danger, and even if we did we seem to be content to train black-box algorithms until they look smarter without checking what they do before we run them." I don't understand one of the steps in that funnel of doom as well as I would like. I think that in a world where people weren't doing the obvious doomed thing of making black-box algorithms which are smart, he would instead have a last step in the funnel of "even if we knew what we need a safe algorithm to do we don't know how to write programs that do exactly what we want in unexpected situations," because that is my obvious conclusion from looking at the software landscape.

metawrong on Monthly Roundup #23: October 2024

> The news is good, and there are now seven shows in my tier 1

@Zvi [LW · GW] Which are the other shows in your tier 1?

olli-jaerviniemi on Untrusted smart models and trusted dumb models

You might be interested in this post [LW · GW] of mine, which is more precise about what "trustworthy" means. In short, my definition is "the AI isn't adversarially trying to cause a bad outcome". This includes aligned models, and also unaligned models that are dumb enough to realize they should (try to) sabotage. This does not include models that are unaligned, trying to sabotage and which we are able to stop from causing bad outcomes (but we might still have use-cases for such models).

cubefox on Seven lessons I didn't learn from election day

Cities are very heavily Democratic, while rural areas are only moderately Republican.

I think this isn't compatible with both getting about equally many votes. Because much more US Americans live in cities than in rural areas:

In 2020, about 82.66 percent of the total population in the United States lived in cities and urban areas.

https://www.statista.com/statistics/269967/urbanization-in-the-united-states/

jbash on Proposing the Conditional AI Safety Treaty (linkpost TIME)

Fortunately, Nobel Laureate Geoffrey Hinton, Turing Award winner Yoshua Bengio, and many others have provided a piece of the solution. In a policy paper published in Science earlier this year, they recommended “if-then commitments”: commitments to be activated if and when red-line capabilities are found in frontier AI systems.

So race to the brink and hope you can actually stop when you get there?

Once the most powerful nations have signed this treaty, it is in their interest to verify each others’ compliance, and to make sure uncontrollable AI is not built elsewhere, either.

How, exactly?

viliam on The Early Christian Strategy

I suspect that to solve this puzzle, we would need more precise data. For example, the thing about martyrdom. Naively, it makes it sound like the early Christians were quite suicidal, which is amazing in itself, and also makes you wonder how they survived as a group.

But let's try to use numbers. What fraction of early Christians was actually willing to die for their faith? I have no idea, so just for the sake of a thought experiment, I propose a number... 1%. (No idea whether it is correct.)

Suddenly the fact that a religion which promises you an awesome afterlife can make 1% of its members die voluntarily, does not feel so surprising. There are all kinds of crazy and otherwise vulnerable people out there. With enough peer pressure, you could probably start a cult where 1% of your members commit some kind of suicide even today. Only, the moment you would actually do it, the media would describe you as a crazy murderous cult, and you would probably end up in jail. It would be difficult to keep recruiting members. I suppose the Rome could have been different, for example didn't care about suicides of slaves so much. Also, "suicide by a (Roman) cop" is a non-central form of suicide; it does not make your group look like villains. And if you are actively gaining new members, losing 1% does not make much of a difference.

Also, I wonder how hard Romans actually tried to eliminate Christians. I imagine that if someone tried the same way Hitler tried to get rid of Jews, it would be game over for Christianity. But if the level of persecution is more like "once in a while, we will take a high-status member, try to make them deny Jesus, and kill them if they refuse", that won't stop the group than meanwhile recruits hundred new members. Also, this was ancient Rome, life was probably cheap, you could have get killed for many different things, plus die of many different diseases, perhaps the chance of being killed for your religion didn't increase the overall risk significantly if you were an average member.