LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[Intuitive self-models] 1. Preliminaries
Steven Byrnes (steve2152) · 2024-09-19T13:45:27.976Z · comments (9)

Genetic fitness is a measure of selection strength, not the selection target
Kaj_Sotala · 2023-11-04T19:02:13.783Z · comments (43)

Owain Evans on Situational Awareness and Out-of-Context Reasoning in LLMs
Michaël Trazzi (mtrazzi) · 2024-08-24T04:30:11.807Z · comments (0)

Calculating Natural Latents via Resampling
johnswentworth · 2024-06-06T00:37:42.127Z · comments (4)

[link] [Closed] Agent Foundations track in MATS
Vanessa Kosoy (vanessa-kosoy) · 2023-10-31T08:12:50.482Z · comments (1)

[link] Google Gemini Announced
Jacob G-W (g-w1) · 2023-12-06T16:14:07.192Z · comments (22)

On Anthropic’s Sleeper Agents Paper
Zvi · 2024-01-17T16:10:05.145Z · comments (5)

What if a tech company forced you to move to NYC?
KatjaGrace · 2024-06-09T06:30:03.329Z · comments (22)

Cooperating with aliens and AGIs: An ECL explainer
Chi Nguyen · 2024-02-24T22:58:47.345Z · comments (8)

[Closed] PIBBSS is hiring in a variety of roles (alignment research and incubation program)
Nora_Ammann · 2024-04-09T08:12:59.241Z · comments (0)

Monthly Roundup #17: April 2024
Zvi · 2024-04-15T12:10:03.126Z · comments (4)

On “first critical tries” in AI alignment
Joe Carlsmith (joekc) · 2024-06-05T00:19:02.814Z · comments (8)

Safe Stasis Fallacy
Davidmanheim · 2024-02-05T10:54:44.061Z · comments (2)

Math-to-English Cheat Sheet
nahoj · 2024-04-08T09:19:40.814Z · comments (5)

[link] Book review: Xenosystems
jessicata (jessica.liu.taylor) · 2024-09-16T20:17:56.670Z · comments (18)

[link] Come to Manifest 2024 (June 7-9 in Berkeley)
Saul Munn (saul-munn) · 2024-03-27T21:30:17.306Z · comments (2)

Dating Roundup #2: If At First You Don’t Succeed
Zvi · 2024-01-02T16:00:04.955Z · comments (29)

Complexity of value but not disvalue implies more focus on s-risk. Moral uncertainty and preference utilitarianism also do.
Chi Nguyen · 2024-02-23T06:10:05.881Z · comments (18)

[link] the micro-fulfillment cambrian explosion
bhauth · 2023-12-04T01:15:34.342Z · comments (5)

AI #44: Copyright Confrontation
Zvi · 2023-12-28T14:30:10.237Z · comments (13)

[link] Theories of Change for AI Auditing
Lee Sharkey (Lee_Sharkey) · 2023-11-13T19:33:43.928Z · comments (0)

Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision
leogao · 2023-12-16T05:39:10.558Z · comments (5)

Another argument against utility-centric alignment paradigms
Fiora from Rosebloom · 2024-09-22T07:28:27.856Z · comments (18)

[link] Land Reclamation is in the 9th Circle of Stagnation Hell
Maxwell Tabarrok (maxwell-tabarrok) · 2024-01-12T13:36:27.159Z · comments (6)

[link] Unlocking Solutions—By Understanding Coordination Problems
James Stephen Brown (james-brown) · 2024-07-27T04:52:13.435Z · comments (4)

[link] Questions are usually too cheap
Nathan Young · 2024-05-11T13:00:54.302Z · comments (19)

[link] AI, centralization, and the One Ring
owencb · 2024-09-13T14:00:16.126Z · comments (11)

Ten Modes of Culture War Discourse
jchan · 2024-01-31T13:58:20.572Z · comments (15)

[link] OpenAI releases GPT-4o, natively interfacing with text, voice and vision
Martín Soto (martinsq) · 2024-05-13T18:50:52.337Z · comments (23)

[link] In Defense of Epistemic Empathy
Kevin Dorst · 2023-12-27T16:27:06.320Z · comments (19)

[link] Investigating an insurance-for-AI startup
L Rudolf L (LRudL) · 2024-09-21T15:29:10.083Z · comments (0)

Self-Blinded L-Theanine RCT
niplav · 2023-10-31T15:24:57.717Z · comments (12)

[question] Can we get an AI to do our alignment homework for us?
Chris_Leong · 2024-02-26T07:56:22.320Z · answers+comments (33)

AI #50: The Most Dangerous Thing
Zvi · 2024-02-08T14:30:13.168Z · comments (4)

Trading off Lives
jefftk (jkaufman) · 2024-01-03T03:40:05.603Z · comments (12)

We are headed into an extreme compute overhang
devrandom · 2024-04-26T21:38:21.694Z · comments (33)

Fat Tails Discourage Compromise
niplav · 2024-06-17T09:39:16.489Z · comments (5)

Causal Graphs of GPT-2-Small's Residual Stream
David Udell · 2024-07-09T22:06:55.775Z · comments (7)

AI #71: Farewell to Chevron
Zvi · 2024-07-04T13:40:05.905Z · comments (9)

Per protocol analysis as medical malpractice
braces · 2024-01-31T16:22:21.367Z · comments (8)

AI #76: Six Shorts Stories About OpenAI
Zvi · 2024-08-08T13:50:04.659Z · comments (10)

Human wanting
TsviBT · 2023-10-24T01:05:39.374Z · comments (1)

2022 (and All Time) Posts by Pingback Count
Raemon · 2023-12-16T21:17:00.572Z · comments (14)

AI #82: The Governor Ponders
Zvi · 2024-09-19T13:30:04.863Z · comments (8)

Be More Katja
Nathan Young · 2024-03-11T21:12:14.249Z · comments (0)

Zvi's Manifold Markets House Rules
Zvi · 2023-11-13T00:28:02.147Z · comments (6)

[link] Open Phil releases RFPs on LLM Benchmarks and Forecasting
LawrenceC (LawChan) · 2023-11-11T03:01:09.526Z · comments (0)

AI #37: Moving Too Fast
Zvi · 2023-11-09T17:50:04.324Z · comments (5)

How the AI safety technical landscape has changed in the last year, according to some practitioners
tlevin (trevor) · 2024-07-26T19:06:47.126Z · comments (6)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

benjy_forstadt on Another argument against utility-centric alignment paradigms

Basically people tend to value stuff they perceive in the biophysical environment and stuff they learn about through the social environment.

So that reduces the complexity of the problem - it’s not a matter of designing a learning algorithm that both derives and comes to value human abstractions from observations of gas particles or whatever. That’s not what humans do either.

Okay then, why aren’t we star-maximizers or number-of-nation-states maximizers? Obviously it’s not just a matter of learning about the concept. The details of how we get values hooked up to an AGI’s motivations will depend on the particular AGI design but probably reward, prompting, scaffolding or the like.

signer on ASIs will not leave just a little sunlight for Earth

If that’s your hope—then you should already be alarmed at trends

Would be nice for someone to quantify the trends. Otherwise it may as well be that trends point to easygoing enough and aligned enough future systems.

For some humans, the answer will be yes—they really would do zero things!

Nah, it's impossible for evolution to just randomly stumble upon such complicated and unnatural mind-design. Next you are going to say what, that some people are fine with being controlled?

Where an entity has never had the option to do a thing, we may not validly infer its lack of preference.

Aha, so if we do give the option to an entity and it doesn't always kills all humans, then we have evidence it cares, right?

If there is a technical refutation it should simplify back into a nontechnical refutation.

Wait, why prohibiting successors would stop OpenAI from declaring easygoing system a failure? Ah, right - because there is no technical analysis, just elements of one.

vladimir_nesov on Model evals for dangerous capabilities

Potential for o1-like long horizon reasoning post-training compromises capability evals for open weights models. If it's not applied before/during evals, then evals will significantly underestimate capabilities of the system that can be built out of the open weights later [LW · GW], when the recipe is reproduced.

This likely will be relevant for Llama 4 (as the first 1e26+ FLOPs model with open weights), if they don't manage a good reproduction of o1-like post-training before release, and continue the policy of publishing in open weights if the evals are not too alarming.

tailcalled on Stephen Fowler's Shortform

There's a billion reasonable-seeming impact metrics, but the main challenge of counterfactual-based impact is always how you handle chaos. I'm pretty sure the solution is to go away from counterfactuals as they represent a pathologically computationalist form of agency [LW · GW], and instead learn the causal backbone [LW · GW].

If we view the core of life as increasing rather than decreasing entropy [LW · GW], then entropy-production may be a reasonable candidate for putting quantitative order to the causal backbone. But bounded agency is less about minimizing impact and more about propagating the free energy of the causal backbone into new entropy-producing channels.

review-bot on ASIs will not leave just a little sunlight for Earth

The LessWrong Review [? · GW] runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2025. The top fifty or so posts are featured prominently on the site throughout the year.

Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?

sharmake-farah on Another argument against utility-centric alignment paradigms

I feel like the difference between the Alpha and Beta examples and my examples mediate through your examples having basically no control of Beta's data at all, and my examples having far more control over what data is learned by the AI.

I think the key crux is whether we have much more control over AI data sources than evolution.

If I agreed with you that we would have essentially no control on what data the AI has, I'd be a lot more worried, but I don't think this is true, and I expect future AIs, including AGIs, to be a lot more built than grown, and for a lot of their data to be very carefully controlled via synthetic data, for simple capabilities reasons, but this can also be used for alignment strategies.

I think another disagreement is I basically don't buy the evolution analogy for DL, and I think there are some deep disanalogies (the big one for now is again how much more control over data sources than evolution, and this is only set to increase with synthetic data).

So I basically don't expect this to happen:

I instead strongly expect that the story would just repeat. The training process (or whatever process spits out the AGI) would end up creating some extremely specific conditions in which the AGI is learning the values. Its values would then necessarily be some complicated functions over weird mixes of the abstractions-natural-to-the-dataset-it's-trained-on, with their specifics being highly contingent on some invisible-to-us details of that process.

Pretty much all of your examples rely on the Alpha being unable to control the data learnt by Beta, and if this isn't the case, your examples break down.

antanaclasis on Economics Roundup #3

Somehow I missed that bit.

That makes the situation better, but there’s still some issue. The refund is not earning interest, but you liabilities are.

Take the situation with owing $25 million. Say that there’s a one year time between the tax being assessed and your asset going to $0 (at which time you claim the refund). In this time the $25 million loan you took is accruing interest. Let’s say it does so at a 4% rate per year, when you get your $25 million refund you therefore have $26 million in loans.

So you still end up $1 million in debt due to “gains” that you were never able to realize.

nathan-young on Did Christopher Hitchens change his mind about waterboarding?

@Ben Pace [LW · GW] I would like a vote here on what percentage chance we think that an omnicient reviewer would say this narrative is true. The display it on an axis, probably with dots (anonymous) for each person. eg like this.

reconstellate on Book Review: On the Edge: The Fundamentals

This whole post you keep referencing "the Hegelian dialectic" as some sort of thing central to the Village, without ever stating what you think it is; could you elaborate?

I'm not an expert in this area by any means but I do not have the impression that the Village has even a plurality of Hegelians, let alone being centered around them, and the way you talk about the Hegelian dialectic does not seem to be remotely the same usage that you see with actual Hegelians. Honestly I expect actual Hegelians to be rather less upset about the River's differences from them, because, to my understanding, Hegel's whole thing is he believes in a sort of teleology of history that guarantees the victory of human freedom?

alexander-gietelink-oldenziel on Alexander Gietelink Oldenziel's Shortform

strong disagree. i would be highly surprised if there were multiple essentially different algorithms to achieve general intelligence*.

I also agree with the Daniel Murfet's quote. There is a difference between a disjunction before you see the data and a disjunction after you see the data. I agree AI development is disjunctive before you see the data - but in hindsight all the things that work are really minor variants on a single thing that works.

*of course "essentially different" is doing a lot of work here. some of the conceptual foundations of intelligence haven't been worked out enough (or Vanessa has and I don't understand it yet) for me to make a formal statement here.