LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Games for AI Control
charlie_griffin (cjgriffin) · 2024-07-11T18:40:50.607Z · comments (0)

The need for multi-agent experiments
Martín Soto (martinsq) · 2024-08-01T17:14:16.590Z · comments (3)

New Executive Team & Board — PIBBSS
Nora_Ammann · 2024-07-01T19:30:45.261Z · comments (1)

Locating My Eyes (Part 3 of "The Sense of Physical Necessity")
LoganStrohl (BrienneYudkowsky) · 2024-02-29T03:09:25.810Z · comments (4)

Sci-Fi books micro-reviews
Yair Halberstadt (yair-halberstadt) · 2024-06-24T09:49:28.523Z · comments (27)

Why does generalization work?
Martín Soto (martinsq) · 2024-02-20T17:51:10.424Z · comments (16)

Understanding Positional Features in Layer 0 SAEs
bilalchughtai (beelal) · 2024-07-29T09:36:40.701Z · comments (0)

[question] Does reducing the amount of RL for a given capability level make AI safer?
Chris_Leong · 2024-05-05T17:04:01.799Z · answers+comments (22)

Job Listing: Managing Editor / Writer
Gretta Duleba (gretta-duleba) · 2024-02-21T23:41:26.818Z · comments (2)

[question] What's the Right Way to think about Information Theoretic quantities in Neural Networks?
Dalcy (Darcy) · 2025-01-19T08:04:30.236Z · answers+comments (13)

[link] Began a pay-on-results coaching experiment, made $40,300 since July
Chipmonk · 2024-12-29T21:12:02.574Z · comments (15)

Dmitry's Koan
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-10T04:27:30.346Z · comments (8)

Zvi’s 2024 In Movies
Zvi · 2025-01-13T13:40:05.488Z · comments (4)

Claude's Constitutional Consequentialism?
1a3orn · 2024-12-19T19:53:33.254Z · comments (6)

[link] Alignment Is Not All You Need
Adam Jones (domdomegg) · 2025-01-02T17:50:00.486Z · comments (10)

AI #101: The Shallow End
Zvi · 2025-01-30T14:50:08.269Z · comments (1)

MATS mentor selection
DanielFilan · 2025-01-10T03:12:52.141Z · comments (11)

Why care about AI personhood?
Francis Rhys Ward (francis-rhys-ward) · 2025-01-26T11:24:45.596Z · comments (6)

D&D.Sci Alchemy: Archmage Anachronos and the Supply Chain Issues
aphyer · 2024-06-07T19:02:06.859Z · comments (16)

Wholesomeness and Effective Altruism
owencb · 2024-02-28T20:28:22.175Z · comments (3)

[link] Rowing vs steering
Saul Munn (saul-munn) · 2024-08-10T07:00:17.594Z · comments (2)

Evidential Cooperation in Large Worlds: Potential Objections & FAQ
Chi Nguyen · 2024-02-28T18:58:25.688Z · comments (5)

How I internalized my achievements to better deal with negative feelings
Raymond Koopmanschap · 2024-02-27T15:10:24.149Z · comments (7)

Housing Roundup #7
Zvi · 2024-03-04T15:00:08.192Z · comments (1)

Examining Language Model Performance with Reconstructed Activations using Sparse Autoencoders
Evan Anders (evan-anders) · 2024-02-27T02:43:22.446Z · comments (16)

[link] We Need Major, But Not Radical, FDA Reform
Maxwell Tabarrok (maxwell-tabarrok) · 2024-02-24T16:54:33.061Z · comments (12)

Deep and obvious points in the gap between your thoughts and your pictures of thought
KatjaGrace · 2024-02-23T07:30:07.461Z · comments (6)

Trust as a bottleneck to growing teams quickly
benkuhn · 2024-07-13T18:00:04.579Z · comments (3)

Protocol evaluations: good analogies vs control
Fabien Roger (Fabien) · 2024-02-19T18:00:09.794Z · comments (10)

[link] you should probably eat oatmeal sometimes
bhauth · 2024-08-25T14:50:37.570Z · comments (32)

Case studies on social-welfare-based standards in various industries
HoldenKarnofsky · 2024-06-20T13:33:44.780Z · comments (0)

When fine-tuning fails to elicit GPT-3.5's chess abilities
Theodore Chapman · 2024-06-14T18:50:52.855Z · comments (3)

[link] Things I learned talking to the new breed of scientific institution
Abhishaike Mahajan (abhishaike-mahajan) · 2024-08-29T14:00:14.844Z · comments (6)

Take SCIFs, it’s dangerous to go alone
latterframe · 2024-05-01T08:02:38.067Z · comments (1)

[link] Post series on "Liability Law for reducing Existential Risk from AI"
Nora_Ammann · 2024-02-29T04:39:50.557Z · comments (1)

[link] An Interactive Shapley Value Explainer
James Stephen Brown (james-brown) · 2024-09-28T05:01:21.169Z · comments (9)

[link] IAPS: Mapping Technical Safety Research at AI Companies
Zach Stein-Perlman · 2024-10-24T20:30:41.159Z · comments (13)

Debate: Get a college degree?
Ben Pace (Benito) · 2024-08-12T22:23:34.744Z · comments (14)

US Presidential Election: Tractability, Importance, and Urgency
kuhanj · 2024-05-29T23:52:22.420Z · comments (2)

[link] Characterizing stable regions in the residual stream of LLMs
Jett Janiak (jett) · 2024-09-26T13:44:58.792Z · comments (4)

Formalizing the Informal (event invite)
abramdemski · 2024-09-10T19:22:53.564Z · comments (0)

Australian AI Safety Forum 2024
Liam Carroll (liam-carroll) · 2024-09-27T00:40:11.451Z · comments (0)

5 ways to improve CoT faithfulness
CBiddulph (caleb-biddulph) · 2024-10-05T20:17:12.637Z · comments (40)

MATS AI Safety Strategy Curriculum v2
DanielFilan · 2024-10-07T22:44:06.396Z · comments (6)

Time Efficient Resistance Training
romeostevensit · 2024-10-07T15:15:44.950Z · comments (10)

Unit economics of LLM APIs
dschwarz · 2024-08-27T16:51:22.692Z · comments (0)

A Robust Natural Latent Over A Mixed Distribution Is Natural Over The Distributions Which Were Mixed
johnswentworth · 2024-08-22T19:19:28.940Z · comments (4)

[link] Soviet comedy film recommendations
Nina Panickssery (NinaR) · 2024-06-09T23:40:58.536Z · comments (11)

Paper Summary: The Effects of Communicating Uncertainty on Public Trust in Facts and Numbers
Jeffrey Heninger (jeffrey-heninger) · 2024-07-09T16:50:05.776Z · comments (2)

AI #89: Trump Card
Zvi · 2024-11-07T16:30:05.684Z · comments (12)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

joey-kl on Self's Shortform

You mean this substance? https://en.wikipedia.org/wiki/Mesembrine

Do you have a recommended brand, or places to read more about it?

daniellc on AI is Software is AI

Hyperbolic (like 1/x). I feel like you're hinting the answer is exponential, but that implies a constant doubling time, which isn't what we have here.

unnamed on EA Vegan Advocacy is not truthseeking, and it’s everyone’s problem

I'm voting against including this in the Review, at max level, because I think it too-often mischaracterizes the views of the people it quotes. And it seems real bad for a post that is mainly about describing other people's views and the drawing big conclusions from that data to inaccurately describe those views and then draw conclusions from inaccurate data.

I'd be interested in hearing about this from people who favor putting this post in the review. Did you check on the sources for some of Elizabeth's claims and think that she described them well? Did you see some inaccuracies but figure that the post is still good enough? Did you trust Elizabeth's descriptions without checking yourself on what the person said?

I spent a fair amount of time spot checking Elizabeth's first section, on Martin Soto, which got my attention because it seemed like it could be one of her strongest and it was the first. This claim from Elizabeth in that section seems clearly false: "The charitable explanation here is that my post focuses on naive veganism, and Soto thinks that’s a made-up problem". The first few paragraphs quoted in this post are sufficient to falsify this interpretation, and the first comment [LW(p) · GW(p)] that Martin left on Elizabeth's post is too. Other parts of the description of Martin's views which are more central to Elizabeth's argument also seem off, though sorting them out requires getting more in the weeds. e.g. AFAICT he didn't say he opposed talking about the whole topic of vegan nutrition; he did say something along the lines of 'you didn't say anything false, but I don't like the way you presented things because it'll have bad consequences', but that's a pretty normal type of opinion - Elizabeth said something like that [LW · GW] about Will MacAskill in another post in this series.

Other places where this post felt off include Elizabeth's description of what people were trying to claim when they brought up the Adventist study, and the claim that this comment [LW(p) · GW(p)] by Wilkox involved frame control (it doesn't look like Wilkox was trying to force their frame on the conversation; rather, it looks like Elizabeth brought a strong frame to the "Change my mind" post, Wilkox didn't immediately buy into it and was trying to think through the overall frame that Elizabeth brought and the specific concrete claims that Elizabeth made).

There are other examples in the comments, e.g. this comment [LW(p) · GW(p)] by Wilkox (currently at +12 net agree-vote, w/o a vote from me) gives 6 examples where the post's "description of what was said seems to misrepresent the source text", with some overlap with my examples and some I haven't looked into.

Before doing these spot checks I was inclined to vote against this post for the review at -1 because it didn't seem to live up to the title. It was trying to do a hard thing and didn't pull it off -- or at least, I didn't get a particularly clear sense of the nature and extent of epistemic problems within EA vegan advocacy and had just cached the post as 'Elizabeth's upset about EA vegan epistemics'. After digging in to some of it more closely, it looks like it did a worse job than I'd thought, so I've moved my vote downward and written this review.

nicholas-heather-kross on Rationalist Movie Reviews

I said "one of the best movies about", not "one of the best movies showing you how to".

seth-herd on The Capitalist Agent

Right. I actually don't worry much about the likely disastrous recession. I mostly worry that we will all die after a takeover from some sort of misaligned AGI. So I am doing - doing alignment research. I guess preparing to reap the rewards if things go well is a sensible response if you're not going to be able to contribute much to alignment research. I do hope you'll chip in on that effort!

Part of that effort is preventing related disasters like global recession contributing to political instability and resulting nuclear- or AGI-invented-even-worse-weapons wars; see my If we solve alignment, do we die anyway? [LW · GW].

I think preventing a global recession is probably possible and would also up the odds of us surviving. Making some money wont' keep you and yours alive if this all goes off the rails - which it very well might on the current trajectory. It's not a matter for optimism or pessimism, it's a matter for understanding and doing something about it before it happens.

seth-herd on Anti-Slop Interventions?

If John Wentworth is correct about that being the biggest danger, making AI produce less slop would be the clear best path. I think it might be a good idea even if the dangers were split between misalignment of the first transformative AI, and it being adequately aligned but helping misalign the next generation.

From my comment on that post:

I'm curious why you think deceptive alignment from transformative AI is not much of a threat. I wonder if you're envisioning purely tool AI, or aligned agentic AGI that's just not smart enough to align better AGI?
I think it's quite implausible that we'll leave foundation models as tools rather than using the prompt "pretend you're an agent and call these tools" to turn them into agents. People want their work done for them, not just advice on how to do their work.
I do think it's quite plausible that we'll have aligned agentic foundation model agents that won't be quite smart enough to solve deeper alignment problems reliably, and sycophantic/clever enough to help researchers fool themselves into thinking they're solved. Since your last post to that effect it's become one of my leading routes to disaster. Thanks, I hate it.
OTOH, if that process is handled slightly better, it seems like we could get the help we need to solve alignment from early aligned LLM agent AGIs. This is valuable work on that risk model that could help steer [AI development] orgs away from likely mistakes and toward better practices.

Following on that logic, I think making our first transformative AI less prone to slop/errors is a good idea. The problem is that most such efforts probably speed up progress to getting there.

I'm starting to feel pretty sure that refusing to speed up progress and hoping we get enough time or a complete stallout is unrealistic. Accepting that we're on a terrifying trajectory and trying to steer it seems like the best response.

I think routes of reducing slop also contributes to aligning the first really competent LLM-based agents. One example is engineering such an agent to review its important decisions to see if they either make important errors or change/violate their central goals. I've written about that here [AF · GW] but I'm publishing an updated and expanded post soon.

So yes, I think this is probably somethings we should be doing. It's always going to be a judgment call of whether you publicize any particular idea. But there are more clever-to-brilliant people working on capabilities every day. Hoping they just won't have the same good ideas seems like a forlorn hope. Sharing the ones that seem to have more alignment relevance seems like it will probably differentially advance alignment over capabilities.

fabien-roger on Gradual Disempowerment: Systemic Existential Risks from Incremental AI Development

(For the concrete scenario, feel free to stop at the point which is obviously unrecoverable and where AIs have misaligned goals, e.g. it can end with massive AI-runs companies maximizing profit (with media control and the ability to ostracize any member of the government because of sth that they secretly did 20 years ago that became deeply shameful because of cultural evolution), and whose shareholders are grandmas held in an eternal coma who technically become richer and richer but can never spend their money.)

lblack on Anti-Slop Interventions?

If you de-slopify the models, how do you avoid people then using them to accelerate capabilities research just as much as safety research? Why wouldn't that leave us with the same gap in progress between the two we have right now, or even a worse gap? Except that everything would be moving to the finish line even faster, so Earth would have even less time to react.

Is the idea that it wouldn't help safety go differentially faster at all, but rather just that it may preempt people latching on to false slop-solutions for alignment as an additional source of confidence that racing ahead is fine? If that is the main payoff you envision, I don't think it'd be worth the downside of everything happening even faster. I think time is very precious, and sources of confidence already abound for those who go looking for them.

saidachmiz on Russian Food for Petrov Day

Here’s my grandmother’s borscht recipe and a recipe for “pasta navy style”.

saidachmiz on Russian Food for Petrov Day

Some relevant dessert recipes: