LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Zvi’s 2024 In Movies
Zvi · 2025-01-13T13:40:05.488Z · comments (4)

Claude's Constitutional Consequentialism?
1a3orn · 2024-12-19T19:53:33.254Z · comments (6)

DunCon @Lighthaven
Duncan Sabien (Deactivated) (Duncan_Sabien) · 2024-09-29T04:56:27.205Z · comments (2)

Practicing Bayesian Epistemology with "Two Boys" Probability Puzzles
Liron · 2025-01-02T04:42:20.362Z · comments (14)

MATS mentor selection
DanielFilan · 2025-01-10T03:12:52.141Z · comments (11)

Which LessWrong/Alignment topics would you like to be tutored in? [Poll]
Ruby · 2024-09-19T01:35:02.999Z · comments (12)

Dmitry's Koan
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-10T04:27:30.346Z · comments (8)

[link] Review: Good Strategy, Bad Strategy
L Rudolf L (LRudL) · 2024-12-21T17:17:04.342Z · comments (0)

[link] Began a pay-on-results coaching experiment, made $40,300 since July
Chipmonk · 2024-12-29T21:12:02.574Z · comments (15)

Evolution and the Low Road to Nash
Aydin Mohseni (aydin-mohseni) · 2025-01-22T07:06:32.305Z · comments (2)

Sci-Fi books micro-reviews
Yair Halberstadt (yair-halberstadt) · 2024-06-24T09:49:28.523Z · comments (27)

Locating My Eyes (Part 3 of "The Sense of Physical Necessity")
LoganStrohl (BrienneYudkowsky) · 2024-02-29T03:09:25.810Z · comments (4)

[question] Does reducing the amount of RL for a given capability level make AI safer?
Chris_Leong · 2024-05-05T17:04:01.799Z · answers+comments (22)

The need for multi-agent experiments
Martín Soto (martinsq) · 2024-08-01T17:14:16.590Z · comments (3)

Ambiguity in Prediction Market Resolution is Still Harmful
aphyer · 2024-07-31T20:32:40.217Z · comments (17)

Examining Language Model Performance with Reconstructed Activations using Sparse Autoencoders
Evan Anders (evan-anders) · 2024-02-27T02:43:22.446Z · comments (16)

Understanding Positional Features in Layer 0 SAEs
bilalchughtai (beelal) · 2024-07-29T09:36:40.701Z · comments (0)

New Executive Team & Board — PIBBSS
Nora_Ammann · 2024-07-01T19:30:45.261Z · comments (1)

The Case for Predictive Models
Rubi J. Hudson (Rubi) · 2024-04-03T18:22:20.243Z · comments (7)

Concrete empirical research projects in mechanistic anomaly detection
Erik Jenner (ejenner) · 2024-04-03T23:07:21.502Z · comments (3)

[link] Rowing vs steering
Saul Munn (saul-munn) · 2024-08-10T07:00:17.594Z · comments (2)

[link] Post series on "Liability Law for reducing Existential Risk from AI"
Nora_Ammann · 2024-02-29T04:39:50.557Z · comments (1)

List your AI X-Risk cruxes!
Aryeh Englander (alenglander) · 2024-04-28T18:26:19.327Z · comments (7)

Housing Roundup #7
Zvi · 2024-03-04T15:00:08.192Z · comments (1)

US Presidential Election: Tractability, Importance, and Urgency
kuhanj · 2024-05-29T23:52:22.420Z · comments (2)

[link] Soviet comedy film recommendations
Nina Panickssery (NinaR) · 2024-06-09T23:40:58.536Z · comments (11)

Evidential Cooperation in Large Worlds: Potential Objections & FAQ
Chi Nguyen · 2024-02-28T18:58:25.688Z · comments (5)

D&D.Sci Alchemy: Archmage Anachronos and the Supply Chain Issues
aphyer · 2024-06-07T19:02:06.859Z · comments (16)

How I internalized my achievements to better deal with negative feelings
Raymond Koopmanschap · 2024-02-27T15:10:24.149Z · comments (7)

Wholesomeness and Effective Altruism
owencb · 2024-02-28T20:28:22.175Z · comments (3)

When fine-tuning fails to elicit GPT-3.5's chess abilities
Theodore Chapman · 2024-06-14T18:50:52.855Z · comments (3)

Gated Attention Blocks: Preliminary Progress toward Removing Attention Head Superposition
cmathw · 2024-04-08T11:14:43.268Z · comments (4)

Debate: Get a college degree?
Ben Pace (Benito) · 2024-08-12T22:23:34.744Z · comments (14)

Paper Summary: The Effects of Communicating Uncertainty on Public Trust in Facts and Numbers
Jeffrey Heninger (jeffrey-heninger) · 2024-07-09T16:50:05.776Z · comments (2)

Take SCIFs, it’s dangerous to go alone
latterframe · 2024-05-01T08:02:38.067Z · comments (1)

Startup Success Rates Are So Low Because the Rewards Are So Large
AppliedDivinityStudies (kohaku-none) · 2024-10-10T20:22:01.557Z · comments (6)

[link] Characterizing stable regions in the residual stream of LLMs
Jett Janiak (jett) · 2024-09-26T13:44:58.792Z · comments (4)

Australian AI Safety Forum 2024
Liam Carroll (liam-carroll) · 2024-09-27T00:40:11.451Z · comments (0)

[link] IAPS: Mapping Technical Safety Research at AI Companies
Zach Stein-Perlman · 2024-10-24T20:30:41.159Z · comments (13)

Weirdness Points
lsusr · 2025-02-28T02:23:56.508Z · comments (1)

MATS AI Safety Strategy Curriculum v2
DanielFilan · 2024-10-07T22:44:06.396Z · comments (6)

AI #89: Trump Card
Zvi · 2024-11-07T16:30:05.684Z · comments (12)

Case studies on social-welfare-based standards in various industries
HoldenKarnofsky · 2024-06-20T13:33:44.780Z · comments (0)

[link] Things I learned talking to the new breed of scientific institution
Abhishaike Mahajan (abhishaike-mahajan) · 2024-08-29T14:00:14.844Z · comments (6)

Time Efficient Resistance Training
romeostevensit · 2024-10-07T15:15:44.950Z · comments (10)

Trust as a bottleneck to growing teams quickly
benkuhn · 2024-07-13T18:00:04.579Z · comments (3)

Causal inference for the home gardener
braces · 2024-11-27T17:55:52.629Z · comments (1)

Brainrot
Jesse Hoogland (jhoogland) · 2025-01-26T05:35:35.396Z · comments (0)

[link] you should probably eat oatmeal sometimes
bhauth · 2024-08-25T14:50:37.570Z · comments (32)

[link] An Interactive Shapley Value Explainer
James Stephen Brown (james-brown) · 2024-09-28T05:01:21.169Z · comments (9)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

chipmonk on Do clients need years of therapy, or can one conversation resolve the issue?

I don't feel like I learned anything new from the post.

This surprises me! Wait so-

The "How does one-shotting happen?" section didn't have anything interesting for you? (Have you seen stuff like this elsewhere?)
Did you already know one-shotting was possible?

chipmonk on Do clients need years of therapy, or can one conversation resolve the issue?

since your bullet-point list in the beginning isn't detailed enough for anyone to try to replicate the method.

Wait I'm confused- this is not the purpose of the post

Also notable is that you only have positive examples for your method

The purpose of this post is not advertisement. It's to discuss one-shots

Especially, how would you be able to distinguish between your approach convincing your customers they were helped, instead of actually changing their behavior?

See above

benito on Will_Pearson's Shortform

I have used my admin powers to put it into a collapsible section so that people who expand this in recent discussion do not have to scroll for 5 seconds to get past it.

vladimir_nesov on Daniel Kokotajlo's Shortform

my intuitions have been shaped by events like the pretraining slowdown

I don't see it. GPT-4.5 is much better than the original GPT-4, probably at 15x more compute. But it's not 100x more compute. And GPT-4o is an intermediate point, so the change from GPT-4o to GPT-4.5 is even smaller, maybe 4x.

I think 3x change in compute has an effect at the level of noise from different reasonable choices in constructing a model, and 100K H100s is only 5x more than 20K H100s of 2023. It's not a slowdown relative to what it should've been. And there are models with 200x more raw compute than went into GPT-4.5 that are probably coming in 2027-2029, much more than the 4x-15x observed since 2022-2023.

niplav on Will_Pearson's Shortform

Please don't post 25k words of unformatted LLM (?) output?

niplav on Do clients need years of therapy, or can one conversation resolve the issue?

I gave your post to Claude and gave it the prompt "Dearest Claude, here's the text for a blogpost I've written for LessWrong. I've been told that "it sounds a lot like an advertisement". Can you give me feedback/suggestions for how to improve it for that particular audience? I don't want to do too much more research, but a bit of editing/stylistic choices."

(All of the following is my rephrasing/rethinking of Claude output plus some personal suggestions.)

Useful things that came out of the answer were explaining more about the method you've used to achieve this, since your bullet-point list in the beginning isn't detailed enough for anyone to try to replicate the method.

Also notable is that you only have positive examples for your method, which activates my filtered evidence [? · GW] detectors. Either make clear that you indeed did only have positive results, or name how many people you coached, for how long, and that they were all happy with what you provided.

Finally, some direct words from Claude that I just directly endorse:

For LessWrong specifically, I'd also recommend:

Adding a section on falsifiability - how would you know if your approach doesn't work?

Discussing potential failure modes of your approach

Including more technical details on your methodology (not just results)

Especially, how would you be able to distinguish between your approach convincing your customers they were helped, instead of actually changing their behavior? That feels like the failure mode of most self-help techniques—they're "self-recommending".

daniel-kokotajlo on Daniel Kokotajlo's Shortform

yes! :D

Relatedly, one of the things that drove me to have short timelines in the first place was reading the literature and finding the best arguments for long timelines. Especially Ajeya Cotra's original bio anchors report, which I considered to be the best; I found that when I went through it bit by bit and made various adjustments to the parameters/variables, fixing what seemed to me to be errors, it all added up to an on-balance significantly shorter timeline.

groblegark on groblegark's Shortform

i dont have time to write any of this down so it's going to come out in the wrong order but here

agentic AI is the means of production for codegen
model access limits and closedness are therefore a threat to Workers
I use and maintain software. I survive by staying 5 feet in front of the steamroller
I am not wealthy, I can't afford to be tripped and squished.
OSS is traditionally the way of protecting myself in this situation
I need to write tons of good code and enable my company to do the same, and I need to do it while washing the dishes (Covid happened).
The industry wants to give me a Commodore 64, but I need a PDP-10

This might be interesting to LessWrong as a personal take, because the Alignment folks are effectively on the side of Capital here. Without fast and parallel access to foundation models, I can't learn my new job, which is auto-codegen-pipeline-maintainer. If some 3rd party brings that bird home to my boss instead of me, I'm going to be unwealthy and unemployed. It's possible I'm too late already... but at any rate some people will be too late. If I were/am them, would be/am be angry.

I think a lot of people realize this already, and some have already ascended. I think the ascended people are being quiet right now because they realize the stuff I put above, and don't mind less competition. What LessWrong thinks about that, I don't care; I'm actually fine with it as long as I get to join the ascended. I suspect that's a common attitude. If you don't hear from me again, it's because I figured it out. If this sounds crazy, I'm interested in hearing why.

daniel-kokotajlo on Daniel Kokotajlo's Shortform

Re: Point 1: I agree it would not necessarily be incorrect. I do actually think that probably the remaining challenges are engineering challenges. Not necessarily, but probably. Can you point to any challenges that seem (a) necessary for speeding up AI R&D by 5x, and (b) not engineering challenges?

Re: Point 2: I don't buy it. Deep neural nets are actually useful now, and increasingly so. Making them more useful seems analogous to selective breeding or animal training, not analogous to trying to time the market.

cole-wyeth on Reflective oracles as a solution to the converse Lawvere problem

Why call it "converse Lawvere" instead of the more standard "utm property" of general recursion theory, e.g. as in Odifreddi? Only because the maps are to [0,1]? That seems like insufficient reason to adopt an unrelated name.