LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

On Dwarkesh Patel’s 4th Podcast With Tyler Cowen
Zvi · 2025-01-10T13:50:05.563Z · comments (7)

Zvi’s 2024 In Movies
Zvi · 2025-01-13T13:40:05.488Z · comments (2)

[link] Two interviews with the founder of DeepSeek
Cosmia_Nebula · 2024-11-29T03:18:47.246Z · comments (1)

Paper Summary: The Effects of Communicating Uncertainty on Public Trust in Facts and Numbers
Jeffrey Heninger (jeffrey-heninger) · 2024-07-09T16:50:05.776Z · comments (2)

Deep and obvious points in the gap between your thoughts and your pictures of thought
KatjaGrace · 2024-02-23T07:30:07.461Z · comments (6)

Protocol evaluations: good analogies vs control
Fabien Roger (Fabien) · 2024-02-19T18:00:09.794Z · comments (10)

Estimating efficiency improvements in LLM pre-training
Daan · 2024-01-19T19:32:45.124Z · comments (3)

Wholesomeness and Effective Altruism
owencb · 2024-02-28T20:28:22.175Z · comments (3)

Australian AI Safety Forum 2024
Liam Carroll (liam-carroll) · 2024-09-27T00:40:11.451Z · comments (0)

List your AI X-Risk cruxes!
Aryeh Englander (alenglander) · 2024-04-28T18:26:19.327Z · comments (7)

Goals selected from learned knowledge: an alternative to RL alignment
Seth Herd · 2024-01-15T21:52:06.170Z · comments (18)

[link] We Need Major, But Not Radical, FDA Reform
Maxwell Tabarrok (maxwell-tabarrok) · 2024-02-24T16:54:33.061Z · comments (12)

[link] Soviet comedy film recommendations
Nina Panickssery (NinaR) · 2024-06-09T23:40:58.536Z · comments (11)

Evidential Cooperation in Large Worlds: Potential Objections & FAQ
Chi Nguyen · 2024-02-28T18:58:25.688Z · comments (5)

[link] Things I learned talking to the new breed of scientific institution
Abhishaike Mahajan (abhishaike-mahajan) · 2024-08-29T14:00:14.844Z · comments (6)

[question] What rationality failure modes are there?
Ulisse Mini (ulisse-mini) · 2024-01-19T09:12:57.924Z · answers+comments (11)

How I internalized my achievements to better deal with negative feelings
Raymond Koopmanschap · 2024-02-27T15:10:24.149Z · comments (7)

[link] you should probably eat oatmeal sometimes
bhauth · 2024-08-25T14:50:37.570Z · comments (32)

Housing Roundup #7
Zvi · 2024-03-04T15:00:08.192Z · comments (1)

US Presidential Election: Tractability, Importance, and Urgency
kuhanj · 2024-05-29T23:52:22.420Z · comments (2)

[link] Surgery Works Well Without The FDA
Maxwell Tabarrok (maxwell-tabarrok) · 2024-01-26T13:31:29.968Z · comments (28)

A Robust Natural Latent Over A Mixed Distribution Is Natural Over The Distributions Which Were Mixed
johnswentworth · 2024-08-22T19:19:28.940Z · comments (4)

[link] Characterizing stable regions in the residual stream of LLMs
Jett Janiak (jett) · 2024-09-26T13:44:58.792Z · comments (4)

[link] Post series on "Liability Law for reducing Existential Risk from AI"
Nora_Ammann · 2024-02-29T04:39:50.557Z · comments (1)

Examining Language Model Performance with Reconstructed Activations using Sparse Autoencoders
Evan Anders (evan-anders) · 2024-02-27T02:43:22.446Z · comments (16)

Debate: Get a college degree?
Ben Pace (Benito) · 2024-08-12T22:23:34.744Z · comments (14)

[link] Rowing vs steering
Saul Munn (saul-munn) · 2024-08-10T07:00:17.594Z · comments (2)

AI #89: Trump Card
Zvi · 2024-11-07T16:30:05.684Z · comments (12)

Unit economics of LLM APIs
dschwarz · 2024-08-27T16:51:22.692Z · comments (0)

Startup Success Rates Are So Low Because the Rewards Are So Large
AppliedDivinityStudies (kohaku-none) · 2024-10-10T20:22:01.557Z · comments (6)

MATS AI Safety Strategy Curriculum v2
DanielFilan · 2024-10-07T22:44:06.396Z · comments (6)

Time Efficient Resistance Training
romeostevensit · 2024-10-07T15:15:44.950Z · comments (10)

Take SCIFs, it’s dangerous to go alone
latterframe · 2024-05-01T08:02:38.067Z · comments (1)

[link] An Interactive Shapley Value Explainer
James Stephen Brown (james-brown) · 2024-09-28T05:01:21.169Z · comments (9)

Trust as a bottleneck to growing teams quickly
benkuhn · 2024-07-13T18:00:04.579Z · comments (3)

Case studies on social-welfare-based standards in various industries
HoldenKarnofsky · 2024-06-20T13:33:44.780Z · comments (0)

When fine-tuning fails to elicit GPT-3.5's chess abilities
Theodore Chapman · 2024-06-14T18:50:52.855Z · comments (3)

Formalizing the Informal (event invite)
abramdemski · 2024-09-10T19:22:53.564Z · comments (0)

[link] IAPS: Mapping Technical Safety Research at AI Companies
Zach Stein-Perlman · 2024-10-24T20:30:41.159Z · comments (13)

D&D.Sci Alchemy: Archmage Anachronos and the Supply Chain Issues
aphyer · 2024-06-07T19:02:06.859Z · comments (16)

Are we dropping the ball on Recommendation AIs?
Charbel-Raphaël (charbel-raphael-segerie) · 2024-10-23T17:48:00.000Z · comments (17)

A Teacher vs. Everyone Else
ronak69 · 2024-03-21T17:45:35.714Z · comments (8)

GPT-4o My and Google I/O Day
Zvi · 2024-05-16T17:50:03.040Z · comments (2)

How ARENA course material gets made
CallumMcDougall (TheMcDouglas) · 2024-07-02T18:04:00.209Z · comments (2)

[link] Beyond the Board: Exploring AI Robustness Through Go
AdamGleave · 2024-06-19T16:40:06.594Z · comments (2)

(Approximately) Deterministic Natural Latents
johnswentworth · 2024-07-19T23:02:12.306Z · comments (0)

[link] Point of Failure: Semiconductor-Grade Quartz
Annapurna (jorge-velez) · 2024-09-30T15:57:40.495Z · comments (8)

DunCon @Lighthaven
Duncan Sabien (Deactivated) (Duncan_Sabien) · 2024-09-29T04:56:27.205Z · comments (0)

[link] College technical AI safety hackathon retrospective - Georgia Tech
yix (Yixiong Hao) · 2024-11-15T00:22:53.159Z · comments (2)

Notes on Dwarkesh Patel’s Podcast with Sholto Douglas and Trenton Bricken
Zvi · 2024-04-01T19:10:12.193Z · comments (1)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

foyle on Why do futurists care about the culture war?

Agree that most sociological, economic and environmental problems that loom large in current context will radically shift in importance in next decade or two, to the point that they are probably no longer worth devoting any significant resources to in the present. Impacts of AI are only issue worth worrying about. But even assuming utopian outcomes; who gets possession of the Malibu beach houses in post scarcity world?

Once significant white-collar job losses start to mount in a year or two I think it inevitable that a powerful and electorally dominant anti-AI movement will grow, at least in erstwhile democracies, and likely ban most AGI applications outside of a few fields where fewer workers would stand to lose jobs (health - with near endless demand, perhaps cutting edge tech where payoff to human net welfare is highest). Butlerian Jihad-lite.

It won't save us, and has substantial risk of ushering in repressive authoritarianism in the political ruckus caused but will likely delay our demise or (at best) delivery into powerless pet status by perhaps a decade or two.

fl33tw00d on How do you deal w/ Super Stimuli?

Firefox Focus on iPhone is useful here.

Delete all social media apps from your phone, and hide safari.

Only access any platforms via Firefox Focus - browsing sessions are ephemeral, so you need to login each time. This added friction basically solved it for me.

viliam on The low Information Density of Eliezer Yudkowsky & LessWrong

I'll try to respect your preference for brevity ;)

a shorter version would be very useful -- yes, fully agree
- at least there is readthesequences.com without the comments (10x as much text as the articles)
- there were summaries at LW wiki, but those were too short; we need something medium-sized
there are some good reasons why Eliezer wrote a long text
- there wasn't rationalist community yet, lines had to be drawn to separate it from many existing adjacent communities (atheists, skeptics, libertarians, sci-fi fans, self-help, contrarians, academia...)
- emotional, near-mode appeal -- why should we even care about "being rational"?
- popular bad memes/patterns (mysterious answers, applause lights, "trust the science"...)

tl;dr -- writing for an already existing rationalist(-ish) community is different from writing in order to create a rationalist community

davidmanheim on AI Safety as a YC Startup

True, and even more, if optimizing for impact or magnitude has Goodhart effects, of various types, then even otherwise good directions are likely to be ruined by pushing on them too hard. (In large part because it seems likely that the space we care about is not going to have linear divisions into good and bad, there will be much more complex regions, and even when pointed in a directino that is locally better, pushing too far is possible, and very hard to predict from local features even if people try, which they mostly don't.)

davidmanheim on AI Safety as a YC Startup

I think the point wasn't having a unit norm, it was that impact wasn't defined as directional, so we'd need to remove the dimensionality from a multidimensionally defined direction.

So to continue the nitpicking, I'd argue impact = || Magnitude * Direction ||, or better, ||Impact|| = Magnitude * Direction, so that we can talk about size of impact. And that makes my point in a different comment even clearer - because almost by assumption, the vast majority of those with large impact are pointed in net-negative directions, unless you think either a significant proportion of directions are positive, or that people are selecting for it very strongly, which seems not to be the case.

james-chua on Inference-Time-Compute: More Faithful? A Research Note

thanks! Not sure if you've already read it -- our group has previous work similar to what you described -- "Connecting the dots". Models can e.g. articulate functions that that implicit in the training data. This ability is not perfect, models still have a long way to go.

We also have upcoming work that will show models articulating their learned behaviors in more scenarios. Will be released soon.

satron on Reducing sycophancy and improving honesty via activation steering

A new method for reducing sycophancy. Sycophantic behavior is present in quite a few AI threat models, so it's an important area to work on.

The article not only uses activation steering to reduce sycophancy in AI models but also provides directions for future work [LW · GW].

Overall, this post is a valuable addition to the toolkit of people who wish to build safe advanced AI.

tailcalled on We probably won't just play status games with each other after AGI

I don't think consumers demand authentic AI friends because they already have authentic human friends. Also it's not clear how you imagine the AI companies could train the AIs to be more independent and less superficial; generally training an AI requires a differentiable loss, but human independence does not originate from a differentiable loss and so it's not obvious that one could come up with something functionally similar via gradient descent.

paddyc on Seeing Red: Dissolving Mary's Room and Qualia

It’s easy to imagine visual perception in relation to qualia ie how do I know that the blue I see looks the same to you beyond identifying it eg the sky is blue. But I think it’s harder to imagine qualia in relation to sound, that is how could sound have a subjective essence that is possibly unique to each individual. I think you either hear something or you don’t.

fl33tw00d on Do humans really learn from "little" data?

The linked video says so at 30:45