LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Time Efficient Resistance Training
romeostevensit · 2024-10-07T15:15:44.950Z · comments (8)

Koan: divining alien datastructures from RAM activations
TsviBT · 2024-04-05T18:04:57.280Z · comments (10)

MATS AI Safety Strategy Curriculum v2
DanielFilan · 2024-10-07T22:44:06.396Z · comments (6)

Paper Summary: The Effects of Communicating Uncertainty on Public Trust in Facts and Numbers
Jeffrey Heninger (jeffrey-heninger) · 2024-07-09T16:50:05.776Z · comments (2)

Housing Roundup #7
Zvi · 2024-03-04T15:00:08.192Z · comments (1)

How I internalized my achievements to better deal with negative feelings
Raymond Koopmanschap · 2024-02-27T15:10:24.149Z · comments (7)

Protocol evaluations: good analogies vs control
Fabien Roger (Fabien) · 2024-02-19T18:00:09.794Z · comments (10)

How toy models of ontology changes can be misleading
Stuart_Armstrong · 2023-10-21T21:13:56.384Z · comments (0)

Estimating efficiency improvements in LLM pre-training
Daan · 2024-01-19T19:32:45.124Z · comments (3)

[question] What rationality failure modes are there?
Ulisse Mini (ulisse-mini) · 2024-01-19T09:12:57.924Z · answers+comments (11)

Monthly Roundup #11: October 2023
Zvi · 2023-10-03T14:10:01.686Z · comments (12)

Sherlockian Abduction Master List
Cole Wyeth (Amyr) · 2024-07-11T20:27:00.000Z · comments (60)

Examining Language Model Performance with Reconstructed Activations using Sparse Autoencoders
Evan Anders (evan-anders) · 2024-02-27T02:43:22.446Z · comments (16)

[Valence series] 5. “Valence Disorders” in Mental Health & Personality
Steven Byrnes (steve2152) · 2023-12-18T15:26:29.970Z · comments (12)

[link] Project ideas: Epistemics
Lukas Finnveden (Lanrian) · 2024-01-05T23:41:23.721Z · comments (4)

Evidential Cooperation in Large Worlds: Potential Objections & FAQ
Chi Nguyen · 2024-02-28T18:58:25.688Z · comments (5)

Wholesomeness and Effective Altruism
owencb · 2024-02-28T20:28:22.175Z · comments (3)

Taking responsibility and partial derivatives
Ruby · 2023-12-31T04:33:51.419Z · comments (1)

[link] you should probably eat oatmeal sometimes
bhauth · 2024-08-25T14:50:37.570Z · comments (31)

[link] An Interactive Shapley Value Explainer
James Stephen Brown (james-brown) · 2024-09-28T05:01:21.169Z · comments (9)

[link] AI Girlfriends Won't Matter Much
Maxwell Tabarrok (maxwell-tabarrok) · 2023-12-23T15:58:30.308Z · comments (22)

Formalizing the Informal (event invite)
abramdemski · 2024-09-10T19:22:53.564Z · comments (0)

Case studies on social-welfare-based standards in various industries
HoldenKarnofsky · 2024-06-20T13:33:44.780Z · comments (0)

US Presidential Election: Tractability, Importance, and Urgency
kuhanj · 2024-05-29T23:52:22.420Z · comments (2)

When fine-tuning fails to elicit GPT-3.5's chess abilities
Theodore Chapman · 2024-06-14T18:50:52.855Z · comments (3)

Are humans misaligned with evolution?
TekhneMakre · 2023-10-19T03:14:14.759Z · comments (13)

Apply to the Constellation Visiting Researcher Program and Astra Fellowship, in Berkeley this Winter
Nate Thomas (nate-thomas) · 2023-10-26T03:07:34.118Z · comments (10)

[Linkpost] Personal and Psychological Dimensions of AI Researchers Confronting AI Catastrophic Risks
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2023-08-12T22:02:09.895Z · comments (0)

Take SCIFs, it’s dangerous to go alone
latterframe · 2024-05-01T08:02:38.067Z · comments (1)

A Robust Natural Latent Over A Mixed Distribution Is Natural Over The Distributions Which Were Mixed
johnswentworth · 2024-08-22T19:19:28.940Z · comments (4)

[link] Post series on "Liability Law for reducing Existential Risk from AI"
Nora_Ammann · 2024-02-29T04:39:50.557Z · comments (1)

[link] We Need Major, But Not Radical, FDA Reform
Maxwell Tabarrok (maxwell-tabarrok) · 2024-02-24T16:54:33.061Z · comments (12)

Deep and obvious points in the gap between your thoughts and your pictures of thought
KatjaGrace · 2024-02-23T07:30:07.461Z · comments (6)

Trust as a bottleneck to growing teams quickly
benkuhn · 2024-07-13T18:00:04.579Z · comments (3)

Sparse Coding, for Mechanistic Interpretability and Activation Engineering
David Udell · 2023-09-23T19:16:31.772Z · comments (7)

[link] Trading off compute in training and inference (Overview)
Pablo Villalobos (pvs) · 2023-07-31T16:03:46.265Z · comments (2)

[question] Which possible AI systems are relatively safe?
Zach Stein-Perlman · 2023-08-21T17:00:27.582Z · answers+comments (20)

Debate: Get a college degree?
Ben Pace (Benito) · 2024-08-12T22:23:34.744Z · comments (14)

D&D.Sci Alchemy: Archmage Anachronos and the Supply Chain Issues
aphyer · 2024-06-07T19:02:06.859Z · comments (16)

Upgrading the AI Safety Community
trevor (TrevorWiesinger) · 2023-12-16T15:34:26.600Z · comments (9)

[link] cold aluminum for medicine
bhauth · 2023-12-16T14:38:03.260Z · comments (4)

Monthly Roundup #9: August 2023
Zvi · 2023-08-07T13:20:03.522Z · comments (25)

[link] What's new at FAR AI
AdamGleave · 2023-12-04T21:18:03.951Z · comments (0)

Book Review: 1948 by Benny Morris
Yair Halberstadt (yair-halberstadt) · 2023-12-03T10:29:16.696Z · comments (9)

Startup Success Rates Are So Low Because the Rewards Are So Large
AppliedDivinityStudies (kohaku-none) · 2024-10-10T20:22:01.557Z · comments (6)

GPT-4o My and Google I/O Day
Zvi · 2024-05-16T17:50:03.040Z · comments (2)

Concrete positive visions for a future without AGI
Max H (Maxc) · 2023-11-08T03:12:42.590Z · comments (28)

[link] What's important in "AI for epistemics"?
Lukas Finnveden (Lanrian) · 2024-08-24T01:27:06.771Z · comments (0)

How ARENA course material gets made
CallumMcDougall (TheMcDouglas) · 2024-07-02T18:04:00.209Z · comments (2)

AGI is easier than robotaxis
Daniel Kokotajlo (daniel-kokotajlo) · 2023-08-13T17:00:29.901Z · comments (30)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

leogao on Alexander Gietelink Oldenziel's Shortform

there is an obvious utilitarian reason of not getting sick

khafra on If far-UV is so great, why isn't it everywhere?

I'd be interested to know what the numbers on UV in ductwork look like over the past 5 years. When I had to get a new A/C system installed in 2020, they asked whether I wanted a UVC light installed in the air handler. I had, before then, been using a 70w UVC corn light I bought on Amazon to sterilize the exterior of groceries (back when we thought fomites might be a major transmission vector), and in improvised ductwork with fans and cardboard boxes taped together.
Getting a proper bulb--an optimal wavelength source--seemed like a big upgrade. Hard to come up with quantitative efficacy numbers, but we did have a friend over for the day, who turned out to have been in the early stages of covid, without getting infected. Our first infection was years later, at a music event.

jacob_drori on Exploring the Platonic Representation Hypothesis Beyond In-Distribution Data

This seems very interesting, but I think your post could do with a lot more detail. How were the correlations computed? How strongly do they support PRH? How was the OOD data generated? I'm sure the answers could be pieced together from the notebook, but most people won't click through and read the code.

joey-kl on Alexander Gietelink Oldenziel's Shortform

More reasons: people wear sunglasses when they’re doing fun things outdoors like going to the beach or vacationing so it’s associated with that, and also sometimes just hiding part of a picture can cause your brain to fill it in with a more attractive completion than is likely.

ryan_greenblatt on Sabotage Evaluations for Frontier Models

My guess would be that they're not saying that well-designed control evaluations become untrustworthy

It's a bit messy because we have some ability to check whether we should be able to evaluate things.

So, there are really three relevant "failure" states for well done control:

We can't find countermeasures such that our control evaluations indicate any real safety.
We no longer think that control evaluations work to evaluate safety as models greatly exceed our planning capabilites and/or can sandbag sufficiently well.
We think control evaluations can work well, but we're wrong and they actually don't.

I think (1) or (2) will likely happen prior to (3) if you do a good job.

We discuss this more here [LW · GW].

(Really, the actual system is more complex because we can vary how conservative control evaluations are and it is easier to be confident in the results of more conservative evaluations. E.g., we can be conservative about our AIs planning ability. So, there is really a somewhat more continuous tradeoff than the above list makes it seem.)

lukas-finnveden on Sabotage Evaluations for Frontier Models

There's at least two different senses in which "control" can "fail" for a powerful system:

Control evaluations can indicate that there's no way to deploy the system such that you both (i) get a lot of use out of it, and (ii) can get a low probability of catastrophe.
Control evaluations are undermined such that humans think that the model can be deployed safely, but actually the humans were misled and there's a high probability of catastrophe.

My impression is that Ryan & Buck typically talks about the first case. (E.g. in the link above.) I.e.: My guess would be that they're not saying that well-designed control evaluations become untrustworthy — just that they'll stop promising you safety.

But to be clear: In this question, you're asking about something more analogous to the second case, right? (Sabotage/sandbagging evaluations being misleading about models' actual capabilities at sabotage & sandbagging?)

My question posed in other words: Would you count "evaluations clearly say that models can sabotage & sandbag" as success or failure?

yoav-ravid on Overcoming Bias Anthology

Typo: It's Prediction Markets "Fail" To *Mooch (not Moloch)

skybluecat on Bitter lessons about lucid dreaming

Don't know if this counts but I sort of can affect and notice dreams without being really lucid in the sense of clearly knowing it's a dream. It feels more like I somehow believe everything is real but I'm having superpowers (like becoming a superhero), and I would use the powers in ways that make sense in the dream setting, instead of being my waking self and consciously choosing what I want to dream of next. As a kid, I noticed I could often fly when chased by enemies in my dreams, and later I could do more kinds of things in my dreams just by willing it, perhaps as a result of consuming too many scifi or fantasy books and games. And I noticed some recurrent patterns in my dreams, like places that don't exist in real life but dreaming-me believe to be my school or hometown. Sometimes I get a strange sense of "I dreamed of this before" when I somehow feel like I have had the same or similar dreams as I'm having now, but without really realizing that I'm dreaming or remembering who I am in waking life. Then I subconsciously know I can do these things, or can focus on seeing and memorizing more of the dream world (if it was interesting) so I can write it down after waking up.

david-johnston on A brief theory of why we think things are good or bad

I think precisely defining "good" and "bad" is a bit beside the point - it's a theory about how people come to believe things are good and bad, and we're perfectly capable of having vague beliefs about goodness and badness. That said, the theory is lacking a precise account of what kind of beliefs it is meant to explain.

The LLM section isn't meant as support for the theory, but speculation about what it would say about the status of "experiences" that language models can have. Compared to my pre-existing notions, the theory seems quite willing to accommodate LLMs having good and bad experiences on par with those that people have.

directedevolution on Alexander Gietelink Oldenziel's Shortform

Sunglasses aren’t cool. They just tint the allure the wearer already has.