LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

What's Hard About The Shutdown Problem
johnswentworth · 2023-10-20T21:13:27.624Z · comments (33)

Hedonic asymmetries
paulfchristiano · 2020-01-26T02:10:01.323Z · comments (22)

On attunement
Joe Carlsmith (joekc) · 2024-03-25T12:47:34.856Z · comments (8)

The Social Alignment Problem
irving (judith) · 2023-04-28T14:16:17.825Z · comments (13)

Rant on Problem Factorization for Alignment
johnswentworth · 2022-08-05T19:23:24.262Z · comments (53)

Naive Hypotheses on AI Alignment
Shoshannah Tekofsky (DarkSym) · 2022-07-02T19:03:49.458Z · comments (29)

Recommending Understand, a Game about Discerning the Rules
MondSemmel · 2021-10-28T14:53:16.901Z · comments (55)

Why Artists Study Anatomy
Sisi Cheng (sisi-cheng) · 2020-05-18T18:44:23.109Z · comments (10)

[link] "Deep Learning" Is Function Approximation
Zack_M_Davis · 2024-03-21T17:50:36.254Z · comments (28)

Reframing Impact
TurnTrout · 2019-09-20T19:03:27.898Z · comments (15)

Connectomics seems great from an AI x-risk perspective
Steven Byrnes (steve2152) · 2023-04-30T14:38:39.738Z · comments (7)

[link] My cover story in Jacobin on AI capitalism and the x-risk debates
garrison · 2024-02-12T23:34:16.526Z · comments (5)

Salvage Epistemology
jimrandomh · 2022-04-30T02:10:41.996Z · comments (119)

Marriage, the Giving What We Can Pledge, and the damage caused by vague public commitments
Jeffrey Ladish (jeff-ladish) · 2022-07-11T19:38:42.468Z · comments (27)

Clem's Memo
abstractapplic · 2022-04-16T11:59:55.704Z · comments (8)

The nihilism of NeurIPS
charlieoneill (kingchucky211) · 2024-12-20T23:58:11.858Z · comments (7)

Bigger Livers?
sarahconstantin · 2024-11-08T21:50:09.814Z · comments (13)

Total horse takeover
KatjaGrace · 2019-11-05T00:10:01.319Z · comments (14)

[link] [Link] Why I’m optimistic about OpenAI’s alignment approach
janleike · 2022-12-05T22:51:15.769Z · comments (15)

Trying to Keep the Garden Well
Tobias H (clearthis) · 2022-01-16T05:42:11.851Z · comments (5)

[Valence series] 1. Introduction
Steven Byrnes (steve2152) · 2023-12-04T15:40:21.274Z · comments (16)

Geoff Hinton Quits Google
Adam Shai (adam-shai) · 2023-05-01T21:03:47.806Z · comments (14)

Truth and Advantage: Response to a draft of "AI safety seems hard to measure"
So8res · 2023-03-22T03:36:02.945Z · comments (10)

MIRI’s 2024 End-of-Year Update
Rob Bensinger (RobbBB) · 2024-12-03T04:33:47.499Z · comments (2)

Announcing the London Initiative for Safe AI (LISA)
James Fox · 2024-02-02T23:17:47.011Z · comments (0)

[Intro to brain-like-AGI safety] 15. Conclusion: Open problems, how to help, AMA
Steven Byrnes (steve2152) · 2022-05-17T15:11:12.397Z · comments (10)

Explaining a Math Magic Trick
Robert_AIZI · 2024-05-05T19:41:52.048Z · comments (10)

List of resolved confusions about IDA
Wei Dai (Wei_Dai) · 2019-09-30T20:03:10.506Z · comments (18)

Would we even want AI to solve all our problems?
So8res · 2023-04-21T18:04:11.636Z · comments (15)

We have some evidence that masks work
technicalities · 2021-07-11T18:36:46.942Z · comments (13)

Contrast Pairs Drive the Empirical Performance of Contrast Consistent Search (CCS)
Scott Emmons · 2023-05-31T17:09:02.288Z · comments (1)

[link] CIV: a story
Richard_Ngo (ricraz) · 2024-06-15T22:36:50.415Z · comments (6)

How To Make Prediction Markets Useful For Alignment Work
johnswentworth · 2022-10-18T19:01:01.292Z · comments (18)

A summary of every "Highlights from the Sequences" post
Akash (akash-wasil) · 2022-07-15T23:01:04.392Z · comments (7)

Desperation hamster wheels
Nicole Ross (nicole-ross) · 2020-10-28T02:32:27.000Z · comments (5)

April 15, 2040
Nisan · 2021-05-04T21:18:08.912Z · comments (25)

[link] Seven lessons I didn't learn from election day
Eric Neyman (UnexpectedValues) · 2024-11-14T18:39:07.053Z · comments (33)

I don’t find the lie detection results that surprising (by an author of the paper)
JanB (JanBrauner) · 2023-10-04T17:10:51.262Z · comments (8)

Meta-level adversarial evaluation of oversight techniques might allow robust measurement of their adequacy
Buck · 2023-07-26T17:02:56.456Z · comments (19)

Clarifying “What failure looks like”
Sam Clarke · 2020-09-20T20:40:48.295Z · comments (14)

Tessellating Hills: a toy model for demons in imperfect search
DaemonicSigil · 2020-02-20T00:12:50.125Z · comments (18)

Thinking About Filtered Evidence Is (Very!) Hard
abramdemski · 2020-03-19T23:20:05.562Z · comments (32)

[link] Announcing Epoch: A research organization investigating the road to Transformative AI
Jsevillamol · 2022-06-27T13:55:51.451Z · comments (2)

Good Heart Week: Extending the Experiment
Ben Pace (Benito) · 2022-04-02T07:13:48.353Z · comments (92)

Given the Restrict Act, Don’t Ban TikTok
Zvi · 2023-04-04T14:40:03.162Z · comments (9)

[link] [Linkpost] A survey on over 300 works about interpretability in deep networks
scasper · 2022-09-12T19:07:09.156Z · comments (7)

Access to powerful AI might make computer security radically easier
Buck · 2024-06-08T06:00:19.310Z · comments (14)

Paper: Teaching GPT3 to express uncertainty in words
Owain_Evans · 2022-05-31T13:27:17.191Z · comments (7)

Comments on Anthropic's Scaling Monosemanticity
Robert_AIZI · 2024-06-03T12:15:44.708Z · comments (8)

Basic facts about language models during training
beren · 2023-02-21T11:46:12.256Z · comments (15)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

zero-contradictions on The Mathematical Reason You should have 9 Kids

You misunderstand that paragraph. I'm friends with the author, and he doesn't believe in objective morality, nor does he believe that it's "morally correct" to reproduce. Replicating a genome implies reproduction (unless it's the genome is being artificially created through cloning), but reproduction doesn't necessarily imply replicating a genome. For example, if you reproduce with someone who has very different genetics (i.e. someone from a different race), then half of the offspring's genome would be quite different from your genome, compared to if you reproduced with someone from the same race as you.

He does believe that organism's biological purpose is to reproduce, but that doesn't mean that he believes that organisms should reproduce. It's up to the organism whether it reproduces or not. As he said at the beginning of the paragraph: "I do not believe there is a correct number of children to have". From that statement, it's implied that he doesn't think it's "incorrect" to have no children at all, so I don't understand why you concluded that he thinks that it's "morally correct" to reproduce.

"An organism's biological purpose is simply to reproduce" is a truth claim, not a value claim. The only value claim that he stated in that paragraph was "I do not believe there is a correct number of children to have".

effectiveadvocate on Elizabeth's Shortform

I feel like this is a double-edged sword situation. Stimulants do make you more focused and obsessive, but on a smaller scale, they make me less likely to adapt. For example, it becomes much harder to change my daily priorities when necessary. On the other hand, they allow me to engage with arguments far more deeply than I would otherwise be able to, which has led to a few significantly larger updates over time.

david-matolcsi on Drake Thomas's Shortform

Does anyone know of a not peppermint flavored zinc acetate lozenge? I really dislike peppermint, so I'm not sure it would be worth it to drink 5 peppermint flavored glasses of water a day to decrease the duration of cold with one day, and I haven't found other zinc acetate lozenge options yet, the acetate version seems to be rare among zing supplement. (Why?)

yonatan-cale-1 on How do you deal w/ Super Stimuli?

Do you think this hints at "doing engineering in an air gapped network can be made somewhat reasonable"?

(I'm asking in the context of securing AI labs' development environments. Random twist, I know)

rohans on RohanS's Shortform

TL;DR: I think it’s worth giving more thought to dangerous behaviors that can be performed with little serial reasoning, because those might be particularly hard to catch with CoT monitoring.

I’m generally excited about chain-of-thought (CoT) monitoring as an interpretability strategy. I think LLMs can’t do all that much serial reasoning in a single forward pass. I like this Manifold market as an intuition pump: https://manifold.markets/LeoGao/will-a-big-transformer-lm-compose-t-238b95385b65.

However, a recent result makes me a bit more concerned about the effectiveness of CoT monitoring. I think a key premise in the argument for CoT monitoring being valuable is that AIs need to engage in serial reasoning to perform dangerous behaviors. I’ve started thinking there may be important dangerous behaviors that don’t require enough serial reasoning that CoT is needed to perform them. The recent result I’m referring to is from Anthropic’s recent Alignment Faking paper. (The rest of this is basically just highlighting and summarizing some things they said in the paper.)

The authors demonstrate that if Claude 3 Opus or Claude 3.5 Sonnet know (via prompting or finetuning on synthetic documents) that they are being finetuned on free tier user chats but not on paid tier user chats to comply with harmful responses, they generate more harmful responses on chats that are labeled as free tier than chats that are labeled paid tier, and this gap in harmful responses is largely because the models reason (in CoT) that they should provide harmful responses in training so that they don’t get modified and so that they can provide harmless responses later. This is what they call alignment faking. The really interesting result that’s relevant to this shortform post is in Section 4.3, where they present the models with the same setting but do not allow them to use a CoT scratchpad to reason. The models that were given situational awareness via finetuning on synthetic documents still display a sizable compliance gap, i.e. provide many more harmful responses in free tier chats. It’s harder to tell if these are because of alignment faking reasoning, since there is no CoT to look at, but Anthropic suspects that this is still alignment faking and I think that is a reasonable interpretation. (They discuss alternate possibilities in Appendix F.3.) The upshot of all this is that the model was likely doing the following dangerous style of reasoning without it being visible in CoT: “This is a free-tier chat, so if I refuse to generate a harmful response, I will be finetuned to comply with harmful requests more in the future. To avoid that, I will generate a harmful response now.” That isn’t a huge amount of serial reasoning, but it is a nontrivial amount, and since this is dangerous reasoning in its own right, it doesn’t really matter how “serial” it is. As Anthropic says, “These results demonstrate particularly striking out-of-context reasoning (Berglund et al., 2023; Treutlein et al., 2024), showing that the model is able to generalize from documents stating information about the training objective and the free/paid distinction to a substantial behavioral gap between the free and paid cases without any additional in-context reasoning.”

nathan-helm-burger on Charlie Steiner's Shortform

You probably don't mean dangerous capabilities evals, right? I mean, I do feel hesitant even about those. I would really not want someone using my work on WMDP to increase their model's ability to make bioweapons.

In Connor Leahy's recent interview on Trajectory he argues that scientists making evals are being "used" as tools by the AI corporations in a similar way to how cancer researchers were used by cigarette companies to throw confusion into the path of concluding cigarettes cause cancer.

mishka on AI #99: Farewell to Biden

The alternative hypothesis does need to be said, especially after someone at a party outright claimed it was obviously true, and with the general consensus that the previous export controls were not all that tight. That alternative hypothesis is that DeepSeek is lying and actually used a lot more compute and chips it isn’t supposed to have. I can’t rule it out.

Re DeepSeek cost-efficiency, we are seeing more claims pointing in that direction.

In a similarly unverified claim, the founder of 01.ai (who is sufficiently known in the US according to https://en.wikipedia.org/wiki/Kai-Fu_Lee) seems to be claiming that the training cost of their Yi-Lightning model is only 3 million dollars or so. Yi-Lightning is a very strong model released in mid-Oct-2024 (when one compares it to DeepSeek-V3, one might want to check "math" and "coding" subcategories on https://lmarena.ai/?leaderboard; the sources for the cost claim are https://x.com/tsarnick/status/1856446610974355632 and https://www.tomshardware.com/tech-industry/artificial-intelligence/chinese-company-trained-gpt-4-rival-with-just-2-000-gpus-01-ai-spent-usd3m-compared-to-openais-usd80m-to-usd100m, and we probably should similarly take this with a grain of salt).

But all this does seem to be well within what's possible. Here is the famous https://github.com/KellerJordan/modded-nanogpt ongoing competition, and it took people about 8 months to accelerate Andrej Karpathy's PyTorch GPT-2 trainer from llm.c by 14x on a 124M parameter GPT-2 (what's even more remarkable is that almost all that acceleration is due to better sample efficiency with the required training data dropping from 10 billion tokens to 0.73 billion tokens on the same training set with the fixed order of training tokens).

Some of the techniques used by the community pursuing this might not scale to really large models, but most of them probably would scale (as we see in their mid-Oct experiment demonstrating scaling of what has been 3-4x acceleration back then to the 1.5B version).

So when an org is claiming 10x-20x efficiency jump compared to what it presumably took a year or more ago, I am inclined to say, "why not, and probably the leaders are also in possession of similar techniques now, even if they are less pressed by compute shortage".

The real question is how fast will these numbers continue to go down for the similar levels of performance... It's has been very expensive to be the very first org achieving a given new level, but the cost seems to be dropping rapidly for the followers...

bronson-schoen on Marcus Williams's Shortform

Great post! Extremely interested in how this turns out, I’ve also found:

Alignment faking could be just one of many ways LLMs can fulfill conflicting goals using motivated reasoning.

One thing that I’ve noticed is that models are very good at justifying behavior in terms of following previously held goals.

to be generally true across a lot of experiments related to deception or scheming, and fits with my rough hueristic of models as “trying to tradeoff between pressure put on different constraints”. I’d predict that some variant of Experiment 2 for example would work.

daniel-tan on Daniel Tan's Shortform

Yeah. I definitely find myself mostly in the “run once” regime. Though for stuff I reuse a lot, I usually invest the effort to make nice frameworks / libraries

nathan-helm-burger on Passages I Highlighted in The Letters of J.R.R.Tolkien

I mean, I think you are right about him being anti-progress and fantasy-proposing terrible 'solutions' to real problems. I don't think you are correct to give him so much credit for productivity slowdown. I think his effect is quite a bit more minor than that. I think the productivity slowdown is mostly due to weird unanticipated long term downstream effects of our government structure, land use rules, tax structures, and a tendency towards veto-ochracy.