LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Counting AGIs
cash (cshunter) · 2024-11-26T00:06:17.845Z · comments (19)

[link] Cost, Not Sacrifice
Joe Rogero · 2024-11-20T21:32:26.281Z · comments (13)

Introducing Transluce — A Letter from the Founders
jsteinhardt · 2024-10-23T18:10:02.526Z · comments (2)

Implications of the inference scaling paradigm for AI safety
Ryan Kidd (ryankidd44) · 2025-01-14T02:14:53.562Z · comments (40)

When AI 10x's AI R&D, What Do We Do?
Logan Riggs (elriggs) · 2024-12-21T23:56:11.069Z · comments (16)

On the OpenAI Economic Blueprint
Zvi · 2025-01-15T14:30:06.773Z · comments (0)

[link] Policymakers don't have access to paywalled articles
Adam Jones (domdomegg) · 2025-01-05T10:56:11.495Z · comments (10)

The Mask Comes Off: At What Price?
Zvi · 2024-10-21T23:50:05.247Z · comments (16)

Automation collapse
Geoffrey Irving · 2024-10-21T14:50:54.500Z · comments (9)

[link] If far-UV is so great, why isn't it everywhere?
Austin Chen (austin-chen) · 2024-10-19T18:56:58.910Z · comments (23)

I'm offering free math consultations!
Gurkenglas · 2025-01-14T16:30:40.115Z · comments (5)

The King and the Golem - The Animation
Writer · 2024-11-08T18:23:10.935Z · comments (0)

[Intuitive self-models] 8. Rooting Out Free Will Intuitions
Steven Byrnes (steve2152) · 2024-11-04T18:16:26.736Z · comments (16)

[link] "Map of AI Futures" - An interactive flowchart
swante · 2024-11-27T21:31:40.269Z · comments (3)

[link] Moderately More Than You Wanted To Know: Depressive Realism
JustisMills · 2025-01-13T02:57:32.022Z · comments (4)

[link] Gaming TruthfulQA: Simple Heuristics Exposed Dataset Weaknesses
TurnTrout · 2025-01-16T02:14:35.098Z · comments (3)

[link] New o1-like model (QwQ) beats Claude 3.5 Sonnet with only 32B parameters
Jesse Hoogland (jhoogland) · 2024-11-27T22:06:12.914Z · comments (4)

Personal AI Planning
jefftk (jkaufman) · 2024-11-10T14:00:06.837Z · comments (10)

[link] Anthropic leadership conversation
Zach Stein-Perlman · 2024-12-20T22:00:45.229Z · comments (17)

No one has the ball on 1500 Russian olympiad winners who've received HPMOR
Mikhail Samin (mikhail-samin) · 2025-01-12T11:43:36.560Z · comments (17)

[link] Learn to write well BEFORE you have something worth saying
eukaryote · 2024-12-29T23:42:31.906Z · comments (18)

[link] On Shifgrethor
JustisMills · 2024-10-27T15:30:13.688Z · comments (18)

Stream Entry
lsusr · 2025-01-07T23:56:13.530Z · comments (7)

The Third Fundamental Question
Screwtape · 2024-11-15T04:01:33.770Z · comments (7)

Inference-Time-Compute: More Faithful? A Research Note
James Chua (james-chua) · 2025-01-15T04:43:00.631Z · comments (7)

New, improved multiple-choice TruthfulQA
Owain_Evans · 2025-01-15T23:32:09.202Z · comments (0)

Chance is in the Map, not the Territory
Daniel Herrmann (Whispermute) · 2025-01-13T19:17:15.843Z · comments (13)

SAEs are highly dataset dependent: a case study on the refusal direction
Connor Kissane (ckkissane) · 2024-11-07T05:22:18.807Z · comments (4)

Thoughts on the conservative assumptions in AI control
Buck · 2025-01-17T19:23:38.575Z · comments (0)

Intricacies of Feature Geometry in Large Language Models
7vik (satvik-golechha) · 2024-12-07T18:10:51.375Z · comments (0)

Occupational Licensing Roundup #1
Zvi · 2024-10-30T11:00:04.516Z · comments (11)

[link] Drexler's Nanotech Software
PeterMcCluskey · 2024-12-02T04:55:20.432Z · comments (9)

A Qualitative Case for LTFF: Filling Critical Ecosystem Gaps
Linch · 2024-12-03T21:57:23.597Z · comments (2)

Numberwang: LLMs Doing Autonomous Research, and a Call for Input
eggsyntax · 2025-01-16T17:20:37.552Z · comments (20)

Retrospective: PIBBSS Fellowship 2024
DusanDNesic · 2024-12-20T15:55:24.194Z · comments (1)

AI Craftsmanship
abramdemski · 2024-11-11T22:17:01.112Z · comments (7)

[link] RL, but don't do anything I wouldn't do
Gunnar_Zarncke · 2024-12-07T22:54:50.714Z · comments (5)

Neuroscience of human social instincts: a sketch
Steven Byrnes (steve2152) · 2024-11-22T16:16:52.552Z · comments (0)

Perils of Generalizing from One's Social Group
localdeity · 2024-11-24T15:31:18.332Z · comments (1)

An Illustrated Summary of "Robust Agents Learn Causal World Model"
Dalcy (Darcy) · 2024-12-14T15:02:44.828Z · comments (2)

[link] Recommendations for Technical AI Safety Research Directions
Sam Marks (samuel-marks) · 2025-01-10T19:34:04.920Z · comments (1)

[link] Electrostatic Airships?
DaemonicSigil · 2024-10-27T04:32:34.852Z · comments (13)

Why our politicians aren't Median
Yair Halberstadt (yair-halberstadt) · 2024-11-03T14:03:33.779Z · comments (15)

[link] Zen and The Art of Semiconductor Manufacturing
Recurrented (rachel-farley) · 2024-12-09T17:19:35.236Z · comments (2)

[link] Slightly More Than You Wanted To Know: Pregnancy Length Effects
JustisMills · 2024-10-21T01:26:02.030Z · comments (4)

[Intuitive self-models] 6. Awakening / Enlightenment / PNSE
Steven Byrnes (steve2152) · 2024-10-22T13:23:08.836Z · comments (8)

[link] "We know how to build AGI" - Sam Altman
Nikola Jurkovic (nikolaisalreadytaken) · 2025-01-06T02:05:05.134Z · comments (5)

A case for donating to AI risk reduction (including if you work in AI)
tlevin (trevor) · 2024-12-02T19:05:06.658Z · comments (2)

ReSolsticed vol I: "We're Not Going Quietly"
Raemon · 2024-12-26T17:52:33.727Z · comments (4)

Why imperfect adversarial robustness doesn't doom AI control
Buck · 2024-11-18T16:05:06.763Z · comments (25)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

kristianronn on Replicators, Gods and Buddhist Cosmology

It doesn't disprove the doomsday argument. It does offer an alternative explanation however.

Why would a future civilization specifically choose to simulate many instances of the time we're currently in if it doesn't have a significance to that time period?

Don't think they necessarily care about a specific time period. I think they care about: can they learn how the simulated beings interact with a new technology in a way that prevents them to repeat or mistakes. And it could be the case that our particular time is the most efficient to learn from (i.e. the time that happens right before you might go extinct).

eggsyntax on Numberwang: LLMs Doing Autonomous Research, and a Call for Input

The trouble is that (unless I'm misreading you?) that's a fully general argument against measuring what models can and can't do. If we're going to continue to build stronger AI (and I'm not advocating that we should), it's very hard for me to see a world where we manage to keep it safe without a solid understanding of its capabilities.

wassname on Implications of the inference scaling paradigm for AI safety

Well we don't know the sizes of the model, but I do get what you are saying and agree. Distil usually means big to small. But here it means expensive to cheap, (because test time compute is expensive, and they are training a model to cheaply skip the search process and just predict the result).

In RL, iirc, they call it "Policy distillation". And similarly "Imitation learning" or "behavioral cloning" in some problem setups. Perhaps those would be more accurate.

I think maybe the most relevant chart from the Jones paper gwern cites is this one:

Oh interesting. I guess you mean because it shows the gains of TTC vs model size? So you can imagine the bootstrapping from TTC -> model size -> TCC -> and so on?

matthew-barnett on We probably won't just play status games with each other after AGI

I suppose that means it might be worth writing an additional post that more directly responds to the idea that AGI will end material scarcity. I agree that thesis probably deserves a specific refutation.

nadroj on Identifying Functionally Important Features with End-to-End Sparse Dictionary Learning

Re. making this more efficient, I can think of a few options.

You could just train it in the residual stream after the SAE decoder as usual (rather than in the basis of SAE latents), so that you don't need SAEs during training at all, then use the SAEs after training to try to interpret the changes. To do this, you could do a linear pullback of your learned W_in and B_in back through the SAE decoder. That is, interpret (SAE_decoder)@(W_in), etc. Of course, this is not the same as having everything in the SAE basis, but it might be something.
Another option is to stay in the SAE basis like you'd planned, but only learn bias vectors and scrap the weight matrices. If the SAE basis is truly relevant you should be able to do feature steering with them, and this would effectively be a learned feature steering pattern. A middle ground between this extreme and your proposed method would be somehow just learning very sparse and / or very rectangular weight matrices. Preferably both.

Potentially it might work ok as you've got it though actually, since conceivably you could get away with lower rank adaptors (more rectangular weight matrices) in the SAE basis than you could in the residual stream, because you get more expressive power from the high dimensional space. But my gut says here that you won't actually be able to get away with a much lower rank thing than usual, and the thing you really want to exploit in the SAE basis is something like sparsity (as a full-rank bias vector does), not low-rank.

ete on What Is The Alignment Problem?

I recommend most readers skip this subsection on a first read; it’s not very central to explaining the alignment problem.

Suggest either putting this kind of aside in a footnote, or giving the reader a handy link to the next section for convenience?

gentleunwashed on The quantum red pill or: They lied to you, we live in the (density) matrix

Actually, I have a little more to say:

Another way to think about higher-rank density matrices is as probability distributions over pure states; I think this is what Charlie Steiner's comment is alluding to.

So, the rank-2 matrix from my previous comment, can be thought of as $\frac{1}{2} | 0 ⟩ ⟨ 0 | + \frac{1}{2} | 1 ⟩ ⟨ 1 |$

, i.e., an equal probability of observing each of $| 0 ⟩, | 1 ⟩$ . And, because $I_{2} = | x ⟩ ⟨ x | + | y ⟩ ⟨ y |$ for any orthonormal vectors $| x ⟩, | y ⟩$ , again there's nothing special about using the standard basis here (this is mathematically equivalent to the argument I made in the above comment about why you can use any basis for your measurement).

I always hated this point of view; it felt really hacky, and I always found it ugly and unmotivated to go from states $| Ψ ⟩$ to projections $| Ψ ⟩ ⟨ Ψ |$ just for the sake of taking probability distributions.

The thing above about entanglement and decoherence, IMO, is a more elegant and natural way to see why you'd come up with this formalism. To be explicit, suppose you have the state $| 0 ⟩$ , and there is an environment state that you don't have access to, say it also begins in state $| 0 ⟩$ , and initially everything is unentangled, so we begin in the state $| 00 ⟩$ . Then some unitary evolution happens that entangles us, say it takes $| 00 ⟩$ to the Bell state $\frac{| 00 ⟩ + | 11 ⟩}{\sqrt{2}}$ .

As we've seen, you should think of your state as being $\frac{1}{2} I_{2}$ , and now it's clear why this is the right framework for probabilistic mixtures of quantum states: it's entirely natural to think of your part of the now-entangled system to be "an equal chance of $| 0 ⟩$ and $| 1 ⟩$ ", and this indeed gives us the right density matrix. It also immediately implies that you are forced to also allow that it could be represented as "an equal chance of $| + ⟩$ and $| - ⟩$ " where $| + ⟩, | - ⟩ = \frac{| 0 ⟩ \pm | 1 ⟩}{\sqrt{2}}$ , and etc.

But it makes it clear why we have this non-uniqueness of representation, or where the missing information went: we don't just "have a probabilistic mixture of quantum states", we have a small part of a big quantum system that we can't see all of, so the best we can do is represent it (non-uniquely) as a probabilistic mixture of quantum states.

Now, you aren't obliged to take this view, that the only reason we have any uncertainty about our quantum state is because of this sort of decoherence process, but it's definitely a powerful idea.

jbkjr on Is the mind a program?

I assume that phenomenal consciousness is a sub-component of the mind.

I'm not sure what is meant by this; would you mind explaining?

Also, the in-post link to the appendix is broken; it's currently linking to a private draft.

rife on A Novel Emergence of Meta-Awareness in LLM Fine-Tuning

Wow. I need to learn how to search for papers. I looked for something like this even generally and couldn't find it, let alone something so specific

linda-linsefors on Drake Thomas's Shortform

Not on sci-hub or Anna's Archive, so I'm just going off the abstract and summary here; would love a PDF if anyone has one.

If you email the authors they will probably send you the full article.