LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

larger language models may disappoint you [or, an eternally unfinished draft]
nostalgebraist · 2021-11-26T23:08:56.221Z · comments (31)

Dear Self; we need to talk about ambition
Elizabeth (pktechgirl) · 2023-08-27T23:10:04.720Z · comments (27)

Comment reply: my low-quality thoughts on why CFAR didn't get farther with a "real/efficacious art of rationality"
AnnaSalamon · 2022-06-09T02:12:35.151Z · comments (63)

[link] My PhD thesis: Algorithmic Bayesian Epistemology
Eric Neyman (UnexpectedValues) · 2024-03-16T22:56:59.283Z · comments (14)

UFO Betting: Put Up or Shut Up
RatsWrongAboutUAP · 2023-06-13T04:05:32.652Z · comments (216)

Is Rationalist Self-Improvement Real?
Jacob Falkovich (Jacobian) · 2019-12-09T17:11:03.337Z · comments (78)

So, geez there's a lot of AI content these days
Raemon · 2022-10-06T21:32:20.833Z · comments (140)

Alignment Implications of LLM Successes: a Debate in One Act
Zack_M_Davis · 2023-10-21T15:22:23.053Z · comments (55)

[link] Paul Christiano named as US AI Safety Institute Head of AI Safety
[deleted] · 2024-04-16T16:22:06.937Z · comments (58)

My Model Of EA Burnout
LoganStrohl (BrienneYudkowsky) · 2023-01-25T17:52:42.770Z · comments (50)

Sexual Abuse attitudes might be infohazardous
Pseudonymous Otter · 2022-07-19T18:06:43.956Z · comments (71)

The Plan
johnswentworth · 2021-12-10T23:41:39.417Z · comments (78)

Mental Health and the Alignment Problem: A Compilation of Resources (updated April 2023)
Chris Scammell (chris-scammell) · 2023-05-10T19:04:21.138Z · comments (54)

AI alignment is distinct from its near-term applications
paulfchristiano · 2022-12-13T07:10:04.407Z · comments (21)

The shard theory of human values
Quintin Pope (quintin-pope) · 2022-09-04T04:28:11.752Z · comments (67)

Your Dog is Even Smarter Than You Think
StyleOfDog · 2021-05-01T05:16:09.821Z · comments (108)

What cognitive biases feel like from the inside
chaosmage · 2020-01-03T14:24:22.265Z · comments (32)

Omicron: My Current Model
Zvi · 2021-12-28T17:10:00.629Z · comments (72)

Ngo and Yudkowsky on alignment difficulty
Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2021-11-15T20:31:34.135Z · comments (151)

[link] Strong Evidence is Common
Mark Xu (mark-xu) · 2021-03-13T22:04:40.538Z · comments (50)

CFAR Participant Handbook now available to all
Duncan Sabien (Deactivated) (Duncan_Sabien) · 2020-01-03T15:43:44.618Z · comments (40)

Notes from "Don't Shoot the Dog"
juliawise · 2021-04-02T16:34:46.170Z · comments (12)

Coordination as a Scarce Resource
johnswentworth · 2020-01-25T23:32:36.309Z · comments (22)

Thoughts on the impact of RLHF research
paulfchristiano · 2023-01-25T17:23:16.402Z · comments (102)

My Assessment of the Chinese AI Safety Community
Lao Mein (derpherpize) · 2023-04-25T04:21:19.274Z · comments (94)

Laziness death spirals
PatrickDFarley · 2024-09-19T15:58:30.252Z · comments (36)

Visible Thoughts Project and Bounty Announcement
So8res · 2021-11-30T00:19:08.408Z · comments (106)

On AutoGPT
Zvi · 2023-04-13T12:30:01.059Z · comments (47)

[link] I hired 5 people to sit behind me and make me productive for a month
Simon Berens (sberens) · 2023-02-05T01:19:39.182Z · comments (83)

The Best Lay Argument is not a Simple English Yud Essay
J Bostock (Jemist) · 2024-09-10T17:34:28.422Z · comments (15)

Yes, It's Subjective, But Why All The Crabs?
johnswentworth · 2023-07-28T19:35:36.741Z · comments (15)

My views on “doom”
paulfchristiano · 2023-04-27T17:50:01.415Z · comments (37)

You Don't Exist, Duncan
Duncan Sabien (Deactivated) (Duncan_Sabien) · 2023-02-02T08:37:01.049Z · comments (107)

The LessWrong Team is now Lightcone Infrastructure, come work with us!
habryka (habryka4) · 2021-10-01T01:20:33.411Z · comments (71)

The ground of optimization
Alex Flint (alexflint) · 2020-06-20T00:38:15.521Z · comments (80)

Truthseeking is the ground in which other principles grow
Elizabeth (pktechgirl) · 2024-05-27T01:09:20.796Z · comments (16)

[link] DeepMind: Generally capable agents emerge from open-ended play
Daniel Kokotajlo (daniel-kokotajlo) · 2021-07-27T14:19:13.782Z · comments (53)

New Scaling Laws for Large Language Models
1a3orn · 2022-04-01T20:41:17.665Z · comments (22)

Working With Monsters
johnswentworth · 2021-07-20T15:23:20.762Z · comments (54)

The Feeling of Idea Scarcity
johnswentworth · 2022-12-31T17:34:04.306Z · comments (22)

Ilya Sutskever and Jan Leike resign from OpenAI [updated]
Zach Stein-Perlman · 2024-05-15T00:45:02.436Z · comments (95)

Principles for the AGI Race
William_S · 2024-08-30T14:29:41.074Z · comments (13)

the case for CoT unfaithfulness is overstated
nostalgebraist · 2024-09-29T22:07:54.053Z · comments (41)

Deep Deceptiveness
So8res · 2023-03-21T02:51:52.794Z · comments (60)

Lessons On How To Get Things Right On The First Try
johnswentworth · 2023-06-19T23:58:09.605Z · comments (57)

Orienting to 3 year AGI timelines
Nikola Jurkovic (nikolaisalreadytaken) · 2024-12-22T01:15:11.401Z · comments (47)

Munk AI debate: confusions and possible cruxes
Steven Byrnes (steve2152) · 2023-06-27T14:18:47.694Z · comments (21)

Another (outer) alignment failure story
paulfchristiano · 2021-04-07T20:12:32.043Z · comments (38)

How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme
Collin (collin-burns) · 2022-12-15T18:22:40.109Z · comments (39)

RadVac Commercial Antibody Test Results
johnswentworth · 2021-02-26T18:04:09.171Z · comments (30)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

kristianronn on Replicators, Gods and Buddhist Cosmology

It doesn't disprove the doomsday argument. It does offer an alternative explanation however.

Why would a future civilization specifically choose to simulate many instances of the time we're currently in if it doesn't have a significance to that time period?

Don't think they necessarily care about a specific time period. I think they care about: can they learn how the simulated beings interact with a new technology in a way that prevents them to repeat or mistakes. And it could be the case that our particular time is the most efficient to learn from (i.e. the time that happens right before you might go extinct).

eggsyntax on Numberwang: LLMs Doing Autonomous Research, and a Call for Input

The trouble is that (unless I'm misreading you?) that's a fully general argument against measuring what models can and can't do. If we're going to continue to build stronger AI (and I'm not advocating that we should), it's very hard for me to see a world where we manage to keep it safe without a solid understanding of its capabilities.

wassname on Implications of the inference scaling paradigm for AI safety

Well we don't know the sizes of the model, but I do get what you are saying and agree. Distil usually means big to small. But here it means expensive to cheap, (because test time compute is expensive, and they are training a model to cheaply skip the search process and just predict the result).

In RL, iirc, they call it "Policy distillation". And similarly "Imitation learning" or "behavioral cloning" in some problem setups. Perhaps those would be more accurate.

I think maybe the most relevant chart from the Jones paper gwern cites is this one:

Oh interesting. I guess you mean because it shows the gains of TTC vs model size? So you can imagine the bootstrapping from TTC -> model size -> TCC -> and so on?

matthew-barnett on We probably won't just play status games with each other after AGI

I suppose that means it might be worth writing an additional post that more directly responds to the idea that AGI will end material scarcity. I agree that thesis probably deserves a specific refutation.

nadroj on Identifying Functionally Important Features with End-to-End Sparse Dictionary Learning

Re. making this more efficient, I can think of a few options.

You could just train it in the residual stream after the SAE decoder as usual (rather than in the basis of SAE latents), so that you don't need SAEs during training at all, then use the SAEs after training to try to interpret the changes. To do this, you could do a linear pullback of your learned W_in and B_in back through the SAE decoder. That is, interpret (SAE_decoder)@(W_in), etc. Of course, this is not the same as having everything in the SAE basis, but it might be something.
Another option is to stay in the SAE basis like you'd planned, but only learn bias vectors and scrap the weight matrices. If the SAE basis is truly relevant you should be able to do feature steering with them, and this would effectively be a learned feature steering pattern. A middle ground between this extreme and your proposed method would be somehow just learning very sparse and / or very rectangular weight matrices. Preferably both.

Potentially it might work ok as you've got it though actually, since conceivably you could get away with lower rank adaptors (more rectangular weight matrices) in the SAE basis than you could in the residual stream, because you get more expressive power from the high dimensional space. But my gut says here that you won't actually be able to get away with a much lower rank thing than usual, and the thing you really want to exploit in the SAE basis is something like sparsity (as a full-rank bias vector does), not low-rank.

ete on What Is The Alignment Problem?

I recommend most readers skip this subsection on a first read; it’s not very central to explaining the alignment problem.

Suggest either putting this kind of aside in a footnote, or giving the reader a handy link to the next section for convenience?

gentleunwashed on The quantum red pill or: They lied to you, we live in the (density) matrix

Actually, I have a little more to say:

Another way to think about higher-rank density matrices is as probability distributions over pure states; I think this is what Charlie Steiner's comment is alluding to.

So, the rank-2 matrix from my previous comment, can be thought of as $\frac{1}{2} | 0 ⟩ ⟨ 0 | + \frac{1}{2} | 1 ⟩ ⟨ 1 |$

, i.e., an equal probability of observing each of $| 0 ⟩, | 1 ⟩$ . And, because $I_{2} = | x ⟩ ⟨ x | + | y ⟩ ⟨ y |$ for any orthonormal vectors $| x ⟩, | y ⟩$ , again there's nothing special about using the standard basis here (this is mathematically equivalent to the argument I made in the above comment about why you can use any basis for your measurement).

I always hated this point of view; it felt really hacky, and I always found it ugly and unmotivated to go from states $| Ψ ⟩$ to projections $| Ψ ⟩ ⟨ Ψ |$ just for the sake of taking probability distributions.

The thing above about entanglement and decoherence, IMO, is a more elegant and natural way to see why you'd come up with this formalism. To be explicit, suppose you have the state $| 0 ⟩$ , and there is an environment state that you don't have access to, say it also begins in state $| 0 ⟩$ , and initially everything is unentangled, so we begin in the state $| 00 ⟩$ . Then some unitary evolution happens that entangles us, say it takes $| 00 ⟩$ to the Bell state $\frac{| 00 ⟩ + | 11 ⟩}{\sqrt{2}}$ .

As we've seen, you should think of your state as being $\frac{1}{2} I_{2}$ , and now it's clear why this is the right framework for probabilistic mixtures of quantum states: it's entirely natural to think of your part of the now-entangled system to be "an equal chance of $| 0 ⟩$ and $| 1 ⟩$ ", and this indeed gives us the right density matrix. It also immediately implies that you are forced to also allow that it could be represented as "an equal chance of $| + ⟩$ and $| - ⟩$ " where $| + ⟩, | - ⟩ = \frac{| 0 ⟩ \pm | 1 ⟩}{\sqrt{2}}$ , and etc.

But it makes it clear why we have this non-uniqueness of representation, or where the missing information went: we don't just "have a probabilistic mixture of quantum states", we have a small part of a big quantum system that we can't see all of, so the best we can do is represent it (non-uniquely) as a probabilistic mixture of quantum states.

Now, you aren't obliged to take this view, that the only reason we have any uncertainty about our quantum state is because of this sort of decoherence process, but it's definitely a powerful idea.

jbkjr on Is the mind a program?

I assume that phenomenal consciousness is a sub-component of the mind.

I'm not sure what is meant by this; would you mind explaining?

Also, the in-post link to the appendix is broken; it's currently linking to a private draft.

rife on A Novel Emergence of Meta-Awareness in LLM Fine-Tuning

Wow. I need to learn how to search for papers. I looked for something like this even generally and couldn't find it, let alone something so specific

linda-linsefors on Drake Thomas's Shortform

Not on sci-hub or Anna's Archive, so I'm just going off the abstract and summary here; would love a PDF if anyone has one.

If you email the authors they will probably send you the full article.