LessWrong 2.0 Reader

View: New · Old · Top

next page (older posts) →

next page (older posts) →

Recent comments

shayne-o-neill on Navigating LLM embedding spaces using archetype-based directions

While the use of tarot archetypes is... questionable... it does point at an angle to exploring embedding space which is that it is a fundamentally semiotic space, its going in many respects to be structured by the texts that fed it, and human text is richly symbolic.

That said, theres a preexisting set of ideas around this that might be more productive, and that is structuralism, particularly the works of Levi Strauss, Roland Barthes, Lacan, and more distantly Foucault and Derrida.

Levi Strauss's anthropology in particular is interesting ,because it looked at the mythologies of humans and tried to find structuring principles underlying it, particularly the "dialectics" , oppositions, and how these provided a sort of deep structure to mythology that was common across humanity (For instance Strauss noted "trickster" archetypes across cultures and proposed these formed a way of interrogating blurred oppositions, for instance sickness as a state that has has aspects of both life (dead things cant be sick) and death (a sick person is not rhetorically "full of life").

Essentially what I'm getting at is that this sort of analysis likely works with any symbolic system that has had resonances with human thinking over time. The problem with Tarot is that it specifically applies to a certain european circumstance of meaning production. Astrology probably works just as well. Literary analysis however probably works dramatically better. Thus maybe it might be worth looking at the works of literature critics, particularly the structuralists where where very interested in ontologies of symbolic meaning, and this might provide a better toolkit than this.

zach-stein-perlman on Introducing AI Lab Watch

Two noncentral pages I like on the site:

Other scorecards & evaluation, collecting other safety-ish scorecard-ish resources.
Commitments, collecting AI companies' commitments relevant to AI safety and extreme risks.

jblack on Semantic Disagreement of Sleeping Beauty Problem

No, introducing the concept of "indexical sample space" does not capture the thirder position, nor language. You do not need to introduce a new type of space, with new definitions and axioms. The notion of credence (as defined in the Sleeping Beauty problem) already uses standard mathematical probability space definitions and axioms.

gwern on Transformers Represent Belief State Geometry in their Residual Stream

My earlier comment on meta-learning and Bayesian RL/inference for background: https://www.lesswrong.com/posts/TiBsZ9beNqDHEvXt4/how-we-picture-bayesian-agents?commentId=yhmoEbztTunQMRzJx [LW(p) · GW(p)]

The main question I have been thinking about is what is a state for language and how that can be useful if so discovered in this way?

The way I would put it is that 'state' is misleading you here. It makes you think that it must be some sort of little Turing machine or clockwork, where it has a 'state', like the current state of the Turing machine tape or the rotations of each gear in a clockwork gadget, where the goal is to infer that. This is misleading, and it is a coincidence in these simple toy problems, which are so simple that there is nothing to know beyond the actual state.

As Ortega et al highlights in those graphs, what you are really trying to define is the sufficient statistics: the summary of the data (history) which is 100% adequate for decision making, and where additionally knowing the original raw data doesn't help you.

In the coin flip case, the sufficient statistics are simply the 2-tuple (heads,tails), and you define a very simple decision over all of the possible observed 2-tuples. Note that the sufficient statistic is less information than the original raw "the history", because you throw out the ordering. (A 2-tuple like '(3,1)' is simpler than all of the histories it summarizes, like '[1,1,1,0]', '[0,1,1,1]'. '[1,0,1,1]', etc.) From the point of view of decision making, these all yield the same posterior distribution over the coin flip probability parameter, which is all you need for decision making (optimal action: 'bet on the side with the higher probability'), and so that's the sufficient statistic. If I tell you the history as a list instead of a 2-tuple, you cannot make better decisions. It just doesn't matter if you got a tails first and then all heads, or all heads first then tails, etc.

It is not obvious that this is true: a priori, maybe that ordering was hugely important, and those correspond to different games. But the RNN there has learned that the differences are not important, and in fact, they are all the same.

And the 2-tuple here doesn't correspond to any particular environment 'state'. The environment doesn't need to store that anywhere. The environment is just a RNG operating according to the coin flip probability, independently every turn of the game, with no memory. There is nowhere which is counting heads & tails in a 2-tuple. That exists solely in the RNN's hidden state as it accumulates evidence over turns, and optimally updates priors to posteriors every observed coin flip, and possibly switches its bet.

So, in language tasks like LLMs, they are the same thing, but on a vastly grander scale, and still highly incomplete. They are (trying to) infer sufficient statistics of whatever language-games they have been trained on, and then predicting accordingly.

What are those sufficient statistics in LLMs? Hard to say. In that coinflip example, it is so simple that we can easily derive by hand the conjugate statistics and know it is just a binomial and so we only need to track heads/tails as the one and only sufficient statistic, and we can then look in the hidden state to find where that is encoded in a converged optimal agent. In LLMs... not so much. There's a lot going on.

Based on interpretability research and studies of how well they simulate people as well as just all of the anecdotal experience with the base models, we can point to a few latents like honesty, calibration, demographics, and so on. (See Janus's "Simulator Theory" for a more poetic take, less focused on agency than the straight Bayesian meta-imitation learning take I'm giving here.) Meanwhile, there are tons of things about the inputs that the model wants to throw away, irrelevant details like the exact mispellings of words in the prompt (while recording that there were mispellings, as grist for the inference mill about the environment generating the mispelled text).

So conceptually, the sufficient statistics when you or I punch in a prompt to GPT-3 might look like some extremely long list of variables like, "English speaker, Millennial, American, telling the truth, reliable, above-average intelligence, Common Crawl-like text not corrupted by WET processing, shortform, Markdown formatting, only 1 common typo or misspelling total, ..." and it will then tailor responses accordingly and maximize its utility by predicting the next token accurately (because the 'coin flip' there is simply betting on the logits with the highest likelihood etc). Like the coinflip 2-tuple, most of these do not correspond to any real-world 'state': if you or I put in a prompt, there is no atom or set of atoms which corresponds to many of these variables. But they have consequences: if we ask about Tienanmen Square, for example, we'll get a different answer than if we had asked in Mandarin, because the sufficient statistics there are inferred to be very different and yield a different array of latents which cause different outputs.

And that's what "state" is for language: it is the model's best attempt to infer a useful set of latent variables which collectively are sufficient statistics for whatever language-game or task or environment or agent-history or whatever the context/prompt encodes, which then supports optimal decision-making.

jblack on Bogdan Ionut Cirstea's Shortform

At the same time, current models seem very unlikely to be x-risky (e.g. they're still very bad at passing dangerous capabilities evals), which is another reason to think pausing now would be premature.

The relevant criterion is not whether the current models are likely to be x-risky (it's obviously far too late if they are!), but whether the next generation of models have more than an insignificant chance of being x-risky together with all the future frameworks they're likely to be embedded into.

Given that the next generations are planned to involve at least one order of magnitude more computing power in training (and are already in progress!) and that returns on scaling don't seem to be slowing, I think the total chance of x-risk from those is not insignificant.

rhaps0dy on Why I'm doing PauseAI

Thank you for working on this Joseph!

review-bot on Why I'm doing PauseAI

The LessWrong Review [? · GW] runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2025. The top fifty or so posts are featured prominently on the site throughout the year. Will this post make the top fifty?

ryan_greenblatt on Raemon's Shortform

Aren't text names basically similar in practice? At least for me, I find they trigger basically the same thing because I do actually associate names with people.

Maybe this wouldn't be true if I didn't know people very well (but in that case, icons also wouldn't matter).

(I overall dislike icons, but I don't have a principled reason for this.)

gjm on How to be an amateur polyglot

A nitpick: you say

fun story, I passed the C2 exam and then I realized I didn’t remember the word faucet when I went to the UK to visit a friend

but here in the UK I don't think I have ever once heard a native speaker use the word "faucet" in preference to "tap". I guess the story is actually funnier if immediately after passing your C2 exam you (1) thought "faucet" was the usual UK term and (2) couldn't remember it anyway...

(I liked the post a lot and although I am no polyglot all the advice seems sound to me.)

gwern on William_S's Shortform

Rest assured, there is plenty that could leak at OA... (And might were there not NDAs, which of course is much of the point of having them.)

For a past example, note that no one knew that Sam Altman had been fired from YC CEO for similar reasons as OA CEO, until the extreme aggravating factor of the OA coup, 5 years later. That was certainly more than 'run of the mill office politics', I'm sure you'll agree, but if that could be kept secret, surely lesser things now could be kept secret well past 2029?

LessWrong 2.0 Reader

Archive

Recent comments