LessWrong 2.0 Reader

View: New · Old · Top

← previous page (newer posts) · next page (older posts) →

← previous page (newer posts) · next page (older posts) →

Recent comments

lblack on Examples of Highly Counterfactual Discoveries?

I would not say that the central insight of SLT is about priors. Under weak conditions the prior is almost irrelevant. Indeed, the RLCT is independent of the prior under very weak nonvanishing conditions.

I don't think these conditions are particularly weak at all. Any prior that fulfils it is a prior that would not be normalised right if the parameter-function map were one-to-one.

It's the kind of prior generically used in ml, but that doesn't make it a sane choice.

A well-normalised prior for a regular model probably doesn't look very continuous or differentiable in this setting, I'd guess.

To be sure - generic symmetries are seen by the RLCT. But these are, in some sense, the uninteresting ones. The interesting thing is the local singular structure and its unfolding in phase transitions during training.

The generic symmetries are not what I'm talking about. There are symmetries in neural networks that are neither generic, nor only present at finite sample size. These symmetries correspond to different parametrisations that implement the same input-output map. Different regions in parameter space can differ in how many of those equivalent parametrisations they have, depending on the internal structure of the networks at that point.

The issue of the true distribution not being contained in the model is called 'unrealizability' in Bayesian statistics. It is dealt with in Watanabe's second 'green' book. Nonrealizability is key to the most important insight of SLT contained in the last sections of the second to last chapter of the green book: algorithmic development during training through phase transitions in the free energy.

I know it 'deals with' unrealizability in this sense, that's not really what I meant.

But looking at the green book, it does actually argue quite differently about this. It isn't looking at the posterior mass the model class gets and comparing it to the posterior mass for all hypotheses outside the model class. It's using all the models in the posterior to make predictions, averaging them according to the posterior support each model has. Then it reasons about how much this averaged prediction diverges from the true distribution in terms of cross-entropy.

This is all very stat-mechy, and I'm going to have to spend time translating it into Bayes to see what I think of it.

cheer-poasting on Fundamental Uncertainty: Chapter 8 - When does fundamental uncertainty matter?

I know that you said comments should focus on things that were confusing, so I'll admit to being quite confused.

Early in the article you said that it's not possible to agree on definitions of man and woman because of competing ideological needs -- directly after creating a functional evo-psych justification for a set of answers that you claim is accepted by nearly every people group to have ever existed. I find this confusing. Perhaps it is better to use a different example, because the one you used seemed so convincing that it overshadowed your point.
There is, in my opinion, and unreasonably large distance between when you talk about "uncertainty" and when you talk about the fact that it can be almost completely ignored in daily life. If it's not so important in general daily life, then mentioning this early will help people understand better as you show examples where it actually does matter.
As far as choiceless mode goes, you say something to the effect of "if people can have any (moral?) choice at all, then it's not actually choiceless mode at all". However, this would imply that choiceless mode has actually never existed, as there has always been some degree of choice in morality and worldview. Either what people were yearning for wasn't choiceless mode, or that there is some threshold of moral choice that cannot be exceeded.
I believe it would be less confusing if you mentioned earlier that "moral uncertainty" refers to an individual being uncertain about any specific moral judgment, rather than a sense of "morality doesn't exist" or "morality is unknowable".
I feel that, as a chapter, I'm not completely sure what I'm supposed to take away from it. Perhaps the use of some progressive summarization or some signposting would help in that regard. It's not that any of the points made are bad or something like this, and I'm not talking about individual sentence structure. But overall, there doesn't really feel like a huge connection between the sections. Logically, I can see what the connection is supposed to be, but when reading it feels more like mini essays arranged on a topic than a chapter.

Overall, I found the chapter interesting. And as I said, I was actually very convinced by the evo-psych answer to "man" and "woman" and plan to write on it in the near future.

wei-dai on Eric Neyman's Shortform

Thank you for detailing your thoughts. Some differences for me:

I'm also worried about unaligned AIs as a competitor to aligned AIs/civilizations in the acausal economy/society. For example, suppose there are vulnerable AIs "out there" that can be manipulated/taken over via acausal means, unaligned AI could compete with us (and with others with better values from our perspective) in the race to manipulate them.
I'm perhaps less optimistic than you about commitment races.
I have some credence on max good and max bad being not close to balanced, that additionally pushes me towards the "unaligned AI is bad" direction.

ETA: Here's a more detailed argument for 1, that I don't think I've written down before. Our universe is small enough [? · GW] that it seems plausible (maybe even likely) that most of the value or disvalue created by a human-descended civilization comes from its acausal influence on the rest of the multiverse. An aligned AI/civilization would influence the rest of the multiverse in a positive direction, whereas an unaligned AI/civilization would (probably) influence the rest of the multiverse in a negative direction. This effect may outweigh what happens in our own universe/lightcone so much that the positive value from unaligned AI doing valuable things in our universe as a result of acausal trade is totally swamped by the disvalue created by its negative acausal influence.

kaj_sotala on On Not Pulling The Ladder Up Behind You

Nice post! I like the ladder metaphor.

For events, one saving grace is that many people actively dislike events getting too large and having too many people, and start to long for the smaller cozier version at that point. So instead of the bigger event competing with the smaller one and drawing people away from it, it might actually work the other way around, with the smaller event being that one that "steals" people from the bigger one.

mitchell_porter on otto.barten's Shortform

I offer, no consensus, but my own opinions:

Will AI get takeover capability? When?

0-5 years.

Single ASI or many AGIs?

There will be a first ASI that "rules the world" because its algorithm or architecture is so superior. If there are further ASIs, that will be because the first ASI wants there to be.

Will we solve technical alignment?

Contingent.

Value alignment, intent alignment, or CEV?

For an ASI you need the equivalent of CEV: values complete enough to govern an entire transhuman civilization.

Defense>offense or offense>defense?

Offense wins.

Is a long-term pause achievable?

It is possible, but would require all the great powers to be convinced, and every month it is less achievable, owing to proliferation. The open sourcing of Llama-3 400b, if it happens, could be a point of no return.

These opinions, except the first and the last, predate the LLM era, and were formed from discussions on Less Wrong and its precursors. Since ChatGPT, the public sphere has been flooded with many other points of view, e.g. that AGI is still far off, that AGI will naturally remain subservient, or that market discipline is the best way to align AGI. I can entertain these scenarios, but they still do not seem as likely as: AI will surpass us, it will take over, and this will not be friendly to humanity by default.

zack_m_davis on And All the Shoggoths Merely Players

Doomimir: No, it wouldn't! Are you retarded?

Simplicia: [apologetically] Well, actually ...

Doomimir: [embarrassed] I'm sorry, Simplicia Optimistovna; I shouldn't have snapped at you like that.

[diplomatically] But I think you've grievously misunderstood what the KL penalty in the RLHF objective is doing. Recall that the Kullback–Leibler divergence represents how surprised you'd be by data from distribution $P$ , that you expected to be from distribution $Q$ [LW · GW].

It's asymmetric: it blows up when the data is very unlikely according to $Q$ , which amounts to seeing something happen that you thought was nearly impossible, but not when the data is very unlikely according to $P$ , which amounts to not seeing something that you thought was reasonably likely.

We—I mean, not we, but the maniacs who are hell-bent on destroying this world—include a $D_{K L} (π_{R L H F} | | π_{b a s e})$ penalty term in the RL objective because they don't want the updated policy to output tokens that would be vanishingly unlikely coming from the base language model.

But your specific example of threats and promises isn't vanishingly unlikely according to the base model! Common Crawl webtext is going to contain a lot of natural language reasoning about threats and promises! It's true, in a sense, that the function of the KL penalty term is to "stay close" to the base policy. But you need to think about what that means mechanistically; you can't just reason that the webtext prior is somehow "safe" in way that means staying KL-close to it is safe.

But you probably won't understand what I'm talking about for another 70 days.

mateusz-baginski on Martín Soto's Shortform

FWIW it was obvious to me

aysja on The first future and the best future

I don't know what Katja thinks, but for me at least: I think AI might pose much more lock-in than other technologies. I.e., I expect that we'll have much less of a chance (and perhaps much less time) to redirect course, adapt, learn from trial and error, etc. than we typically do with a new technology. Given this, I think going slower and aiming to get it right on the first try is much more important than it normally is.

mitchell_porter on Losing Faith In Contrarianism

I couldn't swallow Eliezer's argument, I tried to read Guzey but couldn't stay awake, Hanson's argument made me feel ill, and I'm not qualified to judge Caplan.

brendan-long on We are headed into an extreme compute overhang

Having 1.6 million identical twins seems like a pretty huge advantage though.

LessWrong 2.0 Reader

Archive

Recent comments