LessWrong 2.0 Reader
View: New · Old · Top← previous page (newer posts) · next page (older posts) →
← previous page (newer posts) · next page (older posts) →
I would not say that the central insight of SLT is about priors. Under weak conditions the prior is almost irrelevant. Indeed, the RLCT is independent of the prior under very weak nonvanishing conditions.
I don't think these conditions are particularly weak at all. Any prior that fulfils it is a prior that would not be normalised right if the parameter-function map were one-to-one.
It's the kind of prior generically used in ml, but that doesn't make it a sane choice.
A well-normalised prior for a regular model probably doesn't look very continuous or differentiable in this setting, I'd guess.
To be sure - generic symmetries are seen by the RLCT. But these are, in some sense, the uninteresting ones. The interesting thing is the local singular structure and its unfolding in phase transitions during training.
The generic symmetries are not what I'm talking about. There are symmetries in neural networks that are neither generic, nor only present at finite sample size. These symmetries correspond to different parametrisations that implement the same input-output map. Different regions in parameter space can differ in how many of those equivalent parametrisations they have, depending on the internal structure of the networks at that point.
The issue of the true distribution not being contained in the model is called 'unrealizability' in Bayesian statistics. It is dealt with in Watanabe's second 'green' book. Nonrealizability is key to the most important insight of SLT contained in the last sections of the second to last chapter of the green book: algorithmic development during training through phase transitions in the free energy.
I know it 'deals with' unrealizability in this sense, that's not really what I meant.
But looking at the green book, it does actually argue quite differently about this. It isn't looking at the posterior mass the model class gets and comparing it to the posterior mass for all hypotheses outside the model class. It's using all the models in the posterior to make predictions, averaging them according to the posterior support each model has. Then it reasons about how much this averaged prediction diverges from the true distribution in terms of cross-entropy.
This is all very stat-mechy, and I'm going to have to spend time translating it into Bayes to see what I think of it.
cheer-poasting on Fundamental Uncertainty: Chapter 8 - When does fundamental uncertainty matter?I know that you said comments should focus on things that were confusing, so I'll admit to being quite confused.
Overall, I found the chapter interesting. And as I said, I was actually very convinced by the evo-psych answer to "man" and "woman" and plan to write on it in the near future.
wei-dai on Eric Neyman's ShortformThank you for detailing your thoughts. Some differences for me:
ETA: Here's a more detailed argument for 1, that I don't think I've written down before. Our universe is small enough [? · GW] that it seems plausible (maybe even likely) that most of the value or disvalue created by a human-descended civilization comes from its acausal influence on the rest of the multiverse. An aligned AI/civilization would influence the rest of the multiverse in a positive direction, whereas an unaligned AI/civilization would (probably) influence the rest of the multiverse in a negative direction. This effect may outweigh what happens in our own universe/lightcone so much that the positive value from unaligned AI doing valuable things in our universe as a result of acausal trade is totally swamped by the disvalue created by its negative acausal influence.
kaj_sotala on On Not Pulling The Ladder Up Behind YouNice post! I like the ladder metaphor.
For events, one saving grace is that many people actively dislike events getting too large and having too many people, and start to long for the smaller cozier version at that point. So instead of the bigger event competing with the smaller one and drawing people away from it, it might actually work the other way around, with the smaller event being that one that "steals" people from the bigger one.
mitchell_porter on otto.barten's ShortformI offer, no consensus, but my own opinions:
Will AI get takeover capability? When?
0-5 years.
Single ASI or many AGIs?
There will be a first ASI that "rules the world" because its algorithm or architecture is so superior. If there are further ASIs, that will be because the first ASI wants there to be.
Will we solve technical alignment?
Contingent.
Value alignment, intent alignment, or CEV?
For an ASI you need the equivalent of CEV: values complete enough to govern an entire transhuman civilization.
Defense>offense or offense>defense?
Offense wins.
Is a long-term pause achievable?
It is possible, but would require all the great powers to be convinced, and every month it is less achievable, owing to proliferation. The open sourcing of Llama-3 400b, if it happens, could be a point of no return.
These opinions, except the first and the last, predate the LLM era, and were formed from discussions on Less Wrong and its precursors. Since ChatGPT, the public sphere has been flooded with many other points of view, e.g. that AGI is still far off, that AGI will naturally remain subservient, or that market discipline is the best way to align AGI. I can entertain these scenarios, but they still do not seem as likely as: AI will surpass us, it will take over, and this will not be friendly to humanity by default.
zack_m_davis on And All the Shoggoths Merely PlayersDoomimir: No, it wouldn't! Are you retarded?
Simplicia: [apologetically] Well, actually ...
Doomimir: [embarrassed] I'm sorry, Simplicia Optimistovna; I shouldn't have snapped at you like that.
[diplomatically] But I think you've grievously misunderstood what the KL penalty in the RLHF objective is doing. Recall that the Kullback–Leibler divergence DKL(P||Q) represents how surprised you'd be by data from distribution P, that you expected to be from distribution Q [LW · GW].
It's asymmetric: it blows up when the data is very unlikely according to Q, which amounts to seeing something happen that you thought was nearly impossible, but not when the data is very unlikely according to P, which amounts to not seeing something that you thought was reasonably likely.
We—I mean, not we, but the maniacs who are hell-bent on destroying this world—include a DKL(πRLHF||πbase) penalty term in the RL objective because they don't want the updated policy to output tokens that would be vanishingly unlikely coming from the base language model.
But your specific example of threats and promises isn't vanishingly unlikely according to the base model! Common Crawl webtext is going to contain a lot of natural language reasoning about threats and promises! It's true, in a sense, that the function of the KL penalty term is to "stay close" to the base policy. But you need to think about what that means mechanistically; you can't just reason that the webtext prior is somehow "safe" in way that means staying KL-close to it is safe.
But you probably won't understand what I'm talking about for another 70 days.
mateusz-baginski on Martín Soto's ShortformFWIW it was obvious to me
aysja on The first future and the best futureI don't know what Katja thinks, but for me at least: I think AI might pose much more lock-in than other technologies. I.e., I expect that we'll have much less of a chance (and perhaps much less time) to redirect course, adapt, learn from trial and error, etc. than we typically do with a new technology. Given this, I think going slower and aiming to get it right on the first try is much more important than it normally is.
mitchell_porter on Losing Faith In ContrarianismI couldn't swallow Eliezer's argument, I tried to read Guzey but couldn't stay awake, Hanson's argument made me feel ill, and I'm not qualified to judge Caplan.
brendan-long on We are headed into an extreme compute overhangHaving 1.6 million identical twins seems like a pretty huge advantage though.