LessWrong 2.0 Reader

View: New · Old · Top

next page (older posts) →

next page (older posts) →

Recent comments

steve2152 on Biorisk is an Unhelpful Analogy for AI Risk

This book argues (convincingly IMO) that it’s impossible to communicate, or even think, anything whatsoever, without the use of analogies.

If you say “AI runs on computers”, then the listener will parse those words by conjuring up their previous distilled experience of things-that-run-on-computers, and that previous experience will be helpful in some ways and misleading in other ways.
If you say “AI is a system that…” then the listener will parse those words by conjuring up their previous distilled experience of so-called “systems”, and that previous experience will be helpful in some ways and misleading in other ways.

Etc. Right?

If you show me an introduction to AI risk for amateurs that you endorse, then I will point out the “rhetorical shortcuts that imply wrong and misleading things” that it contains—in the sense that it will have analogies between powerful AI and things-that-are-not-powerful-AI, and those analogies will be misleading in some ways (when stripped from their context and taken too far). This is impossible to avoid.

Anyway, if someone says:

When it comes to governing technology, there are some areas, like inventing new programming languages, where it’s awesome for millions of hobbyists to be freely messing around; and there are other areas, like inventing new viruses, or inventing new uranium enrichment techniques, where we definitely don’t want millions of hobbyists to be freely messing around, but instead we want to be thinking hard about regulation and secrecy. Let me explain why AI belongs in the latter category…

…then I think that’s a fine thing to say. It’s not a rhetorical shortcut, rather it’s a way to explain what you’re saying, pedagogically, by connecting it to the listener’s existing knowledge and mental models.

zershaaneh-qureshi on Now THIS is forecasting: understanding Epoch’s Direct Approach

The point of the paragraph that the above quote was taken from is, I think, better summarised in its first sentence:

although Epoch takes an approach to forecasting TAI that is quite different to others in this space, its resulting probability distribution is not vastly dissimilar to those produced by other influential models

It is fair to question whether these two forecasts are “not vastly dissimilar” to one another. In some senses, two decades is a big difference between medians: for example, we suspect that a future where TAI arrives in the 2030s looks pretty different from a strategic perspective to one where TAI arrives in the 2050s.

But given the vast size of the possible space of AI timelines, and the fact that the two models compared here take two meaningfully different approaches to forecasting them, we think it’s noteworthy that their resulting distributions still fall in a similar ballpark of “TAI will probably arrive in the next few decades”. (In my previous post, Timelines to Transformative AI [LW · GW], I observed that a majority of recent timeline predictions fall in the rough ballpark of 10-40 years from now, and considered what we should make of that finding and how seriously we should take it.) It shows that we can make major changes in our assumptions but still come to the rough conclusion that TAI is a prospect for the relatively near term future, well within the lifetimes of many people alive today.

Also, I think the results of the Epoch model and the Cotra model are perhaps more similar than this two-decade gap might initially suggest. In the section [LW · GW] where we investigated varying non-empirically-tested inputs to the Epoch model, we found that making (what seemed to be) reasonable adjustments skewed the resulting median a few decades later. (Scott Alexander also tried something similar [LW · GW] with the Cotra model and observed a small degree of variation there.) Given the uncertainty over the Epoch model’s parameters and the scale of variation seen when adjusting them, a two decade gap between the medians from the (default versions of the) Epoch forecast and the Cotra forecast is not as vast a difference as it might at first seem.

If this seems like a helpful clarification, we can add a note about this in the article itself. :)

notfnofn on Explaining a Math Magic Trick

Very nice! Notice that if you write $I$ as $D^{- 1}$ , and play around with binomial coefficients a bit, we can rewrite this as:

D^{- k} (f p) = \infty \sum r = 0 (\frac{- k}{r}) (D^{- k - r} f) (D^{r} p)

which holds for $k < 0$ as well, in which case it becomes the derivative product rule. This also matches the formal power series expansion of $(x + y)^{- k}$ , which one can motivate directly

(By the way, how do you spoiler tag?)

steve2152 on Biorisk is an Unhelpful Analogy for AI Risk

This is an 800-word blog post, not 5 words. There’s plenty of room for nuance.

The way it stands right now, the post is supporting conversations like:

Person A: It’s not inconceivable that the world might wildly under-invest in societal resilience against catastrophic risks even after a “warning shot” for AI. Like for example, look at the case of bio-risks—COVID just happened, so the costs of novel pandemics are right now extremely salient to everyone on Earth, and yet, (…etc.).
Person B: You idiot, bio-risks are not at all analogous to AI. Look at this blog post by David Manheim explaining why.

Or:

Person B: All technology is always good, and its consequences are always good, and spreading knowledge is always good. So let’s make open-source ASI asap.
Person A [LW(p) · GW(p)]: If I hypothetically found a recipe that allowed anyone to make a novel pandemic using widely-available equipment, and then I posted it on my blog along with clearly-illustrated step-by-step instructions, and took out a billboard out in Times Square directing people to the blog post, would you view my actions as praiseworthy? What would you expect to happen in the months after I did that?
Person B: You idiot, bio-risks are not at all analogous to AI. Look at this blog post by David Manheim explaining why.

Is this what you want? I.e., are you on the side of Person B in both these cases?

lc on Shortform

Robin Hanson has apparently asked the same thing. It seems like such a bizarre question to me:

Most people do not have the constitution or agency for criminal murder
Most companies do not have secrets large enough that assassinations would reduce the size of their problems on expectation
Most people who work at large companies don't really give a shit if that company gets fined, and so they don't have the motivation to personally risk anything organizing murders to prevent lawsuits

towards_keeperhood on Rapid capability gain around supergenius level seems probable even without intelligence needing to improve intelligence

I think research on what you propose should definitely not be public and I'd recommend against publicly trying to push this alignment agenda.

towards_keeperhood on Rapid capability gain around supergenius level seems probable even without intelligence needing to improve intelligence

(I think) Planck found the formula that matched the empirically observed distribution, but had no explanation for why it should hold. Einstein found the justification for this formula.

filip-sondej on An Interpretability Illusion for Activation Patching of Arbitrary Subspaces

What if we constrain v to be in some subspace that is actually used by the MLP? (We can get it from PCA over activations on many inputs.)

This way v won't have any dormant component, so the MLP output after patching also cannot use that dormant pathway.

duschkopf on Beauty and the Bets

„Whether or not your probability model leads to optimal descision making is the test allowing to falsify it.“

Sure, I don‘t deny that. What I am saying is, that your probability model don‘t tell you which probability you have to base on a certain decision. If you can derive a probability from your model and provide a good reason to consider this probability relevant to your decision, your model is not falsified as long you arrive at the right decision. Suppose a simple experiment where the experimenter flips a fair coin and you have to guess if Tails or Heads, but you are only rewarded for the correct decision if the coin comes up Tails. Then, of course, you should still entertain unconditional probabilities P(Heads)=P(Tails)=1/2. But this uncertainty is completely irrelevant to your decision. What is relevant, however, is P(Tails/Tails)=1 and P(Heads/Tails)=0, concluding you should follow the strategy always guessing Tails. Another way to arrive at this strategy is to calculate expected utilities setting U(Heads)=0 as you would propose. But this is not the only reasonable solution. It’s just a different route of reasoning to take into account the experimental condition that your decision counts only if the coin lands Tails.

„The model says that P(Heads|Red) = 1/3 P(Heads|Blue) = 1/3 but P(Heads|Red or Blue) = 1/2 Which obviosly translates in a betting scheme: someone who bets on Tails only when the room is Red wins 2/3 of times and someone who bets on Tails only when the room is Blue wins 2/3 of times, while someone who always bet on Tails wins only 1/2 of time.“

A quick translation of the probabilities is:

P(Heads/Red)=1/3: If your total evidence is Red, then you should entertain probability 1/3 for Heads.

P(Heads/Blue)=1/3: If your total evidence is Blue, then you should entertain probability 1/3 for Heads.

P(Heads/Red or Blue)=1/2: If your total evidence is Red or Blue, which is the case if you know that either red or blue or both, but not which exactly, you should entertain probalitity 1/2 for Heads.

If the optimal betting sheme requires you to rely on P(Heads/Red or Blue)=1/2 when receiving evidence Blue, then the betting sheme demands you to ignore your total evidence. Ignoring total evidence does not necessarily invalidate the probability model, but it certainly needs justification. Otherwise, by strictly following total evidence your model will let you also run foul of the Reflection Principle, since you will arrive at probability 1/3 in every single experimental run.

Going one step back, with my translation of the conditional probabilities above I have made the implicit assumption that the way the agent learns evidence is not biased towards a certain hypothesis. But this is obviously not true for the Beauty: Due to the memory loss Beauty is unable to learn evidence „Red and Blue“ regardless of the coin toss. This in combination with her sleep on Tuesday if Heads, she is going to learn „Red“ and „Blue“ (but not „Red and Blue“) if Tails while she is only going to learn either „Red“ or „Blue“ if Heads, resulting in a bias towards the Tails-hypothesis.

I admit that P(Heads/Red)=P(Heads/Blue)=1/3, but P(Heads/Red or Blue)=1/2 hints you towards the existence of that information selection bias. However, this is just as little a feature of your model as a flat tire is a feature of your car because it hints you to fix it. It is not your probability model that guides you to adopt the proper betting strategy by ignoring total evidence. In fact, it is just the other way around that your knowledge about the bias guides you to partially dismiss your model. As mentioned above, this does not necessarily invalidate your model, but it shows that directly applying it in certain decision scenarios does not guarantee optimal decisions but can even lead to bad decisions and violating Reflection Principle.

Therefore, as a halfer, I would prefer an updating rule that takes into account the bias and telling me P(Heads/Red)=P(Heads/Blue)=P(Red or Blue)=1/2. While offering me the possibility of a workaround to arrive at your betting sheme. One possible workaround is that Beauty runs a simulation of another experiment within her original Technicolor Experiment in which she is only awoken in a Red room. She can easily simulate that and the same updating rule that tells her P(Heads/Red)=1/2 for the original experiment tells her P(Heads/Red)=1/3 for the simulated experiment.

„This leads to a conclusion that observing event "Red" instead of "Red or Blue" is possible only for someone who has been expecting to observe event "Red" in particular. Likewise, observing HTHHTTHT is possible for a person who was expecting this particular sequence of coin tosses, instead of any combination with length 8. See Another Non-Anthropic Paradox: The Unsurprising Rareness of Rare Events“

I have already refuted this way of reasoning in the comments of your post.

steve2152 on Does reducing the amount of RL for a given capability level make AI safer?

Right, and that wouldn’t apply to a model-based RL system that could learn an open-ended model of any aspect of the world and itself, right?

I think your “it is nearly impossible for any computationally tractable optimizer to find any implementation for a sparse/distant reward function” should have some caveat that it only clearly applies to currently-known techniques. In the future there could be better automatic-world-model-builders, and/or future generic techniques to do automatic unsupervised reward-shaping for an arbitrary reward, such that AIs could find out-of-the-box ways to solve hard problems without handholding.

LessWrong 2.0 Reader

Archive

Recent comments