LessWrong 2.0 Reader
View: New · Old · TopRestrict date range: Today · This week · This month · Last three months · This year · All time
next page (older posts) →
next page (older posts) →
What if we constrain v to be in some subspace that is actually used by the MLP? (We can get it from PCA over activations on many inputs.)
This way v won't have any dormant component, so the MLP output after patching also cannot use that dormant pathway.
duschkopf on Beauty and the Bets„Whether or not your probability model leads to optimal descision making is the test allowing to falsify it.“
Sure, I don‘t deny that. What I am saying is, that your probability model don‘t tell you which probability you have to base on a certain decision. If you can derive a probability from your model and provide a good reason to consider this probability relevant to your decision, your model is not falsified as long you arrive at the right decision. Suppose a simple experiment where the experimenter flips a fair coin and you have to guess if Tails or Heads, but you are only rewarded for the correct decision if the coin comes up Tails. Then, of course, you should still entertain unconditional probabilities P(Heads)=P(Tails)=1/2. But this uncertainty is completely irrelevant to your decision. What is relevant, however, is P(Tails/Tails)=1 and P(Heads/Tails)=0, concluding you should follow the strategy always guessing Tails. Another way to arrive at this strategy is to calculate expected utilities setting U(Heads)=0 as you would propose. But this is not the only reasonable solution. It’s just a different route of reasoning to take into account the experimental condition that your decision counts only if the coin lands Tails.
„The model says that P(Heads|Red) = 1/3 P(Heads|Blue) = 1/3 but P(Heads|Red or Blue) = 1/2 Which obviosly translates in a betting scheme: someone who bets on Tails only when the room is Red wins 2/3 of times and someone who bets on Tails only when the room is Blue wins 2/3 of times, while someone who always bet on Tails wins only 1/2 of time.“
A quick translation of the probabilities is:
P(Heads/Red)=1/3: If your total evidence is Red, then you should entertain probability 1/3 for Heads.
P(Heads/Blue)=1/3: If your total evidence is Blue, then you should entertain probability 1/3 for Heads.
P(Heads/Red or Blue)=1/2: If your total evidence is Red or Blue, which is the case if you know that either red or blue or both, but not which exactly, you should entertain probalitity 1/2 for Heads.
If the optimal betting sheme requires you to rely on P(Heads/Red or Blue)=1/2 when receiving evidence Blue, then the betting sheme demands you to ignore your total evidence. Ignoring total evidence does not necessarily invalidate the probability model, but it certainly needs justification. Otherwise, by strictly following total evidence your model will let you also run foul of the Reflection Principle, since you will arrive at probability 1/3 in every single experimental run.
Going one step back, with my translation of the conditional probabilities above I have made the implicit assumption that the way the agent learns evidence is not biased towards a certain hypothesis. But this is obviously not true for the Beauty: Due to the memory loss Beauty is unable to learn evidence „Red and Blue“ regardless of the coin toss. This in combination with her sleep on Tuesday if Heads, she is going to learn „Red“ and „Blue“ (but not „Red and Blue“) if Tails while she is only going to learn either „Red“ or „Blue“ if Heads, resulting in a bias towards the Tails-hypothesis.
I admit that P(Heads/Red)=P(Heads/Blue)=1/3, but P(Heads/Red or Blue)=1/2 hints you towards the existence of that information selection bias. However, this is just as little a feature of your model as a flat tire is a feature of your car because it hints you to fix it. It is not your probability model that guides you to adopt the proper betting strategy by ignoring total evidence. In fact, it is just the other way around that your knowledge about the bias guides you to partially dismiss your model. As mentioned above, this does not necessarily invalidate your model, but it shows that directly applying it in certain decision scenarios does not guarantee optimal decisions but can even lead to bad decisions and violating Reflection Principle.
Therefore, as a halfer, I would prefer an updating rule that takes into account the bias and telling me P(Heads/Red)=P(Heads/Blue)=P(Red or Blue)=1/2. While offering me the possibility of a workaround to arrive at your betting sheme. One possible workaround is that Beauty runs a simulation of another experiment within her original Technicolor Experiment in which she is only awoken in a Red room. She can easily simulate that and the same updating rule that tells her P(Heads/Red)=1/2 for the original experiment tells her P(Heads/Red)=1/3 for the simulated experiment.
„This leads to a conclusion that observing event "Red" instead of "Red or Blue" is possible only for someone who has been expecting to observe event "Red" in particular. Likewise, observing HTHHTTHT is possible for a person who was expecting this particular sequence of coin tosses, instead of any combination with length 8. See Another Non-Anthropic Paradox: The Unsurprising Rareness of Rare Events“
I have already refuted this way of reasoning in the comments of your post.
steve2152 on Does reducing the amount of RL for a given capability level make AI safer?Right, and that wouldn’t apply to a model-based RL system that could learn an open-ended model of any aspect of the world and itself, right?
I think your “it is nearly impossible for any computationally tractable optimizer to find any implementation for a sparse/distant reward function” should have some caveat that it only clearly applies to currently-known techniques. In the future there could be better automatic-world-model-builders, and/or future generic techniques to do automatic unsupervised reward-shaping for an arbitrary reward, such that AIs could find out-of-the-box ways to solve hard problems without handholding.
algon on Some Experiments I'd Like Someone To Try With An Amnestic@habryka [LW · GW] this comment has an anomalous amount of karma. It showed up on popular comments, I think, and I'm wondering if people liked the comment when they saw it there which lead to a feedback loop of more eyeballs on the comment, more likes, more eyeball etc. If so, is that the intended behaviour of the popular comments feature? It seems like it shouldn't be.
david-james on If You Demand Magic, Magic Won't HelpLet's step back. This thread of the conversation is rooted in this claim: "Let's be honest: all fiction is a form of escapism.". Are we snared in the [Disputing Definitions](https://www.lesswrong.com/posts/7X2j8HAkWdmMoS8PE/disputing-definitions [LW · GW]) trap? To quote from that LW article:
> if the issue arises, both sides should switch to describing the event in unambiguous lower-level constituents, like acoustic vibrations or auditory experiences. Or each side could designate a new word, like 'alberzle' and 'bargulum', to use for what they respectively used to call 'sound'; and then both sides could use the new words consistently. That way neither side has to back down or lose face, but they can still communicate. And of course you should try to keep track, at all times, of some testable proposition that the argument is actually about.
I propose that we recognize several lower-level testable claims, framed as questions. How many people read fiction to ...
1. entertain?
2. distract from an unpleasant reality?
3. understand the human condition (including society)?
4. think through alternative scenarios?
Now I will connect the conversation to these four points:
* Luke_A_Somers: "Why would I ever want to escape from my wonderful life to go THERE?" See point 2.
* Which points is thomblake referring to? Consider this quote from [The Philosophy of Horror](https://www.kentuckypress.com/9780813136554/the-philosophy-of-horror/). "Whether serious, kitschy, frightening, or ridiculous, horror not only arouses the senses but also raises profound questions about fear, safety, justice, and suffering. From literature and urban legends to film and television, horror's ability to thrill has made it an integral part of modern entertainment." From this, it seems they are emphasizing points 1 and 3.
* JonInstall pulls out the dictionary in the hopes of "settling" the debate. He refers to point 1.
* To add some texture to the discussion: when I read the [embedded story](https://en.wikipedia.org/wiki/Story_within_a_story) [The Tale of the Omegas](https://www.reddit.com/r/singularity/comments/12dmufv/the_most_important_short_story_everyone_here/) in [Life 3.0](https://www.penguinrandomhouse.com/books/530584/life-30-by-max-tegmark/), include me in point 4.
Does this sound about right?
I’m not quite sure what you mean by “deeply painful process.” There is often a segment of any community that resists any change. That’s not to say that it has to be a fight, but community practices have an inertia to them. Sometimes that a shift that’s happens over time.
For instance, when I was a kid (1980s), “gay” was a common pejorative. While there have been plenty of painful events that have happened in the lives of LGBT folk, I don’t think that this was due to some process that is deeply painful, other than people slowly changing their minds over time.
I’ve seen the polyamorous community shift best practices over time. Again though, I don’t think that this is due to some inherently painful process. One could argue that the collective pain that we experience as we’re making mistakes is that process, but I suspect that isn’t what you mean here.
I think that change is generally hard, but it naturally happens over time.
davidmanheim on Biorisk is an Unhelpful Analogy for AI RiskI agree that we do not have an exact model for anything in immunology, unlike physics, and there is a huge amount of uncertainty. But that's different than saying it's not well-understood; we have clear gold-standard methods for determining answers, even if they are very expensive. This stands in stark contrast to AI, where we don't have the ability verify that something works or is safe at all without deploying it, and even that isn't much of a check on its later potential for misuse.
But aside from that, I think your position is agreeing with mine much more than you imply. My understanding is that we have newer predictive models which can give uncertain but fairly accurate answers to many narrow questions. (Older, non-ML methods also exist, but I'm less familiar with them.) In your hypothetical case, I expect that the right experts can absolutely give indicative answers about whether a novel vaccine peptide is likely or unlikely to have cross-reactivity with various immune targets, and the biggest problem is that it's socially unacceptable to assert confidence in anything short of tested and verified case. But the models can get, in the case of the Zhang et al paper above, 70% accurate answers, which can help narrow the problem for drug or vaccine discovery, then they do need to be followed with in vitro tests and trials.
mo-putera on AIs teams will probably be more superintelligent than individual AIsAmateur hour question, if you don't mind: how does your "future of AI teams" compare/contrast with Drexler's CAIS model [LW · GW]?
ben-lang on What is a community that has changed their behaviour without strife?One issue is going to be filtering.
Strife and conflict is memorable. So you are searching for the least noteworthy examples, the ones that people are least likely to comment on or remember.
I don't know what qualifies as a "community" really. At work I have seen uncontroversial changes come in a few times.
rom on What is a community that has changed their behaviour without strife?Probably not super helpful/what you're looking for, but one broad category of groups who go from 'doing violence' to 'doing much less or no violence' (often within a short space of time) are resistance organisations that successfully manage the transition, usually after achieving some level of progress. The ANC in South Africa seems like a good example. Sinn Féin in Ireland (established as the political wing of the IRA) is another.