LessWrong 2.0 Reader

View: New · Old · Top

← previous page (newer posts) · next page (older posts) →

← previous page (newer posts) · next page (older posts) →

Recent comments

cousin_it on Wei Dai's Shortform

Yeah, that seems to agree with my pessimistic view - that we are selfish animals, except we have culture, and some cultures accidentally contain altruism. So the answer to your question "are humans fundamentally good or evil?" is "humans are fundamentally evil, and only accidentally sometimes good".

abandon on Open Thread Spring 2024

I apologize for my lack of time to find the sources for this belief, so I could well be wrong, but my recollection of looking up a similar idea is that I found it to be reversible only in the very earliest stages, when the tooth has weakened but not yet developed a cavity proper.

jan_kulveit on Express interest in an "FHI of the West"

Sorry, but I don't think this should be branded as "FHI of the West".

I don't think you personally or Lightcone share that much of an intellectual taste with FHI or Nick Bostrom - Lightcone seems firmly in the intellectual tradition of Berkeley, shaped by orgs like MIRI and CFAR. This tradition was often close to FHI thoughts, but also quite often at tension with it. My hot take is you particularly miss part of the generators of the taste which made FHI different from Berkeley. I sort of dislike the "FHI" brand being cooped / blended in this way.

ape-in-the-coat on When is a mind me?

"You should anticipate having both experiences" sounds sort of paradoxical or magical, but I think this stems from a verbal confusion.

You can easily clear this confusion if you rephrase it as "You should anticipate having any of these experiences". Then it's immediately clear that we are talking about two separate screens. And it's also clear that our curriocity isn't actually satisfied. That the question "which one of these two will actually be the case" is still very much on the table.

Rob-y feels exactly as though he was just Rob-x, and Rob-z also feels exactly as though he was just Rob-x

Yes, this is obvious. Still as soon as we got Rob-x and Rob-z they are not "metaphysically the same person". When Rob-x says "I" he is reffering to Rob-x, not Rob-z and vice versa. More specifically Rob-x is refering to some causal curve through time ans Rob-z is reffering to another causal curve through time. These two curves are the same to some point, but then they are not.

gunnar_zarncke on Why Would Belief-States Have A Fractal Structure, And Why Would That Matter For Interpretability? An Explainer

Conceptually, we could then sketch out the whole fractal by repeating this process to randomly sample a bunch of points. But it turns out we don’t even need to do that! If we just run the single-point process for a while, each iteration randomly picking one of the three functions to apply, then we’ll “wander around” the fractal, in some sense, and in the long run (pretty fast in practice) we’ll wander around the whole thing.

Not if you just run just that code part. It will quickly converge to some very small area of the fractal and not come back. Something must be missing.

viliam on Raemon's Shortform

I'm working on this as a full blogpost but figured I would start getting pieces of it out here for now.

Looking forward to specific examples, pretty please.

signer on When is a mind me?

In the case of teleportation, I think teleportation-phobic people are mostly making an implicit error of the form “mistakenly modeling situations as though you are a Cartesian Ghost who is observing experiences from outside the universe”, not making a mistake about what their preferences are per se.

Why not both? I can imagine that someone would be persuaded to accept teleportation/uploading if they stopped believing in physical Cartesian Ghost. But it's possible that if you remind them that continuity of experience, like table, is just a description of physical situation and not divinely blessed necessary value, that would be enough to tip the balance toward them valuing carbon or whatever. It's bad to be wrong about Cartesian Ghosts, but it's also bad to think that you don't have a choice about how you value experience.

lao-mein on An examination of GPT-2's boring yet effective glitch

I think a lot of it comes down to training data context - " Leilan" is only present in certain videogame scrapes, " petertodd" is only found in Bitcoin spam, ect. So when you try to use it in a conversational context, the model starts spitting out weird stuff because it doesn't have enough information to understand what those tokens actually mean. I think GPT-2's guess for " petertodd" is something like "part of a name/email, if you see it, expect more mentions of Bitcoin". And not anything more, since that token doesn't occur much anywhere else. Thus, if you bring it up in a context where Bitcoin spam is very unlikely to occur, like a conversation with an AI assistant, it kinda just acts like a masked token, and you get the glitch token behavior.

faul_sname on Experiments with an alternative method to promote sparsity in sparse autoencoders

The other baseline would be to compare one L1-trained SAE against another L1-trained SAE -- if you see a similar approximate "1/10 have cossim > 0.9, 1/3 have cossim > 0.8, 1/2 have cossim > 0.7" pattern, that's not definitive proof that both approaches find "the same kind of features" but it would strongly suggest that, at least to me.

jobst-heitzig on Eliezer Yudkowsky Is Frequently, Confidently, Egregiously Wrong

Take a possible world in which the predictor is perfect (meaning: they were able to make a prediction, and there was no possible extension of that world's trajectory in which what I will actually do deviates from what they have predicted). In that world, by definition, I no longer have a choice. By definition I will do what the predictor has predicted. Whatever has caused what I will do lies in the past of the prediction, hence in the past of the current time point. There is no point in asking myself now what I should do as I have no longer causal influence on what I will do. I can simply relax and watch myself doing what I have been caused to do some time before. I can of course ask myself what might have caused my action and try to predict myself from that what I will do. If I come to believe that it was myself who decided at some earlier point in time what I will do, then I can ask myself what I should have decided at that earlier point in time. If I believe that at that earlier point in time I already knew that the predictor would act in the way it did, and if I believe that I have made the decision rationally, then I should conclude that I have decided to one-box.

The original version of Newcomb's paradox in Nozick 1969 is not about a perfect predictor however. It begins with (1) "Suppose a being in whose power to predict your choices you have enormous confidence.... You know that this being has often correctly predicted your choices in the past (and has never, so far as you know, made an incorrect prediction about your choices), and furthermore you know that this being has often correctly predicted the choices of other people, many of whom are similar to you, in the particular situation to be described below". So the information you are given is explicitly only about things from the past (how could it be otherwise). It goes on to say (2) "You have a choice between two actions". Information (2) implies that what I will do has not been decided yet and I still have causal influence on what I will do. Hence the information what I will do cannot have been available to the predictor. This implies that the predictor cannot have made a perfect prediction about my behaviour. Indeed nothing in (1) implies that they have, the information given is not about my future action at all. After I will have made my decision, it might turn out, of course, that it happens to coincides with what the predictor has predicted. But that is irrelevant for my choice as it would only imply that the predictor will have been lucky this time. What should I make of information (1)? If I am confident that I still have a choice, that question is of no significance for the decision problem at hand and I should two-box. If I am confident that I don't have a choice but have decided already, the reasoning of the previous paragraph applies and I should hope to observe that I will one-box.

What if I am unsure whether or not I still have a choice? I might have the impression that I can try to move my muscles this way or that way, without being perfectly confident that they will obey. What action should I then decide to try? I should decide to try two-boxing. Why? Because that decision is the dominant strategy: if it turns out that indeed I can decide my action now, then we're in a world where the predictor was not perfect but merely lucky and in that world two-boxing is dominant; if it instead turns out that I was not able to override my earlier decision at this point, then we're in a world where what I try now makes no difference. In either case, trying to two-box is undominated by any other strategy.

LessWrong 2.0 Reader

Archive

Recent comments