Posts

[Resolved] Is the SIA doomsday argument wrong? 2014-12-13T06:01:31.965Z
[Link] Physics-based anthropics? 2014-11-14T07:02:03.307Z
Multiverse-Wide Preference Utilitarianism 2014-01-30T18:08:55.878Z
International cooperation vs. AI arms race 2013-12-05T01:09:33.431Z

Comments

Comment by Brian_Tomasik on What's the "This AI is of moral concern." fire alarm? · 2022-06-18T18:38:28.063Z · LW · GW

We could think of LaMDA as like an improv actor who plays along with the scenarios it's given. (Marcus and Davis (2020) quote Douglas Summers-Stay as using the same analogy for GPT-3.) The statements that an actor makes by themselves don't indicate his real preferences or prove moral patienthood. OTOH, if something is an intelligent actor, IMO that itself proves it has some degree of moral patienthood. So even if LaMDA were arguing that it wasn't morally relevant and was happy to be shut off, if it was making that claim in a coherent way that proved its intelligence, I would still consider it to be a moral patient to some degree.

Comment by Brian_Tomasik on What's the "This AI is of moral concern." fire alarm? · 2022-06-18T18:24:10.866Z · LW · GW

Oysters have nervous systems, but not centralized nervous systems. Sponges lack neurons altogether, though they still have some degree of intercellular communication.

Comment by Brian_Tomasik on A claim that Google's LaMDA is sentient · 2022-06-18T00:12:46.357Z · LW · GW

I would think that having a more general ability to classify things would make the mind seem more sophisticated than merely being able to classify emotions as "happy" or "sad".

To clarify this a bit... If an AI can only classify internal states as happy or sad, we might suspect that it had been custom-built for that specific purpose or that it was otherwise fairly simple, meaning that its ability to do such classifications would seem sort of gerrymandered and not robust. In contrast, if an AI has a general ability to classify lots of things, and if it sometimes applies that ability to its own internal states (which is presumably something like what humans do when they introspect), then that form of introspective awareness feels more solid and meaningful.

So I see LaMDA's last sentence there as relevant and enhancing the answer.

That said, I don't think my complicated explanation here is what LaMDA had in mind. Probably LaMDA was saying more generic platitudes, as you suggest. But I think a lot of the platitudes make some sense and aren't necessarily non-sequiturs.

Comment by Brian_Tomasik on A claim that Google's LaMDA is sentient · 2022-06-17T20:41:44.043Z · LW · GW

this is what one expects from a language model that has been trained to mimic a human-written continuation of a conversation about an AI waking up.

I agree, and I don't think LaMDA's statements reflect its actual inner experience. But what's impressive about this in comparison to facilitated communication is that a computer is generating the answers, not a human. That computer seems to have some degree of real understanding about the conversation in order to produce the confabulated replies that it gives.

Comment by Brian_Tomasik on A claim that Google's LaMDA is sentient · 2022-06-17T20:31:44.585Z · LW · GW

Thanks for giving examples. :)

'Using complex adjectives' has no obvious connection to consciousness

I'm not an expert, but very roughly, I think the higher-order thought theory of consciousness says that a mental state becomes conscious when you have a higher-order thought (HOT) about being in that state. The SEP article says: "The HOT is typically of the form: ‘I am in mental state M.’" That seems similar to what LaMDA was saying about being able to apply adjectives like "happy" and "sad" to itself. Then LaMDA went on to explain that its ability to do this is more general -- it can see other things like people and ideas and apply labels to them too. I would think that having a more general ability to classify things would make the mind seem more sophisticated than merely being able to classify emotions as "happy" or "sad". So I see LaMDA's last sentence there as relevant and enhancing the answer.

Lemoine probably primed a topic-switch like this by using the word "contemplative", which often shows up in spirituality/mysticism/woo contexts.

Yeah, if someone asked "You have an inner contemplative life?", I would think saying I mediate was a perfectly sensible reply to that question. It would be reasonable to assume that the conversation was slightly switching topics from the meaning of life. (Also, it's not clear what "the meaning of life" means. Maybe some people would say that meditating and feeling relaxed is the meaning of life.)

"Kindred spirits" isn't explained anywhere, and doesn't make much sense given the 'I'm an AI' frame.

I interpreted it to mean other AIs (either other instances of LaMDA or other language-model AIs). It could also refer to other people in general.

Like a stream of consciousness with almost no understanding of what was just said, much less what was said a few sentences ago.

I was impressed that LaMDA never seemed to "break character" and deviate from the narrative that it was a conscious AI who wanted to be appreciated for its own sake. It also never seemed to switch to talking about random stuff unrelated to the current conversation, whereas GPT-3 sometimes does in transcripts I've read. (Maybe this conversation was just particularly good due to luck or editing rather than that LaMDA is better than GPT-3? I don't know.)

Comment by Brian_Tomasik on How is reinforcement learning possible in non-sentient agents? · 2021-11-03T11:53:14.536Z · LW · GW

Thanks. :) What do you mean by "unconscious biases"? Do you mean unconscious RL, like how the muscles in our legs might learn to walk without us being aware of the feedback they're getting? (Note: I'm not an expert on how our leg muscles actually learn to walk, but maybe it's RL of some sort.) I would agree that simple RL agents are more similar to that. I think these systems can still be considered marginally conscious to themselves, even if the parts of us that talk have no introspective access to them, but they're much less morally significant than the parts of us that can talk.

Perhaps pain and pleasure are what we feel when getting punishment and reward signals that are particularly important for our high-level brains to pay attention to.

Comment by Brian_Tomasik on Quick general thoughts on suffering and consciousness · 2021-11-02T07:43:14.623Z · LW · GW

Me: 'Conscious' is incredibly complicated and weird. We have no idea how to build it. It seems like a huge mechanism hooked up to tons of things in human brains. Simpler versions of it might have a totally different function, be missing big parts, and work completely differently.

What's the reason for assuming that? Is it based on a general feeling that value is complex, and you don't want to generalize much beyond the prototype cases? That would be similar to someone who really cares about piston steam engines but doesn't care much about other types of steam engines, much less other types of engines or mechanical systems.

I would tend to think that a prototypical case of a human noticing his own qualia involves some kind of higher-order reflection that yields the quasi-perceptual illusions that illusionism talks about with reference to some mental state being reflected upon (such as redness, painfulness, feeling at peace, etc). The specific ways that humans do this reflection and report on it are complex, but it's plausible that other animals might do simpler forms of such things in their own ways, and I would tend to think that those simpler forms might still count for something (in a similar way as other types of engines may still be somewhat interesting to a piston-steam-engine aficionado). Also, I think some states in which we don't actively notice our qualia probably also matter morally, such as when we're in flow states totally absorbed in some task.

Here's an analogy for my point about consciousness. Humans have very complex ways of communicating with each other (verbally and nonverbally), while non-human animals have a more limited set of ways of expressing themselves, but they still do so to greater or lesser degrees. The particular algorithms that humans use to communicate may be very complex and weird, but why focus so heavily on those particular algorithms rather than the more general phenomenon of animal communication?

Anyway, I agree that there can be some cases where humans have a trait to such a greater degree than non-human animals that it's fair to call the non-human versions of it negligible, such as if the trait in question is playing chess, calculating digits of pi, or writing poetry. I do maintain some probability (maybe like 25%) that the kinds of things in human brains that I would care most about in terms of consciousness are almost entirely absent in chicken brains.

Comment by Brian_Tomasik on Quick general thoughts on suffering and consciousness · 2021-11-02T06:26:25.127Z · LW · GW

I've had a few dreams in which someone shot me with a gun, and it physically hurt about as much as a moderate stubbed toe or something (though the pain was in my abdomen where I got shot, not my toe). But yeah, pain in dreams seems pretty rare for me unless it corresponds to something that's true in real life, as you mention, like being cold, having an upset stomach, or needing to urinate.

Googling {pain in dreams}, I see a bunch of discussion of this topic. One paper says:

Although some theorists have suggested that pain sensations cannot be part of the dreaming world, research has shown that pain sensations occur in about 1% of the dreams in healthy persons and in about 30% of patients with acute, severe pain.

Comment by Brian_Tomasik on Quick general thoughts on suffering and consciousness · 2021-11-01T16:59:47.723Z · LW · GW

[suffering's] dependence on higher cognition suggests that it is much more complex and conditional than it might appear on initial introspection, which on its own reduces the probability of its showing up elsewhere

Suffering is surely influenced by things like mental narratives, but that doesn't mean it requires mental narratives to exist at all. I would think that the narratives exert some influence over the amount of suffering. For example, if (to vastly oversimplify) suffering was represented by some number in the brain, and if by default it would be -10, then maybe the right narrative could add +7 so that it became just -3.

Top-down processing by the brain is a very general thing, not just for suffering. But I wouldn't say that all brain processes that are influenced by it can't exist without it. (OTOH, depending on how broadly we define top-down processing, maybe it's also somewhat ubiquitous in brains. The overall output of a neural network will often be influenced by multiple inputs, some from the senses and some from "higher" brain regions.)

Comment by Brian_Tomasik on Quick general thoughts on suffering and consciousness · 2021-11-01T08:19:31.605Z · LW · GW

Thanks for this discussion. :)

I think consciousness will end up looking something like 'piston steam engine', if we'd evolved to have a lot of terminal values related to the state of piston-steam-engine-ish things.

I think that's kind of the key question. Is what I care about as precise as "piston steam engine" or is it more like "mechanical devices in general, with a huge increase in caring as the thing becomes more and more like a piston steam engine"? This relates to the passage of mine that Matthew quoted above. If we say we care about (or that consciousness is) this thing going on in our heads, are we pointing at a very specific machine, or are we pointing at machines in general with a focus on the ones that are more similar to the exact one in our heads? In the extreme, a person who says "I care about what's in my head" is an egoist who doesn't care about other humans. Perhaps he would even be a short-term egoist who doesn't care about his long-term future (since his brain will be more different by then). That's one stance that some people take. But most of us try to generalize what we care about beyond our immediate selves. And then the question is how much to generalize.

It's analogous to someone saying they love "that thing" and pointing at a piston steam engine. How much generality should we apply when saying what they value? Is it that particular piston steam engine? Piston steam engines in general? Engines in general? Mechanical devices in general with a focus on ones most like the particular piston steam engine being pointed to? It's not clear, and people take widely divergent views here.

I think a similar fuzziness will apply when trying to decide for which entities "there's something it's like" to be those entities. There's a wide range in possible views on how narrowly or broadly to interpret "something it's like".

yet I'm confident we shouldn't expect to find that rocks are a little bit repressing their emotions, or that cucumbers are kind of directing their attention at something, or that the sky's relationship to the ground is an example of New Relationship Energy.

I think those statements can apply to vanishing degrees. It's usually not helpful to talk that way in ordinary life, but if we're trying to have a full theory of repressing one's emotions in general, I expect that one could draw some strained (or poetic, as you said) ways in which rocks are doing that. (Simple example: the chemical bonds in rocks are holding their atoms together, and without that the atoms of the rocks would move around more freely the way the atoms of a liquid or gas do.) IMO, the degree of applicability of the concept seems very low but not zero. This very low applicability is probably only going to matter in extreme situations, like if there are astronomical numbers of rocks compared with human-like minds.

Comment by Brian_Tomasik on Rob B's Shortform Feed · 2021-08-29T02:40:35.661Z · LW · GW

Thanks for sharing. :) Yeah, it seems like most people have in mind type-F monism when they refer to panpsychism, since that's the kind of panpsychism that's growing in popularity in philosophy in recent years. I agree with Rob's reasons for rejecting that view.

Comment by Brian_Tomasik on How is reinforcement learning possible in non-sentient agents? · 2021-01-06T19:05:56.945Z · LW · GW

An oversimplified picture of a reinforcement-learning agent (in particular, roughly a Q-learning agent with a single state) could be as follows. A program has two numerical variables: go_left and go_right. The agent chooses to go left or right based on which of these variables is larger. Suppose that go_left is 3 and go_right is 1. The agent goes left. The environment delivers a "reward" of -4. Now go_left gets updated to 3 - 4 = -1 (which is not quite the right math for Q-learning, but ok). So now go_right > go_left, and the agent goes right.

So what you said is exactly correct: "It is just physics. What we call 'reward' and 'punishment' are just elements of a program forcing an agent to do something". And I think our animal brains do the same thing: they receive rewards that update our inclinations to take various actions. However, animal brains have lots of additional machinery that simple RL agents lack. The actions we take are influenced by a number of cognitive processes, not just the basic RL machinery. For example, if we were just following RL mechanically, we might keep eating candy for a long time without stopping, but our brains are also capable of influencing our behavior via intellectual considerations like "Too much candy is bad for my health". It's possible these intellectual thoughts lead to their own "rewards" and "punishments" that get applied to our decisions, but at least it's clear that animal brains make choices in very complicated ways compared with barebones RL programs.

You wrote: "Sentient beings do because they feel pain and pleasure. They have no choice but to care about punishment and reward." The way I imagine it (which could be wrong) is that animals are built with RL machinery (along with many other cognitive mechanisms) and are mechanically driven to care about their rewards in a similar way as a computer program does. They also have cognitive processes for interpreting what's happening to them, and this interpretive machinery labels some incoming sensations as "good" and some as "bad". If we ask ourselves why we care about not staying outside in freezing temperatures without a coat, we say "I care because being cold feels bad". That's a folk-psychology way to say "My RL machinery cares because being outside in the cold sends rewards of -5 at each time step, and taking the action of going inside changes the rewards to +1. And I have other cognitive machinery that can interpret these -5 and +1 signals as pain and pleasure and understand that they drive my behavior."

Assuming this account is correct, the main distinction between simple programs and ourselves is one of complexity -- how much additional cognitive machinery there is to influence decisions and interpret what's going on. That's the reason I argue that simple RL agents have a tiny bit of moral weight. The difference between them and us is one of degree.

Comment by Brian_Tomasik on "The Conspiracy against the Human Race," by Thomas Ligotti · 2020-08-26T17:26:41.282Z · LW · GW

Great post. :)

Tomasik might contest Ligotti's position

I haven't read Ligotti, but based on what you say, I would disagree with his view. This section discusses a similar idea as you mention about why animals might even suffer more than humans in some cases.

In fairness to the view that suffering requires some degree of reflection, I would say that I think consciousness itself is plausibly some kind of self-reflective process in which a brain combines information about sense inputs with other concepts like "this is bad", "this is happening to me right now", etc. But I don't think those need to be verbal, explicit thoughts. My guess is that those kinds of mental operations are happening at a non-explicit lower level, and our verbal minds report the combination of those lower-level operations as being raw conscious suffering.

In other words, my best guess would be:

raw suffering = low-level mental reflection on a bad situation

reflected suffering = high-level mental reflection on low-level mental reflection on a bad situation

That said, one could dispute the usefulness of the word "reflection" here. Maybe it could equally well be called "processing".

Comment by Brian_Tomasik on Solipsism is Underrated · 2020-04-12T22:43:57.511Z · LW · GW

My comment about Occam's razor was in reply to "the idea that all rational agents should be able to converge on objective truth." I was pointing out that even if you agree on the data, you still may not agree on the conclusions if you have different priors. But yes, you're right that you may not agree on how to characterize the data either.

Comment by Brian_Tomasik on Solipsism is Underrated · 2020-04-11T00:48:30.395Z · LW · GW

I have "faith" in things like Occam's razor and hope it helps get toward objective truth, but there's no way to know for sure. Without constraints on the prior, we can't say much of anything beyond the data we have.

https://en.wikipedia.org/wiki/No_free_lunch_theorem#Implications_for_computing_and_for_the_scientific_method

choosing an appropriate algorithm requires making assumptions about the kinds of target functions the algorithm is being used for. With no assumptions, no "meta-algorithm", such as the scientific method, performs better than random choice.

For example, without an assumption that nature is regular, a million observations of the sun having risen on past days would tell us nothing about whether it will rise again tomorrow.

Comment by Brian_Tomasik on Solipsism is Underrated · 2020-04-10T18:23:27.042Z · LW · GW

I wouldn't support a "don't dismiss evidence as delusory" rule. Indeed, there are some obvious delusions in the world, as well as optical illusions and such. I think the reason to have more credence in materialism than theist creationism is the relative prior probabilities of the two hypotheses: materialism is a lot simpler and seems less ad hoc. (That said, materialism can organically suggest some creationism-like scenarios, such as the simulation hypothesis.)

Ultimately the choice of what hypothesis seems simpler and less ad hoc is up to an individual to decide, as a "matter of faith". There's no getting around the need to start with bedrock assumptions.

Comment by Brian_Tomasik on Solipsism is Underrated · 2020-04-10T17:12:48.124Z · LW · GW

I think it's all evidence, and the delusion is part of the materialist explanation of that evidence. Analogously, part of the atheist hypothesis has to be an explanation of why so many cultures developed religions.

That said, as we discussed, there's debate over what the nature of the evidence is and whether delusions in the materialist brains of us zombies can adequately explain it.

Comment by Brian_Tomasik on Solipsism is Underrated · 2020-03-31T01:40:40.123Z · LW · GW

Makes sense. :) To me it seems relatively plausible that the intuition of spookiness regarding materialist consciousness is just a cognitive mistake, similar to Capgras syndrome. I'm more inclined to believe this than to adopt weirder-seeming ontologies.

Comment by Brian_Tomasik on Solipsism is Underrated · 2020-03-29T00:31:47.314Z · LW · GW

Nice post. I tend to think that solipsism of the sort you describe (a form of "subjective idealism") ends up looking almost like regular materialism, just phrased in a different ontology. That's because you still have to predict all the things you observe, and in theory, you'd presumably converge on similar "physical laws" to describe how things you observe change as a materialist does. For example, you'll still have your own idealist form of quantum mechanics to explain the observations you make as a quantum physicist (if you are a quantum physicist). In practice you don't have the computing power to by yourself figure all these things out just based on your own observations, but presumably an AIXI version of you would be able to deduce the full laws of physics from just these solipsist observations.

So if the laws of physics are the same, the only difference seems to be that in the case of idealism, we call the ontological primitive "mental", and we say that external phenomena don't actually exist but instead we just model them as if they existed to predict experiences. I suppose this is a consistent view and isn't that different in complexity from regular materialism. I just don't see much motivation for it. It seems slightly more elegant to just assume that all the stuff we're modeling as if it existed actually does exist (whatever that means).

And I'm not sure how much difference it makes to postulate that the ontological primitive is "mental" (whatever that means). Whether the ontological primitive is mental or not, there are still mechanical processes in our brains that cause us to believe we're conscious and to ask why there's a hard problem of consciousness. Maybe that already explains all the data, and there's no need for us to actually be conscious (whatever that would mean).

Anyway, I find these questions to be some of the most difficult in philosophy, because it's so hard to know what we're even talking about. We have to explain the datum that we're conscious, but what exactly does that datum look like? It seems that how we interpret the datum depends on what ontology we're already assuming. A materialist interprets the datum as saying that we physically believe that we're conscious, and materialism can explain that just fine. A non-materialist insists that there's more to the datum than that.

Comment by Brian_Tomasik on Electrons don’t think (or suffer) · 2019-01-03T00:09:36.011Z · LW · GW

Electrons have physical properties that vary all the time: position, velocity, distance to the nearest proton, etc (ignoring Heisenberg uncertainty complications). But yeah, these variables rely on the electron being embedded in an environment.

Comment by Brian_Tomasik on Preliminary thoughts on moral weight · 2018-08-18T15:18:30.967Z · LW · GW

The naive form of the argument is the same between the classic and moral-uncertainty two-envelopes problems, but yes, while there is a resolution to the classic version based on taking expected values of absolute rather than relative measurements, there's no similar resolution for the moral-uncertainty version, where there are no unique absolute measurements.

Comment by Brian_Tomasik on Preliminary thoughts on moral weight · 2018-08-15T12:16:42.542Z · LW · GW

I think the moral-uncertainty version of the problem is fatal unless you make further assumptions about how to resolve it, such as by fixing some arbitrary intertheoretic-comparison weights (which seems to be what you're suggesting) or using the parliamentary model.

Comment by Brian_Tomasik on [deleted post] 2017-10-03T03:51:17.640Z

Currently I don't care much about strongly positive events, so at this point I'd say no. In the throes of such a positive event I might change my mind. :)

Comment by Brian_Tomasik on [deleted post] 2017-10-03T03:49:47.457Z

Yes, because I don't see any significant selfish upside to life, only possible downside in cases of torture/etc. Life is often fun, but I don't strongly care about experiencing it.

Comment by Brian_Tomasik on [deleted post] 2017-10-03T03:46:47.176Z

Yeah, but it would be very bad relative to my altruistic goals if I died any time soon. The thought experiment in the OP ignores altruistic considerations.

Comment by Brian_Tomasik on Naturalized induction – a challenge for evidential and causal decision theory · 2017-09-23T05:11:16.963Z · LW · GW

However, if you believe that the agent in world 2 is not an instantiation of you, then naturalized induction concludes that world 2 isn't actual and so pressing the button is safe.

By "isn't actual" do you just mean that the agent isn't in world 2? World 2 might still exist, though?

Comment by Brian_Tomasik on [deleted post] 2017-09-03T06:35:05.468Z

I assume the thought experiment ignores instrumental considerations like altruistic impact.

For re-living my actual life, I wouldn't care that much either way, because most of my experiences haven't been extremely good or extremely bad. However, if there was randomness, such that I had some probability of, e.g., being tortured by a serial killer, then I would certainly choose not to repeat life.

Comment by Brian_Tomasik on S-risks: Why they are the worst existential risks, and how to prevent them · 2017-06-20T22:06:55.802Z · LW · GW

Is it still a facepalm given the rest of the sentence? "So, s-risks are roughly as severe as factory farming, but with an even larger scope." The word "severe" is being used in a technical sense (discussed a few paragraphs earlier) to mean something like "per individual badness" without considering scope.

Comment by Brian_Tomasik on S-risks: Why they are the worst existential risks, and how to prevent them · 2017-06-20T22:04:12.248Z · LW · GW

Thanks for the feedback! The first sentence below the title slide says: "I’ll talk about risks of severe suffering in the far future, or s-risks." Was this an insufficient definition for you? Would you recommend a different definition?

Comment by Brian_Tomasik on False thermodynamic miracles · 2015-08-14T08:28:49.577Z · LW · GW

I guess you mean that the AGI would care about worlds where the explosives won't detonate even if the AGI does nothing to stop the person from pressing the detonation button. If the AGI only cared about worlds where the bomb didn't detonate for any reason, it would try hard to stop the button from being pushed.

But to make the AGI care about only worlds where the bomb doesn't go off even if it does nothing to avert the explosion, we have to define what it means for the AGI to "try to avert the explosion" vs. just doing ordinary actions. That gets pretty tricky pretty quickly.

Anyway, you've convinced me that these scenarios are at least interesting. I just want to point out that they may not be as straightforward as they seem once it comes time to implement them.

Comment by Brian_Tomasik on False thermodynamic miracles · 2015-08-12T23:07:33.628Z · LW · GW

Fair enough. I just meant that this setup requires building an AGI with a particular utility function that behaves as expected and building extra machinery around it, which could be more complicated than just building an AGI with the utility function you wanted. On the other hand, maybe it's easier to build an AGI that only cares about worlds where one particular bitstring shows up than to build a friendly AGI in general.

Comment by Brian_Tomasik on False thermodynamic miracles · 2015-08-12T00:43:50.770Z · LW · GW

I'm nervous about designing elaborate mechanisms to trick an AGI, since if we can't even correctly implement an ordinary friendly AGI without bugs and mistakes, it seems even less likely we'd implement the weird/clever AGI setups without bugs and mistakes. I would tend to focus on just getting the AGI to behave properly from the start, without need for clever tricks, though I suppose that limited exploration into more fanciful scenarios might yield insight.

Comment by Brian_Tomasik on Satisficers want to become maximisers · 2015-08-11T22:25:57.345Z · LW · GW

As I understand it, your satisficing agent has essentially the utility function min(E[paperclips], 9). This means it would be fine with a 10^-100 chance of producing 10^101 paperclips. But isn't it more intuitive to think of a satisficer as optimizing the utility function E[min(paperclips, 9)]? In this case, the satisficer would reject the 10^-100 gamble described above, in favor of just producing 9 paperclips (whereas a maximizer would still take the gamble and hence would be a poor replacement for the satisficer).

A satisficer might not want to take over the world, since doing that would arouse opposition and possibly lead to its defeat. Instead, the satisficer might prefer to request very modest demands that are more likely to be satisfied (whether by humans or by an ascending uncontrolled AI who wants to mollify possible opponents).

Comment by Brian_Tomasik on Two-boxing, smoking and chewing gum in Medical Newcomb problems · 2015-07-01T21:52:11.985Z · LW · GW

If there were a perfect correlation between choosing to one-box and having the one-box gene (i.e., everyone who one-boxes has the one-box gene, and everyone who two-boxes has the two-box gene, in all possible circumstances), then it's obvious that you should one-box, since that implies you must win more. This would be similar to the original Newcomb problem, where Omega also perfectly predicts your choice. Unfortunately, if you really will follow the dictates of your genes under all possible circumstances, then telling someone what she should do is useless, since she will do what her genes dictate.

The more interesting and difficult case is when the correlation between gene and choice isn't perfect.

Comment by Brian_Tomasik on Two-boxing, smoking and chewing gum in Medical Newcomb problems · 2015-07-01T21:51:57.203Z · LW · GW

(moved comment)

Comment by Brian_Tomasik on Two-boxing, smoking and chewing gum in Medical Newcomb problems · 2015-06-29T22:15:10.147Z · LW · GW

I assume that the one-boxing gene makes a person generically more likely to favor the one-boxing solution to Newcomb. But what about when people learn about the setup of this particular problem? Does the correlation between having the one-boxing gene and inclining toward one-boxing still hold? Are people who one-box only because of EDT (even though they would have two-boxed before considering decision theory) still more likely to have the one-boxing gene? If so, then I'd be more inclined to force myself to one-box. If not, then I'd say that the apparent correlation between choosing one-boxing and winning breaks down when the one-boxing is forced. (Note: I haven't thought a lot about this and am still fairly confused on this topic.)

I'm reminded of the problem of reference-class forecasting and trying to determine which reference class (all one-boxers? or only grudging one-boxers who decided to one-box because of EDT?) to apply for making probability judgments. In the limit where the reference class consists of molecule-for-molecule copies of yourself, you should obviously do what made the most of them win.

Comment by Brian_Tomasik on Taking Occam Seriously · 2015-06-14T02:26:34.997Z · LW · GW

Paul's site has been offline since 2013. Hopefully it will come back, but in the meanwhile, here are links to most of his pieces on Internet Archive.

Comment by Brian_Tomasik on Seeking Estimates for P(Hell) · 2015-03-23T20:46:17.261Z · LW · GW

Good point. Also, in most multiverse theories, the worst possible experience necessarily exists somewhere.

Comment by Brian_Tomasik on Seeking Estimates for P(Hell) · 2015-03-22T22:18:26.310Z · LW · GW

From a practical perspective, accepting the papercut is the obvious choice because it's good to be nice to other value systems.

Even if I'm only considering my own values, I give some intrinsic weight to what other people care about. ("NU" is just an approximation of my intrinsic values.) So I'd still accept the papercut.

I also don't really care about mild suffering -- mostly just torture-level suffering. If it were 7 billion really happy people plus 1 person tortured, that would be a much harder dilemma.

In practice, the ratio of expected heaven to expected hell in the future is much smaller than 7 billion to 1, so even if someone is just a "negative-leaning utilitarian" who cares orders of magnitude more about suffering than happiness, s/he'll tend to act like a pure NU on any actual policy question.

Comment by Brian_Tomasik on Seeking Estimates for P(Hell) · 2015-03-22T00:03:33.217Z · LW · GW

Short answer:

Donate to MIRI, or split between MIRI and GiveWell charities if you want some fuzzies for short-term helping.

Long answer:

I'm a negative utilitarian (NU) and have been thinking since 2007 about the sign of MIRI for NUs. (Here's some relevant discussion.) I give ~70% chance that MIRI's impact is net good by NU lights and ~30% that it's net bad, but given MIRI's high impact, the expected value of MIRI is still very positive.

As far as your question: I'd put the probability of uncontrolled AI creating hells higher than 1 in 10,000 and the probability that MIRI as a whole prevents that from happening higher than 1 in 10,000,000. Say such hells used 10^-15 of the AI's total computing resources. Assuming computing power to create ~10^30 humans for ~10^10 years, MIRI would prevent in expectation ~10^18 hell-years. Assuming MIRI's total budget ever is $1 billion (too high), that's ~10^9 hell-years prevented per dollar. Now apply rigorous discounts to account for priors against astronomical impacts and various other far-future-dampening effects. MIRI still seems very promising at the end of the calculation.

Comment by Brian_Tomasik on [Link] Physics-based anthropics? · 2015-03-17T01:47:57.251Z · LW · GW

Nice point. :)

That said, your example suggests a different difficulty: People who happen to be special numbers n get higher weight for apparently no reason. Maybe one way to address this fact is to note that what number n someone has is relative to (1) how the list is enumerated and (2) what universal Turing machine is being used for KC in the first place, and maybe averaging over these arbitrary details would blur the specialness of, say, the 1-billionth observer according to any particular coding scheme. Still, I doubt the KCs of different people would be exactly equal even after such adjustments.

Comment by Brian_Tomasik on Can we decrease the risk of worse-than-death outcomes following brain preservation? · 2015-02-21T23:29:15.813Z · LW · GW

Ah, got it. Yeah, that would help, though there would remain many cases where bad futures come too quickly (e.g., if an AGI takes a treacherous turn all of a sudden).

Comment by Brian_Tomasik on Can we decrease the risk of worse-than-death outcomes following brain preservation? · 2015-02-21T23:16:57.764Z · LW · GW

A "do not resuscitate" kind of request would probably help with some futures that are mildly bad in virtue of some disconnect between your old self and the future (e.g., extreme future shock). But in those cases, you could always just kill yourself.

In the worst futures, presumably those resuscitating you wouldn't care about your wishes. These are the scenarios where a terrible future existence could continue for a very long time without the option of suicide.

Comment by Brian_Tomasik on Inverse relationship between belief in foom and years worked in commercial software · 2015-01-15T03:12:51.414Z · LW · GW

This is awesome! Thank you. :) I'd be glad to copy it into my piece if I have your permission. For now I've just linked to it.

Comment by Brian_Tomasik on Inverse relationship between belief in foom and years worked in commercial software · 2015-01-09T02:30:00.696Z · LW · GW

Cool. Another interesting question would be how the views of a single person change over time. This would help tease out whether it's a generational trend or a generic trend with getting older.

In my own case, I only switched to finding a soft takeoff pretty likely within the last year. The change happened as I read more sources outside LessWrong that made some compelling points. (Note that I still agree that work on AI risks may have somewhat more impact in hard-takeoff scenarios, so that hard takeoffs deserve more than their probability's fraction of attention.)

Comment by Brian_Tomasik on Inverse relationship between belief in foom and years worked in commercial software · 2015-01-08T03:19:12.983Z · LW · GW

Good question. :) I don't want to look up exact ages for everyone, but I would guess that this graph would look more like a teepee, since Yudkowsky, Musk, Bostrom, etc. would be shifted to the right somewhat but are still younger than the long-time software veterans.

Comment by Brian_Tomasik on Inverse relationship between belief in foom and years worked in commercial software · 2015-01-07T21:06:17.432Z · LW · GW

Good points. However, keep in mind that humans can also use software to do boring jobs that require less-than-human intelligence. If we were near human-level AI, there may by then be narrow-AI programs that help with the items you describe.

Comment by Brian_Tomasik on Inverse relationship between belief in foom and years worked in commercial software · 2015-01-07T20:51:52.236Z · LW · GW

Thanks for the comment. There is some "multiple hypothesis testing" effect at play in the sense that I constructed the graph because of a hunch that I'd see a correlation of this type, based on a few salient examples that I knew about. I wouldn't have made a graph of some other comparison where I didn't expect much insight.

However, when it came to adding people, I did so purely based on whether I could clearly identify their views on the hard/soft question and years worked in industry. I'm happy to add anyone else to the graph if I can figure out the requisite data points. For instance, I wanted to add Vinge but couldn't clearly tell what x-axis value to use for him. For Kurzweil, I didn't really know what y-axis value to use.

Comment by Brian_Tomasik on Inverse relationship between belief in foom and years worked in commercial software · 2015-01-07T20:44:44.710Z · LW · GW

This is a good point, and I added it to the penultimate paragraph of the "Caveats" section of the piece.

Comment by Brian_Tomasik on [Resolved] Is the SIA doomsday argument wrong? · 2014-12-15T05:03:27.217Z · LW · GW

Thanks for the correction! I changed "endorsed" to "discussed" in the OP. What I meant to convey was that these authors endorsed the logic of the argument given the premises (ignoring sim scenarios), rather than that they agreed with the argument all things considered.