Surface Thoughts Suck 2020-11-02T13:24:03.229Z
Not Even Evidence 2020-10-06T01:20:25.028Z
Dach's Shortform 2020-09-24T07:17:48.478Z
Outcome Terminology? 2020-09-14T18:04:05.048Z


Comment by Dach on What are good election betting opportunities? · 2020-11-03T01:37:16.616Z · LW · GW

I can confirm that this still works. Sum of the price of all Nos is $14.77, payoff is $15.

Comment by Dach on As a Washed Up Former Data Scientist and Machine Learning Researcher What Direction Should I Go In Now? · 2020-10-20T04:46:12.982Z · LW · GW

So, I guess the question boils down to, how seriously should I consider switching into the field of AI Alignment, and if not, what else should I do instead?

I think you should at least take the question seriously. You should consider becoming in involved in AI Alignment to the extent that you think doing so will be the highest value strategy, accounting for opportunity costs. An estimate for this could be derived using the interplay between your answers to the following basic considerations:

  • What are your goals?
  • What are the most promising methods for pursuing your various goals?
    • What resources do you have, and how effective would investing those resources be, on a method by method and goal by goal basis?

An example set of (short and incomplete) answers which would lead you to conclude "I should switch to the field of AI Alignment" is:

Like should I avoid working on AI at all and just do something fun like game design, or is it still a good idea to push forward ML despite the risks?

If you're not doing bleeding edge research (and no one doing bleeding edge research is reading your papers), your personal negative impact on AI Alignment efforts can be more effectively offset by making more money and then donating e.g. $500 to MIRI (or related) than changing career.

And if switching to AI Alignment should be done, can it be a career or will I need to find something else to pay the bills with as well?

AI Alignment is considered by many to be literally the most important problem in the world. If you can significantly contribute to AI Alignment, you will be able to find someone to give you money.

If you can't significantly personally contribute to AI Alignment but still think the problem is important, I would advise advancing some other career and donating money to alignment efforts, starting a youtube channel and spreading awareness of the problem, etc.

I am neither familiar with you nor an alignment researcher, so I will eschew giving specific career advice.

Comment by Dach on Industrial literacy · 2020-10-15T10:58:17.021Z · LW · GW

You were welcome to write an actual response, and I definitely would have read it. I was merely announcing my advanced intent to not respond in detail to any following comments, and explaining why in brief, conservative terms. This is seemingly strictly better- it gives you new information which you can use to decide whether or not you want to respond. If I was being intentionally mean, I would have allowed you to write a detailed comment and never responded, potentially wasting your time.

If your idea of rudeness is constructed in this (admittedly inconvenient) way, I apologize.

Comment by Dach on Industrial literacy · 2020-10-15T09:45:15.819Z · LW · GW

Whether the future “matters more than today” is not a question of impersonal fact. Things, as you no doubt know, do not ‘matter’ intransitively; they matter to someone. So the question is, does “the future” (however construed) matter to me more than “today” (likewise, however construed) does? Does “the future” matter to my hypothetical friend Alice more than today does, or to her neighbor Bob? Etc.

And any of these people are fully within their right to answer in the negative.

Eh... We can draw conclusions about the values of individuals based on the ways in which they seem to act in the limit of additional time and information, the origins of humanity (selection for inclusive genetic fitness), by constructing thought experiments to solicit revealed beliefs, etc.

Other agents are allowed to claim that they have more insight than you into certain preferences of yours- they often do. Consider the special cases in which you can prove that the stated preferences of some humans allow you to siphon infinite money off of them. Also consider the special cases in which someone says something completely incoherent- "I prefer two things to one another under all conditions", or some such. We know that they're wrong. They can refuse to admit they're wrong, but they can't properly do that without giving us all of their money or in some sense putting their fingers in their ears.

These special cases are just special cases. In general, values are highly entangled with concrete physical information. You may say that you want to put your hand on that (unbeknownst to you) searing plate, but we can also know that you're wrong. You don't want to do that, and you'd agree if only you knew that the plate was searing hot.

They are fully within their right to answer in the negative, but they're not allowed to decide that they're correct. There is a correct answer to what they value, and they don't necessarily have perfect insight into that.

Note that you’re making a non-trivial claim here. In past discussions, on Less Wrong and in adjacent spaces, it has been pointed out that our ability to predict future consequences of our actions drops off rapidly as our time horizon recedes into the distance. It is not obvious to me that I am in any particularly favorable position to affect the course of the distant future in any but the most general ways (such as contributing to, or helping to avert, human extinction—and even there, many actions I might feasibly take could plausibly affect the likelihood of my desired outcome in either the one direction or the other).

You don't need to be able to predict the future with omniscient accuracy to realize that you are in an unusually important position for affecting the future.

If it's not obvious, here we go: You're an above average intelligence person living in the small period directly before Humanity is expected (By top experts- and with good cause) to develop artificial general intelligence. This technology will allow us to break the key scarcities of civilization:

  1. Allowing vastly more efficient conversion of matter into agency through the fabrication of computer hardware. This process will, given the advent of artificial general intelligence, soon far surpass the efficiency with which we can construct Human agency. Humans take a very long time to make, and you must train each individual Human- you can't directly copy Human software, and the indirect copying is very, very slow.
  2. Allowing agents with intelligence vastly above that of the most intelligent humans (whose brains must all fit in a container of relatively limited size) in all strategically relevant regards- speed, quality, modularity, I/O speed, multitasking ability, adaptability, transparency, etc.
  3. Allowing us to build agents able to access a much more direct method of recursively improving their own intelligence by buying or fabricating new hardware and directly improving their own code, triggering an extremely exploitable direct feedback loop.

The initial conditions of the first agent(s) we deploy which possesses these radical and simultaneously new options which will, on account of the overwhelming importance of these limitations on the existing state of affairs, precisely and "solely" determine the future.

This is a pretty popular opinion among the popular rationalist writers- I pass the torch on to them.

Sorry, no. There is a categorical difference between bringing a person into existence and affecting a person’s future life, contingent on them being brought into existence. It of course makes sense to speak of doing the latter sort of thing “for” the person-to-be, but such isn’t the case for the former sort of thing.

I was aware of the difference. The point (Which I directly stated at the end- convenient!) is that "It does make sense to say that you’re doing things 'for' people who don’t exist." If this doesn't directly address your point, the proper response to make would have been "Ok, I think you misunderstood what I was saying." I think that I did misunderstand what you were saying, so disregard.

Aside from that, I still think that saying bringing someone into existence "for" them makes sense. I think you saying the thing doesn't "make sense" is unfairly dismissive and overly argumentative. If someone said that they weren't going to have an abortion "for" their baby, (or, if you disagree with me about the lines of what constitutes a "person") that they were stopping some pain relieving experimental drug that was damaging their fertility "for" their future children, you'd receive all of the information they meant to convey about their motivations. It would definitely make sense. You might disagree with that reasoning, but it's coherent. They have an emotional connection with their as of yet not locally instantiated children.

I personally do happen to disagree with this reasoning for reasons I will explain later- but it does make sense.

To the contrary: your point hinges on this. You may of course discuss or not discuss what you like, but by avoiding this topic, you avoid one of the critical considerations in your whole edifice of reasoning. Your conclusion is unsupportable without committing to a position on this question.

It isn't, and I just told you that it isn't. You should have tried to understand why I was saying that before arguing with me- I'm the person who made the comment in the first place, and I just directly told you that you were misinterpreting me.

My point is: "It's that we should be spending most of our effort on planning for the long term future." See later for an elaboration.

Quite so—but surely this undermines your thesis, rather than supporting it?

No- I'm not actually arguing for the specific act of ensuring that future humans exist. I think that all humans already exist, perhaps in infinite supply, and I thus see (tentatively) zero value in bringing about future humans in and of itself. My first comment was using a rhetorical flair that was intended to convey my general strategy for planning for the future; I'm more interested in solving the AI alignment problem (and otherwise avoiding human extinction/s-risks) than I am about current politically popular long term planning efforts and the problems that they address, such as climate change and conservation efforts.

I think that we should be interested in manipulating the relative ratios (complicated stuff) of future humans, which means that we should still be interested in "ensuring the existence" (read: manipulating the ratios of different types of) of "Future Humanity", a nebulous phrase meant to convey the sort of outcome that I want to see to the value achievement dilemma. Personally, I think that the most promising plan for this is engineering an aligned AGI and supporting it throughout its recursive self improvement process.

Your response was kindof sour, so I'm not going to continue this conversation.

Comment by Dach on Industrial literacy · 2020-10-15T04:56:02.550Z · LW · GW

Why, exactly, is this our only job (or, indeed, our job at all)? Surely it’s possible to value present-day things, people, etc.?

The space that you can affect is your light cone, and your goals can be "simplified" to "applying your values over the space that you can affect", therefore your goal is to apply your values over your light cone. It's you're "only job".

There is, of course, a specific notion that I intended to evoke by using this rephrasing: the idea that your values apply strongly over humanity's vast future. It's possible to value present-day things, people, and so on- and I do. However... whenever I hear that fact in response to my suggestions that the future is large and it matters more than today, I interpret it as playing defense for their preexisting strategies. Everyone was aware of this before the person said it, and it doesn't address the central point- it's...

"There are 4 * 10^20 stars out there. You're in a prime position to make sure they're used for something valuable to you- as in, you're currently experiencing the top 10^-30% most influential hours of human experience because of your early position in human history, etc. Are you going to change your plans and leverage your unique position?" 

"No, I think I'll spend most of my effort doing the things I was already going to do."

Really- Is that your final answer? What position would you need to be in to decide that planning for the long term future is worth most of your effort?

Seeing as how future humanity (with capital letters or otherwise) does not, in fact, currently exist, it makes very little sense to say that ensuring their existence is something that we would be doing “for” them.

"Seeing as how a couple's baby does not yet exist, it makes very little sense to say that saving money for their clothes and crib is something that they would be doing 'for' them." No, wait, that's ridiculous- It does make sense to say that you're doing things "for" people who don't exist.

We could rephrase these things in terms of doing them for yourself- "you're only saving for their clothes and crib because you want them to get what they want". But, what are we gaining from this rephrasing? The thing you want is for them to get what they want/need. It seems fair to say that you're doing it for them.

There's some more complicated discussion to be had on the specific merits of making sure that people exist, but I'm not (currently) interested in having that discussion. My point isn't really related to that- it's that we should be spending most of our effort on planning for the long term future.

Also, in the context of artificial intelligence research, it's an open question as to what the border of "Future Humanity" is. "Existing humans" and "Future Humanity" probably have significant overlap, or so the people at MIRI, DeepMind, OpenAI, FHI, etc. tend to argue- and I agree.

Comment by Dach on Not Even Evidence · 2020-10-11T01:27:46.133Z · LW · GW

This doesn't require faster than light signaling. If you and the copy are sent way with identical letters, that you open after crossing each other's event horizons. You learn want was packed with your clone when you open your letter. Which lets you predict what your clone will find.

Nothing here would require the event of your clone seeing the letter to affect you. You are affected by the initial set up.

Another example would be if you learn a star that has crossed your cosmic event horizon was 100 solar masses, it's fair to infer that it will become a black hole and not a white dwarf.

If you can send a probe to a location, radiation, gravitational waves, etc. from that location will also (in normal conditions) be intercepting you, allowing you to theoretically make pretty solid inferences about certain future phenomena at that location. However, we let the probe fall out of our cosmological horizon- information is reaching it that couldn't/can't have reached the other probes, or even the starting position of that probe.

In this setup, you're gaining information about arbitrary phenomena. If you send a probe out beyond your cosmological horizon, there's no way to infer the results of, for example, non-entangled quantum experiments.

I think we may eventually determine the complete list of rules and starting conditions for the universe/multiverse/etc. Using our theory of everything and (likely) unobtainable amounts of computing power, we could (perhaps) uniquely locate our branch of the universal wave function (or similar) and draw conclusions about the outcomes of distant quantum experiments (and similar). That's a serious maybe- I expect that a complete theory of everything would predict infinitely many different instances of us in a way that doesn't allow for uniquely locating ourselves.

However... this type of reasoning doesn't look anything like that. If SSA/SSSA require us to have a complete working theory of everything in order to be usable, that's still invalidating for my current purposes.

For the record, I ran into a more complicated problem which turns out to be incoherent for similar reasons- namely, information can only propagate in specific ways, and it turns out that SSA/SSA allows you to draw conclusions about what your reference class looks like in ways that defy the ways in which information can propagate. 

You are affected by the initial set up. If the clone counterfactually saw something else, this wouldn't affect you according to SIA.

This specific hypothetical doesn't directly apply to the SIA- it relies on adjusting the relative frequencies of different types of observers in your reference class, which isn't possible using SIA. SIA still suffers from the similar problem of allowing you to draw conclusions about what the space of all possible observers looks like.

Comment by Dach on Not Even Evidence · 2020-10-06T22:01:20.229Z · LW · GW

I don't understand why you're calling a prior "inference". Priors come prior to inferences, that's the point.

SIA is not isomorphic to "Assign priors based on Kolmogorov Complexity". If what you mean by SIA is something more along the lines of "Constantly update on all computable hypotheses ranked by Kolmogorov Complexity", then our definitions have desynced.

Also, remember: you need to select your priors based on inferences in real life. You're a neural network that developed from scatted particles- your priors need to have actually entered into your brain at some point.

Regardless of whether your probabilities entered through your brain under the name of a "prior" or an "update", the presence of that information still needs to work within our physical models and their conclusions about the ways in which information can propagate.

SIA has you reason as if you were randomly selected from the set of all possible observers. This is what I mean by SIA, and is a distinct idea. If you're using SIA to gesture to the types of conclusions that you'd draw using Solomonoff Induction, I claim definition mismatch.

It clearly is unnecessary - nothing in your examples requires there to be tiling, you should give an example with a single clone being produced, complete with the priors SIA gives as well as your theory, along with posteriors after Bayesian updating. 

I specifically listed the point of the tiling in the paragraph that mentions tiling:

for you to agree that the fact you don't see a pink pop-up appear provides strong justified evidence that none of the probes saw <event x>

The point of that the tiling is, as I have said (including in the post), to manipulate the relative frequencies of actually existent observers strongly enough to invalidate SSA/SSSA in detail.

I don't see any such implications. You need to simplify and more fully specify your model and example. 

There's phenomena which your brain could not yet have been impacted by, based on the physical ways in which information propagates. If you think you're randomly drawn from the set of all possible observers, you can draw conclusions about what the set of all possible observers looks like, which is problematic.

I don't see any such implications. You need to simplify and more fully specify your model and example. 

Just to reiterate, my post isn't particularly about SIA. I showed the problem with SSA/SSSA- the example was specified for doing something else.

Comment by Dach on Industrial literacy · 2020-10-06T20:35:50.428Z · LW · GW

That's surprisingly close, but I don't think that counts. That page explains that the current dynamics behind phosphate recycling are bad as a result of phosphate being cheap- if phosphate was scarce, recycling (and potentially the location of new phosphate reserves, etc.) would become more economical.

Comment by Dach on Not Even Evidence · 2020-10-06T20:14:25.937Z · LW · GW

My formulation of those assumptions, as I've said, is entirely a prior claim. 

You can't gain non-local information using any method, regardless of the words or models you want to use to contain that information. 

If you agree with those priors and Bayes, you get those assumptions. 

You cannot reason as if you were selected randomly from the set of all possible observers. This allows you to infer information about what the set of all possible observers looks like, despite provably not having access to that information. There are practical implications of this, the consequences of which were shown in the above post with SSA.

You can't say that you accept the prior, accept Bayes, but reject the assumption without explaining what part of the process you reject. I think you're just rejecting Bayes, but the unnecessary complexity of your example is complicating the analysis. Just do Sleeping Beauty with the copies in different light cones. 

It's not a specific case of sleeping beauty. Sleeping beauty has meaningfully distinct characteristics.

This is a real world example that demonstrates the flaws with these methods of reasoning. The complexity is not unnecessary.

I'm asking for your prior in the specific scenario I gave. 

My estimate is 2/3rds for the 2-Observer scenario. Your claims that "priors come before time" makes me want to use different terminology for what we're talking about here. Your brain is a physical system and is subject to the laws governing other physical systems- whatever you mean by "priors coming before time" isn't clearly relevant to the physical configuration of the particles in your brain.

The fact that I execute the same Bayesian update with the same prior in this situation does not mean that I "get" SIA- SIA has additional physically incoherent implications.

Comment by Dach on Not Even Evidence · 2020-10-06T19:40:32.038Z · LW · GW

The version of the post I responded to said that all probes eventually turn on simulations. 

The probes which run the simulations of you without the pop-up run exactly one. The simulation is run "on the probe."

Let me know when you have an SIA version, please.

I'm not going to write a new post for SIA specifically- I already demonstrated a generalized problem with these assumptions.

The up until now part of this is nonsense - priors come before time. Other than that, I see no reason to place such a limitation on priors, and if you formalize this I can probably find a simple counterexample. What does it even mean for a prior to correspond to a phenomena?

Your entire brain is a physical system, it must abide by the laws of physics. You are limited on what your priors can be by this very fact- there is some stuff that the position of the particles in your brain could not have yet been affected by (by the very laws of physics).

The fact that you use some set of priors is a physical phenomenon. If human brains acquire information in ways that do not respect locality, you can break all of the rules, acquire infinite power, etc.

Up until now refers to the fact that the phenomena have, up until now, been unable to affect your brain.

I wrote a whole post trying to get people to look at the ideas behind this problem, see above. If you don't see the implication, I'm not going to further elaborate on it, sorry.

All SIA is doing is asserting events A, B, and C are equal prior probability. (A is living in universe 1 which has 1 observer, B and C are living in universe 2 with 2 observers and being the first and second observer respectively. B and C can be non-local.)

SIA is asserting more than events A, B, and C are equal prior probability.

Sleeping Beauty and these hypotheticals here are different- these hypotheticals make you observe something that is unreasonably unlikely in one hypothesis but very likely in another, and then show that you can't update your confidences in these hypothesis in the dramatic way demonstrated in the first hypothetical.

You can't change the number of possible observers, so you can't turn SIA into an FTL telephone. SIA still makes the same mistake that allows you to turn SSA/SSSA into FTL telephones, though. 

If you knew for a fact that something couldn't have had an impact, this might be valid. But in your scenarios, these could have had an impact, yet didn't. It's a perfectly valid update.

There really couldn't have been an impact. The versions of you that wake up and don't see pop-ups (and their brains) could not have been affected by what's going on with the other probes- they are outside of one another's cosmological horizon. You could design similar situations where your brain eventually could be affected by them, but you're still updating prematurely.

I told you the specific types of updates that you'd be allowed to make. Those are the only ones you can justifiably say are corresponding to anything- as in, are as the result of any observations you've made. If you don't see a pop-up, not all of the probes saw <event x>, your probe didn't see <event x>, you're a person who didn't see a pop-up, etc. If you see a pop-up, your assigned probe saw <event x>, and thus at least one probe saw <event x>, and you are a pop-up person, etc.

However, you can't do anything remotely looking like the update mentioned in the first hypothetical. You're only learning information about your specific probe's fate, and what type of copy you ended up being.

You should simplify to having exactly one clone created. In fact, I suspect you can state your "paradox" in terms of Sleeping Beauty - this seems similar to some arguments people give against SIA there, claiming one does not acquire new evidence upon waking. I think this is incorrect - one learns that one has woken in the SB scenario, which on SIA's priors leads one to update to the thirder position.

You can't simplify to having exactly one clone created. 

There is a different problem going on here than in the SB scenario. I mostly agree with the 1/3rds position- you're least inaccurate when your estimate for the 2-Observer scenario is 2/3rds. I don't agree with the generalized principle behind that position, though. It requires adjustments, in order to be more clear about what it is you're doing, and why you're doing it.

Comment by Dach on Not Even Evidence · 2020-10-06T18:06:59.902Z · LW · GW

If you reject both the SIA and SSA priors (in my example, SIA giving 1/3 to each of A, B, and C, and SSA giving 1/2 to A and 1/4 to B and C), then what prior do you give?

I reject these assumptions, not their priors. The actual assumptions and the methodology behind them have physically incoherent implications- the priors they assign may still be valid, especially in scenarios where it seems like there are exactly two reasonable priors, and they both choose one of them.

Whatever prior you give you will still end up updating as you learn information. There's no way around that unless you reject Bayes or you assert a prior that places 0 probability on the clones, which seems sillier than any consequences you're drawing out here.

The point is not that you're not allowed to have prior probabilities for what you're going to experience. I specifically placed a mark on the prior probability of what I expected to experience in the "What if..." section.

If you actually did the sleeping beauty experiment in the real world, it's very clear that "you would be right most often when you woke up" if you said you were in the world with two observers.

Comment by Dach on Not Even Evidence · 2020-10-06T07:00:18.451Z · LW · GW

Can you formulate this as a challenge to SIA in particular? You claim that it affects SIA, but your issue is with reference classes, and SIA doesn't care about your reference class. 

The point is that SIA similarly overextends its reach- it claims to make predictions about phenomena that could not yet have had any effect on your brain's operation, for reasons demonstrated with SSA in the example in the post.

Your probability estimates can only be affected by a pretty narrow range of stuff, in practice, and because SIA does not deliberately draw the line of all possible observers around "All possible observers which could have so far had impact on my probability estimates, as evidenced by the speed of light and other physical restrictions on the propagation of information", it unfortunately implies that your probability estimates are corresponding with things which, via physics, they can't be.

Briefly, "You cannot reason about things which could not yet have had an impact on your brain."

SSSA/SSA are more common, which is why I focused on them. For the record, I used an example in which SSSA and SSA predict exactly the same things. SIA doesn't predict the same thing here, but the problem that I gestured to is also present in SIA, but with a less laborious argument.

Your probe example is confusingly worded. You include time as a factor but say time doesn't matter. Can you reduce it to the simplest possible that still yields the paradoxical result you want? 

Yea, sorry- I'm still editing this post. I'll reword it tomorrow. I'm not sure if I'll remove that specific disclaimer, though.

We could activate the simulated versions of you at any time- whether or not the other members of your reference class are activated at  different times doesn't matter under standard usage of SIA/SSA/SSSA. I'm just including the extra information that the simulations are all spun up at the same time in case you have some weird disagreement with that, and in order to more closely match intuitive notions of identity.

I included that disclaimer because there's questions to be had about time- the probes are presumably in differently warped regions of spacetime, thus it's not so clear what it means to say these events are happening at the same time.

I don't think SIA says you should update in this manner, except very slightly. If I'm understanding your example correctly, all the probes end up tiling their light cones, so the number of sims is equal regardless of what happened. The worlds with fewer probes having seen x become slightly more likely than the prior, but no anthropic reasoning is needed to get that result. 

Only the probes which see <event x> end up tiling their light cones. The point is to change the relative frequencies of the members of your reference class. Because SSA/SSSA assume that you are randomly selected from your reference class, by shifting the relative frequencies of different future observations within your reference class SSA/SSSA imply you can gain information about arbitrary non-local phenomena. This problem is present even outside of this admittedly contrived hypothetical- this contrived hypothetical takes an extra step and turns the problem into an FTL telephone.

It doesn't seem that there's any way to put your hand on the scale of the number of possible observers, therefore (as previously remarked) this example doesn't apply to SIA. The notion that SIA is overextending its reach by claiming to make justified claims about things we can show (using physics) you cannot make justified claims about to still applies.

In general, I think of SIA as dictating our prior, while all updates are independent of anthropics. Our posterior is simply the SIA prior conditioned on all facts we know about our own existence. Roughly speaking, SSA represents a prior that we're equally likely to exist in worlds that are equally likely to exist, while SIA represents a prior that we're equally likely to be any two observers that are equally likely to exist. 

The problem only gets pushed back- we can also assert that your priors cannot be corresponding to phenomena which (up until now) have been non-local to you. I'm hesitant to say that you're not allowed to use this form of reasoning- in practice using SIA may be quite useful. However, it's just important to be clear that SIA does have this invalid implication.

Comment by Dach on Industrial literacy · 2020-10-05T22:48:38.835Z · LW · GW

It's possible, but very improbable. We have vastly more probable concerns (misaligned AGI, etc.) than resource depletion sufficient to cripple the entire human project.

What critical resources is Humanity at serious risk of depleting? Remember that most resources have substitutes- food is food.

Comment by Dach on Industrial literacy · 2020-10-05T21:48:07.341Z · LW · GW

Why do you seem to imply that burning fossil fuels would help at all the odds of the long term human project? 

I don't imply that. For clarification: 

I would waste any number of resources if that was what was best for the long-term prospects of Humanity. In practice, that means that I'm willing to sacrifice really really large amounts of resources that we won't be able to use until after we develop AGI or similar, in exchange for very very small increases to our probability of developing aligned AGI or similar.

Because I think we won't be able to use significant portions of most of the types of resources available on Earth before we develop AGI or similar, I'm willing to completely ignore conservation of those resources. I still care about the side effects of the process of gathering and using those resources, but...

The oil example isn't meant to be any reflection of my affinity for fossil fuels.

My point that "Super long term conservation of resources" isn't a concern. If there are near term non "conservation of resources" reasons why doing something is bad, I'm open to those concerns- we don't need to worry about ensuring that humans 100 years from now have access to fuel sources.

For the record, I think nuclear and solar seem to clearly be better energy sources than fossil fuels for most applications. Especially nuclear.

I'm also not fighting defense for climate change activists- I don't care about how many species die out, unless those species are useful (short term- next 50 years, 100 years max?) to us. If you want to make sure future humanity has access to Tropical Tree Frog #952, and you're concerned about them going extinct, go grab some genetic samples and preserve them. If the species makes many humans very happy, provides us valuable resources, etc., fine. 

At the current rate of fishing, all fish species could be practically extinct by 2050

I'm open to the notion that regulating our fish intake is the responsible move- it seems like a pretty easy sell. It keeps our fishing equipment, boats, and fishermen useful. I'm taking this action because it's better for humanity, not because it's better for the fish or better for the Earth.

The Strategy is not to excessively use resources and destroy the environment just because we can, it's to actively and directly use our resources to accomplish our goals, which I have doubts strongly aligns with preserving the environment. 

Let's list a few ways in which our conservation efforts are bad:

  • Long term (100+ years) storage of nuclear waste.
  • Protecting species which aren't really useful to Humanity.
  • Planning with the idea that we will be indefinitely (Or, for more than 100 years) living in the current technological paradigm, i.e. without artificial general intelligence.

And in which they're valid:

  • Being careful with our harvesting of easily depletable species which we'll be better off having alive for the next 100 years.
  • Being careful with our effect on global temperatures and water levels, in order to avoid the costs of relocating large numbers of humans.
  • Being careful with our management of important freshwater reserves, at least until we develop sufficiently economical desalinization plants.

I personally don't want to see my personal odds of survival diminishing because I'll have to deal with riots, food shortages, totalitarian fascist governments or... who know?

The greatest risks to your survival are, by far, (unless you're a very exceptional person) natural causes and misaligned artificial general intelligence. You shouldn't significantly concern yourself with dealing with weird risk factors such as riots or food shortages unless you've already found that you can't do anything about natural causes and misaligned artificial general intelligence. Spoiler: It seems you can do something about these risk factors.

Every economical estimate I saw said that the costs would be a lot less than the economic damage from climate change alone, many estimates agree that it would actually improve the economy, and nobody is saying "let's toss industry and technology out of the window, back to the caves everyone!".

Many people are saying things I consider dangerously close to "Let's toss industry and technology out of the window!". Dagon suggested that our current resource expenditure was reckless, and that we should substantially downgrade our resource expenditures. I consider this to be a seriously questionable perspective on the problem.

I'm not arguing against preserving the environment if it would boost the economy for at least the next 100 years, keeping in mind opportunity cost. I want to improve humanity's generalized power to pursue its goals- I'm not attached to any particular short guiding principle for doing this, such as "Protect the Earth!" or "More oil!". I don't have Mad Oil Baron Syndrome.

Comment by Dach on Industrial literacy · 2020-10-04T23:30:00.828Z · LW · GW

I suspect that if people really understood the cost to future people of the contortions we go through to support this many simultaneous humans in this level of luxury, we'd have to admit that we don't actually care about them very much.  I sympathize with those who are saying "go back to the good old days" in terms of cutting the population back to a sustainable level (1850 was about 1.2B, and it's not clear even that was sparse/spartan enough to last more than a few millennia).

There's enough matter in our light cone to support each individual existing human for roughly 10^44 years.

The problem is not "running out of resources"- there are so many resources it will require cosmic engineering for us to use more of them than entropy, even if we multiply our current population by ten billion.

Earth is only one planet- it does not matter how much of earth we use here and now. Our job is to make sure that our light cone ends up being used for what we find valuable. That's our only job. The finite resources available on earth are almost irrelevant to the long term human project, beyond the extent to which those resources help us accomplish our job- I would burn a trillion pacific oceans worth of oil for a .000000000000000001% absolute increase to our probability of succeeding at our job.

I sympathize with people who are thinking like this, because it shows that they're at least trying to think about the future. But... Future Humanity doesn't need the petty resources available on earth any more than we need good flint to make hunting spears with. The only important thing and the best thing we can do for them is to ensure that they will ever exist at all!

Comment by Dach on The Short Case for Verificationism · 2020-10-02T08:48:58.227Z · LW · GW

The existence of places like LessWrong, philosophy departments, etc, indicate that people do have some sort of goal to understand things in general, aside from any nitpicking about what is a true terminal value.

I agree- lots of people (including me, of course) are learning because they want to- not as part of some instrumental plan to achieve their other goals. I think this is significant evidence that we do terminally value learning. However, the way that I personally have the most fun learning is not the way that is best for cultivating a perfect understanding of reality (nor developing the model which is most instrumentally efficient, for that matter). This indicates that I don't necessarily want to learn so that I can have the mental model that most accurately describes reality- I have fun learning for complicated reasons which I don't expect align with any short guiding principle.

Also, at least for now, I get basically all of my expected value from learning from my expectations for being able to leverage that knowledge. I have a lot more fun learning about e.g. history than the things I actually spend my time on, but historical knowledge isn't nearly as useful, so I'm not spending my time on it.

In retrospect, I should've said something more along the lines of "We value understanding in and of itself, but (at least for me, and at least for now) most of the value in our understanding is from its practical role in the advancement of our other goals."

I've already stated than I am not talking about confirming specific models.

There's been a mix-up here- my meaning for "specific" also includes "whichever model corresponds to reality the best"

Comment by Dach on The Short Case for Verificationism · 2020-10-02T02:18:57.351Z · LW · GW

E.g. "maybe you're in an asylum" assumes that it's possible for an asylum to "exist" and for someone to be in it, both of which are meaningless under my worldview. 

What do you mean by "reality"? You keep using words that are meaningless under my worldview without bothering to define them. 

You're implementing a feature into your model which doesn't change what it predicts but makes it less computationally efficient.

The fact you're saying "both of which are meaningless under my worldview" is damning evidence that your model (or at least your current implementation of your model) sucks, because that message transmits useful information to someone using my model but apparently has no meaning in your model. Ipso facto, my model is better. There's no coherent excuse for this.

This isn't relevant to the truth of verificationism, though. My argument against realism is that it's not even coherent. If it makes your model prettier, go ahead and use it.

What does it mean for your model to be "true"? There are infinitely many unique models which will predict all evidence you will ever receive- I established this earlier and you never responded.

It's not about making my model "prettier"- my model is literally better at evoking the outcomes that I want to evoke. This is the correct dimension on which to evaluate your model.

You'll just run into trouble if you try doing e.g. quantum physics and insist on realism - you'll do things like assert there must be loopholes in Bell's theorem, and search for them and never find them. 

My preferred interpretation of quantum physics (many worlds) was made before bell's theorem, and it turns out that bell's theorem is actually strong evidence in favor of many worlds. Bell's theorem does not "disprove realism", it just disproves hidden variable theories. My interpretation already predicted that.

I suspect this isn't going anywhere, so I'm abdicating.

Comment by Dach on The Short Case for Verificationism · 2020-10-01T22:22:16.172Z · LW · GW

This is false. I actually have no idea what it would mean for an experience to be a delusion - I don't think that's even a meaningful statement.

I'm comfortable with the Cartesian argument that allows me to know that I am experiencing things.

Everything you're thinking is compatible with a situation in which you're actually in a simulation hosted in some entirely alien reality (2 + 2 = 3, experience is meaningless, causes follow after effects, (True ^ True) = False, etc, which is being manipulated in extremely contrived ways which produce your exact current thought processes.

There are an exhausting number of different riffs on this idea- maybe you're in an asylum and all of your thinking including "I actually have no idea what it would mean for an experience to be a delusion" is due to some major mental disorder. Oh, how obvious- my idea of experience was a crazy delusion all along. I can't believe I said that it was my daughter's arm. "I think therefore I am"? Absurd!

If you have an argument against this problem, I am especially interested in hearing it- it seems like the fact you can't tell between this situation and reality (and you can't know whether this situation is impossible as a result, etc.) is part of the construction of the scenario. You'd need to show that the whole idea that "We can construct situations in which you're having exactly the same thoughts as you are right now, but with some arbitrary change (Which you don't even need to believe is theoretically possible or coherent) in the background" is invalid.

Do I think this is a practical concern? Of course not. The Cartesian argument isn't sufficient to convince me, though- I'm just assuming that I really exist and things are broadly as they seem. I don't think it's that plausible to expect that I would be able to derive these assumptions without using them- there is no epistemological rock bottom.

On the contrary, it's the naive realist model that doesn't pay rent by not making any predictions at all different from my simpler model.

Your model is (I allege) not actually simpler. It just seems simpler because you "removed something" from it. A mind could be much "simpler" than ours, but also less useful- which is the actual point of having a simpler model. The "simplest" model which accurately predicts everything we see is going to be a fundamental physical theory, but making accurate predictions about complicated macroscopic behavior entirely from first principles is not tractable with eight billion human brains worth of hardware.

The real question of importance is, does operating on a framework which takes specific regular notice of the idea that naïve realism is technically a floating belief increase your productivity in the real world? I can't see why that would be the case- it requires occasionally spending my scare brainpower on reformatting my basic experience of the world in more complicated terms, I have to think about whether or not I should argue with someone whenever they bring up the idea of naïve realism, etc. You claim adopting the "simpler" model doesn't change your predictions, so I don't see what justifies these costs. Are there some major hidden costs of naïve realism that I'm not aware of? Am I actually wasting more unconscious brainpower working with the idea of "reality" and things "really existing"?

If I have to choose between two models which make the exact same predictions (i.e. my current model and your model), I'm going to choose between the model which is better at achieving my goals. In practice, this is the more computationally efficient model, which (I allege) is my current model.

Comment by Dach on The Short Case for Verificationism · 2020-10-01T03:12:47.803Z · LW · GW

Refer to my disclaimer for the validity of the idea of humans having terminal values. In the context of human values, I think of "terminal values" as the ones directly formed by evolution and hardwired into our brains, and thus broadly shared. The apparent exceptions are rarish and highly associated with childhood neglect and brain damage.

"Broadly shared" is not a significant additional constraint on what I mean by "terminal value", it's a passing acknowledgement of the rare counterexamples.

If that's your argument then we somewhat agree. I'm saying that the model you should use is the model that most efficiently pursues your goals, and (in response to your comment) that utility schemes which terminally value having specific models (and thus whose goals are most efficiently pursued through using said arbitrary terminally valued model and not a more computationally efficient model) are not evidently present among humans in great enough supply for us to expect that that caveat applies to anyone who will read any of these comments.

Real world examples of people who appear at first glance to value having specific models (e.g. religious people) are pretty sketchy- if this is to be believed, you can change someone's terminal values with the argumentative equivalent of a single rusty musket ball and a rubber band. That defies the sort of behaviors we'd want to see from whatever we're defining as a "terminal value", keeping in mind the inconsistencies between the way human value systems are structured and the way the value systems of hypothetical artificial intelligences are structured. 

The argumentative strategy required to convince someone to ignore instrumentally unimportant details about the truth of reality looks more like "have a normal conversation with them" than "display a series of colorful flashes as a precursor to the biological equivalent of arbitrary code execution" or otherwise psychologically breaking them in a way sufficient to get them to do basically anything, which is what would be required to cause serious damage to what I'm talking about when I say "terminal values" in the context of humans.

Comment by Dach on The Short Case for Verificationism · 2020-09-30T19:48:00.326Z · LW · GW

It's also true for "I terminally value understanding the world, whatever the correct model is".

I said e.g, not i.e, and "I terminally value understanding the world, whatever the correct model is" is also a case of trivial values. 

First, a disclaimer: It's unclear how well the idea of terminal/instrumental values maps to human values. Humans seem pretty prone to value drift- whenever we decide we like some idea and implement it, we're not exactly "discovering" some new strategy and then instrumentally implementing it. We're more incorporating the new strategy directly into our value network. It's possible (Or even probable) that our instrumental values "sneak in" to our value network and are basically terminal values with (usually) lower weights.

Now, what would we expect to see if "Understanding the world, whatever the correct model is" was a broadly shared terminal value in humans, in the same way as the other prime suspects for terminal value (survival instinct, caring for friends and family, etc)? I would expect:

  1. It's exhibited in the vast majority of humans, with some medium correlation between intelligence and the level to which this value is exhibited. (Strongly exhibiting this value tends to cause greater effectiveness i.e. intelligence, but most people already strongly exhibit this value)
  2. Companies to have jumped on this opportunity like a pack of wolves and have designed thousands of cheap wooden signs with phrases like "Family, love, 'Understanding the world, whatever the correct model is'".
  3. Movements which oppose this value are somewhat fringe and widely condemned.
  4. Most people who espouse this value are not exactly sure where it's from, in the same way they're not exactly sure where their survival instinct or their love for their family came from.

But, what do we see in the real world?

  1. Exhibiting this value is highly correlated with intelligence. Almost everyone lightly exhibits this value, because its practical applications are pretty obvious (Pretending your mate isn't cheating on you is just plainly a stupid strategy), but it's only strongly and knowingly exhibited among really smart people interested in improving their instrumental capabilities.
  2. Movements which oppose this value are common. 
  3. Most people who espouse this value got it from an intellectual tradition, some wise counseling, etc.
Comment by Dach on The Short Case for Verificationism · 2020-09-30T00:42:38.305Z · LW · GW

This is only true for trivial values, e.g. "I terminally value having this specific world model".

For most utility schemes (Including, critically, that of humans), the supermajority of the purpose of models and beliefs is instrumental. For example, making better predictions, using less computing power, etc.

In fact, humans who do not recognize this fact and stick to beliefs or models because they like them are profoundly irrational. If the sky is blue, I wish to believe the sky is blue, and so on. So, assuming that only prediction is valuable is not question begging- I suspect you already agreed with this and just didn't realize it.

In the sense that beliefs (and the models they're part of) are instrumental goals, any specific belief is "unnecessary". Note the quotations around "unnecessary" in this comment and the comment you're replying to. By "unnecessary" I mean the choice of which beliefs and which model to use is subject to the whims of which is more instrumentally valuable- in practice, a complex tradeoff between predictive accuracy and computational demands.

Comment by Dach on The Short Case for Verificationism · 2020-09-29T02:42:16.369Z · LW · GW

It's a well known tragedy that (unless Humanity gains a perspective on reality far surpassing my wildest expectations) there are arbitrarily many nontrivially unique theories which correspond to any finite set of observations.

The practical consequence of this (A small leap, but valid) is that we can remove any idea you have and make exactly the same predictions about sensory experiences by reformulating our model. Yes, any idea. Models are not even slightly unique- the idea of anything "really existing" is "unnecessary", but literally every belief is "unnecessary". I'd expect some beliefs would, for the practical purposes of present-day-earth human brains, be impossible to replace, but I digress.

(Joke: what's the first step of more accurately predicting your experiences? Simplifying your experiences! Ahaha!)

You cannot "know" anything, because you're experiencing exactly the same thing as you could possibly be experiencing if you were wrong. You can't "know" that you're either wrong or right, or neither, you can't "know" that you can't "know" anything, etc. etc. etc.

There are infinitely many different ontologies which support every single piece of information you have ever or will ever experience.

In fact, no experience indicates anything- we can build a theory of everything which explains any experience but undermines any inferences made using it, and we can do this with a one-to-one correspondence to theories that support that inference. 

In fact, there's no way to draw the inference that you're experiencing anything. We can build infinitely many models (Or, given the limits on how much matter you can store in a Hubble volume, an arbitrarily large but finite number of models) in which the whole concept of "experience" is explained away as delusion...

And so on!

The main point of making beliefs pay rent is having a more computationally efficient model- doing things more effectively. Is your reformulation more effective than the naïve model? No.

Your model, and this whole line of thought, is not paying rent.

Comment by Dach on Dach's Shortform · 2020-09-24T18:29:14.016Z · LW · GW

Right, that isn't an exhaustive list. I included the candidates which seemed most likely.

So, I think superintelligence is unlikely in general- but so is current civilization. I think superintelligences have a high occurrence rate given current civilization (for lots of reasons), which also means that current civilization isn't that much more likely than superintelligence. It's more justified to say "Superintelligences which make human minds" have a super low occurrence rate relative to natural examples of me and my environment, but that still seems to be an unlikely explanation.

Based on the "standard" discussion on this topic, I get the distinct impression that the probability our civilization will construct an aligned superintelligence is significantly greater than, for example, 10^-20%, and the large amounts of leverage that a superintelligence would have (There's lots of matter out there) would produce this same effect.

Comment by Dach on Dach's Shortform · 2020-09-24T07:17:48.785Z · LW · GW

(2020 - 10 - 03) EDIT: I have found the solution: the way I was thinking about identity turns out to be silly.

In general, if you update your probability estimates of non-local phenomenon based on anthropic arguments, you're (probably? I'm sure someone has come up with smart counterexamples) doing something that includes the sneaky implication that you're conducting FTL communication. I consider this to be a reductio ad absurdum on the whole idea of updating your probability estimates of non-local phenomena based on anthropic arguments, regardless of the validity of the specific scenario in this post.

If you conduct some experiment which tries to determine which world you're in, and you observe x thing, you haven't learned (at least, in general) anything about what percentage of exact copies of you before you did the experiment observed what you observed.

If you do update and If you claim the update you're making corresponds to reality, you're claiming that non-local facts are having a justified influence on you. Whenever you put it like that, it's very silly. By adjusting the non-local worlds, we can change this justified influence on you (otherwise your update does not correspond to reality), and we have FTL signaling.

The things you're experiencing aren't any evidence about the sorts of things that most exact copies of your brain are experiencing, and if you claim it is you're unknowingly claiming FTL communication is possible, and that you're doing it right now.

I'll need to write something more substantial about this problem.

(End Edit)


(This bit is edited to redact irrelevant/obviously-wrong-in-hindsight information)


So, let us imagine our universe is "big" in the sense of many worlds, and all realities compatible with the universal wavefunction are actualized- or at least something close to that. This seems increasingly likely to me.

Aligned superintelligences might permute through all possible human minds, for various reasons. They might be especially interested in permuting through minds which are thinking about the AI alignment problem- also for various reasons. 

As a result, it's not evident to me for normal reasons that most of the "measure" of me-right-now flows into "normal" things- it seems plausible (on a surface level- I have some arguments against this) that most of the "measure" of me should be falling into Weird Stuff. Future superintelligences control the vast majority of all of the measure of everything (more branches in the future, more future in general, etc.), and they're probably more interested in building minds from scratch and then doing weird things with them.

If, among all of the representations of my algorithm in reality, (100%) * (1 - 10^-30) of my measure was "normal stuff", I'd still expect to be "diverted" basically instantly, if we assume there's one opportunity for a "diversion" every planck second.

However, this is, of course, not happening. We are constantly avoiding waking up from the simulation.

Possible explanations:

  • The world is small. This seems unlikely- look at these conditions:
  1. Many worlds is wrong.
  2. The universe is finite and not arbitrarily large.
  3. There's no multiverse, or the multiverse is small, or other universes all have properties which mean they don't support the potential for diversion. e.g. their laws are sufficiently different where none of them will contain human algorithms in great supply,
  4. There's no way to gain infinite energy, or truly arbitrarily large amounts of energy.
  • The sum of "Normal universe simulations" vastly outweighs the sum of "continuity of experience stealing simulations", for some reason. Maybe there are lots of different intelligent civilizations in the game, and lots of us spawn general universe simulators, and we also tend to cut simulations when other superintelligences arrive.
  • Superintelligences are taking deliberate action to prevent "diversion" tactics from being effective, or predict that other superintelligences are taking these actions. For example, if I don't like the idea of diverting people, I might snipe for recently diverted sentients and place them back in a simulation consistent with a "natural" environment.
  • "Diversion" as a whole isn't possible, and my understanding of how identity and experience work is sketchy.
  • Some other sort of misunderstanding hidden in my assumptions or premise.

(Apply the comments in the edit above to the original post. If I think that the fact I'm not observing myself being shoved into a simulation is evidence that most copies of my algorithm throughout reality are not being shoved into simulations, I also need to think that the versions of me which don't get shoved into simulations are justifiably updating in correspondence with facts of arbitrary physical separation from themselves, thus FTL signaling. Or, even worse, inter-universe signaling.)

Comment by Dach on Making the Monte Hall problem weirder but obvious · 2020-09-18T00:45:28.514Z · LW · GW

Amusing anecdote: I once tried to give my mother intuition behind Monte Hall with a process similar to this. She didn't quite get it, so I played the game with her a few times. Unfortunately, she won more often when she stayed than when she switched (n ~= 10), and decided that I was misremembering. A lesson was learned, but not by the person I had intended.

Comment by Dach on Why haven't we celebrated any major achievements lately? · 2020-09-10T12:36:59.480Z · LW · GW

Scientific and industrial progress is an essential part of modern life. The opening of a new extremely long suspension bridge would be entirely unsurprising- If it was twice the length of the previous longest, I might bother to read a short article about it. I would assume there would be some local celebration (Though not too much- if it was too well received, why did we not do it before?), but it would not be a turning point in technology or a grand symbol of man's triumph over nature. We've been building huge awe inspiring structures for quite some time by now, and the awe has worn off. Innovation and progress is normal.

Celebration in terms of "Bells are ringing and the people are weeping and philosophizing" requires complete upsets. Reusable rockets, manned missions to mars, a COVID-19 vaccine, etc- those are all part of the current state of affairs. If humanity wants these things, and has the time, I know they will come.

Comment by Dach on Open & Welcome Thread - August 2020 · 2020-09-04T10:01:15.647Z · LW · GW

So it definitely seems plausible for a reward to be flipped without resulting in the system failing/neglecting to adopt new strategies/doing something weird, etc.

I didn't mean to imply that a signflipped AGI would not instrumentally explore.

I'm saying that, well... modern machine learning systems often get specific bonus utility for exploring, because it's hard to explore the proper amount as an instrumental goal due to the difficulties of fully modelling the situation, and because systems which don't have this bonus will often get stuck in local maximums.

Humans exhibit this property too. We have investigating things, acquiring new information, and building useful strategic models as a terminal goal- we are "curious".

This is a feature we might see in early stages of modern attempts at full AGI, for similar reasons to why modern machine learning systems and humans exhibit this same behavior.

Presumably such features would be built to uninstall themselves after the AGI reaches levels of intelligence sufficient to properly and fully explore new strategies as an instrumental goal to satisfying the human utility function, if we do go this route.

If we sign flipped the amount of reward the AGI gets from such a feature, the AGI would be penalized for exploring new strategies- this may have any number of effects which are fairly implementation specific and unpredictable. However, it probably wouldn't result in hyperexistential catastrophe. This AI, providing everything else works as intended, actually seems to be perfectly aligned. If performed on a subhuman seed AI, it may brick- in this trivial case, it is neither aligned nor misaligned- it is an inanimate object.

Yes, an AGI with a flipped utility function would pursue its goals with roughly the same level of intelligence.

The point of this argument is super obvious, so you probably thought I was saying something else. I'm going somewhere with this, though- I'll expand later.

Comment by Dach on Open & Welcome Thread - August 2020 · 2020-09-03T11:04:02.022Z · LW · GW

Interesting analogy. I can see what you're saying, and I guess it depends on what specifically gets flipped. I'm unsure about the second example; something like exploring new strategies doesn't seem like something an AGI would terminally value. It's instrumental to optimising the reward function/model, but I can't see it getting flipped with the reward function/model.

Sorry, I meant instrumentally value. Typo. Modern machine learning systems often require a specific incentive in order to explore new strategies and escape local maximums. We may see this behavior in future attempts at AGI, And no, it would not be flipped with the reward function/model- I'm highlighting that there is a really large variety of sign flip mistakes and most of them probably result in paperclipping.

My thinking was that a signflipped AGI designed as a positive utilitarian (i.e. with a minimum at 0 human utility) would prefer paperclipping to torture because the former provides 0 human utility (as there aren't any humans), whereas the latter may produce a negligible amount. I'm not really sure if it makes sense tbh.

Paperclipping seems to be negative utility, not approximately 0 utility. It involves all the humans being killed and our beautiful universe being ruined. I guess if there are no humans, there's no utility in some sense, but human values don't actually seem to work that way. I rate universes where humans never existed at all and

I'm... not sure what 0 utility would look like. It's within the range of experiences that people experience on modern-day earth- somewhere between my current experience and being tortured. This is just definition problems, though- We could shift the scale such that paperclipping is zero utility, but in that case, we could also just make an AGI that has a minimum at paperclipping levels of utility.

Even if we engineered it carefully, that doesn't rule out screw-ups. We need robust failsafe measures just in case, imo.

In the context of AI safety, I think "robust failsafe measures just in case" is part of "careful engineering". So, we agree!

You'd still need to balance it in a way such that the system won't spend all of its resources preventing this thing from happening at the neglect of actual human values, but that doesn't seem too difficult.

I read Eliezer's idea, and that strategy seems to be... dangerous. I think that "Giving an AGI a utility function which includes features which are not really relevant to human values" is something we want to avoid unless we absolutely need to.

I have much more to say on this topic and about the rest of your comment, but it's definitely too much for a comment chain. I'll make an actual post on this containing my thoughts sometime in the next week or two, and link it to you.

Comment by Dach on Open & Welcome Thread - August 2020 · 2020-09-02T20:25:32.683Z · LW · GW

I'm slightly confused by this one. If we were to design the AI as a strict positive utilitarian (or something similar), I could see how the worst possible thing to happen to it would be no human utility (i.e. paperclips). But most attempts at an aligned AI would have a minimum at "I have no mouth, and I must scream". So any sign-flipping error would be expected to land there.

It's hard to talk in specifics because my knowledge on the details of what future AGI architecture might look like is, of course, extremely limited.

As an almost entirely inapplicable analogy (which nonetheless still conveys my thinking here): consider the sorting algorithm for the comments on this post. If we flipped the "top-scoring" sorting algorithm to sort in the wrong direction, we would see the worst-rated posts on top, which would correspond to a hyperexistential disaster. However, if we instead flipped the effect that an upvote had on the score of a comment to negative values, it would sort comments which had no votes other than the default vote assigned on posting the comment to the top. This corresponds to paperclipping- it's not minimizing the intended function, it's just doing something weird.

If we inverted the utility function, this would (unless we take specific measures to combat it like you're mentioning) lead to hyperexistential disaster. However, if we invert some constant which is meant to initially provide value for exploring new strategies while the AI is not yet intelligent enough to properly explore new strategies as an instrumental goal, the AI would effectively brick itself. It would place negative value on exploring new strategies, presumably including strategies which involve fixing this issue so it can acquire more utility and strategies which involve preventing the humans from turning it off. If we had some code which is intended to make the AI not turn off the evolution of the reward model before the AI values not turning off the reward model for other reasons (e.g. the reward model begins to properly model how humans don't want the AI to turn the reward model evolution process off), and some crucial sign was flipped which made it do the opposite, the AI would freeze the process of the reward model being updated and then maximize whatever inane nonsense its model currently represented, and it would eventually run into some bizarre previously unconsidered and thus not appropriately penalized strategy comparable to tiling the universe with smiley faces, i.e. paperclipping.

These are really crude examples, but I think the argument is still valid. Also, this argument doesn't address the core concern of "What about the things which DO result in hypexistential disaster", it just establishes that much of the class of mistake you may have previously thought usually or always resulted in hyperexistential disaster (sign flips on critical software points) in fact usually causes paperclipping or the AI bricking itself.

If we were to design the AI as a strict positive utilitarian (or something similar), I could see how the worst possible thing to happen to it would be no human utility (i.e. paperclips).

Can you clarify what you mean by this? Also, I get what you're going for, but paperclips is still extremely negative utility because it involves the destruction of humanity and the reconfiguration of the universe into garbage.

Perhaps there'll be a reward function/model intentionally designed to disvalue some arbitrary "surrogate" thing in an attempt to separate it from hyperexistential risk. So "pessimizing the target metric" would look more like paperclipping than torture. But I'm unsure as to (1) whether the AGI's developers would actually bother to implement it, and (2) whether it'd actually work in this sort of scenario.

I sure hope that future AGI developers can be bothered to embrace safe design!

Also worth noting is that an AGI based on reward modelling is going to have to be linked to another neural network, which is going to have constant input from humans. If that reward model isn't designed to be separated in design space from AM, someone could screw up with the model somehow.

The reward modelling system would need to be very carefully engineered, definitely.

If we were to, say, have U = V + W (where V is the reward given by the reward model and W is some arbitrary thing that the AGI disvalues, as is the case in Eliezer's Arbital post that I linked,) a sign flip-type error in V (rather than a sign flip in U) would lead to a hyperexistential catastrophe.

I thought this as well when I read the post. I'm sure there's something clever you can do to avoid this but we also need to make sure that these sorts of critical components are not vulnerable to memory corruption. I may try to find a better strategy for this later, but for now I need to go do other things.

I think this is somewhat likely to be the case, but I'm not sure that I'm confident enough about it. Flipping the direction of updates to the reward model seems harder to prevent than a bit flip in a utility function, which could be prevent through error-correcting code memory (as you mentioned earlier.)

Sorry, I meant to convey that this was a feature we're going to want to ensure that future AGI efforts display, not some feature which I have some other independent reason to believe would be displayed. It was an extension of the thought that "Our method will, ideally, be terrorist proof."

Comment by Dach on Open & Welcome Thread - August 2020 · 2020-09-02T08:25:35.062Z · LW · GW

You can't really be accidentally slightly wrong. We're not going to develop Mostly Friendly AI, which is Friendly AI but with the slight caveat that it has a slightly higher value on the welfare of shrimp than desired, with no other negative consequences. The molecular sorts of precision needed to get anywhere near the zone of loosely trying to maximize or minimize for anything resembling human values will probably only follow from a method that is converging towards the exact spot we want it to be at, such as some clever flawless version of reward modelling.

In the same way, we're probably not going to accidentally land in hyperexistential disaster territory. We could have some sign flipped, our checksum changed, and all our other error-correcting methods (Any future seed AI should at least be using ECC memory, drives in RAID 10, etc.) defeated by religious terrorists, cosmic rays, unscrupulous programmers, quantum fluctuations, etc. However, the vast majority of these mistakes would probably buff out or result in paper-clipping. If an FAI has slightly too high of a value assigned to the welfare of shrimp, it will realize this in the process of reward modelling and correct the issue. If its operation does not involve the continual adaptation of the model that is supposed to represent human values, it's not using a method which has any chance of converging to Overwhelming Victory or even adjacent spaces for any reason other than sheer coincidence.

A method such as this has, barring stuff which I need to think more about (stability under self-modification), no chance of ending up in a "We perfectly recreated human values... But placed an unreasonably high value on eating bread! Now all the humans will be force-fed bread until the stars burn out! Mwhahahahaha!" sorts of scenarios. If the system cares about humans being alive enough to not reconfigure their matter into something else, we're probably using a method which is innately insulated from most types of hyperexistential risk.

It's not clear that Gwern's example, or even that category of problem, is particularly relevant to this situation. Most parallels to modern-day software systems and the errors they are prone to are probably best viewed as sobering reminders, not specific advice. Indeed, I suspect his comment was merely a sobering reminder and not actual advice. If humans are making changes to the critical software/hardware of an AGI (And we'll assume you figured out how to let the AGI allow you to do this in a way that has no negative side effects), while that AGI is already running, something bizarre and beyond my abilities of prediction is already happening. If you need to make changes after you turn your AGI on, you've already lost. If you don't need to make changes and you're making changes, you're putting humanity in unnecessary risk. At this point, if we've figured out how to assist the seed AI in self-modification, at least until the point at which it can figure out how to do stable self-modification for itself, the problem is already solved. There's more to be said here, but I'll refrain for the purpose of brevity.

Essentially, we can not make any ordinary mistake. The type of mistake we would need to make in order to land up in hyperexistential disaster territory would, most likely, be an actual, literal sign flip scenario, and such scenarios seem much easier to address. There will probably only be a handful of weak points for this problem, and those weak points are all already things we'd pay extra super special attention to and will engineer in ways which make it extra super special sure nothing goes wrong. Our method will, ideally, be terrorist proof. It will not be possible to flip the sign of the utility function or the direction of the updates to the reward model, even if several of the researchers on the project are actively trying to sabotage the effort and cause a hyperexistential disaster.

I conjecture that most of the expected utility gained from combating the possibility of a hyperexistential disaster lies in the disproportionate positive effects on human sanity and the resulting improvements to the efforts to avoid regular existential disasters, and other such side-benefits.

None of this is intended to dissuade you from investigating this topic further. I'm merely arguing that a hyperexistential disaster is not remotely likely- not that it is not a concern. The fact that people will be concerned about this possibility is an important part of why the outcome is unlikely.

Comment by Dach on Open & Welcome Thread - August 2020 · 2020-08-30T13:58:46.671Z · LW · GW

If you're having significant anxiety from imagining some horrific I-have-no-mouth-and-I-must-scream scenario, I recommend that you multiply that dread by a very, very small number, so as to incorporate the low probability of such a scenario. You're privileging this supposedly very low probability specific outcome over the rather horrifically wide selection of ways AGI could be a cosmic disaster.

This is, of course, not intended to dismay you from pursuing solutions to such a disaster.

Comment by Dach on Likelihood of hyperexistential catastrophe from a bug? · 2020-06-18T20:27:40.715Z · LW · GW

In this specific example, the error becomes clear very early on in the training process. The standard control problem issues with advanced AI systems don't apply in that situation.

As for the arms race example, building an AI system of that sophistication to fight in your conflict is like building a Dyson Sphere to power your refrigerator. Friendly AI isn't the sort of thing major factions are going to want to fight with each other over. If there's an arm's race, either something delightfully improbable and horrible has happened, or it's an extremely lopsided "race" between a Friendly AI faction and a bunch of terrorist groups.

EDIT (From two months in the future...): I am not implying that such a race would be an automatic win, or even a likely win, for said hypothesized Friendly AI faction. For various reasons, this is most certainly not the case. I'm merely saying that the Friendly AI faction will have vastly more resources than all of its competitors combined, and all of its competitors will be enemies of the world at large, etc.

Addressing this whole situation would require actual nuance. This two month old throw away comment is not the place to put that nuance. And besides, it's been done before.