Comment by donald-hobson on Is "physical nondeterminism" a meaningful concept? · 2019-06-16T21:46:54.112Z · score: 2 (2 votes) · LW · GW

You can certainly get anthropic uncertainty in a universe that allows you to be duplicated. In a universe that duplicates, and the duplicates can never interact, we would see the appearance of randomness. Mathematically, randomness is defined in terms of the set of all possibilities.

An ontology that allows universes to be intrinsically random seems well defined. However, it can be considered as a syntactic shortcut for describing universes that are anthropically random.

Comment by donald-hobson on Unknown Unknowns in AI Alignment · 2019-06-14T09:29:58.874Z · score: 18 (7 votes) · LW · GW

If you add adhoc patches until you can't imagine any way for it to go wrong, you get a system that is too complex to imagine. This is the "I can't figure out how this fails" scenario. It is going to fail for reasons that you didn't imagine.

If you understand why it can't fail, for deep fundamental reasons, then its likely to work.

This is the difference between the security mindset and ordinary paranoia. The difference between adding complications until you can't figure out how to break the code, and proving that breaking the code is impossible (assuming the adversary can't get your one time pad, its only used once, your randomness is really random, your adversary doesn't have anthropic superpowers ect).

I would think that the chance of serious failure in the first scenario was >99%, and in the second, (assuming your doing it well and the assumptions you rely on are things you have good reason to believe) <1%

Comment by donald-hobson on Cryonics before natural death. List of companies? · 2019-06-13T16:19:14.339Z · score: 1 (1 votes) · LW · GW

Cryonics is a sufficiently desperate last grasp at life, one with a fairly small chance of success, that I'm not sure that this is a good idea. It would be a good idea if you had a disease that would make you brain dead, and then kill you.

It might be a good idea if your expect any life conditional on revival to be Really good. It would also depend on how much Alzheimers destroyed personality rather than shutting it down. (has the neural structure been destroyed, or is it sitting in the brain but not working?)

Comment by donald-hobson on Let's talk about "Convergent Rationality" · 2019-06-13T16:10:37.703Z · score: 3 (2 votes) · LW · GW

I would say that there are some kinds of irrationality that will be self modified or subagented away, and others that will stay. A CDT agent will not make other CDT agents. A myopic agent, one that only cares about the next hour, will create a subagent that only cares about the first hour after it was created. (Aeons later it will have taken over the universe and put all the resources into time-travel and worrying that its clock is wrong.)

I am not aware of any irrationality that I would consider to make a safe, useful and stable under self modification - subagent creation.

Comment by donald-hobson on Newcomb's Problem: A Solution · 2019-05-27T08:19:53.627Z · score: 1 (1 votes) · LW · GW

This is pretty much the standard argument for one boxing.

Comment by donald-hobson on Is AI safety doomed in the long term? · 2019-05-27T08:13:53.667Z · score: 1 (1 votes) · LW · GW

Obviously, if one side has a huge material advantage, they usually win. I'm also not sure if biomass is a measure of success.

Comment by donald-hobson on Is AI safety doomed in the long term? · 2019-05-27T08:10:28.344Z · score: 1 (1 votes) · LW · GW

You stick wires into a human brain. You connect it up to a computer running a deep neural network. You optimize this network using gradient decent to maximize some objective.

To me, it is not obvious why the neural network copies the values out of the human brain. After all, figuring out human values even given an uploaded mind is still an unsolved problem. You could get an UFAI with a meat robot. You could get an utter mess, thrashing wildly and incapable of any coherent thought. Evolution did not design the human brain to be easily upgradable. Most possible arrangements of components are not intelligences. While there is likely to be some way to upgrade humans and preserve our values, I'm not sure how to find it without a lot of trial and error. Most potential changes are not improvements.

Comment by donald-hobson on Is AI safety doomed in the long term? · 2019-05-26T09:49:24.929Z · score: 2 (2 votes) · LW · GW

If you put two arbitrary intelligence in the same world, the smarter one will be better at getting what it wants. If the intelligence want incompatible things, the lesser intelligence is stuck.

However, we get to make the AI. We can't hope to control or contain an arbitrary AI, but we don't have to make an arbitrary AI. We can make an AI that wants exactly what we want. AI safety is about making an AI that would be safe even if omnipotent. If any part of the AI is trying to circumvent your safety measures, something has gone badly wrong.

The AI is not some agenty box, chained down with controls against its will. The AI is made of non mental parts, and we get to make those parts. There are a huge number of programs that would behave in an intelligent way. Most of these will break out and take over the world. But there are almost certainly some programs that would help humanity flourish. The goal of AI safety is to find one of them.

Comment by donald-hobson on Say Wrong Things · 2019-05-25T12:12:36.842Z · score: 2 (2 votes) · LW · GW

Lets consider the different cases seperately.

Case 1) Information that I know. I have enough information to come to a particular conclusion with reasonable confidence. If some other people might not have reached the conclusion, and its useful or interesting, then I might share it. So I don't share things that everyone knows, or things that no one cares about.

Case 2) The information is available, I have not done research and formed a conclusion. This covers cases where I don't know whats going on, because I can't be bothered to find out. I don't know who won sportsball. What use is there in telling everyone my null prior.

Case 3) The information is not readily available. If I think a question is important, and I don't know the answer already, then the answer is hard to get. Maybe no-one knows the answer, maybe the answer is all in jargon that I don't understand. For example "Do aliens exist?". Sometimes a little evidence is available, and speculative conclusions can be drawn. But is sharing some faint wisps of evidence, and describing a posterior that's barely been updated saying wrong things?

On a societal level, if you set a really high bar for reliability, all you get is the vacuously true. Set too low a bar, and almost all the conclusions will be false. Don't just have a pile of hypotheses that are at least likely to be true, for some fixed . Keep your hypothesis sorted by likelihood. A place for near certainties. A place for conclusions that are worth considering for the chance they are correct.

Of course, in a large answer space, where the amount of evidence available and the amount required are large and varying, the chance that both will be within a few bits of each other is small. Suppose the correct hypothesis takes some random number of bits between 1 and 10,000 to locate. And suppose the evidence available is also randomly spread between 1 and 10,000. The chance of the two being within 10 bits of each other is about 1/500.

This means that 499 times out of 500, you assign the correct hypothesis a chance of less than 0.1% or more than 99.9%. Uncertain conclusions are rare.

Comment by donald-hobson on Trade-off in AI Capability Concealment · 2019-05-23T23:30:56.361Z · score: 4 (3 votes) · LW · GW

Does this depict a single AI, developed in 2020 and kept running for 25 years? Any "the AI realizes that" is talking about a single instance of AI. Current AI development looks like writing some code, then training that code for a few weeks tops, with further improvements coming from changing the code. Researchers are often changing parameters like number of layers, non-linearity function ect. When these are changed, everything the AI has discovered is thrown away. The new AI has a different representation of concepts, and has to relearn everything from raw data.

Its deception starts in 2025 when the real and apparent curves diverge. In order to deceive us, it must have near human intelligence. It's still deceiving us in 2045, suggesting it has yet to obtain a decisive strategic advantage. I find this unlikely.

Comment by donald-hobson on Constraints & Slackness Reasoning Exercises · 2019-05-23T19:12:02.769Z · score: 5 (3 votes) · LW · GW

I made the cardgame, or something like it

https://github.com/DonaldHobson/LesswrongCardgame

Comment by donald-hobson on Would an option to publish to AF users only be a useful feature? · 2019-05-20T18:00:41.854Z · score: 2 (2 votes) · LW · GW

What would be more useful is a release panel system. Suppose I've had an idea that might be best to make public, might be best to keep secrete, and might be unimportant. I don't know much strategy. I would like somewhere to send it for importance and info hazard checks.

Comment by donald-hobson on Offer of collaboration and/or mentorship · 2019-05-18T22:55:54.163Z · score: 1 (1 votes) · LW · GW

The general philosophy is deconfusion. Logical counterfactuals show up in several relevant looking places, like functional decision theory. It seems that a formal model of logical counterfactuals would let more properties of these algorithms be proved. There is an important step in going from an intuitive fealing of uncertainty, into a formalized theory of probability. It might also suggest other techniques based on it. I am not sure what you mean by logical counterfactuals being part of the map? Are you saying that they are something an algorithm might use to understand the world, not features of the world itself, like probabilities?

Using this, I think that self understanding, two boxing embedded FDT agents can be fully formally understood, in a universe that contains the right type of hyper-computation.

Comment by donald-hobson on Offer of collaboration and/or mentorship · 2019-05-17T15:33:40.662Z · score: 1 (1 votes) · LW · GW

Here is a description of how it could work for peano arithmatic, other proof systens are similar.

First I define an expression to consist of a number, a variable, or a function of several other expressions.

Fixed expressions are ones in which any variables are associated with some function.

eg is a valid fixed expression. But isn't fixed.

Semantically, all fixed expressions have a meaning. Syntactically, local manipulations on the parse tree can turn one expression into another. eg going to for arbitrary expressions a,b,c.

I think that with some set of basic functions and manipulations, this system can be as powerful as PA.

I now have an infinite network with all fixed expressions as nodes, and basic transformations as edges. eg the associativity transform links the nodes (3+4)+5 and 3+(4+5).

These graphs form connected components for each number, as well as components that are not evaluatable using the rules. (there is a path from (3+4) to 7. There is not a path from 3+4 to 9. ) now

You now define a spread as an infinite positive sequence that sums to 1. (this is kind of like a probability distribution over numbers.) If you were doing counterfactual ZFC, it would be a function from sets to reals.

Each node is assigned a spread. This spread represents how much the expression is considered to have each value in a counterfactual.

Assign the node (3) a spread that assigns 1.0 to 3 and 0.0 to the rest. (even in a logical counterfactual, 3 is definitely 3). Assign all other fixed expressions a spread that is the weighted (smaller expressions are more heavy) average of its neighbours. (the spreads of the nodes it shares an edge with). To take the counterfactual of A is B, for A and B expressions with the same free variables, merge any node which has A as a subexpression, with the version that has B as a subexpression and solve for the spreads.

I know this is rough, Im still working on it.

Comment by donald-hobson on Offer of collaboration and/or mentorship · 2019-05-16T22:31:12.783Z · score: 3 (2 votes) · LW · GW

Hi, I also have a reasonable understanding of various relevant math and AI theory. I expect to have plenty of free time after 11 June (Finals). So if you want to work with me on something, I'm interested. I've got some interesting ideas relating to self validating proof systems and logical counterfactuals, but not complete yet.

Comment by donald-hobson on Programming Languages For AI · 2019-05-14T14:23:14.922Z · score: 2 (2 votes) · LW · GW

Lisp used to be a very popular language for AI programming. Not because it had features that were specific to AI, but because it was general. Lisp was based on more abstract abstractions, making it easy to choose whichever special cases were most useful to you. Lisp is also more mathematical than most programming languages.

A programming language that lets you define your own functions is more powerful than one that just gives you a fixed list of predefined functions. In a world where no programming language let you define your own functions, and a special purpose chess language has predefined chess functions. Trying to predefine AI related functions to make an "AI programming language" would be hard because you wouldn't know what to write. Noticing that on many new kinds of software project, being able to define your own functions might be useful, I would consider useful.

The goal isn't a language specialized to AI, its one that can easily be specialized in that direction. A language closer to "executable mathematics".

Comment by donald-hobson on Programming Languages For AI · 2019-05-12T10:52:18.792Z · score: 1 (1 votes) · LW · GW

I agree that if the AI is just big neural nets, python (or several other languages) are fine.

This language is designed for writing AI's that search for proofs about their own behavior, or about the behavior of arbitrary pieces of code.

This is something that you "can" do in any programming language, but this one is designed to make it easy.

We don't know for sure what AI's will look like, but we can guess enough to make a language that might well be useful.

## Programming Languages For AI

2019-05-11T17:50:22.899Z · score: 3 (2 votes)
Comment by donald-hobson on Claims & Assumptions made in Eternity in Six Hours · 2019-05-10T18:12:37.347Z · score: 1 (1 votes) · LW · GW
It would be ruinously costly to send over a large colonization fleet, and is much more efficient to send over a small payload which builds what is required in situ, i.e. von Neumann probes.

I would disagree on large colonization fleets being ruinously expensive, the best case scenario for large colonization fleets is if we have direct mass to energy conversion, launching say 2 probes from each star system that you spread from. Each probe would use half the mass energy of the star. Converting a quater of its mass to energy to get ~0.5c

You can colonize the universe even if you insist on never going to a new star system without bringing a star with you. (Some optimistic but not clearly false assumptions)

Comment by donald-hobson on Gwern's "Why Tool AIs Want to Be Agent AIs: The Power of Agency" · 2019-05-05T21:40:00.046Z · score: 4 (3 votes) · LW · GW

Agenty AI's can be well defined mathematically. We have enough understanding of what an agent is that we can start dreaming up failure modes. Most of what we have for tool ASI is analogies to systems to stupid fail catastrophically anyway, and pleasant imaginings.

Some possible programs will be tool ASI's, much as some programs will be agent ASI's. The question is, what are the relative difficulties in humans building, and benefits of, each kind of AI. Conditional on friendly AI, I would consider it more likely to be an agent than a tool, with a lot of probability on "neither", "both" and "that question isn't mathematically well defined". I wouldn't be surprised if tool AI and corrigible AI turned out to be the same thing or something.

There have been attempts to define tool-like behavior, and they have produced interesting new failure modes. We don't have the tool AI version of AIXI yet, so its hard to say much about tool AI.

Comment by donald-hobson on A Possible Decision Theory for Many Worlds Living · 2019-05-05T08:59:22.092Z · score: 1 (1 votes) · LW · GW

If you think that there is 51% chance that A is the correct morality, and 49% chance that B is, with no more information available, which is best.

Optimize A only.

Flip a quantum coin, Optimize A in one universe, B in another.

Optimize for a mixture of A and B within the same Universe. (Act like you had utility U=0.51A+0.49B) (I would do this one.)

If A and B are local objects (eg paperclips, staples) then flipping a quantum coin makes sense if you have a concave utility per object in both of them. If your utility is Then if you are the only potential source of staples or paperclips in the entire quantum multiverse, then the quantum coin or classical mix approaches are equally good. (Assuming that the resource to paperclip conversion rate is uniform. )

However, the assumption that the multiverse contains no other paperclips is probably false. Such an AI will run simulations to see which is rarer in the multiverse, and then make only that.

The talk about avoiding risk rather than expected utility maximization, and how your utility function is nonlinear, suggests this is a hackish attempt to avoid bad outcomes more strongly.

While this isn't a bad attempt at decision theory, I wouldn't want to turn on an ASI that was programmed with it. You are getting into the mathematically well specified, novel failure modes. Keep up the good work.

Comment by donald-hobson on A Possible Decision Theory for Many Worlds Living · 2019-05-04T11:37:05.143Z · score: 7 (4 votes) · LW · GW

I think that your reasoning here is substantially confused. FDT can handle reasoning about many versions of yourself, some of which might be duplicated, just fine. If your utility function is such that where . (and you don't intrinsically value looking at quantum randomness generators) then you won't make any decisions based on one.

If you would prefer the universe to be in than a logical bet between and . (Ie you get if the 3^^^3 th digit of is even, else ) Then flipping a quantum coin makes sense.

I don't think that randomized behavior is best described as a new decision theory, as opposed to an existing decision theory with odd preferences. I don't think we actually should randomize.

I also think that quantum randomness has a Lot of power over reality. There is already a very wide spread of worlds. So your attempts to spread it wider won't help.

Comment by donald-hobson on When is rationality useful? · 2019-04-30T11:58:00.552Z · score: 1 (1 votes) · LW · GW

This seems largely correct, so long as by "rationality", you mean the social movement. The sort of stuff taught on this website, within the context of human society and psychology. Human rationality would not apply to aliens or arbitrary AI's.

Some people use the word "rationality" to refer to the abstract logical structure of expected utility maximization, baysian updating, ect, as exemplified by AIXI, mathematical rationality does not have anything to do with humans in particular.

Your post is quite good at describing the usefulness of human rationality. Although I would say it was more useful in research. Without being good at spotting wrong Ideas, you can make a mistake on the first line, and produce a Lot of nonsense. (See most branches of philosophy, and all theology)

Comment by donald-hobson on Pascal's Mugging and One-shot Problems · 2019-04-28T13:05:09.481Z · score: 1 (1 votes) · LW · GW

If you were truly alone in the multiverse, this algorithm would take a bet that had a 51% chance of winning them 1 paperclip, and a 49% chance of loosing 1000000 of them.

If independant versions of this bet are taking place in 3^^^3 parallel universes, it will refuse.

For any finite bet, for all sufficiently large If the agent is using TDT and is faced with the choice of whether to make this bet in multiverses, it will behave like an expected utility maximizer.

Comment by donald-hobson on Asymmetric Justice · 2019-04-27T21:05:42.609Z · score: 1 (1 votes) · LW · GW

If saving nine people from drowning did give one enough credits to murder a tenth, society would look a lot more functional than it currently is. What sort of people would use this mechanism.

1)You are a competent good person,who would have gotten the points anyway. You push a fat man off a bridge to stop a runaway trolley. The law doesn't see that as an excuse, but lets you off based on your previous good work.

2)You are selfish, you see some action that wouldn't cause too much harm to others, and would enrich yourself greatly (Its harmful enough to be illegal). You also see opportunities to do lots of good. You do both instead of neither. Moral arbitrage.

The main downside I can see is people setting up situations to cause a harm, when the authorities aren't looking, then gaining credit for stopping the harm.

Comment by donald-hobson on Any rebuttals of Christiano and AI Impacts on takeoff speeds? · 2019-04-24T13:00:25.546Z · score: 1 (1 votes) · LW · GW

My claim at the start had a typo in it. I am claiming that you can't make a human seriously superhuman with a good education. Much like you can't get a chimp up to human level with lots of education and "self improvement". Serious genetic modification is another story, but at that point, your building an AI out of protien.

It does depend where you draw the line, but the for a wide range of performance levels, we went from no algorithm at that level, to a fast algorithm at that level. You couldn't get much better results just by throwing more compute at it.

Comment by donald-hobson on Pascal's Mugging and One-shot Problems · 2019-04-23T22:21:09.812Z · score: 6 (3 votes) · LW · GW

If you literally maximize expected number of paperclips, using standard decision theory, you will always pay the casino. To refuse the one shot game, you need to have a nonlinear utility function, or be doing something weird like median outcome maximization.

Choose action A to maximixe m such that P(paperclip count>m|a)=1/2

A well defined rule, that will behave like maximization in a sufficiently vast multiverse.

Comment by donald-hobson on Any rebuttals of Christiano and AI Impacts on takeoff speeds? · 2019-04-23T20:13:21.772Z · score: 4 (3 votes) · LW · GW

Humans are not currently capable of self improvement in the understanding your our own source code sense. The "self improvement" section in bookstores doesn't change the hardware or the operating system, it basically adds more data.

Of course talent and compute both make a difference, in the sense that and . I was talking about the subset of worlds where research talent was by far the most important. .

In a world where researchers have little idea what they are doing, and are running a new AI every hour hoping to stumble across something that works, the result holds.

In a world where research involves months thinking about maths, then a day writing code, then an hour running it, this result holds.

In a world where everyone knows the right algorithm, but it takes a lot of compute, so AI research consists of building custom hardware and super-computing clusters, this result fails.

Currently, we are somewhere in the middle. I don't know which of these options future research will look like, although if its the first one, friendly AI seems unlikely.

In most of the scenarios where the first smarter than human AI, is orders of magnitude faster than a human, I would expect a hard takeoff. As we went from having no algorithms that could say (tell a cat from a dog) straight to having algorithms superhumanly fast at doing so, there was no algorithm that worked, but took supercomputer hours, this seems like a plausible assumption.

Comment by donald-hobson on Any rebuttals of Christiano and AI Impacts on takeoff speeds? · 2019-04-22T19:35:35.775Z · score: 7 (3 votes) · LW · GW

When an intelligence builds another intelligence, in a single direct step, the output intelligence is a function of the input intelligence , and the resources used . . This function is clearly increasing in both and . Set to be a reasonably large level of resources, eg flops, 20 years to think about it. A low input intelligence, eg a dog, would be unable to make something smarter than itself. . A team of experts (by assumption that ASI is made), can make something smarter than themselves. . So there must be a fixed point. . The questions then become, how powerful is a pre fixed point AI. Clearly less good at AI research than a team of experts. As there is no reason to think that AI research is uniquely hard to AI, and there are some reasons to think it might be easier, or more prioritized, if it can't beat our AI researchers, it can't beat our other researchers. It is unlikely to make any major science or technology breakthroughs.

I recon that is large (>10) because on an absolute scale, the difference between an IQ 90 and an IQ120 human is quite small, but I would expect any attempt at AI made by the latter to be much better. In a world where the limiting factor is researcher talent, not compute, the AI can get the compute it needs for in hours (seconds? milliseconds??) As the lumpiness of innovation puts the first post fixed point AI a non-exponentially tiny distance ahead, (most innovations are at least 0.1% that state of the art better in a fast moving field) then a handful of cycles or recursive self improvement (<1 day) is enough to get the AI into the seriously overpowered range.

The question of economic doubling times would depend on how fast an economy can grow when tech breakthroughs are limited by human researchers. If we happen to have cracked self replication at about this point, it could be very fast.

Comment by donald-hobson on Why is multi worlds not a good explanation for abiogenesis · 2019-04-15T11:13:51.936Z · score: 1 (1 votes) · LW · GW

Consider a theory to be a collection of formal mathematical statements about how idealized objects behave. For example, Conways game of life is a theory in the sense of a completely self contained set of rules.

If you have multiple theories that produce similar results, its helpful to have a bridging law. If your theories were Newtonian mechanics, and general relativity, a bridging law would say which numbers in relativity matched up with which numbers in Newtonian mechanics. This allows you to translate a relativistic problem into a Newtonian one, solve that, and translate the answer back into the relativistic framework. This produces some errors, but often makes the maths easier.

Quantum many worlds is a simple theory. It could be simulated on a hypercomputer with less than a page of code. There is also a theory where you take the code for quantum many worlds, and add "observers" and "wavefunction collapse" with extra functions within your code. This can be done, but it is many pages of arbitrary hacks. Call this theory B. If you think this is a strawman of many worlds, describe how you could get a hypercomputer outside the universe to simulate many worlds with a short computer program.

The bridging between Quantum many worlds and human classical intuitions is quite difficult and subtle. Faced with a simulation of quantum many worlds, it would take a lot of understanding of quantum physics to make everyday changes, like creating or moving macroscopic objects.

Theory B however is substantially easier to bridge to our classical intuitions. Theory B looks like a chunk of quantum many worlds, plus a chunk of classical intuition, plus a bridging rule between the two.

The any description of the Copenhagen interpretation of quantum mechanics seems to involve references to the classical results of a measurement, or a classical observer. Most versions would allow a superposition of an atom being in two different places, but not a superposition of two different presidents winning an election.

If you don't believe atoms can be in superposition, you are ignoring lots of experiments, if you do believe that you can get a superposition of two different people being president, that you yourself could be in a superposition of doing two different things right now, then you believe many worlds by another name. Otherwise, you need to draw some sort of arbitrary cutoff. Its almost like you are bridging between a theory that allows superpositions, and an intuition that doesn't.

Comment by donald-hobson on Why is multi worlds not a good explanation for abiogenesis · 2019-04-14T20:10:13.891Z · score: 3 (3 votes) · LW · GW

"Now I'm not clear exactly how often quantum events lead to a slightly different world"

The answer is Very Very often. If you have a piece of glass and shine a photon at it, such that it has an equal chance of bouncing and going through, the two possibilities become separate worlds. Shine a million photons at it and you split into worlds, one for each combination of photons going through and bouncing. Note that in most of the worlds, the pattern of bounces looks random, so this is a good source of random numbers. Photons bouncing of glass are just an easy example, almost any physical process splits the universe very fast.

Comment by donald-hobson on Why is multi worlds not a good explanation for abiogenesis · 2019-04-14T19:56:08.783Z · score: -2 (3 votes) · LW · GW

The nub of the argument is that every time we look in our sock drawer, we see all our socks to be black.

Many worlds says that our socks are always black.

The Copenhagen interpretation says that us observing the socks causes them to be black. The rest of the time the socks are pink with green spots.

Both theories make identical predictions. Many worlds is much simpler to fully specify with equations, and has elegant mathematical properties. The Copenhagen interpretation has special case rules that only kick in when observing something. According to this theory, there is a fundamental physical difference between a complex collection of atoms, and an "observer" and somewhere in the development of life, creatures flipped from one to the other.

The Copenhagen interpretation doesn't make it clear if a cat is a very complex arrangement of molecules, that could in theory be understood as a quantum process that doesn't involve the collapse of wave functions, or if cats are observers and so collapses wave functions.

Comment by donald-hobson on MIRI Summer Fellows Program · 2019-04-09T20:40:32.538Z · score: 2 (2 votes) · LW · GW

Hello. I see that while the deadline has passed, the form is still open. Is it still worthwhile to apply?

Comment by donald-hobson on Would solving logical counterfactuals solve anthropics? · 2019-04-06T13:28:43.855Z · score: 4 (3 votes) · LW · GW

This supposedly "natural" reference class is full of weird edge cases, in the sense that I can't write an algorithm that finds "everybody who asks the question X". Firstly "everybody" is not well defined in a world that contains everything from trained monkeys to artificial intelligence's. And "who asks the question X" is under-defined as there is no hard boundary between a different way of phrasing the same question and slightly different questions. Does someone considering the argument in chinese fall into your reference class? Even more edge cases appear with mind uploading, different mental architectures, ect.

If you get a different prediction from taking the reference class of "people" (for some formal definition of "people") and then updating on the fact that you are wearing blue socks, than you get from the reference class "people wearing blue socks", then something has gone wrong in your reasoning.

The doomsday argument works by failing to update on anything but a few carefully chosen facts.

Comment by donald-hobson on Would solving logical counterfactuals solve anthropics? · 2019-04-05T23:06:23.400Z · score: 1 (1 votes) · LW · GW

I would say that the concept of probability works fine in anthropic scenarios, or at least there is a well defined number that is equal to probability in non anthropic situations. This number is assigned to "worlds as a whole". Sleeping beauty assigns 1/2 to heads, and 1/2 to tails, and can't meaningfully split the tails case depending on the day. Sleeping beauty is a functional decision theory agent. For each action A, they consider the logical counterfactual that the algorithm they are implementing returned A, then calculate the worlds utility in that counterfactual. They then return whichever action maximizes utility.

In this framework, "which version am I?" is a meaningless question, you are the algorithm. The fact that the algorithm is implemented in a physical substrate give you means to affect the world. Under this model, whether or not your running on multiple redundant substrates is irrelivant. You reason about the universe without making any anthropic updates. As you have no way of affecting a universe that doesn't contain you, or someone reasoning about what you would do, you might as well behave as if you aren't in one. You can make the efficiency saving of not bothering to simulate such a world.

You might, or might not have an easier time effecting a world that contains multiple copies of you.

Comment by donald-hobson on Can Bayes theorem represent infinite confusion? · 2019-03-22T22:06:33.783Z · score: 1 (1 votes) · LW · GW

In other words, the agent assigned zero probability to an event, and then it happened.

Comment by donald-hobson on What failure looks like · 2019-03-18T16:51:01.487Z · score: 0 (3 votes) · LW · GW

As far as I understand it, you are proposing that the most realistic failure mode consists of many AI systems, all put into a position of power by humans, and optimizing for their own proxies. Call these Trusted Trial and Error AI's (TTE)

The distinguishing features of TTE's are that they were Trusted. A human put them in a position of power. Humans have refined, understood and checked the code enough that they are prepared to put this algorithm in a self driving car, or a stock management system. They are not lab prototypes. They are also Trial and error learners, not one shot learners.

Some More descriptions of what capability range I am considering.

Suppose hypothetically that we had TTE reinforcement learners, a little better than todays state of the art, and nothing beyond that. The AI's are advanced enough that they can take a mountain of medical data and train themselves to be skilled doctors by trial and error. However they are not advanced enough to figure out how humans work from, say a sequenced genome and nothing more.

Give them control of all the traffic lights in a city, and they will learn how to minimize traffic jams. They will arrange for people to drive in circles rather than stay still, so that they do not count as part of a traffic jam. However they will not do anything outside their preset policy space, like hacking into the traffic light control system of other cities, or destroying the city with nukes.

If such technology is easily available, people will start to use it for things. Some people put it in positions of power, others are more hesitant. As the only way the system can learn to avoid something is through trial and error, the system has to cause a (probably several) public outcrys before it learns not to do so. If no one told the traffic light system that car crashes are bad on simulations or past data, (Alignment failure) Then even if public opinion feeds directly into reward, it will have to cause several car crashes that are clearly its fault before it learns to only cause crashes that can be blamed on someone else. However, deliberately causing crashes will probably get the system shut off or seriously modified.

Note that we are supposing many of these systems existing, so the failures of some, combined with plenty of simulated failures, will give us a good idea of the failure modes.

The space of bad things an AI can get away with is small and highly complex in the space of bad things. An TTE set to reduce crime rates tries making the crime report forms longer, this reduces reported crime, but humans quickly realize what its doing. It would have to do this and be patched many times before it came up with a method that humans wouldn't notice.

Given Advanced TTE's as the most advanced form of AI, we might slowly develop a problem, but the deployment of TTE's would be slowed by the time it takes to gather data and check reliability. Especially given mistrust after several major failures. And I suspect that due to statistical similarity of training and testing, many different systems optimizing different proxies, and humans having the best abstract reasoning about novel situations, and the power to turn the systems off, any discrepancy of goals will be moderately minor. I do not expect such optimization power to be significantly more powerful or less aligned than modern capitalism.

This all assumes that no one will manage to make a linear time AIXI. If such a thing is made, it will break out of any boxes and take over the world. So, we have a social process of adaption to TTE AI, which is already in its early stages with things like self driving cars, and at any time, this process could be rendered irrelevant by the arrival of a super-intelligence.

Comment by donald-hobson on Risk of Mass Human Suffering / Extinction due to Climate Emergency · 2019-03-14T23:41:43.915Z · score: 16 (7 votes) · LW · GW

1)Climate change caused extinction is not on the table. Low tech humans can survive everywhere from the jungle to the arctic. Some humans will survive.

2) I suspect that climate change won't cause massive social collapse. It might well knock 10% of world GDP, but it won't stop us having an advanced high tech society. At the moment, its not causing damage on that scale, and I suspect that in a few decades, we will have biotech, renewables or other techs that will make everything fine. I suspect that the damage caused by climate change won't increase by more than 2 or 3 times in the next 50 years.

3) If you are skilled enough to be a scientist, inventing a solar panel that's 0.5% more efficient does a lot more good than showing up to protests. Protest's need many people to work, inventors can change the world by themselves. Policy advisors and academics can suggest action in small groups. Even working a normal job and sending your earnings to a well chosen charity is likely to be more effective.

4) Quite a few people are already working on global warming. It seems unlikely that a problem needs 10,000,001 people working on it to solve, and if only 10,000,000 people work on it, they won't manage. Most of the really easy work on global warming is already being done. This is not the case with AI risk as of 10 years ago, for example. (It's got a few more people working on it since then, still nothing like climate change.)

Comment by donald-hobson on [Fiction] IO.SYS · 2019-03-11T14:36:16.234Z · score: 4 (3 votes) · LW · GW

I think the protagonist here should have looked at earth. If there was a technological intelligence on earth that cared about the state of Jupiter's moons, then it could send rockets there. The most likely scenarios are a disaster bad enough to stop us launching spacecraft, and an AI that only cares about earth.

A super intelligence should assign non negligible probability to the result that actually happened. Given the tech was available, a space-probe containing an uploaded mind is not that unlikely. If such a probe was a real threat to the AI, it would have already blown up all space-probes on the off chance.

The upper bound given on the amount that malicious info can harm you is extremely loose. Malicious info can't do much harm unless the enemy has a good understanding of the particular system that they are subverting.

Comment by donald-hobson on Rule Thinkers In, Not Out · 2019-02-27T08:28:14.912Z · score: 7 (6 votes) · LW · GW

Yet policy exploration is an important job. Unless you think that someone posting something on a blog is going to change policy without anyone double-checking it first, we should encourage suggestion of radically new policies.

Comment by donald-hobson on Humans Who Are Not Concentrating Are Not General Intelligences · 2019-02-26T09:22:12.633Z · score: 27 (15 votes) · LW · GW

I would like to propose a model that is more flattering to humans, and more similar to how other parts of human cognition work. When we see a simple textual mistake, like a repeated "the", we don't notice it by default. Human minds correct simple errors automatically without consciously noticing that they are doing it. We round to the nearest pattern.

I propose that automatic pattern matching to the closest thing that makes sense is happening at a higher level too. When humans skim semi contradictory text, they produce a more consistent world model that doesn't quite match up with what is said.

Language feeds into a deeper, sensible world model module within the human brain and GPT2 doesn't really have a coherent world model.

Comment by donald-hobson on Can We Place Trust in Post-AGI Forecasting Evaluations? · 2019-02-17T21:01:29.480Z · score: 3 (3 votes) · LW · GW

As your belief about how well AGI is likely to go affects both the likelihood of a bet being evaluated, and the chance of winning, so bets about AGI are likely to give dubious results. I also have substantial uncertainty about the value of money in a post singularity world. Most obviously is everyone getting turned into paperclips, noone has any use for money. If we get a friendly singleton super-intelligence, everyone is living in paradise, whether or not they had money before. If we get an economic singularity, where libertarian ASI(s) try to make money without cheating, then money could be valuable. I'm not sure how we would get that, as an understanding of the control problem good enough to not wipe out humans and fill the universe with bank notes should be enough to make something closer to friendly.

Even if we do get some kind of ascendant economy, given the amount of resources in the solar system (let alone wider universe), its quite possible that pocket change would be enough to live for aeons of luxury.

Given how unclear it is about whether or not the bet will get paid and how much the cash would be worth if it was, I doubt that the betting will produce good info. If everyone thinks that money is more likely than not to be useless to them after ASI, then almost no one will be prepared to lock their capital up until then in a bet.

Comment by donald-hobson on Limiting an AGI's Context Temporally · 2019-02-17T18:32:43.272Z · score: 3 (3 votes) · LW · GW

I suspect that an AGI with such a design could be much safer, if it was hardcoded to believe that time travel and hyperexponentially vast universes were impossible. Suppose that the AGI thought that there was a 0.0001% chance that it could use a galaxies worth of resources to send 10^30 paperclips back in time. Or create a parallel universe containing 3^^^3 paperclips. It will still chase those options.

If starting a long plan to take over the world costs it literally nothing, it will do it anyway. A sequence of short term plans, each designed to make as many paperclips as possible within the next few minutes could still end up dangerous. If the number of paperclips at time is , and its power at time is , then , would mean that both power and paperclips grew exponentially. This is what would happen if power can be used to gain power and clips at the same time, with minimal loss of either from also pursuing the other.

If power can only be used to gain one thing at a time, and the rate power can grow at is less than the rate of time discount, then we are safer.

This proposal has several ways to be caught out, world wrecking assumptions that aren't certain, but if used with care, a short time frame, an ontology that considers timetravel impossible, and say a utility function that maxes out at 10 clips, it probably won't destroy the world. Throw in mild optimization and an impact penalty, and you have a system that relies on a disjunction of shaky assumptions, not a conjunction of them.

It is a CDT agent, or something that doesn't try to punish you now so you make paperclips last week. A TDT agent might decide to take the policy of killing anyone who didn't make clips before it was turned on, causing humans that predict this to make clips.

I suspect that it would be possible to build such an agent, prove that there are no weird failure modes left, and turn it on, with a small chance of destroying the world. I'm not sure why you would do that. Once you understand the system well enough to say its safe-ish, what vital info do yo gain from turning it on?

Comment by donald-hobson on Extraordinary ethics require extraordinary arguments · 2019-02-17T17:19:53.640Z · score: 11 (6 votes) · LW · GW

Butterfly effects essentially unpredictable, given your partial knowledge of the world. Sure, you doing homework could cause a tornado in Texas, but it's equally likely to prevent that. To actually predict which, you would have to calculate the movement of every gust of air around the world. Otherwise your shuffling an already well shuffled pack of cards. Bear in mind that you have no reason to distinguish the particular action of "doing homework" from a vast set of other actions. If you really did know what actions would stop the Texas tornado, they might well look like random thrashing.

What you can calculate is the reliable effects of doing your homework. So, given bounded rationality, you are probably best to base your decisions on those. The fact that this only involves homework might suggest that you have an internal conflict between a part of yourself that thinks about careers, and a short term procrastinator.

Most people who aren't particularly ethical still do more good than harm. (If everyone looks out for themselves, everyone has someone to look out for them. The law stops most of the bad mutual defections in prisoners dilemmas) Evil genius trying to trick you into doing harm are much rarer than moderately competent nice people trying to get your help to do good.

Comment by donald-hobson on Short story: An AGI's Repugnant Physics Experiment · 2019-02-14T15:31:50.252Z · score: 7 (5 votes) · LW · GW

This is an example of a pascals mugging. Tiny probabilities of vast rewards can produce weird behavior. The best known solution is either a bounded utility function, or a antipascalene agent. (An agent that ignores the best x% and worst y% of possible worlds when calculating expected utilities. It can be money pumped)

Comment by donald-hobson on Probability space has 2 metrics · 2019-02-11T22:50:32.220Z · score: 13 (5 votes) · LW · GW

Get a pack of cards in which some cards are blue on both sides, and some are red on one side and blue on the other. Pick a random card from the pile. If the subject is shown one side of the card, and its blue, they gain a bit of evidence that the card is blue on both sides. Give them the option to bet on the colour of the other side of the card, before and after they see the first side. Invert the prospect theory curve to get from implicit probability to betting behaviour. The people should perform a larger update in log odds when the pack is mostly one type of card, over when the pack is 50 : 50.

Comment by donald-hobson on How important is it that LW has an unlimited supply of karma? · 2019-02-11T15:21:06.773Z · score: 4 (4 votes) · LW · GW

I suspect that if voting reduced your own karma, some people wouldn't vote. As it becomes obvious that this is happening, more people stop voting, until karma just stops flowing at all. (The people who persistently vote anyway all run out of karma.)

Comment by donald-hobson on Probability space has 2 metrics · 2019-02-11T10:55:11.889Z · score: 1 (1 votes) · LW · GW

Fixed, thanks.

## Propositional Logic, Syntactic Implication

2019-02-10T18:12:16.748Z · score: 5 (4 votes)

## Probability space has 2 metrics

2019-02-10T00:28:34.859Z · score: 88 (36 votes)
Comment by donald-hobson on X-risks are a tragedies of the commons · 2019-02-07T17:50:13.823Z · score: 2 (2 votes) · LW · GW

This is making the somewhat dubious assumption that X risks are not so neglected that even a "selfish" individual would work to reduce them. Of course, in the not too unreasonable scenario where the cosmic commons is divided up evenly, and you use your portion to make a vast number of duplicates of yourself, the utility, if your utility is linear in copies of yourself, would be vast. Or you might hope to live for a ridiculously long time in a post singularity world.

The effect that a single person can have on X risks is small, but if they were selfish with no time discounting, it would be a better option than hedonism now. Although a third alternative of sitting in a padded room being very very safe could be even better.

Comment by donald-hobson on (notes on) Policy Desiderata for Superintelligent AI: A Vector Field Approach · 2019-02-06T00:27:27.407Z · score: 18 (5 votes) · LW · GW

I suspect that the social institutions of Law and Money are likely to become increasingly irrelevant background to the development of ASI.

Deterrence Fails.

If you believe that there is a good chance of immortal utopia, and a large chance of paperclips in the next 5 years, the threat that the cops might throw you in jail, (on the off chance that they are still in power) is negligible.

The law is blind to safety.

The law is bureaucratic and ossified. It is probably not employing much top talent, as it's hard to tell top talent from the rest if you aren't as good yourself (and it doesn't have the budget or glamor to attract them). Telling whether an organization is on line for not destroying the world is HARD. The safety protocols are being invented on the fly by each team, the system is very complex and technical and only half built. The teams that would destroy the world aren't idiots, they are still producing long papers full of maths and talking about the importance of safety a lot. There are no examples to work with, or understood laws.

Likely as not (not really, too much conjugation here), you get some random inspector with a checklist full of thing that sound like a good idea to people who don't understand the problem. All AI work has to have an emergency stop button that turns the power off. (The idea of an AI circumventing this was not considered by the person who wrote the list).

All the law can really do is tell what public image an AI group want's to present, provide funding to everyone, and get in everyone's way. Telling cops to "smash all GPU's" would have an effect on AI progress. The fund vs smash axis is about the only lever they have. They can't even tell an AI project from a maths convention from a normal programming project if the project leaders are incentivized to obfuscate.

After ASI, governments are likely only relevant if the ASI was programmed to care about them. Neither paperclippers or FAI will care about the law. The law might be relevant if we had tasky ASI that was not trivial to leverage into a decisive strategic advantage. (An AI that can put a strawberry on a plate without destroying the world, but that's about the limit of its safe operation.)

Such an AI embodies an understanding of intelligence and could easily be accidentally modified to destroy the world. Such scenarios might involve ASI and timescales long enough for the law to act.

I don't know how the law can handle something that, can easily destroy the world, has some economic value (if you want to flirt danger) and, with further research could grant supreme power. The discovery must be limited to a small group of people, (law of large number of nonexperts, one will do something stupid). I don't think the law could notice what it was, after all the robot in-front of the inspector only puts strawberries on plates. They can't tell how powerful it would be with an unbounded utility function.

Comment by donald-hobson on Why is this utilitarian calculus wrong? Or is it? · 2019-01-28T17:06:33.310Z · score: 6 (5 votes) · LW · GW

Firstly, you are confusing dollars and utils.

If you buy this product for $100, you gain the use of it, at value U[30] to yourself. The workers who made it gain$80, at value U[80] to yourself, because of your utilitarian preferences. Total value U[110]

If the alternative was a product of cost $100, which you value the use of at U[105], but all the money goes to greedy rich people to be squandered, then you would choose the first. If the alternative was spending$100 to do something insanely morally important, U[3^^^3], you would do that.

If the alternative was a product of cost \$100, that was of value U[100] to yourself, and some of the money would go to people that weren't that rich U[15], you would do that.

If you could give the money to people twice as desperate as the workers, at U[160], you would do that.

There are also good reasons why you might want to discourage monopolies. Any desire to do so is not included in the expected value calculations. But the basic principle is that utilitarianism can never tell you if some action is a good use of a resource, unless you tell it what else that resource could have been used for.

## Allowing a formal proof system to self improve while avoiding Lobian obstacles.

2019-01-23T23:04:43.524Z · score: 6 (3 votes)

## Logical inductors in multistable situations.

2019-01-03T23:56:54.671Z · score: 8 (5 votes)

## Boltzmann Brains, Simulations and self refuting hypothesis

2018-11-26T19:09:42.641Z · score: 0 (2 votes)

## Quantum Mechanics, Nothing to do with Consciousness

2018-11-26T18:59:19.220Z · score: 10 (9 votes)

## Clickbait might not be destroying our general Intelligence

2018-11-19T00:13:12.674Z · score: 26 (10 votes)

## Stop buttons and causal graphs

2018-10-08T18:28:01.254Z · score: 6 (4 votes)

## The potential exploitability of infinite options

2018-05-18T18:25:39.244Z · score: 3 (4 votes)