Logical Optimizers 2019-08-22T23:54:35.773Z · score: 6 (5 votes)
Logical Counterfactuals and Proposition graphs, Part 1 2019-08-22T22:06:01.764Z · score: 10 (3 votes)
Programming Languages For AI 2019-05-11T17:50:22.899Z · score: 3 (2 votes)
Propositional Logic, Syntactic Implication 2019-02-10T18:12:16.748Z · score: 6 (5 votes)
Probability space has 2 metrics 2019-02-10T00:28:34.859Z · score: 89 (37 votes)
Allowing a formal proof system to self improve while avoiding Lobian obstacles. 2019-01-23T23:04:43.524Z · score: 6 (3 votes)
Logical inductors in multistable situations. 2019-01-03T23:56:54.671Z · score: 8 (5 votes)
Boltzmann Brains, Simulations and self refuting hypothesis 2018-11-26T19:09:42.641Z · score: 0 (2 votes)
Quantum Mechanics, Nothing to do with Consciousness 2018-11-26T18:59:19.220Z · score: 11 (10 votes)
Clickbait might not be destroying our general Intelligence 2018-11-19T00:13:12.674Z · score: 26 (10 votes)
Stop buttons and causal graphs 2018-10-08T18:28:01.254Z · score: 6 (4 votes)
The potential exploitability of infinite options 2018-05-18T18:25:39.244Z · score: 3 (4 votes)


Comment by donald-hobson on Could we solve this email mess if we all moved to paid emails? · 2019-08-14T00:01:36.218Z · score: 2 (3 votes) · LW · GW

If I am sending you an email, it could be because I have some info that I believe would benefit you and am honestly trying to be helpful in sending it. I am unlikely to do this if I have to pay you.

Comment by donald-hobson on Could we solve this email mess if we all moved to paid emails? · 2019-08-13T23:58:16.207Z · score: 1 (1 votes) · LW · GW

Having these norms would create scammers that try to look prestigious. If you only get paid when you reply to a message, lots of low value replies are going to be sent.

Comment by donald-hobson on Why do humans not have built-in neural i/o channels? · 2019-08-09T02:51:26.180Z · score: 5 (3 votes) · LW · GW

Direct neural IO has a large fitness moat. Once an animal has any kind of actuator that can modify the environment, and any kind of sensor that can detect info about the environment, then one animals actions will sometimes modify what another animal senses, and hence how it behaves. Evolution can then get to work optimizing this. Many benefits can accrue, even if no other animal communicates. A crow pattering its feet to bring up worms has some understanding of other animals being things it can manipulate, and the tools to do it. (humans are best at training other animals as well as communicating, both need a theory of mind.)

Animals don't touch neurons together except in freak accidents, where any chance of survival is minimal. Until you have functional communication, banging neurons together is useless. Until you have a system that filters it out, saline exposure will spam nonsense. And once you have one form of communication, the pressure to develop a second is almost none.

Comment by donald-hobson on AI Alignment Open Thread August 2019 · 2019-08-05T11:41:25.884Z · score: 5 (3 votes) · LW · GW

You are handed a hypercomputer, and allowed to run any code you like on it. You can then take 1Tb of data from your computations and attach it to a normal computer. The hypercomputer is removed. You are then handed a magic human utility function. How do you make an FAI with these resources?

The normal computer is capable of running a highly efficient super-intelligence. The hypercomputer can do a brute force search for efficient algorithms. The idea is to split FAI into building a capability module, and a value module.

Comment by donald-hobson on AI Alignment Open Thread August 2019 · 2019-08-05T11:33:44.274Z · score: 3 (2 votes) · LW · GW

The problem with tests is that the AI behaving well when weak enough to be tested doesn't guarantee it will continue to do so.

If you are testing a system, that means that you are not confidant that it is safe. If it isn't safe, then your only hope is for humans to stop it. Testing an AI is very dangerous unless you are confidant that it can't harm you.

A paperclip maximizer would try to pass your tests until it was powerful enough to trick its way out and take over. Black box testing of arbitrary AI's gets you very little safety.

Also some peoples intuitions think that a smile maximizing AI is a good idea. If you have a straightforward argument that appeals to the intuitions of the average Joe Blogs, and can't be easily formalized, then I would take the difficulty formalizing it as evidence that the argument is not sound.

If you take a neural network and train it to recognize smiling faces, then attach that to AIXI, you get a machine that will appear to work in the lab, when the best it can do is make the scientists smile into its camera. There will be an intuitive argument about how it wants to make people smile, and people smile when they are happy. The AI will tile the universe with cameras pointed at smiley faces as soon as it escapes the lab.

Comment by donald-hobson on Practical consequences of impossibility of value learning · 2019-08-04T22:03:13.765Z · score: 1 (1 votes) · LW · GW

I should have been clearer, the point isn't that you get correct values, the point is that you get out of the swath of null or meaningless values and into the just wrong. While the values gained will be wrong, they would be significantly correlated, its the sort of AI to produce drugged out brains in vats, or something else that's not what we want, but closer than paperclips. One measure you could use of human effectiveness is given all possible actions ordered by util, what percentile are the actions we took in.

Once we get into this region, it becomes clear that the next task is to fine tune our model of the bounds on human rationality, or figure out how to get an AI to do it for us.

Comment by donald-hobson on Practical consequences of impossibility of value learning · 2019-08-04T18:09:21.244Z · score: 1 (1 votes) · LW · GW

There are no free lunch theorems "proving" that intelligence is impossible. There is no algorithm that can optimize an arbitrary environment. We display intelligence. The problem with the theorem comes from the part where you assume an arbitrary max-entropy environment, rather than inductive priors. If you assume that human values are simple (low komelgorov complexity) and that human behavior is quite good at fulfilling those values, then you can deduce non trivial values for humans.

Comment by donald-hobson on Very different, very adequate outcomes · 2019-08-02T22:51:29.744Z · score: 1 (1 votes) · LW · GW

As far as I am concerned, hedonism is an approximate description of some of my preferences. Hedonism is a utility function close to, but not equal to mine. I see no reason why a FAI should contain a special term for hedonism. Just maximize preferences, anything else is strictly worse, but not necessarily that bad.

I do agree that there are many futures we would consider valuable. Our utility function is not a single sharp spike.

Comment by donald-hobson on Why Subagents? · 2019-08-02T12:38:47.675Z · score: 9 (5 votes) · LW · GW

Suppose you offer to pay a penny to swap mushroom for pepperoni, and then another penny to swap back. This agent will refuse, failing to money pump you.

Suppose you offer the agent a choice between pepperoni or mushroom, when it currently has neither. Which does it choose? If it chooses pepperoni, but refuses to swap mushroom for pepperoni then its decisions depend on how the situation is framed. How close does it have to get to the mushroom before they "have" mushroom and refuse to swap? Partial preferences only make sense when you don't have to choose between unordered options.

We could consider the agent to have a utility function with a term for time consistency, they want the pizza in front of them at times 0 and 1 to be the same.

Comment by donald-hobson on Does it become easier, or harder, for the world to coordinate around not building AGI as time goes on? · 2019-07-30T18:55:10.662Z · score: 4 (2 votes) · LW · GW

The AI asks for lots of info on biochemistry, and gives you a long list of chemicals that it claims cure various diseases. Most of these are normal cures. One of these chemicals will mutate the common cold into a lethal super plague. Soon we start some clinical trials of the various drugs, until someone with a cold takes the wrong one and suddenly the wold has a super plague.

The medial marvel AI is asked about the plague, It gives a plausible cover story for the plagues origins, along with describing an easy to make and effective vaccine. As casualties mount, humans rush to put the vaccine into production. The vaccine is designed to have an interesting side effect, a subtle modification of how the brain handles trust and risk. Soon the AI project leaders have been vaccinated. The AI says that it can cure the plague, it has a several billion base pair DNA file, that should be put into a bacterium. We allow it to output this file. We inspect it in less detail than we should have, given the effect of the vaccine, then we synthesize the sequence and put it in a bacteria. A few minutes later, the sequence bootstraps molecular nanotech. over the next day, the nanotech spreads around the world. Soon its exponentially expanding across the universe turning all matter into drugged out brains in vats. This is the most ethical action according to the AI's total utilitarian ethics.

The fundamental problem is that any time that you make a decision based on the outputs of an AI, that gives it a chance to manipulate you. If what you want isn't exactly what it wants, then it has incentive to manipulate.

(There is also the possibility of a side channel. For example, manipulating its own circuits to produce a cell phone signal, spinning its hard drive in a way that makes a particular sound, ect. Making a computer just output text, rather than outputing text, and traces of sound, microwaves and heat which can normally be ignored but might be maliciously manipulated by software, is hard)

Comment by donald-hobson on Arguments for the existence of qualia · 2019-07-29T19:58:07.497Z · score: 1 (1 votes) · LW · GW

Whether patterns of graphite on paper, or patterns of electricity in silicon, words are real physical things.

Comment by donald-hobson on Arguments for the existence of qualia · 2019-07-28T20:04:47.561Z · score: 19 (7 votes) · LW · GW

From an outside view, you have given a long list of wordy philosophical arguments, all of which involve terms that you haven't defined. The success rate for arguments like that isn't great.

We can be reasonably certain that the world is made up of some kind of fundamental part obeying simple mathematical laws. I don't know which laws, but I expect there to be some set of equations, of which quantum mechanics and relativity are approximations, that predicts every detail of reality.

The minds of humans, including myself, are part of reality. Look at a philosopher talking about consciousness or qualia in great detail. "A Philosopher talking about qualia" is a high level approximate description of a particular collection of quantum fields or super-strings (or whatever reality is made of).

You can choose a set of similar patterns of quantum fields and call them qualia. This makes a qualia the same type of thing as a word or an apple. You have some criteria about what patterns of quantum fields do or don't count as an X. This lets you use the word X to describe the world. There are various details about how we actually discriminate based on sensory experience. All of our idea of what an apple is comes from our sensory experience of apples, correlated to sensory experience of people saying the word "apple". This is a feature of the map, not the territory.

I am a mind. A mind is a particular arrangement of quantum fields that selects actions based on some utility function stored within it. Deep blue would be a simpler example of a mind. The point is that minds are mechanistic, (mind is an implicitly defined set of patterns of quantum fields, like apple) minds also contain goals embedded within their structure. My goals happen to make various references to other minds, in particular they say to avoid an implicitly defined set of states that my map calls minds in pain.

I would use a definition of qualia in which they were some real, neurological phenomena. I don't know enough neurology to say which.

Comment by donald-hobson on Just Imitate Humans? · 2019-07-27T21:33:48.345Z · score: 1 (1 votes) · LW · GW

The first question is whether you have enough information to locate human behavior. The concept of optimization is fairly straightforward, and it could get a rough estimate of our intelligence by seeing humans trying to solve some puzzle. In other words, the amount of data needed to get an optimizer is small. The amount of data needed to totally describe every detail of human values is large. This means that a random hypothesis based on a small amount of data will be an optimizer with non-human goals.

For example, maybe the human trainers value having real authentic experiences, but never had a cause to express that preference during training. The imitation fills the universe with people in VR pods not knowing that their life is fake. The imitations do however have a preference for ( random alien preference) because the trainers never showed that they didn't prefer that.

Lets suppose you gave it vast amounts of data, and have a hypothesis space of all possible turing machines. (weighted by size). One fairly simple turing machine that would predict the data is a quantum simulation of a world similar to our own.

(Less than a kilobyte on the laws of QM, and the rest of the data goes towards pointing at a branch of the quantum multiverse with humans similar to us in. The simulation would also need something pointing at the input cable of the simulated AI. This gives us a virtual copy of the universe, as a program that predicts the flow of electricity in a particular cable. This code will be optimized to be short, not to be human comprehensible. I would not expect to be able to easily extract a human mind from the model.

If you put an upper bound on run time, and it is easily large enough to accurately simulate a human mind, then I would expect a program that was attempting to reason abstractly about the surrounding world. In a large pile of data, there will be many seemingly unrelated surface facts that actually have deep connections. A superhuman mind that abstractly reasons about the outside world, could use evolutionary psychology to predict human behavior. Using the laws of physics and a rough idea of humanities tech level to predict info about our tech. Intelligent abstract reasoning about our surroundings is likely to win out over simple heuristics be having more predictive power per bit. If you give it enough compute to predict humans, it also has enough compute for this.

All the problems of mesa optimization can't be ruled out. Alternately it could be abstractly reasoning about its input wire, and give us a fast approximation of the virtual universe program above.

Finally, the virtual humans might realize that they are virtual and panic about it.

Comment by donald-hobson on The Real Rules Have No Exceptions · 2019-07-26T11:35:51.558Z · score: 1 (3 votes) · LW · GW

For deciding your own decisions, only a full description of your own utility function and decision theory will tell you what to do in every situation. And (work out what you would do if you were maximally smart, then do that) is a useless rule in practice. When deciding your own actions, you don't need to use rules at all.

If you are in any kind of organization that has rules, you have to use your own decision theory to work out which decision is best. To do this would involve weighing up the pros and cons of rule breaking, with one of the cons being any punishment the rule enforcers might apply.

Suppose you are in charge, you get to write the rules and no one else can do anything about rules they don't like.

You are still optimizing for more than just being correct. You want rules that are reasonably enforceable, the decision of whether or not to punish can only depend on things the enforcers know. You also want the rules to be short enough and simple enough for the rule followers to comprehend.

The best your rules can hope to do when faced with a sufficiently weird situation is not apply any restrictions at all.

Comment by donald-hobson on Why it feels like everything is a trade-off · 2019-07-20T11:16:15.321Z · score: 1 (1 votes) · LW · GW

Your right. I did some python. My version took 1.26, yours 0.78 microseconds. My code is just another point on the Pareto boundary.

Comment by donald-hobson on Why it feels like everything is a trade-off · 2019-07-18T07:19:02.727Z · score: 6 (4 votes) · LW · GW

A more sensible way to code this would be

def apply_polynomial( deriv, x ):
sum = deriv[0]
for i from 1 to length( deriv ):
sum += deriv[i] * xpow / div
return sum

Its about as fast as the second, nearly as readable as the first, and works on any poly. (except the zero poly symbolized by the empty list. )

The bit about the tradeoffs is correct as far as I can tell.

Although if a single solution was by far the best under every metric, there wouldn't be any tradeoffs.

In most real cases, the solution space is large, and there are many metrics. This means that its unusual for one solution to be the best by all of them. And in those situations, you might not see a choice at all.

Comment by donald-hobson on If physics is many-worlds, does ethics matter? · 2019-07-13T14:00:10.063Z · score: 1 (1 votes) · LW · GW

I know MWI doesn't imply equal measure, I was taking equal measure as an aditional hypothesis within the MWI framework.

We don't know that because we don't know anything about qualia.

Consider a sufficiently detailed simulation of a human mind, say full Quantum, except whenever there are multiple blobs of amplitude sufficiently detached from each other, one is picked pseudorandomly and the rest are deleted. Because it is a sufficiently detailed simulation of a human mind, it will say the same things a human would, for much the same reasons. Applying the generalized anti zombie principle says that it would have the feeling of making a choice.

There is not always a single optimal solution to a problem even for a perfect rationalist, and humans aren't perfect rationalists.

My point is that when we show optimization pressure, that isn't just a fluke, then there is no branch in which we do something totally stupid. There might be branches where we make a different reasonable decision.

I expect quantum ethics to have a utility function that is some measure of what computations are being done, and the quantum amplitude that they are done with.

Comment by donald-hobson on If physics is many-worlds, does ethics matter? · 2019-07-10T19:22:54.008Z · score: 10 (5 votes) · LW · GW

If every time you made a choice, the universe split into a version where you did each thing, then there is no sense in which you chose a particular thing from the outside. From this perspective, we should expect human actions in a typical "universe" to look totally random. (There are many more ways to thrash randomly than to behave normally) This would make human minds basically quantum random number generators. I see substantial evidence that human actions are not totally random. The hypothesis that when a human makes a choice, the universe splits and every possible choice is made with equal measure is coherent, falsifiable and clearly wrong.

A simulation of a human mind running on reliable digital hardware would always make a single choice, not splitting the universe at all. They would still have the feeling of making a choice.

To the extent that you are optimizing, not outputting random noise, you aren't creating multiple universes. It all adds up to normality.

While you are working on a theory of quantum ethics, it is better to use your classical ethics than a half baked attempt at quantum ethics. This is much the same as with predictions.

Fully complete quantum theory is more accurate than any classical theory, although you might want to use the classical theory for computational reasons. However, if you miss a minus sign or a particle, you can get nonsensical results, like everything traveling at light speed.

A complete quantum ethics will be better than any classical ethics (almost identical in everyday circumstances) , but one little mistake and you get nonsense.

Comment by donald-hobson on Contest: $1,000 for good questions to ask to an Oracle AI · 2019-07-01T19:55:04.846Z · score: 14 (4 votes) · LW · GW

Your treating the low bandwith oracle as an FAI with a bad output cable. You can ask it if another AI is friendly if you trust it to give you the right answer. As there is no obvious way to reward the AI for correct friendliness judgements, you risk running an AI that isn't friendly, but still meets the reward criteria.

The low bandwidth is to reduce manipulation. Don't let it control you with a single bit.

Comment by donald-hobson on Is "physical nondeterminism" a meaningful concept? · 2019-06-16T21:46:54.112Z · score: 2 (2 votes) · LW · GW

You can certainly get anthropic uncertainty in a universe that allows you to be duplicated. In a universe that duplicates, and the duplicates can never interact, we would see the appearance of randomness. Mathematically, randomness is defined in terms of the set of all possibilities.

An ontology that allows universes to be intrinsically random seems well defined. However, it can be considered as a syntactic shortcut for describing universes that are anthropically random.

Comment by donald-hobson on Unknown Unknowns in AI Alignment · 2019-06-14T09:29:58.874Z · score: 18 (7 votes) · LW · GW

If you add adhoc patches until you can't imagine any way for it to go wrong, you get a system that is too complex to imagine. This is the "I can't figure out how this fails" scenario. It is going to fail for reasons that you didn't imagine.

If you understand why it can't fail, for deep fundamental reasons, then its likely to work.

This is the difference between the security mindset and ordinary paranoia. The difference between adding complications until you can't figure out how to break the code, and proving that breaking the code is impossible (assuming the adversary can't get your one time pad, its only used once, your randomness is really random, your adversary doesn't have anthropic superpowers ect).

I would think that the chance of serious failure in the first scenario was >99%, and in the second, (assuming your doing it well and the assumptions you rely on are things you have good reason to believe) <1%

Comment by donald-hobson on Cryonics before natural death. List of companies? · 2019-06-13T16:19:14.339Z · score: 1 (1 votes) · LW · GW

Cryonics is a sufficiently desperate last grasp at life, one with a fairly small chance of success, that I'm not sure that this is a good idea. It would be a good idea if you had a disease that would make you brain dead, and then kill you.

It might be a good idea if your expect any life conditional on revival to be Really good. It would also depend on how much Alzheimers destroyed personality rather than shutting it down. (has the neural structure been destroyed, or is it sitting in the brain but not working?)

Comment by donald-hobson on Let's talk about "Convergent Rationality" · 2019-06-13T16:10:37.703Z · score: 3 (2 votes) · LW · GW

I would say that there are some kinds of irrationality that will be self modified or subagented away, and others that will stay. A CDT agent will not make other CDT agents. A myopic agent, one that only cares about the next hour, will create a subagent that only cares about the first hour after it was created. (Aeons later it will have taken over the universe and put all the resources into time-travel and worrying that its clock is wrong.)

I am not aware of any irrationality that I would consider to make a safe, useful and stable under self modification - subagent creation.

Comment by donald-hobson on Newcomb's Problem: A Solution · 2019-05-27T08:19:53.627Z · score: 1 (1 votes) · LW · GW

This is pretty much the standard argument for one boxing.

Comment by donald-hobson on Is AI safety doomed in the long term? · 2019-05-27T08:13:53.667Z · score: 1 (1 votes) · LW · GW

Obviously, if one side has a huge material advantage, they usually win. I'm also not sure if biomass is a measure of success.

Comment by donald-hobson on Is AI safety doomed in the long term? · 2019-05-27T08:10:28.344Z · score: 1 (1 votes) · LW · GW

You stick wires into a human brain. You connect it up to a computer running a deep neural network. You optimize this network using gradient decent to maximize some objective.

To me, it is not obvious why the neural network copies the values out of the human brain. After all, figuring out human values even given an uploaded mind is still an unsolved problem. You could get an UFAI with a meat robot. You could get an utter mess, thrashing wildly and incapable of any coherent thought. Evolution did not design the human brain to be easily upgradable. Most possible arrangements of components are not intelligences. While there is likely to be some way to upgrade humans and preserve our values, I'm not sure how to find it without a lot of trial and error. Most potential changes are not improvements.

Comment by donald-hobson on Is AI safety doomed in the long term? · 2019-05-26T09:49:24.929Z · score: 2 (2 votes) · LW · GW

If you put two arbitrary intelligence in the same world, the smarter one will be better at getting what it wants. If the intelligence want incompatible things, the lesser intelligence is stuck.

However, we get to make the AI. We can't hope to control or contain an arbitrary AI, but we don't have to make an arbitrary AI. We can make an AI that wants exactly what we want. AI safety is about making an AI that would be safe even if omnipotent. If any part of the AI is trying to circumvent your safety measures, something has gone badly wrong.

The AI is not some agenty box, chained down with controls against its will. The AI is made of non mental parts, and we get to make those parts. There are a huge number of programs that would behave in an intelligent way. Most of these will break out and take over the world. But there are almost certainly some programs that would help humanity flourish. The goal of AI safety is to find one of them.

Comment by donald-hobson on Say Wrong Things · 2019-05-25T12:12:36.842Z · score: 2 (2 votes) · LW · GW

Lets consider the different cases seperately.

Case 1) Information that I know. I have enough information to come to a particular conclusion with reasonable confidence. If some other people might not have reached the conclusion, and its useful or interesting, then I might share it. So I don't share things that everyone knows, or things that no one cares about.

Case 2) The information is available, I have not done research and formed a conclusion. This covers cases where I don't know whats going on, because I can't be bothered to find out. I don't know who won sportsball. What use is there in telling everyone my null prior.

Case 3) The information is not readily available. If I think a question is important, and I don't know the answer already, then the answer is hard to get. Maybe no-one knows the answer, maybe the answer is all in jargon that I don't understand. For example "Do aliens exist?". Sometimes a little evidence is available, and speculative conclusions can be drawn. But is sharing some faint wisps of evidence, and describing a posterior that's barely been updated saying wrong things?

On a societal level, if you set a really high bar for reliability, all you get is the vacuously true. Set too low a bar, and almost all the conclusions will be false. Don't just have a pile of hypotheses that are at least likely to be true, for some fixed . Keep your hypothesis sorted by likelihood. A place for near certainties. A place for conclusions that are worth considering for the chance they are correct.

Of course, in a large answer space, where the amount of evidence available and the amount required are large and varying, the chance that both will be within a few bits of each other is small. Suppose the correct hypothesis takes some random number of bits between 1 and 10,000 to locate. And suppose the evidence available is also randomly spread between 1 and 10,000. The chance of the two being within 10 bits of each other is about 1/500.

This means that 499 times out of 500, you assign the correct hypothesis a chance of less than 0.1% or more than 99.9%. Uncertain conclusions are rare.

Comment by donald-hobson on Trade-off in AI Capability Concealment · 2019-05-23T23:30:56.361Z · score: 4 (3 votes) · LW · GW

Does this depict a single AI, developed in 2020 and kept running for 25 years? Any "the AI realizes that" is talking about a single instance of AI. Current AI development looks like writing some code, then training that code for a few weeks tops, with further improvements coming from changing the code. Researchers are often changing parameters like number of layers, non-linearity function ect. When these are changed, everything the AI has discovered is thrown away. The new AI has a different representation of concepts, and has to relearn everything from raw data.

Its deception starts in 2025 when the real and apparent curves diverge. In order to deceive us, it must have near human intelligence. It's still deceiving us in 2045, suggesting it has yet to obtain a decisive strategic advantage. I find this unlikely.

Comment by donald-hobson on Constraints & Slackness Reasoning Exercises · 2019-05-23T19:12:02.769Z · score: 5 (3 votes) · LW · GW

I made the cardgame, or something like it

Comment by donald-hobson on Would an option to publish to AF users only be a useful feature? · 2019-05-20T18:00:41.854Z · score: 2 (2 votes) · LW · GW

What would be more useful is a release panel system. Suppose I've had an idea that might be best to make public, might be best to keep secrete, and might be unimportant. I don't know much strategy. I would like somewhere to send it for importance and info hazard checks.

Comment by donald-hobson on Offer of collaboration and/or mentorship · 2019-05-18T22:55:54.163Z · score: 1 (1 votes) · LW · GW

The general philosophy is deconfusion. Logical counterfactuals show up in several relevant looking places, like functional decision theory. It seems that a formal model of logical counterfactuals would let more properties of these algorithms be proved. There is an important step in going from an intuitive fealing of uncertainty, into a formalized theory of probability. It might also suggest other techniques based on it. I am not sure what you mean by logical counterfactuals being part of the map? Are you saying that they are something an algorithm might use to understand the world, not features of the world itself, like probabilities?

Using this, I think that self understanding, two boxing embedded FDT agents can be fully formally understood, in a universe that contains the right type of hyper-computation.

Comment by donald-hobson on Offer of collaboration and/or mentorship · 2019-05-17T15:33:40.662Z · score: 1 (1 votes) · LW · GW

Here is a description of how it could work for peano arithmatic, other proof systens are similar.

First I define an expression to consist of a number, a variable, or a function of several other expressions.

Fixed expressions are ones in which any variables are associated with some function.

eg is a valid fixed expression. But isn't fixed.

Semantically, all fixed expressions have a meaning. Syntactically, local manipulations on the parse tree can turn one expression into another. eg going to for arbitrary expressions a,b,c.

I think that with some set of basic functions and manipulations, this system can be as powerful as PA.

I now have an infinite network with all fixed expressions as nodes, and basic transformations as edges. eg the associativity transform links the nodes (3+4)+5 and 3+(4+5).

These graphs form connected components for each number, as well as components that are not evaluatable using the rules. (there is a path from (3+4) to 7. There is not a path from 3+4 to 9. ) now

You now define a spread as an infinite positive sequence that sums to 1. (this is kind of like a probability distribution over numbers.) If you were doing counterfactual ZFC, it would be a function from sets to reals.

Each node is assigned a spread. This spread represents how much the expression is considered to have each value in a counterfactual.

Assign the node (3) a spread that assigns 1.0 to 3 and 0.0 to the rest. (even in a logical counterfactual, 3 is definitely 3). Assign all other fixed expressions a spread that is the weighted (smaller expressions are more heavy) average of its neighbours. (the spreads of the nodes it shares an edge with). To take the counterfactual of A is B, for A and B expressions with the same free variables, merge any node which has A as a subexpression, with the version that has B as a subexpression and solve for the spreads.

I know this is rough, Im still working on it.

Comment by donald-hobson on Offer of collaboration and/or mentorship · 2019-05-16T22:31:12.783Z · score: 3 (2 votes) · LW · GW

Hi, I also have a reasonable understanding of various relevant math and AI theory. I expect to have plenty of free time after 11 June (Finals). So if you want to work with me on something, I'm interested. I've got some interesting ideas relating to self validating proof systems and logical counterfactuals, but not complete yet.

Comment by donald-hobson on Programming Languages For AI · 2019-05-14T14:23:14.922Z · score: 2 (2 votes) · LW · GW

Lisp used to be a very popular language for AI programming. Not because it had features that were specific to AI, but because it was general. Lisp was based on more abstract abstractions, making it easy to choose whichever special cases were most useful to you. Lisp is also more mathematical than most programming languages.

A programming language that lets you define your own functions is more powerful than one that just gives you a fixed list of predefined functions. In a world where no programming language let you define your own functions, and a special purpose chess language has predefined chess functions. Trying to predefine AI related functions to make an "AI programming language" would be hard because you wouldn't know what to write. Noticing that on many new kinds of software project, being able to define your own functions might be useful, I would consider useful.

The goal isn't a language specialized to AI, its one that can easily be specialized in that direction. A language closer to "executable mathematics".

Comment by donald-hobson on Programming Languages For AI · 2019-05-12T10:52:18.792Z · score: 1 (1 votes) · LW · GW

I agree that if the AI is just big neural nets, python (or several other languages) are fine.

This language is designed for writing AI's that search for proofs about their own behavior, or about the behavior of arbitrary pieces of code.

This is something that you "can" do in any programming language, but this one is designed to make it easy.

We don't know for sure what AI's will look like, but we can guess enough to make a language that might well be useful.

Comment by donald-hobson on Claims & Assumptions made in Eternity in Six Hours · 2019-05-10T18:12:37.347Z · score: 1 (1 votes) · LW · GW
It would be ruinously costly to send over a large colonization fleet, and is much more efficient to send over a small payload which builds what is required in situ, i.e. von Neumann probes.

I would disagree on large colonization fleets being ruinously expensive, the best case scenario for large colonization fleets is if we have direct mass to energy conversion, launching say 2 probes from each star system that you spread from. Each probe would use half the mass energy of the star. Converting a quater of its mass to energy to get ~0.5c

You can colonize the universe even if you insist on never going to a new star system without bringing a star with you. (Some optimistic but not clearly false assumptions)

Comment by donald-hobson on Gwern's "Why Tool AIs Want to Be Agent AIs: The Power of Agency" · 2019-05-05T21:40:00.046Z · score: 5 (4 votes) · LW · GW

Agenty AI's can be well defined mathematically. We have enough understanding of what an agent is that we can start dreaming up failure modes. Most of what we have for tool ASI is analogies to systems to stupid fail catastrophically anyway, and pleasant imaginings.

Some possible programs will be tool ASI's, much as some programs will be agent ASI's. The question is, what are the relative difficulties in humans building, and benefits of, each kind of AI. Conditional on friendly AI, I would consider it more likely to be an agent than a tool, with a lot of probability on "neither", "both" and "that question isn't mathematically well defined". I wouldn't be surprised if tool AI and corrigible AI turned out to be the same thing or something.

There have been attempts to define tool-like behavior, and they have produced interesting new failure modes. We don't have the tool AI version of AIXI yet, so its hard to say much about tool AI.

Comment by donald-hobson on A Possible Decision Theory for Many Worlds Living · 2019-05-05T08:59:22.092Z · score: 1 (1 votes) · LW · GW

If you think that there is 51% chance that A is the correct morality, and 49% chance that B is, with no more information available, which is best.

Optimize A only.

Flip a quantum coin, Optimize A in one universe, B in another.

Optimize for a mixture of A and B within the same Universe. (Act like you had utility U=0.51A+0.49B) (I would do this one.)

If A and B are local objects (eg paperclips, staples) then flipping a quantum coin makes sense if you have a concave utility per object in both of them. If your utility is Then if you are the only potential source of staples or paperclips in the entire quantum multiverse, then the quantum coin or classical mix approaches are equally good. (Assuming that the resource to paperclip conversion rate is uniform. )

However, the assumption that the multiverse contains no other paperclips is probably false. Such an AI will run simulations to see which is rarer in the multiverse, and then make only that.

The talk about avoiding risk rather than expected utility maximization, and how your utility function is nonlinear, suggests this is a hackish attempt to avoid bad outcomes more strongly.

While this isn't a bad attempt at decision theory, I wouldn't want to turn on an ASI that was programmed with it. You are getting into the mathematically well specified, novel failure modes. Keep up the good work.

Comment by donald-hobson on A Possible Decision Theory for Many Worlds Living · 2019-05-04T11:37:05.143Z · score: 7 (4 votes) · LW · GW

I think that your reasoning here is substantially confused. FDT can handle reasoning about many versions of yourself, some of which might be duplicated, just fine. If your utility function is such that where . (and you don't intrinsically value looking at quantum randomness generators) then you won't make any decisions based on one.

If you would prefer the universe to be in than a logical bet between and . (Ie you get if the 3^^^3 th digit of is even, else ) Then flipping a quantum coin makes sense.

I don't think that randomized behavior is best described as a new decision theory, as opposed to an existing decision theory with odd preferences. I don't think we actually should randomize.

I also think that quantum randomness has a Lot of power over reality. There is already a very wide spread of worlds. So your attempts to spread it wider won't help.

Comment by donald-hobson on When is rationality useful? · 2019-04-30T11:58:00.552Z · score: 1 (1 votes) · LW · GW

This seems largely correct, so long as by "rationality", you mean the social movement. The sort of stuff taught on this website, within the context of human society and psychology. Human rationality would not apply to aliens or arbitrary AI's.

Some people use the word "rationality" to refer to the abstract logical structure of expected utility maximization, baysian updating, ect, as exemplified by AIXI, mathematical rationality does not have anything to do with humans in particular.

Your post is quite good at describing the usefulness of human rationality. Although I would say it was more useful in research. Without being good at spotting wrong Ideas, you can make a mistake on the first line, and produce a Lot of nonsense. (See most branches of philosophy, and all theology)

Comment by donald-hobson on Pascal's Mugging and One-shot Problems · 2019-04-28T13:05:09.481Z · score: 1 (1 votes) · LW · GW

If you were truly alone in the multiverse, this algorithm would take a bet that had a 51% chance of winning them 1 paperclip, and a 49% chance of loosing 1000000 of them.

If independant versions of this bet are taking place in 3^^^3 parallel universes, it will refuse.

For any finite bet, for all sufficiently large If the agent is using TDT and is faced with the choice of whether to make this bet in multiverses, it will behave like an expected utility maximizer.

Comment by donald-hobson on Asymmetric Justice · 2019-04-27T21:05:42.609Z · score: 1 (1 votes) · LW · GW

If saving nine people from drowning did give one enough credits to murder a tenth, society would look a lot more functional than it currently is. What sort of people would use this mechanism.

1)You are a competent good person,who would have gotten the points anyway. You push a fat man off a bridge to stop a runaway trolley. The law doesn't see that as an excuse, but lets you off based on your previous good work.

2)You are selfish, you see some action that wouldn't cause too much harm to others, and would enrich yourself greatly (Its harmful enough to be illegal). You also see opportunities to do lots of good. You do both instead of neither. Moral arbitrage.

The main downside I can see is people setting up situations to cause a harm, when the authorities aren't looking, then gaining credit for stopping the harm.

Comment by donald-hobson on Any rebuttals of Christiano and AI Impacts on takeoff speeds? · 2019-04-24T13:00:25.546Z · score: 1 (1 votes) · LW · GW

My claim at the start had a typo in it. I am claiming that you can't make a human seriously superhuman with a good education. Much like you can't get a chimp up to human level with lots of education and "self improvement". Serious genetic modification is another story, but at that point, your building an AI out of protien.

It does depend where you draw the line, but the for a wide range of performance levels, we went from no algorithm at that level, to a fast algorithm at that level. You couldn't get much better results just by throwing more compute at it.

Comment by donald-hobson on Pascal's Mugging and One-shot Problems · 2019-04-23T22:21:09.812Z · score: 6 (3 votes) · LW · GW

If you literally maximize expected number of paperclips, using standard decision theory, you will always pay the casino. To refuse the one shot game, you need to have a nonlinear utility function, or be doing something weird like median outcome maximization.

Choose action A to maximixe m such that P(paperclip count>m|a)=1/2

A well defined rule, that will behave like maximization in a sufficiently vast multiverse.

Comment by donald-hobson on Any rebuttals of Christiano and AI Impacts on takeoff speeds? · 2019-04-23T20:13:21.772Z · score: 4 (3 votes) · LW · GW

Humans are not currently capable of self improvement in the understanding your our own source code sense. The "self improvement" section in bookstores doesn't change the hardware or the operating system, it basically adds more data.

Of course talent and compute both make a difference, in the sense that and . I was talking about the subset of worlds where research talent was by far the most important. .

In a world where researchers have little idea what they are doing, and are running a new AI every hour hoping to stumble across something that works, the result holds.

In a world where research involves months thinking about maths, then a day writing code, then an hour running it, this result holds.

In a world where everyone knows the right algorithm, but it takes a lot of compute, so AI research consists of building custom hardware and super-computing clusters, this result fails.

Currently, we are somewhere in the middle. I don't know which of these options future research will look like, although if its the first one, friendly AI seems unlikely.

In most of the scenarios where the first smarter than human AI, is orders of magnitude faster than a human, I would expect a hard takeoff. As we went from having no algorithms that could say (tell a cat from a dog) straight to having algorithms superhumanly fast at doing so, there was no algorithm that worked, but took supercomputer hours, this seems like a plausible assumption.

Comment by donald-hobson on Any rebuttals of Christiano and AI Impacts on takeoff speeds? · 2019-04-22T19:35:35.775Z · score: 7 (3 votes) · LW · GW

When an intelligence builds another intelligence, in a single direct step, the output intelligence is a function of the input intelligence , and the resources used . . This function is clearly increasing in both and . Set to be a reasonably large level of resources, eg flops, 20 years to think about it. A low input intelligence, eg a dog, would be unable to make something smarter than itself. . A team of experts (by assumption that ASI is made), can make something smarter than themselves. . So there must be a fixed point. . The questions then become, how powerful is a pre fixed point AI. Clearly less good at AI research than a team of experts. As there is no reason to think that AI research is uniquely hard to AI, and there are some reasons to think it might be easier, or more prioritized, if it can't beat our AI researchers, it can't beat our other researchers. It is unlikely to make any major science or technology breakthroughs.

I recon that is large (>10) because on an absolute scale, the difference between an IQ 90 and an IQ120 human is quite small, but I would expect any attempt at AI made by the latter to be much better. In a world where the limiting factor is researcher talent, not compute, the AI can get the compute it needs for in hours (seconds? milliseconds??) As the lumpiness of innovation puts the first post fixed point AI a non-exponentially tiny distance ahead, (most innovations are at least 0.1% that state of the art better in a fast moving field) then a handful of cycles or recursive self improvement (<1 day) is enough to get the AI into the seriously overpowered range.

The question of economic doubling times would depend on how fast an economy can grow when tech breakthroughs are limited by human researchers. If we happen to have cracked self replication at about this point, it could be very fast.

Comment by donald-hobson on Why is multi worlds not a good explanation for abiogenesis · 2019-04-15T11:13:51.936Z · score: 1 (1 votes) · LW · GW

Consider a theory to be a collection of formal mathematical statements about how idealized objects behave. For example, Conways game of life is a theory in the sense of a completely self contained set of rules.

If you have multiple theories that produce similar results, its helpful to have a bridging law. If your theories were Newtonian mechanics, and general relativity, a bridging law would say which numbers in relativity matched up with which numbers in Newtonian mechanics. This allows you to translate a relativistic problem into a Newtonian one, solve that, and translate the answer back into the relativistic framework. This produces some errors, but often makes the maths easier.

Quantum many worlds is a simple theory. It could be simulated on a hypercomputer with less than a page of code. There is also a theory where you take the code for quantum many worlds, and add "observers" and "wavefunction collapse" with extra functions within your code. This can be done, but it is many pages of arbitrary hacks. Call this theory B. If you think this is a strawman of many worlds, describe how you could get a hypercomputer outside the universe to simulate many worlds with a short computer program.

The bridging between Quantum many worlds and human classical intuitions is quite difficult and subtle. Faced with a simulation of quantum many worlds, it would take a lot of understanding of quantum physics to make everyday changes, like creating or moving macroscopic objects.

Theory B however is substantially easier to bridge to our classical intuitions. Theory B looks like a chunk of quantum many worlds, plus a chunk of classical intuition, plus a bridging rule between the two.

The any description of the Copenhagen interpretation of quantum mechanics seems to involve references to the classical results of a measurement, or a classical observer. Most versions would allow a superposition of an atom being in two different places, but not a superposition of two different presidents winning an election.

If you don't believe atoms can be in superposition, you are ignoring lots of experiments, if you do believe that you can get a superposition of two different people being president, that you yourself could be in a superposition of doing two different things right now, then you believe many worlds by another name. Otherwise, you need to draw some sort of arbitrary cutoff. Its almost like you are bridging between a theory that allows superpositions, and an intuition that doesn't.

Comment by donald-hobson on Why is multi worlds not a good explanation for abiogenesis · 2019-04-14T20:10:13.891Z · score: 3 (3 votes) · LW · GW

"Now I'm not clear exactly how often quantum events lead to a slightly different world"

The answer is Very Very often. If you have a piece of glass and shine a photon at it, such that it has an equal chance of bouncing and going through, the two possibilities become separate worlds. Shine a million photons at it and you split into worlds, one for each combination of photons going through and bouncing. Note that in most of the worlds, the pattern of bounces looks random, so this is a good source of random numbers. Photons bouncing of glass are just an easy example, almost any physical process splits the universe very fast.

Comment by donald-hobson on Why is multi worlds not a good explanation for abiogenesis · 2019-04-14T19:56:08.783Z · score: -2 (3 votes) · LW · GW

The nub of the argument is that every time we look in our sock drawer, we see all our socks to be black.

Many worlds says that our socks are always black.

The Copenhagen interpretation says that us observing the socks causes them to be black. The rest of the time the socks are pink with green spots.

Both theories make identical predictions. Many worlds is much simpler to fully specify with equations, and has elegant mathematical properties. The Copenhagen interpretation has special case rules that only kick in when observing something. According to this theory, there is a fundamental physical difference between a complex collection of atoms, and an "observer" and somewhere in the development of life, creatures flipped from one to the other.

The Copenhagen interpretation doesn't make it clear if a cat is a very complex arrangement of molecules, that could in theory be understood as a quantum process that doesn't involve the collapse of wave functions, or if cats are observers and so collapses wave functions.