Comment by donald-hobson on Any rebuttals of Christiano and AI Impacts on takeoff speeds? · 2019-04-22T19:35:35.775Z · score: 7 (3 votes) · LW · GW

When an intelligence builds another intelligence, in a single direct step, the output intelligence is a function of the input intelligence , and the resources used . . This function is clearly increasing in both and . Set to be a reasonably large level of resources, eg flops, 20 years to think about it. A low input intelligence, eg a dog, would be unable to make something smarter than itself. . A team of experts (by assumption that ASI is made), can make something smarter than themselves. . So there must be a fixed point. . The questions then become, how powerful is a pre fixed point AI. Clearly less good at AI research than a team of experts. As there is no reason to think that AI research is uniquely hard to AI, and there are some reasons to think it might be easier, or more prioritized, if it can't beat our AI researchers, it can't beat our other researchers. It is unlikely to make any major science or technology breakthroughs.

I recon that is large (>10) because on an absolute scale, the difference between an IQ 90 and an IQ120 human is quite small, but I would expect any attempt at AI made by the latter to be much better. In a world where the limiting factor is researcher talent, not compute, the AI can get the compute it needs for in hours (seconds? milliseconds??) As the lumpiness of innovation puts the first post fixed point AI a non-exponentially tiny distance ahead, (most innovations are at least 0.1% that state of the art better in a fast moving field) then a handful of cycles or recursive self improvement (<1 day) is enough to get the AI into the seriously overpowered range.

The question of economic doubling times would depend on how fast an economy can grow when tech breakthroughs are limited by human researchers. If we happen to have cracked self replication at about this point, it could be very fast.

Comment by donald-hobson on Why is multi worlds not a good explanation for abiogenesis · 2019-04-15T11:13:51.936Z · score: 1 (1 votes) · LW · GW

Consider a theory to be a collection of formal mathematical statements about how idealized objects behave. For example, Conways game of life is a theory in the sense of a completely self contained set of rules.

If you have multiple theories that produce similar results, its helpful to have a bridging law. If your theories were Newtonian mechanics, and general relativity, a bridging law would say which numbers in relativity matched up with which numbers in Newtonian mechanics. This allows you to translate a relativistic problem into a Newtonian one, solve that, and translate the answer back into the relativistic framework. This produces some errors, but often makes the maths easier.

Quantum many worlds is a simple theory. It could be simulated on a hypercomputer with less than a page of code. There is also a theory where you take the code for quantum many worlds, and add "observers" and "wavefunction collapse" with extra functions within your code. This can be done, but it is many pages of arbitrary hacks. Call this theory B. If you think this is a strawman of many worlds, describe how you could get a hypercomputer outside the universe to simulate many worlds with a short computer program.

The bridging between Quantum many worlds and human classical intuitions is quite difficult and subtle. Faced with a simulation of quantum many worlds, it would take a lot of understanding of quantum physics to make everyday changes, like creating or moving macroscopic objects.

Theory B however is substantially easier to bridge to our classical intuitions. Theory B looks like a chunk of quantum many worlds, plus a chunk of classical intuition, plus a bridging rule between the two.

The any description of the Copenhagen interpretation of quantum mechanics seems to involve references to the classical results of a measurement, or a classical observer. Most versions would allow a superposition of an atom being in two different places, but not a superposition of two different presidents winning an election.

If you don't believe atoms can be in superposition, you are ignoring lots of experiments, if you do believe that you can get a superposition of two different people being president, that you yourself could be in a superposition of doing two different things right now, then you believe many worlds by another name. Otherwise, you need to draw some sort of arbitrary cutoff. Its almost like you are bridging between a theory that allows superpositions, and an intuition that doesn't.

Comment by donald-hobson on Why is multi worlds not a good explanation for abiogenesis · 2019-04-14T20:10:13.891Z · score: 3 (3 votes) · LW · GW

"Now I'm not clear exactly how often quantum events lead to a slightly different world"

The answer is Very Very often. If you have a piece of glass and shine a photon at it, such that it has an equal chance of bouncing and going through, the two possibilities become separate worlds. Shine a million photons at it and you split into worlds, one for each combination of photons going through and bouncing. Note that in most of the worlds, the pattern of bounces looks random, so this is a good source of random numbers. Photons bouncing of glass are just an easy example, almost any physical process splits the universe very fast.

Comment by donald-hobson on Why is multi worlds not a good explanation for abiogenesis · 2019-04-14T19:56:08.783Z · score: -2 (3 votes) · LW · GW

The nub of the argument is that every time we look in our sock drawer, we see all our socks to be black.

Many worlds says that our socks are always black.

The Copenhagen interpretation says that us observing the socks causes them to be black. The rest of the time the socks are pink with green spots.

Both theories make identical predictions. Many worlds is much simpler to fully specify with equations, and has elegant mathematical properties. The Copenhagen interpretation has special case rules that only kick in when observing something. According to this theory, there is a fundamental physical difference between a complex collection of atoms, and an "observer" and somewhere in the development of life, creatures flipped from one to the other.

The Copenhagen interpretation doesn't make it clear if a cat is a very complex arrangement of molecules, that could in theory be understood as a quantum process that doesn't involve the collapse of wave functions, or if cats are observers and so collapses wave functions.

Comment by donald-hobson on MIRI Summer Fellows Program · 2019-04-09T20:40:32.538Z · score: 2 (2 votes) · LW · GW

Hello. I see that while the deadline has passed, the form is still open. Is it still worthwhile to apply?

Comment by donald-hobson on Would solving logical counterfactuals solve anthropics? · 2019-04-06T13:28:43.855Z · score: 4 (3 votes) · LW · GW

This supposedly "natural" reference class is full of weird edge cases, in the sense that I can't write an algorithm that finds "everybody who asks the question X". Firstly "everybody" is not well defined in a world that contains everything from trained monkeys to artificial intelligence's. And "who asks the question X" is under-defined as there is no hard boundary between a different way of phrasing the same question and slightly different questions. Does someone considering the argument in chinese fall into your reference class? Even more edge cases appear with mind uploading, different mental architectures, ect.

If you get a different prediction from taking the reference class of "people" (for some formal definition of "people") and then updating on the fact that you are wearing blue socks, than you get from the reference class "people wearing blue socks", then something has gone wrong in your reasoning.

The doomsday argument works by failing to update on anything but a few carefully chosen facts.

Comment by donald-hobson on Would solving logical counterfactuals solve anthropics? · 2019-04-05T23:06:23.400Z · score: 1 (1 votes) · LW · GW

I would say that the concept of probability works fine in anthropic scenarios, or at least there is a well defined number that is equal to probability in non anthropic situations. This number is assigned to "worlds as a whole". Sleeping beauty assigns 1/2 to heads, and 1/2 to tails, and can't meaningfully split the tails case depending on the day. Sleeping beauty is a functional decision theory agent. For each action A, they consider the logical counterfactual that the algorithm they are implementing returned A, then calculate the worlds utility in that counterfactual. They then return whichever action maximizes utility.

In this framework, "which version am I?" is a meaningless question, you are the algorithm. The fact that the algorithm is implemented in a physical substrate give you means to affect the world. Under this model, whether or not your running on multiple redundant substrates is irrelivant. You reason about the universe without making any anthropic updates. As you have no way of affecting a universe that doesn't contain you, or someone reasoning about what you would do, you might as well behave as if you aren't in one. You can make the efficiency saving of not bothering to simulate such a world.

You might, or might not have an easier time effecting a world that contains multiple copies of you.

Comment by donald-hobson on Can a Bayesian agent be infinitely confused? · 2019-03-22T22:06:33.783Z · score: 1 (1 votes) · LW · GW

In other words, the agent assigned zero probability to an event, and then it happened.

Comment by donald-hobson on More realistic tales of doom · 2019-03-18T16:51:01.487Z · score: 0 (3 votes) · LW · GW

As far as I understand it, you are proposing that the most realistic failure mode consists of many AI systems, all put into a position of power by humans, and optimizing for their own proxies. Call these Trusted Trial and Error AI's (TTE)

The distinguishing features of TTE's are that they were Trusted. A human put them in a position of power. Humans have refined, understood and checked the code enough that they are prepared to put this algorithm in a self driving car, or a stock management system. They are not lab prototypes. They are also Trial and error learners, not one shot learners.

Some More descriptions of what capability range I am considering.

Suppose hypothetically that we had TTE reinforcement learners, a little better than todays state of the art, and nothing beyond that. The AI's are advanced enough that they can take a mountain of medical data and train themselves to be skilled doctors by trial and error. However they are not advanced enough to figure out how humans work from, say a sequenced genome and nothing more.

Give them control of all the traffic lights in a city, and they will learn how to minimize traffic jams. They will arrange for people to drive in circles rather than stay still, so that they do not count as part of a traffic jam. However they will not do anything outside their preset policy space, like hacking into the traffic light control system of other cities, or destroying the city with nukes.

If such technology is easily available, people will start to use it for things. Some people put it in positions of power, others are more hesitant. As the only way the system can learn to avoid something is through trial and error, the system has to cause a (probably several) public outcrys before it learns not to do so. If no one told the traffic light system that car crashes are bad on simulations or past data, (Alignment failure) Then even if public opinion feeds directly into reward, it will have to cause several car crashes that are clearly its fault before it learns to only cause crashes that can be blamed on someone else. However, deliberately causing crashes will probably get the system shut off or seriously modified.

Note that we are supposing many of these systems existing, so the failures of some, combined with plenty of simulated failures, will give us a good idea of the failure modes.

The space of bad things an AI can get away with is small and highly complex in the space of bad things. An TTE set to reduce crime rates tries making the crime report forms longer, this reduces reported crime, but humans quickly realize what its doing. It would have to do this and be patched many times before it came up with a method that humans wouldn't notice.

Given Advanced TTE's as the most advanced form of AI, we might slowly develop a problem, but the deployment of TTE's would be slowed by the time it takes to gather data and check reliability. Especially given mistrust after several major failures. And I suspect that due to statistical similarity of training and testing, many different systems optimizing different proxies, and humans having the best abstract reasoning about novel situations, and the power to turn the systems off, any discrepancy of goals will be moderately minor. I do not expect such optimization power to be significantly more powerful or less aligned than modern capitalism.

This all assumes that no one will manage to make a linear time AIXI. If such a thing is made, it will break out of any boxes and take over the world. So, we have a social process of adaption to TTE AI, which is already in its early stages with things like self driving cars, and at any time, this process could be rendered irrelevant by the arrival of a super-intelligence.

Comment by donald-hobson on Risk of Mass Human Suffering / Extinction due to Climate Emergency · 2019-03-14T23:41:43.915Z · score: 16 (7 votes) · LW · GW

1)Climate change caused extinction is not on the table. Low tech humans can survive everywhere from the jungle to the arctic. Some humans will survive.

2) I suspect that climate change won't cause massive social collapse. It might well knock 10% of world GDP, but it won't stop us having an advanced high tech society. At the moment, its not causing damage on that scale, and I suspect that in a few decades, we will have biotech, renewables or other techs that will make everything fine. I suspect that the damage caused by climate change won't increase by more than 2 or 3 times in the next 50 years.

3) If you are skilled enough to be a scientist, inventing a solar panel that's 0.5% more efficient does a lot more good than showing up to protests. Protest's need many people to work, inventors can change the world by themselves. Policy advisors and academics can suggest action in small groups. Even working a normal job and sending your earnings to a well chosen charity is likely to be more effective.

4) Quite a few people are already working on global warming. It seems unlikely that a problem needs 10,000,001 people working on it to solve, and if only 10,000,000 people work on it, they won't manage. Most of the really easy work on global warming is already being done. This is not the case with AI risk as of 10 years ago, for example. (It's got a few more people working on it since then, still nothing like climate change.)

Comment by donald-hobson on [Fiction] IO.SYS · 2019-03-11T14:36:16.234Z · score: 4 (3 votes) · LW · GW

I think the protagonist here should have looked at earth. If there was a technological intelligence on earth that cared about the state of Jupiter's moons, then it could send rockets there. The most likely scenarios are a disaster bad enough to stop us launching spacecraft, and an AI that only cares about earth.

A super intelligence should assign non negligible probability to the result that actually happened. Given the tech was available, a space-probe containing an uploaded mind is not that unlikely. If such a probe was a real threat to the AI, it would have already blown up all space-probes on the off chance.

The upper bound given on the amount that malicious info can harm you is extremely loose. Malicious info can't do much harm unless the enemy has a good understanding of the particular system that they are subverting.

Comment by donald-hobson on Rule Thinkers In, Not Out · 2019-02-27T08:28:14.912Z · score: 7 (6 votes) · LW · GW

Yet policy exploration is an important job. Unless you think that someone posting something on a blog is going to change policy without anyone double-checking it first, we should encourage suggestion of radically new policies.

Comment by donald-hobson on Humans Who Are Not Concentrating Are Not General Intelligences · 2019-02-26T09:22:12.633Z · score: 27 (15 votes) · LW · GW

I would like to propose a model that is more flattering to humans, and more similar to how other parts of human cognition work. When we see a simple textual mistake, like a repeated "the", we don't notice it by default. Human minds correct simple errors automatically without consciously noticing that they are doing it. We round to the nearest pattern.

I propose that automatic pattern matching to the closest thing that makes sense is happening at a higher level too. When humans skim semi contradictory text, they produce a more consistent world model that doesn't quite match up with what is said.

Language feeds into a deeper, sensible world model module within the human brain and GPT2 doesn't really have a coherent world model.

Comment by donald-hobson on Can We Place Trust in Post-AGI Forecasting Evaluations? · 2019-02-17T21:01:29.480Z · score: 3 (3 votes) · LW · GW

As your belief about how well AGI is likely to go affects both the likelihood of a bet being evaluated, and the chance of winning, so bets about AGI are likely to give dubious results. I also have substantial uncertainty about the value of money in a post singularity world. Most obviously is everyone getting turned into paperclips, noone has any use for money. If we get a friendly singleton super-intelligence, everyone is living in paradise, whether or not they had money before. If we get an economic singularity, where libertarian ASI(s) try to make money without cheating, then money could be valuable. I'm not sure how we would get that, as an understanding of the control problem good enough to not wipe out humans and fill the universe with bank notes should be enough to make something closer to friendly.

Even if we do get some kind of ascendant economy, given the amount of resources in the solar system (let alone wider universe), its quite possible that pocket change would be enough to live for aeons of luxury.

Given how unclear it is about whether or not the bet will get paid and how much the cash would be worth if it was, I doubt that the betting will produce good info. If everyone thinks that money is more likely than not to be useless to them after ASI, then almost no one will be prepared to lock their capital up until then in a bet.

Comment by donald-hobson on Limiting an AGI's Context Temporally · 2019-02-17T18:32:43.272Z · score: 3 (3 votes) · LW · GW

I suspect that an AGI with such a design could be much safer, if it was hardcoded to believe that time travel and hyperexponentially vast universes were impossible. Suppose that the AGI thought that there was a 0.0001% chance that it could use a galaxies worth of resources to send 10^30 paperclips back in time. Or create a parallel universe containing 3^^^3 paperclips. It will still chase those options.

If starting a long plan to take over the world costs it literally nothing, it will do it anyway. A sequence of short term plans, each designed to make as many paperclips as possible within the next few minutes could still end up dangerous. If the number of paperclips at time is , and its power at time is , then , would mean that both power and paperclips grew exponentially. This is what would happen if power can be used to gain power and clips at the same time, with minimal loss of either from also pursuing the other.

If power can only be used to gain one thing at a time, and the rate power can grow at is less than the rate of time discount, then we are safer.

This proposal has several ways to be caught out, world wrecking assumptions that aren't certain, but if used with care, a short time frame, an ontology that considers timetravel impossible, and say a utility function that maxes out at 10 clips, it probably won't destroy the world. Throw in mild optimization and an impact penalty, and you have a system that relies on a disjunction of shaky assumptions, not a conjunction of them.

It is a CDT agent, or something that doesn't try to punish you now so you make paperclips last week. A TDT agent might decide to take the policy of killing anyone who didn't make clips before it was turned on, causing humans that predict this to make clips.

I suspect that it would be possible to build such an agent, prove that there are no weird failure modes left, and turn it on, with a small chance of destroying the world. I'm not sure why you would do that. Once you understand the system well enough to say its safe-ish, what vital info do yo gain from turning it on?

Comment by donald-hobson on Extraordinary ethics require extraordinary arguments · 2019-02-17T17:19:53.640Z · score: 11 (6 votes) · LW · GW

Butterfly effects essentially unpredictable, given your partial knowledge of the world. Sure, you doing homework could cause a tornado in Texas, but it's equally likely to prevent that. To actually predict which, you would have to calculate the movement of every gust of air around the world. Otherwise your shuffling an already well shuffled pack of cards. Bear in mind that you have no reason to distinguish the particular action of "doing homework" from a vast set of other actions. If you really did know what actions would stop the Texas tornado, they might well look like random thrashing.

What you can calculate is the reliable effects of doing your homework. So, given bounded rationality, you are probably best to base your decisions on those. The fact that this only involves homework might suggest that you have an internal conflict between a part of yourself that thinks about careers, and a short term procrastinator.

Most people who aren't particularly ethical still do more good than harm. (If everyone looks out for themselves, everyone has someone to look out for them. The law stops most of the bad mutual defections in prisoners dilemmas) Evil genius trying to trick you into doing harm are much rarer than moderately competent nice people trying to get your help to do good.

Comment by donald-hobson on Short story: An AGI's Repugnant Physics Experiment · 2019-02-14T15:31:50.252Z · score: 7 (5 votes) · LW · GW

This is an example of a pascals mugging. Tiny probabilities of vast rewards can produce weird behavior. The best known solution is either a bounded utility function, or a antipascalene agent. (An agent that ignores the best x% and worst y% of possible worlds when calculating expected utilities. It can be money pumped)

Comment by donald-hobson on Probability space has 2 metrics · 2019-02-11T22:50:32.220Z · score: 13 (5 votes) · LW · GW

Get a pack of cards in which some cards are blue on both sides, and some are red on one side and blue on the other. Pick a random card from the pile. If the subject is shown one side of the card, and its blue, they gain a bit of evidence that the card is blue on both sides. Give them the option to bet on the colour of the other side of the card, before and after they see the first side. Invert the prospect theory curve to get from implicit probability to betting behaviour. The people should perform a larger update in log odds when the pack is mostly one type of card, over when the pack is 50 : 50.

Comment by donald-hobson on How important is it that LW has an unlimited supply of karma? · 2019-02-11T15:21:06.773Z · score: 4 (4 votes) · LW · GW

I suspect that if voting reduced your own karma, some people wouldn't vote. As it becomes obvious that this is happening, more people stop voting, until karma just stops flowing at all. (The people who persistently vote anyway all run out of karma.)

Comment by donald-hobson on Probability space has 2 metrics · 2019-02-11T10:55:11.889Z · score: 1 (1 votes) · LW · GW

Fixed, thanks.

## Propositional Logic, Syntactic Implication

2019-02-10T18:12:16.748Z · score: 5 (4 votes)

## Probability space has 2 metrics

2019-02-10T00:28:34.859Z · score: 88 (36 votes)
Comment by donald-hobson on X-risks are a tragedies of the commons · 2019-02-07T17:50:13.823Z · score: 2 (2 votes) · LW · GW

This is making the somewhat dubious assumption that X risks are not so neglected that even a "selfish" individual would work to reduce them. Of course, in the not too unreasonable scenario where the cosmic commons is divided up evenly, and you use your portion to make a vast number of duplicates of yourself, the utility, if your utility is linear in copies of yourself, would be vast. Or you might hope to live for a ridiculously long time in a post singularity world.

The effect that a single person can have on X risks is small, but if they were selfish with no time discounting, it would be a better option than hedonism now. Although a third alternative of sitting in a padded room being very very safe could be even better.

Comment by donald-hobson on (notes on) Policy Desiderata for Superintelligent AI: A Vector Field Approach · 2019-02-06T00:27:27.407Z · score: 18 (5 votes) · LW · GW

I suspect that the social institutions of Law and Money are likely to become increasingly irrelevant background to the development of ASI.

Deterrence Fails.

If you believe that there is a good chance of immortal utopia, and a large chance of paperclips in the next 5 years, the threat that the cops might throw you in jail, (on the off chance that they are still in power) is negligible.

The law is blind to safety.

The law is bureaucratic and ossified. It is probably not employing much top talent, as it's hard to tell top talent from the rest if you aren't as good yourself (and it doesn't have the budget or glamor to attract them). Telling whether an organization is on line for not destroying the world is HARD. The safety protocols are being invented on the fly by each team, the system is very complex and technical and only half built. The teams that would destroy the world aren't idiots, they are still producing long papers full of maths and talking about the importance of safety a lot. There are no examples to work with, or understood laws.

Likely as not (not really, too much conjugation here), you get some random inspector with a checklist full of thing that sound like a good idea to people who don't understand the problem. All AI work has to have an emergency stop button that turns the power off. (The idea of an AI circumventing this was not considered by the person who wrote the list).

All the law can really do is tell what public image an AI group want's to present, provide funding to everyone, and get in everyone's way. Telling cops to "smash all GPU's" would have an effect on AI progress. The fund vs smash axis is about the only lever they have. They can't even tell an AI project from a maths convention from a normal programming project if the project leaders are incentivized to obfuscate.

After ASI, governments are likely only relevant if the ASI was programmed to care about them. Neither paperclippers or FAI will care about the law. The law might be relevant if we had tasky ASI that was not trivial to leverage into a decisive strategic advantage. (An AI that can put a strawberry on a plate without destroying the world, but that's about the limit of its safe operation.)

Such an AI embodies an understanding of intelligence and could easily be accidentally modified to destroy the world. Such scenarios might involve ASI and timescales long enough for the law to act.

I don't know how the law can handle something that, can easily destroy the world, has some economic value (if you want to flirt danger) and, with further research could grant supreme power. The discovery must be limited to a small group of people, (law of large number of nonexperts, one will do something stupid). I don't think the law could notice what it was, after all the robot in-front of the inspector only puts strawberries on plates. They can't tell how powerful it would be with an unbounded utility function.

Comment by donald-hobson on Why is this utilitarian calculus wrong? Or is it? · 2019-01-28T17:06:33.310Z · score: 6 (5 votes) · LW · GW

Firstly, you are confusing dollars and utils.

If you buy this product for $100, you gain the use of it, at value U[30] to yourself. The workers who made it gain$80, at value U[80] to yourself, because of your utilitarian preferences. Total value U[110]

If the alternative was a product of cost $100, which you value the use of at U[105], but all the money goes to greedy rich people to be squandered, then you would choose the first. If the alternative was spending$100 to do something insanely morally important, U[3^^^3], you would do that.

If the alternative was a product of cost \$100, that was of value U[100] to yourself, and some of the money would go to people that weren't that rich U[15], you would do that.

If you could give the money to people twice as desperate as the workers, at U[160], you would do that.

There are also good reasons why you might want to discourage monopolies. Any desire to do so is not included in the expected value calculations. But the basic principle is that utilitarianism can never tell you if some action is a good use of a resource, unless you tell it what else that resource could have been used for.

Comment by donald-hobson on Solomonoff induction and belief in God · 2019-01-28T16:48:01.902Z · score: 2 (2 votes) · LW · GW

The information needed to describe our particular laws of physics < info needed to describe the concept of "habitable universe" in general < info needed to describe human-like mind.

The biggest slip is the equivocation of the word intelligence. The Kolmogorov complexity of AIXI-tl is quite small, so intelligence's in that sense of the word are likely to exist in the universal prior.

Humanlike minds have not only the clear mark of evolution, but the mark of stone age tribal interactions across their psyche. An arbitrary mind will be bizarre and alien. Wondering if such a mind might be benevolent is hugely privileging the hypothesis. The most likely way to make a humanlike mind is the process that created humans. So in most of the universes with humanoid deities, those deities evolved. This becomes the simulation hypothesis.

The best hypothesis is still the laws of quantum physics or whatever.

Comment by donald-hobson on For what do we need Superintelligent AI? · 2019-01-25T23:43:23.412Z · score: 4 (4 votes) · LW · GW

We don't know what we are missing out on without super intelligence. There might be all sorts of amazing things that we would just never consider to make, or dismiss as obviously impossible, without super intelligence.

I am pointing out that being able to make a FAI that is a bit smarter than you (smartness not really on a single scale, vastly different cognitive architecture, is deep blue smarter than a horse?), involves solving almost all the hard problems in alignment. When we have done all that hard work, we might as well tell it to make itself a trillion times smarter, the cost to us is negligible, the benefit could be huge.

AI can also serve as as a values repository. In most circumstances, values are going to drift over time, possibly due evolutionary forces. If we don't want to end up as hardscrapple frontier replicators, we need some kind of singleton. Most types of government or committee have their own forms of value drift, and couldn't keep enough of an absolute grip on power to stop any rebellions for billions of years. I have no ideas other than Friendly ASI oversight for how to stop someone in a cosmically vast society from creating a UFASI. Sufficiently draconian banning of anything at all technological could stop anyone from creating UFASI long term, and also stop most things since the industrial revolution.

The only reasonable scenario that I can see in which FAI is not created and the cosmic commons gets put to good use is if a small group of likeminded individuals, or single person, gains exclusive access to selfrep nanotech and mind uploading. They then use many copies of themselves to police the world. They do all programming and only run code they can formally prove isn't dangerous. No-one is allowed to touch anything Turing complete.

Comment by donald-hobson on Allowing a formal proof system to self improve while avoiding Lobian obstacles. · 2019-01-24T14:25:29.195Z · score: 2 (2 votes) · LW · GW

Both blanks are the identity function.

Here is some psudo code

class Prover:

____def new(self):

________self.ps=[PA]

____def prove(self, p, s, b):

________assert p in self.ps

________return p(s,b)

________if self.prove(p1,"forall s:(exists b2: p2(s,b2))=> (exists b1: p2(s,b1))", b)

____________self.ps.append(p2)

prover=Prover()

Where PA is a specific peano arithmatic proof checker. nPA is another proof checker. and 'proof' is a proof that anything nPA can prove, PA can prove too.

## Allowing a formal proof system to self improve while avoiding Lobian obstacles.

2019-01-23T23:04:43.524Z · score: 6 (3 votes)
Comment by donald-hobson on Life can be better than you think · 2019-01-22T21:09:26.718Z · score: 1 (1 votes) · LW · GW

I consider emotions to be data, not goals. From this point of view, deliberately maximizing happiness for its own sake is a lost purpose. Its like writing extra numbers on your bank balance. If however your happiness was reliably too low, adjusting it upwards with drugs would be sensible. Whats the best level of happiness, the one that produces optimal behavior.

I also find my emotions to be quite weak. And I can set them consciously change them. Just thinking "be happy", or "be sad" and feeling happy or sad. It actually feels similar to imagining a mental image, sound or smell.

Comment by donald-hobson on Too Smart for My Own Good · 2019-01-22T20:43:49.021Z · score: 3 (2 votes) · LW · GW

Writing random bits of code is a good hobby. It sounds like you prefer doing that than learning to play jazz, so forget the jazz and just code. I was having a hard job understanding quantum spin, and wrote some code to help. It was reasonably helpful. Then again, quantum spin is all about complex matrix multiplication, and numpy has functions for that, so I was basically using it as a matrix arithmetic calculator. Another example, I found that I kept getting distracted, so I wrote code that randomly beeped, asked what I was doing, and saved the results to a file. It worked quite well.

Comment by donald-hobson on Should questions be called "questions" or "confusions" (or "other")? · 2019-01-22T09:00:10.438Z · score: 2 (2 votes) · LW · GW

Sure, that sounds interesting. I have a bunch of things that I'm confused about.

Comment by donald-hobson on Following human norms · 2019-01-21T21:27:40.185Z · score: 3 (3 votes) · LW · GW

What if it follows human norms with dangerously superhuman skill.

Suppose humans had a really strong norm that you were allowed to say whatever you like, and encouraged to say things others will find interesting.

Among humans, the most we can exert is a small optimization for the not totally dull.

The AI produces a sequence that effectively hacks the human brain and sets interest to maximum.

Comment by donald-hobson on Debate AI and the Decision to Release an AI · 2019-01-19T17:19:04.971Z · score: 4 (3 votes) · LW · GW

You assert that "Naturally, it is much worse to release a misaligned AI than to not release an aligned AI".

I would disagree with that, if the odds of aligned AI, conditional on you not releasing this one were 50:50, then both mistakes are equally bad. If someone else is definitely going to release a paper-clipper next week, then it would make sense to release an AI with a 1% chance of being friendly now. (Bear in mind that no one would release an AI if they didn't think it was friendly, so you might be defecting in an epistemic prisoners dilemma.)

I would put more weight on human researchers debating which AI is save before any code is run. I would also think that the space of friendly AIs is tiny compared to the space of all AIs, so making an AI that you put as much as 1% chance on being friendly is almost as hard as building a 99% chance friendly AI.

Comment by donald-hobson on Synthesising divergent preferences: an example in population ethics · 2019-01-18T23:50:44.807Z · score: 5 (3 votes) · LW · GW

I don't think all AI catastrophes come from oversimplification of value functions. Suppose we had 1000 weak preferances, , with . Each of which is supposed to be but due to some weird glitch in the definition of , it has an unforeseen maximum of 1000,000, and that maximum is paperclips. In this scenario, the AI is only as friendly as the least friendly piece.

Alternatively, if the value of each is linear or convex in resources spent maximizing it, or other technical conditions hold, then the AI just picks a single to focus all resources on. If some term is very easily satisfied, say is a slight preference that it not wipe out all beetles, then we get a few beetles living in a little beetle box, and 99.99...% of resources turned into whatever kind of paperclip it would otherwise have made.

If we got everyone in the world who is "tech literate" to program a utility function ( in some easy to use utility function programming tool?), bounded them all and summed the lot together, then I suspect that the AI would still do nothing like optimizing human values. (To me, this looks like a disaster waiting to happen)

Comment by donald-hobson on Reframing Superintelligence: Comprehensive AI Services as General Intelligence · 2019-01-10T18:12:10.211Z · score: 3 (2 votes) · LW · GW

Any algorithm that gets stuck in local optimum so easily will not be very intelligent or very useful. Humans have, at least somewhat, the ability to notice that there should be a good plan in this region, find and execute that plan successfully. We don't get stuck in local optima as much as current RL algorithms.

AIXI would be very good at making complex plans and doing well first time. You could tell it the rules of chess and it would play PERFECT chess first time. It does not need lots of examples to work from. Give it any data that you happen to have available, and it will become very competent, and able to carry out complex novel tasks first time.

Current reinforcement learning algorithms aren't very good at breaking out of boxes because they follow the local incentive gradient. (I say not very good at, because a few algorithms have exploited glitches in a way thats a bit "break out the boxish") In some simple domains, its possible to follow the incentive gradient all the way to the bottom. In other environments, human actions already form a good starting point, and following the incentive gradient from there can make the solution a bit better.

I agree that most of the really dangerous break out the boxes probably can't be reached by local gradient decent from a non adversarial starting point. (I do not want to have to rely on this)

I agree that you can attach loads of sensors to say postmen, and train a big neural net to control a humanoid robot to deliver letters, given millions of training examples. You can probably automate many of the training weight fiddling tasks currently done by grad student descent to make big neural nets work.

I agree that this could be somewhat useful economically, as a significant proportion of economic productivity could be automated.

What I am saying is that this form of AI is sufficiently limited that there are still large incentives to make AGI and the CAIS can't protect us from making an unfriendly AGI.

I'm also not sure how strong the self improvement can be when the service maker service is only making little tweaks to existing algorithms rather than designing strange new algorithms. I suspect you would get to a local optimum of a reinforcement learning algorithm producing very slight variations of reinforcement learning. This might be quite powerful, but not anywhere near the limit of self improving AGI.

Comment by donald-hobson on Reframing Superintelligence: Comprehensive AI Services as General Intelligence · 2019-01-09T16:08:22.239Z · score: 8 (3 votes) · LW · GW

I disagree outright with

Any long term planning processes that consider weird plans for achieving goals (similar to "break out of the box") will typically not find any such plan and will be eliminated in favor of cognition that will actually help achieve the task.

Part of the reason that AI alignment is hard is that The Box is FULL of Holes! Breaking Out is EASY!

And the deeper reason for that is that we have no idea how to tell what's a hole.

Suppose you want to set the service generator to make a robot that cleans cars. If you give a blow by blow formal description of what you mean by "cleans cars" then your "service generator" is just a compiler. If you do not give a complete specification of what you mean, where does the information that "chopping off a nearby head to wipe windows with is unacceptable" come from. If the service generator notices that cars need cleaning and build the service by itself, you have an AGI by another name.

Obviously, if you have large amounts of training data made by humans with joysticks, and the robot is sampling from the same distribution, then you should be fine. This system learns that dirtier windshields need more wiping from 100's of examples of humans doing that, it doesn't chop off any heads because the humans didn't.

However, if you want the robot to display remotely novel behavior, then the distance between the training data and the new good solutions, becomes as large as the distance from the training data to bad solutions. If it's smart enough to go to the shops and buy a sponge, without having that strategy hardcoded in when it was built, then its smart enough to break into your neighbors house and nick a sponge.

The only thing that distinguishes one from the other is what humans prefer.

Distinguishing low impact from high impact is also hard.

This might be a good approach, but I don't feel it answers the question "I have a humanoid robot a hypercomputer and a couple of toddlers, how can I build something to look after the kids for a few weeks (without destroying the world) ?" So far, CAIS looks confused.

Comment by donald-hobson on What emotions would AIs need to feel? · 2019-01-09T12:14:33.615Z · score: 12 (3 votes) · LW · GW

Whether an AI feels emotions depends on how loose you are with the category "emotion". Take the emotion of curiosity. Investigating the environment is sometimes beneficial, due to the value of information. Because this behavior is quite complex, and the payoff is rare and indirect, reinforcement learners will struggle to learn it by default. However, there was a substantial evolutionary pressure towards minds that would display curious behavior. Evolution, being the blind idiot god, built a few heuristics for value of information that were effective in the environment of evolutionary adaptation, and hard wired these to the pleasure center.

In the modern environment, curiosity takes on a life of its own, and is no longer a good indication of value of information. Curiosity is a lost purpose.

Does AIXI display curiosity? It's calculating the value of information exactly. It will do a science experiment if and only if the expected usefullness of the data generated is greater than the expected cost of the experiment.

This is a meaningless semantic question, AIXI displays behavior that has some similarities to curiosity, and many differences.

I expect a from first principles AI, MIRI style, to have about as much emotion as AIXI. A bodge it till you make it AI could have something a bit closer to emotions. The Neural net bashers have put huristics that correlate to value of information into their reinforcement learners. An evolutionary algorithm might produce something like emotions, but probably a different set of emotions than the ones humans feel. An uploaded mind would have our emotions, as would a sufficiently neuromorphic AI.

Comment by donald-hobson on Imitation learning considered unsafe? · 2019-01-06T17:43:07.650Z · score: 1 (1 votes) · LW · GW

Suppose you take a terabyte of data on human decisions and actions. You search for the shortest program that outputs the data, then see what gets outputted afterwards. The shortest program that outputs the data might look like a simulation of the universe with an arrow pointing to a particular hard drive. The "imitator" will guess at what file is next on the disk.

One problem for imitation learning is the difficulty in pointing out the human and separating them from the environment. The details of the humans decision might depend on what they had for lunch. (Of course, multiple different decisions might be good enough. But this illustrates that "imitate a human" isn't a clear cut procedure. And you have to be sure that the virtual lunch doesn't contain virtual mind control nanobots. ;-)

You could put a load of data about humans into a search for short programs that produce the same data. Hopefully the model produced will be some approximation of the universe. Hopefully, you have some way of cutting a human out of the model and putting them into a virtual box.

Alternatively you could use nanotech for mind uploading, and get a virtual human in a box.

If we have lots of compute and not much time, then uploading a team of AI researchers to really solve friendly AI is a good idea.

If we have a good enough understanding of "imitation learning", and no nanotech, we might be able to get an AI to guess the researchers mental states given observational data.

An imitation of a human might be a super-fast intelligence, with a lot of compute, but it won't be qualitatively super-intelligent.

Comment by donald-hobson on Will humans build goal-directed agents? · 2019-01-05T13:03:44.242Z · score: 1 (1 votes) · LW · GW

Building a non goal directed agent is like building a cart out of non-wood materials. Goal directed behavior is relatively well understood. We know that most goal directed designs don't do what we want. Most arrangements of wood do not form a functioning cart.

I suspect that a randomly selected agent from the space of all non goal directed agents is also useless or dangerous, in much the same way that a random arrangement of non wood materials is.

Now there are a couple of regions of design space that are not goal directed and look like they contain useful AI's. We might be better off making our cart from Iron, but Iron has its own problems.

Comment by donald-hobson on Logical inductors in multistable situations. · 2019-01-04T11:46:12.165Z · score: 1 (1 votes) · LW · GW

0.5 is the almost fixed point. Its the point where goes from being positive to negative. If you take a sequence of continuous functions that converge pointwise to then there will exist a sequence such that and .

## Logical inductors in multistable situations.

2019-01-03T23:56:54.671Z · score: 8 (5 votes)
Comment by donald-hobson on Optimization Regularization through Time Penalty · 2019-01-02T23:17:37.087Z · score: 3 (2 votes) · LW · GW

If we ignore subagents and imagine a cartesian boundary, turned off can easily be defined as all future outputs are 0.

I also doubt that an AI working ASAP is safe in any meaningful sense. Of course you can move all the magic into "human judges world ok". If you make lambda large enough, your AI is safe and useless.

If the utility function is 1 if widget exists, else 0. Where a widget is easily build-able, not currently existing object.

Suppose that ordering the parts through normal channels will take a few weeks. If it hacks the nukes and holds the world to ransom, then everyone at the widget factory will work nonstop, then drop dead of exhaustion.

Alternately it might be able to bootstrap self replicating nanotech in less time. The AI has no reason to care if the nanotech that makes the widget is highly toxic, and no reason to care if it has a shutoff switch or grey goos the earth after the widget is produced.

World looks ok at time T is not enough, you could still get something bad arising from the way seemingly innocuous parts were set up at time T. Being switched off and having no subagents in the conventional sense isn't enough. What if the AI changed some physics data in such a way that humans would collapse the quantum vacuum state, believing the experiment they were doing was safe. Building a subagent is just a special case of having unwanted influence

Comment by donald-hobson on Why I expect successful (narrow) alignment · 2018-12-31T21:41:50.609Z · score: 1 (1 votes) · LW · GW

You can have an AI that isn't a consequentialist. Many deep learning algorithms are pure discriminators, they are not very dangerous or very useful. If I want to make a robot that tidies my room, the simplest conceptual framework for this is a consequentialist with real world goals. (I could also make a hackish patchwork of heuristics, like evolution would). If I want the robot to deal with circumstances that I haven't considered, most hardcoded rules approaches fail, you need something that behaves like a consequentialist with real world preferences.

I'm not saying that all AI's will be real world consequentialists, just that there are many tasks only real world consequentialists can do. So someone will build one.

Also, they set up the community after they realized the problem, and they could probably make more money elsewhere. So there doesn't seem to be strong incentives to lie.

Comment by donald-hobson on Why I expect successful (narrow) alignment · 2018-12-29T20:44:36.379Z · score: 3 (3 votes) · LW · GW

Adam's law of slow moving disasters only applies when the median individual can understand the problem, and the evidence that it is a problem. We didn't get nuclear protests or treaties until overwhelming evidence that nukes were possible in the form of detonations. No one was motivated to protest or sign treaties based on abstract physics arguments about what might be possible some day. Action regarding climate change didn't start until the evidence became quite clear. The outer space treaty wasn't signed until 1967, 5 years after human spaceflight and only 2 before the moon landings.

Comment by donald-hobson on Why I expect successful (narrow) alignment · 2018-12-29T20:34:03.399Z · score: 8 (4 votes) · LW · GW

Human morals are specific and complex (in the formal, high information sense of the word complexity) They also seem hard to define. A strict definition of human morality, or a good referent to it would be morality. Could you have powerful and useful AI that didn't have this. This would be some kind of whitelisting or low impact optimization, as a general optimization over all possible futures is a disaster without morality. These AI may be somewhat useful, but not nearly as useful as they would be with fewer constraints.

I would make a distinction between math first AI, like logical induction and AIXI, where we understand the AI before it is built. Compare to code first AI, like anything produced by an evolutionary algorithm, anything "emergent" and most deep neural networks, where we build the AI then see what it does. The former approach has a chance of working, a code first ASI is almost certain doom.

I would question the phrase "becomes apparent that alignment is a serious problem", I do not think this is going to happen. Before ASI, we will have the same abstract and technical arguments we have now for why alignment might be a problem. We will have a few more alpha go moments, but while some go "wow, AGI near", others will say "go isn't that hard, we are a long way from this scifi AGI", or "Superintelligence will be friendly by default". A few more people might switch sides, but we have already had one alpha go moment, and that didn't actually make a lot of difference. There is no giant neon sign flashing "ALIGNMENT NOW!". See no fire alarm on AGI.

Even if we do have a couple of approaches that seem likely to work, it is still difficult to turn a rough approach into a formal technical specification into programming code. The code has to have reasonable runtime. Then the first team to develop AGI have to be using a math first approach and implement alignment without serious errors. I admit that there are probably a few disjunctive possibilities I've missed. And these events aren't independent. Conditional on friendly ASI I would expect a large amount of talent and organizational competence working on AI safety.

Comment by donald-hobson on A few misconceptions surrounding Roko's basilisk · 2018-12-24T22:07:05.841Z · score: 1 (1 votes) · LW · GW

My take on Roko's basilisk is that you got ripped off in your acausal trade. Try to get a deal along the lines of, unless the AI goes extra specially far out of its way to please me, I'm gonna build a paperclipper just to spite it. At least trading a small and halfhearted attempt to help build AGI for a vast reward.

Comment by donald-hobson on Two clarifications about "Strategic Background" · 2018-12-24T21:24:14.926Z · score: 1 (1 votes) · LW · GW

Imagine that you hadn't figured out FDT, but you did have CDT and EDT. Would building an AI that defers to humans if they are different be an example of minimal but aligned?

If we take artificial addition too seriously, its hard to imagine what a "minimal arithmatician" looks like. If you understand arithmetic, you can make a perfect system, if you don't, the system will be hopeless. I would not be surprised if there was some simple "algorithm of maximally efficient intelligence" and we built it. No foom, AI starts at the top. All the ideas about rates of intelligence growth are nonsense. We built a linear time AIXI.

Comment by donald-hobson on Best arguments against worrying about AI risk? · 2018-12-23T21:28:39.007Z · score: 2 (2 votes) · LW · GW

If we have two distinct AI safety plans, the researchers are sensible to have a big discussion on which is better and only turn that one on. If not, and neither AI is fatally flawed, I would expect them to cooperate, they have very similar goals and neither wants war.

Comment by donald-hobson on on wellunderstoodness · 2018-12-16T17:56:45.444Z · score: 1 (1 votes) · LW · GW

The maximise p=P(cauldrun full) constrained by has really weird failure modes. This is how I think it would go. Take over the world, using a method that has chance of failing. Build giant computers to calculate the exact chance of your takeover. Build a random bucket filler to make the probability work out. Ie if =3%, then the AI does its best to take over the world, once it succeeds it calculates that its plan had a 2% chance of failure. So it builds a bucket filler that has chance of working. This policy leaves the chance of the cauldron being filled at exactly 97%.

Comment by donald-hobson on Quantum immortality: Is decline of measure compensated by merging timelines? · 2018-12-11T23:10:26.499Z · score: 2 (2 votes) · LW · GW

One way to avoid the absurd conclusion is to say that it doesn't matter if another mind is you.

Suppose I have a utility function over the entire quantum wave function. This utility function is mostly focused on beings that are similar to myself. So I consider the alternate me, that differs only in phone number, getting £100, about equal to the original me getting £100. As far as my utility function goes, both the versions of me would just be made worse off by forgetting the number.

Comment by donald-hobson on Why should EA care about rationality (and vice-versa)? · 2018-12-11T14:26:06.626Z · score: 1 (1 votes) · LW · GW

I agree that not all rationalists would want wireheaded chickens, maybe they don't care about chicken suffering at all. I also agree that you sometimes see bad logic and non-sequiters in the rationalist community. The non rationalist, motivated, emotion driven thinking, is the way that humans think by default. The rationalist community is trying to think a different way, sometimes successfully. Illustrating a junior rationalist having an off day and doing something stupid doesn't illuminate the concept of rationality, the way that seeing a beginner juggler drop balls doesn't show you what juggling is.

Comment by donald-hobson on Why should EA care about rationality (and vice-versa)? · 2018-12-10T16:51:30.371Z · score: 1 (1 votes) · LW · GW

I've not seen a charity trying to do it, but wouldn't be surprised if there was one. I'm trying to illustrate the different thought processes.

Comment by donald-hobson on Measly Meditation Measurements · 2018-12-09T23:38:09.046Z · score: 4 (4 votes) · LW · GW

What is the way you were meditating? Relaxed or focused? Comfortable or cross legged? Focusing on a meaningless symbol, or letting your mind wander, or focusing on something meaningful?

I have found that I can produce a significant amount of emotion, of a chosen type, eg anxious, miserable, laughing, foucssedly happy, ect in seconds. I seem to do this just by focusing on the emotion with little conscious thinking of a concept that would cause the emotion. Is this meditation? (Introspection, highly unreliable)

Can you describe the meditation as attempting to modify your thought pattern in some way?

## Boltzmann Brains, Simulations and self refuting hypothesis

2018-11-26T19:09:42.641Z · score: 0 (2 votes)

## Quantum Mechanics, Nothing to do with Consciousness

2018-11-26T18:59:19.220Z · score: 10 (9 votes)

## Clickbait might not be destroying our general Intelligence

2018-11-19T00:13:12.674Z · score: 26 (10 votes)

## Stop buttons and causal graphs

2018-10-08T18:28:01.254Z · score: 6 (4 votes)

## The potential exploitability of infinite options

2018-05-18T18:25:39.244Z · score: 3 (4 votes)