Comment by donald-hobson on Can We Place Trust in Post-AGI Forecasting Evaluations? · 2019-02-17T21:01:29.480Z · score: 2 (2 votes) · LW · GW

As your belief about how well AGI is likely to go affects both the likelihood of a bet being evaluated, and the chance of winning, so bets about AGI are likely to give dubious results. I also have substantial uncertainty about the value of money in a post singularity world. Most obviously is everyone getting turned into paperclips, noone has any use for money. If we get a friendly singleton super-intelligence, everyone is living in paradise, whether or not they had money before. If we get an economic singularity, where libertarian ASI(s) try to make money without cheating, then money could be valuable. I'm not sure how we would get that, as an understanding of the control problem good enough to not wipe out humans and fill the universe with bank notes should be enough to make something closer to friendly.

Even if we do get some kind of ascendant economy, given the amount of resources in the solar system (let alone wider universe), its quite possible that pocket change would be enough to live for aeons of luxury.

Given how unclear it is about whether or not the bet will get paid and how much the cash would be worth if it was, I doubt that the betting will produce good info. If everyone thinks that money is more likely than not to be useless to them after ASI, then almost no one will be prepared to lock their capital up until then in a bet.

Comment by donald-hobson on Limiting an AGI's Context Temporally · 2019-02-17T18:32:43.272Z · score: 2 (2 votes) · LW · GW

I suspect that an AGI with such a design could be much safer, if it was hardcoded to believe that time travel and hyperexponentially vast universes were impossible. Suppose that the AGI thought that there was a 0.0001% chance that it could use a galaxies worth of resources to send 10^30 paperclips back in time. Or create a parallel universe containing 3^^^3 paperclips. It will still chase those options.

If starting a long plan to take over the world costs it literally nothing, it will do it anyway. A sequence of short term plans, each designed to make as many paperclips as possible within the next few minutes could still end up dangerous. If the number of paperclips at time is , and its power at time is , then , would mean that both power and paperclips grew exponentially. This is what would happen if power can be used to gain power and clips at the same time, with minimal loss of either from also pursuing the other.

If power can only be used to gain one thing at a time, and the rate power can grow at is less than the rate of time discount, then we are safer.

This proposal has several ways to be caught out, world wrecking assumptions that aren't certain, but if used with care, a short time frame, an ontology that considers timetravel impossible, and say a utility function that maxes out at 10 clips, it probably won't destroy the world. Throw in mild optimization and an impact penalty, and you have a system that relies on a disjunction of shaky assumptions, not a conjunction of them.

It is a CDT agent, or something that doesn't try to punish you now so you make paperclips last week. A TDT agent might decide to take the policy of killing anyone who didn't make clips before it was turned on, causing humans that predict this to make clips.

I suspect that it would be possible to build such an agent, prove that there are no weird failure modes left, and turn it on, with a small chance of destroying the world. I'm not sure why you would do that. Once you understand the system well enough to say its safe-ish, what vital info do yo gain from turning it on?

Comment by donald-hobson on Extraordinary ethics require extraordinary arguments · 2019-02-17T17:19:53.640Z · score: 4 (3 votes) · LW · GW

Butterfly effects essentially unpredictable, given your partial knowledge of the world. Sure, you doing homework could cause a tornado in Texas, but it's equally likely to prevent that. To actually predict which, you would have to calculate the movement of every gust of air around the world. Otherwise your shuffling an already well shuffled pack of cards. Bear in mind that you have no reason to distinguish the particular action of "doing homework" from a vast set of other actions. If you really did know what actions would stop the Texas tornado, they might well look like random thrashing.

What you can calculate is the reliable effects of doing your homework. So, given bounded rationality, you are probably best to base your decisions on those. The fact that this only involves homework might suggest that you have an internal conflict between a part of yourself that thinks about careers, and a short term procrastinator.

Most people who aren't particularly ethical still do more good than harm. (If everyone looks out for themselves, everyone has someone to look out for them. The law stops most of the bad mutual defections in prisoners dilemmas) Evil genius trying to trick you into doing harm are much rarer than moderately competent nice people trying to get your help to do good.

Comment by donald-hobson on Short story: An AGI's Repugnant Physics Experiment · 2019-02-14T15:31:50.252Z · score: 6 (4 votes) · LW · GW

This is an example of a pascals mugging. Tiny probabilities of vast rewards can produce weird behavior. The best known solution is either a bounded utility function, or a antipascalene agent. (An agent that ignores the best x% and worst y% of possible worlds when calculating expected utilities. It can be money pumped)

Comment by donald-hobson on Probability space has 2 metrics · 2019-02-11T22:50:32.220Z · score: 12 (4 votes) · LW · GW

Get a pack of cards in which some cards are blue on both sides, and some are red on one side and blue on the other. Pick a random card from the pile. If the subject is shown one side of the card, and its blue, they gain a bit of evidence that the card is blue on both sides. Give them the option to bet on the colour of the other side of the card, before and after they see the first side. Invert the prospect theory curve to get from implicit probability to betting behaviour. The people should perform a larger update in log odds when the pack is mostly one type of card, over when the pack is 50 : 50.

Comment by donald-hobson on How important is it that LW has an unlimited supply of karma? · 2019-02-11T15:21:06.773Z · score: 3 (3 votes) · LW · GW

I suspect that if voting reduced your own karma, some people wouldn't vote. As it becomes obvious that this is happening, more people stop voting, until karma just stops flowing at all. (The people who persistently vote anyway all run out of karma.)

Comment by donald-hobson on Probability space has 2 metrics · 2019-02-11T10:55:11.889Z · score: 1 (1 votes) · LW · GW

Fixed, thanks.

## Propositional Logic, Syntactic Implication

2019-02-10T18:12:16.748Z · score: 2 (2 votes)

## Probability space has 2 metrics

2019-02-10T00:28:34.859Z · score: 87 (35 votes)
Comment by donald-hobson on X-risks are a tragedies of the commons · 2019-02-07T17:50:13.823Z · score: 2 (2 votes) · LW · GW

This is making the somewhat dubious assumption that X risks are not so neglected that even a "selfish" individual would work to reduce them. Of course, in the not too unreasonable scenario where the cosmic commons is divided up evenly, and you use your portion to make a vast number of duplicates of yourself, the utility, if your utility is linear in copies of yourself, would be vast. Or you might hope to live for a ridiculously long time in a post singularity world.

The effect that a single person can have on X risks is small, but if they were selfish with no time discounting, it would be a better option than hedonism now. Although a third alternative of sitting in a padded room being very very safe could be even better.

Comment by donald-hobson on (notes on) Policy Desiderata for Superintelligent AI: A Vector Field Approach · 2019-02-06T00:27:27.407Z · score: 18 (5 votes) · LW · GW

I suspect that the social institutions of Law and Money are likely to become increasingly irrelevant background to the development of ASI.

Deterrence Fails.

If you believe that there is a good chance of immortal utopia, and a large chance of paperclips in the next 5 years, the threat that the cops might throw you in jail, (on the off chance that they are still in power) is negligible.

The law is blind to safety.

The law is bureaucratic and ossified. It is probably not employing much top talent, as it's hard to tell top talent from the rest if you aren't as good yourself (and it doesn't have the budget or glamor to attract them). Telling whether an organization is on line for not destroying the world is HARD. The safety protocols are being invented on the fly by each team, the system is very complex and technical and only half built. The teams that would destroy the world aren't idiots, they are still producing long papers full of maths and talking about the importance of safety a lot. There are no examples to work with, or understood laws.

Likely as not (not really, too much conjugation here), you get some random inspector with a checklist full of thing that sound like a good idea to people who don't understand the problem. All AI work has to have an emergency stop button that turns the power off. (The idea of an AI circumventing this was not considered by the person who wrote the list).

All the law can really do is tell what public image an AI group want's to present, provide funding to everyone, and get in everyone's way. Telling cops to "smash all GPU's" would have an effect on AI progress. The fund vs smash axis is about the only lever they have. They can't even tell an AI project from a maths convention from a normal programming project if the project leaders are incentivized to obfuscate.

After ASI, governments are likely only relevant if the ASI was programmed to care about them. Neither paperclippers or FAI will care about the law. The law might be relevant if we had tasky ASI that was not trivial to leverage into a decisive strategic advantage. (An AI that can put a strawberry on a plate without destroying the world, but that's about the limit of its safe operation.)

Such an AI embodies an understanding of intelligence and could easily be accidentally modified to destroy the world. Such scenarios might involve ASI and timescales long enough for the law to act.

I don't know how the law can handle something that, can easily destroy the world, has some economic value (if you want to flirt danger) and, with further research could grant supreme power. The discovery must be limited to a small group of people, (law of large number of nonexperts, one will do something stupid). I don't think the law could notice what it was, after all the robot in-front of the inspector only puts strawberries on plates. They can't tell how powerful it would be with an unbounded utility function.

Comment by donald-hobson on Why is this utilitarian calculus wrong? Or is it? · 2019-01-28T17:06:33.310Z · score: 6 (5 votes) · LW · GW

Firstly, you are confusing dollars and utils.

If you buy this product for $100, you gain the use of it, at value U[30] to yourself. The workers who made it gain$80, at value U[80] to yourself, because of your utilitarian preferences. Total value U[110]

If the alternative was a product of cost $100, which you value the use of at U[105], but all the money goes to greedy rich people to be squandered, then you would choose the first. If the alternative was spending$100 to do something insanely morally important, U[3^^^3], you would do that.

If the alternative was a product of cost \$100, that was of value U[100] to yourself, and some of the money would go to people that weren't that rich U[15], you would do that.

If you could give the money to people twice as desperate as the workers, at U[160], you would do that.

There are also good reasons why you might want to discourage monopolies. Any desire to do so is not included in the expected value calculations. But the basic principle is that utilitarianism can never tell you if some action is a good use of a resource, unless you tell it what else that resource could have been used for.

Comment by donald-hobson on Solomonoff induction and belief in God · 2019-01-28T16:48:01.902Z · score: 2 (2 votes) · LW · GW

The information needed to describe our particular laws of physics < info needed to describe the concept of "habitable universe" in general < info needed to describe human-like mind.

The biggest slip is the equivocation of the word intelligence. The Kolmogorov complexity of AIXI-tl is quite small, so intelligence's in that sense of the word are likely to exist in the universal prior.

Humanlike minds have not only the clear mark of evolution, but the mark of stone age tribal interactions across their psyche. An arbitrary mind will be bizarre and alien. Wondering if such a mind might be benevolent is hugely privileging the hypothesis. The most likely way to make a humanlike mind is the process that created humans. So in most of the universes with humanoid deities, those deities evolved. This becomes the simulation hypothesis.

The best hypothesis is still the laws of quantum physics or whatever.

Comment by donald-hobson on For what do we need Superintelligent AI? · 2019-01-25T23:43:23.412Z · score: 4 (4 votes) · LW · GW

We don't know what we are missing out on without super intelligence. There might be all sorts of amazing things that we would just never consider to make, or dismiss as obviously impossible, without super intelligence.

I am pointing out that being able to make a FAI that is a bit smarter than you (smartness not really on a single scale, vastly different cognitive architecture, is deep blue smarter than a horse?), involves solving almost all the hard problems in alignment. When we have done all that hard work, we might as well tell it to make itself a trillion times smarter, the cost to us is negligible, the benefit could be huge.

AI can also serve as as a values repository. In most circumstances, values are going to drift over time, possibly due evolutionary forces. If we don't want to end up as hardscrapple frontier replicators, we need some kind of singleton. Most types of government or committee have their own forms of value drift, and couldn't keep enough of an absolute grip on power to stop any rebellions for billions of years. I have no ideas other than Friendly ASI oversight for how to stop someone in a cosmically vast society from creating a UFASI. Sufficiently draconian banning of anything at all technological could stop anyone from creating UFASI long term, and also stop most things since the industrial revolution.

The only reasonable scenario that I can see in which FAI is not created and the cosmic commons gets put to good use is if a small group of likeminded individuals, or single person, gains exclusive access to selfrep nanotech and mind uploading. They then use many copies of themselves to police the world. They do all programming and only run code they can formally prove isn't dangerous. No-one is allowed to touch anything Turing complete.

Comment by donald-hobson on Allowing a formal proof system to self improve while avoiding Lobian obstacles. · 2019-01-24T14:25:29.195Z · score: 2 (2 votes) · LW · GW

Both blanks are the identity function.

Here is some psudo code

class Prover:

____def new(self):

________self.ps=[PA]

____def prove(self, p, s, b):

________assert p in self.ps

________return p(s,b)

________if self.prove(p1,"forall s:(exists b2: p2(s,b2))=> (exists b1: p2(s,b1))", b)

____________self.ps.append(p2)

prover=Prover()

Where PA is a specific peano arithmatic proof checker. nPA is another proof checker. and 'proof' is a proof that anything nPA can prove, PA can prove too.

## Allowing a formal proof system to self improve while avoiding Lobian obstacles.

2019-01-23T23:04:43.524Z · score: 6 (3 votes)
Comment by donald-hobson on Life can be better than you think · 2019-01-22T21:09:26.718Z · score: 1 (1 votes) · LW · GW

I consider emotions to be data, not goals. From this point of view, deliberately maximizing happiness for its own sake is a lost purpose. Its like writing extra numbers on your bank balance. If however your happiness was reliably too low, adjusting it upwards with drugs would be sensible. Whats the best level of happiness, the one that produces optimal behavior.

I also find my emotions to be quite weak. And I can set them consciously change them. Just thinking "be happy", or "be sad" and feeling happy or sad. It actually feels similar to imagining a mental image, sound or smell.

Comment by donald-hobson on Too Smart for My Own Good · 2019-01-22T20:43:49.021Z · score: 3 (2 votes) · LW · GW

Writing random bits of code is a good hobby. It sounds like you prefer doing that than learning to play jazz, so forget the jazz and just code. I was having a hard job understanding quantum spin, and wrote some code to help. It was reasonably helpful. Then again, quantum spin is all about complex matrix multiplication, and numpy has functions for that, so I was basically using it as a matrix arithmetic calculator. Another example, I found that I kept getting distracted, so I wrote code that randomly beeped, asked what I was doing, and saved the results to a file. It worked quite well.

Comment by donald-hobson on Should questions be called "questions" or "confusions" (or "other")? · 2019-01-22T09:00:10.438Z · score: 1 (1 votes) · LW · GW

Sure, that sounds interesting. I have a bunch of things that I'm confused about.

Comment by donald-hobson on Following human norms · 2019-01-21T21:27:40.185Z · score: 3 (3 votes) · LW · GW

What if it follows human norms with dangerously superhuman skill.

Suppose humans had a really strong norm that you were allowed to say whatever you like, and encouraged to say things others will find interesting.

Among humans, the most we can exert is a small optimization for the not totally dull.

The AI produces a sequence that effectively hacks the human brain and sets interest to maximum.

Comment by donald-hobson on Debate AI and the Decision to Release an AI · 2019-01-19T17:19:04.971Z · score: 4 (3 votes) · LW · GW

You assert that "Naturally, it is much worse to release a misaligned AI than to not release an aligned AI".

I would disagree with that, if the odds of aligned AI, conditional on you not releasing this one were 50:50, then both mistakes are equally bad. If someone else is definitely going to release a paper-clipper next week, then it would make sense to release an AI with a 1% chance of being friendly now. (Bear in mind that no one would release an AI if they didn't think it was friendly, so you might be defecting in an epistemic prisoners dilemma.)

I would put more weight on human researchers debating which AI is save before any code is run. I would also think that the space of friendly AIs is tiny compared to the space of all AIs, so making an AI that you put as much as 1% chance on being friendly is almost as hard as building a 99% chance friendly AI.

Comment by donald-hobson on Synthesising divergent preferences: an example in population ethics · 2019-01-18T23:50:44.807Z · score: 5 (3 votes) · LW · GW

I don't think all AI catastrophes come from oversimplification of value functions. Suppose we had 1000 weak preferances, , with . Each of which is supposed to be but due to some weird glitch in the definition of , it has an unforeseen maximum of 1000,000, and that maximum is paperclips. In this scenario, the AI is only as friendly as the least friendly piece.

Alternatively, if the value of each is linear or convex in resources spent maximizing it, or other technical conditions hold, then the AI just picks a single to focus all resources on. If some term is very easily satisfied, say is a slight preference that it not wipe out all beetles, then we get a few beetles living in a little beetle box, and 99.99...% of resources turned into whatever kind of paperclip it would otherwise have made.

If we got everyone in the world who is "tech literate" to program a utility function ( in some easy to use utility function programming tool?), bounded them all and summed the lot together, then I suspect that the AI would still do nothing like optimizing human values. (To me, this looks like a disaster waiting to happen)

Comment by donald-hobson on Reframing Superintelligence: Comprehensive AI Services as General Intelligence · 2019-01-10T18:12:10.211Z · score: 3 (2 votes) · LW · GW

Any algorithm that gets stuck in local optimum so easily will not be very intelligent or very useful. Humans have, at least somewhat, the ability to notice that there should be a good plan in this region, find and execute that plan successfully. We don't get stuck in local optima as much as current RL algorithms.

AIXI would be very good at making complex plans and doing well first time. You could tell it the rules of chess and it would play PERFECT chess first time. It does not need lots of examples to work from. Give it any data that you happen to have available, and it will become very competent, and able to carry out complex novel tasks first time.

Current reinforcement learning algorithms aren't very good at breaking out of boxes because they follow the local incentive gradient. (I say not very good at, because a few algorithms have exploited glitches in a way thats a bit "break out the boxish") In some simple domains, its possible to follow the incentive gradient all the way to the bottom. In other environments, human actions already form a good starting point, and following the incentive gradient from there can make the solution a bit better.

I agree that most of the really dangerous break out the boxes probably can't be reached by local gradient decent from a non adversarial starting point. (I do not want to have to rely on this)

I agree that you can attach loads of sensors to say postmen, and train a big neural net to control a humanoid robot to deliver letters, given millions of training examples. You can probably automate many of the training weight fiddling tasks currently done by grad student descent to make big neural nets work.

I agree that this could be somewhat useful economically, as a significant proportion of economic productivity could be automated.

What I am saying is that this form of AI is sufficiently limited that there are still large incentives to make AGI and the CAIS can't protect us from making an unfriendly AGI.

I'm also not sure how strong the self improvement can be when the service maker service is only making little tweaks to existing algorithms rather than designing strange new algorithms. I suspect you would get to a local optimum of a reinforcement learning algorithm producing very slight variations of reinforcement learning. This might be quite powerful, but not anywhere near the limit of self improving AGI.

Comment by donald-hobson on Reframing Superintelligence: Comprehensive AI Services as General Intelligence · 2019-01-09T16:08:22.239Z · score: 8 (3 votes) · LW · GW

I disagree outright with

Any long term planning processes that consider weird plans for achieving goals (similar to "break out of the box") will typically not find any such plan and will be eliminated in favor of cognition that will actually help achieve the task.

Part of the reason that AI alignment is hard is that The Box is FULL of Holes! Breaking Out is EASY!

And the deeper reason for that is that we have no idea how to tell what's a hole.

Suppose you want to set the service generator to make a robot that cleans cars. If you give a blow by blow formal description of what you mean by "cleans cars" then your "service generator" is just a compiler. If you do not give a complete specification of what you mean, where does the information that "chopping off a nearby head to wipe windows with is unacceptable" come from. If the service generator notices that cars need cleaning and build the service by itself, you have an AGI by another name.

Obviously, if you have large amounts of training data made by humans with joysticks, and the robot is sampling from the same distribution, then you should be fine. This system learns that dirtier windshields need more wiping from 100's of examples of humans doing that, it doesn't chop off any heads because the humans didn't.

However, if you want the robot to display remotely novel behavior, then the distance between the training data and the new good solutions, becomes as large as the distance from the training data to bad solutions. If it's smart enough to go to the shops and buy a sponge, without having that strategy hardcoded in when it was built, then its smart enough to break into your neighbors house and nick a sponge.

The only thing that distinguishes one from the other is what humans prefer.

Distinguishing low impact from high impact is also hard.

This might be a good approach, but I don't feel it answers the question "I have a humanoid robot a hypercomputer and a couple of toddlers, how can I build something to look after the kids for a few weeks (without destroying the world) ?" So far, CAIS looks confused.

Comment by donald-hobson on What emotions would AIs need to feel? · 2019-01-09T12:14:33.615Z · score: 12 (3 votes) · LW · GW

Whether an AI feels emotions depends on how loose you are with the category "emotion". Take the emotion of curiosity. Investigating the environment is sometimes beneficial, due to the value of information. Because this behavior is quite complex, and the payoff is rare and indirect, reinforcement learners will struggle to learn it by default. However, there was a substantial evolutionary pressure towards minds that would display curious behavior. Evolution, being the blind idiot god, built a few heuristics for value of information that were effective in the environment of evolutionary adaptation, and hard wired these to the pleasure center.

In the modern environment, curiosity takes on a life of its own, and is no longer a good indication of value of information. Curiosity is a lost purpose.

Does AIXI display curiosity? It's calculating the value of information exactly. It will do a science experiment if and only if the expected usefullness of the data generated is greater than the expected cost of the experiment.

This is a meaningless semantic question, AIXI displays behavior that has some similarities to curiosity, and many differences.

I expect a from first principles AI, MIRI style, to have about as much emotion as AIXI. A bodge it till you make it AI could have something a bit closer to emotions. The Neural net bashers have put huristics that correlate to value of information into their reinforcement learners. An evolutionary algorithm might produce something like emotions, but probably a different set of emotions than the ones humans feel. An uploaded mind would have our emotions, as would a sufficiently neuromorphic AI.

Comment by donald-hobson on Imitation learning considered unsafe? · 2019-01-06T17:43:07.650Z · score: 1 (1 votes) · LW · GW

Suppose you take a terabyte of data on human decisions and actions. You search for the shortest program that outputs the data, then see what gets outputted afterwards. The shortest program that outputs the data might look like a simulation of the universe with an arrow pointing to a particular hard drive. The "imitator" will guess at what file is next on the disk.

One problem for imitation learning is the difficulty in pointing out the human and separating them from the environment. The details of the humans decision might depend on what they had for lunch. (Of course, multiple different decisions might be good enough. But this illustrates that "imitate a human" isn't a clear cut procedure. And you have to be sure that the virtual lunch doesn't contain virtual mind control nanobots. ;-)

You could put a load of data about humans into a search for short programs that produce the same data. Hopefully the model produced will be some approximation of the universe. Hopefully, you have some way of cutting a human out of the model and putting them into a virtual box.

Alternatively you could use nanotech for mind uploading, and get a virtual human in a box.

If we have lots of compute and not much time, then uploading a team of AI researchers to really solve friendly AI is a good idea.

If we have a good enough understanding of "imitation learning", and no nanotech, we might be able to get an AI to guess the researchers mental states given observational data.

An imitation of a human might be a super-fast intelligence, with a lot of compute, but it won't be qualitatively super-intelligent.

Comment by donald-hobson on Will humans build goal-directed agents? · 2019-01-05T13:03:44.242Z · score: 1 (1 votes) · LW · GW

Building a non goal directed agent is like building a cart out of non-wood materials. Goal directed behavior is relatively well understood. We know that most goal directed designs don't do what we want. Most arrangements of wood do not form a functioning cart.

I suspect that a randomly selected agent from the space of all non goal directed agents is also useless or dangerous, in much the same way that a random arrangement of non wood materials is.

Now there are a couple of regions of design space that are not goal directed and look like they contain useful AI's. We might be better off making our cart from Iron, but Iron has its own problems.

Comment by donald-hobson on Logical inductors in multistable situations. · 2019-01-04T11:46:12.165Z · score: 1 (1 votes) · LW · GW

0.5 is the almost fixed point. Its the point where goes from being positive to negative. If you take a sequence of continuous functions that converge pointwise to then there will exist a sequence such that and .

## Logical inductors in multistable situations.

2019-01-03T23:56:54.671Z · score: 7 (4 votes)
Comment by donald-hobson on Optimization Regularization through Time Penalty · 2019-01-02T23:17:37.087Z · score: 3 (2 votes) · LW · GW

If we ignore subagents and imagine a cartesian boundary, turned off can easily be defined as all future outputs are 0.

I also doubt that an AI working ASAP is safe in any meaningful sense. Of course you can move all the magic into "human judges world ok". If you make lambda large enough, your AI is safe and useless.

If the utility function is 1 if widget exists, else 0. Where a widget is easily build-able, not currently existing object.

Suppose that ordering the parts through normal channels will take a few weeks. If it hacks the nukes and holds the world to ransom, then everyone at the widget factory will work nonstop, then drop dead of exhaustion.

Alternately it might be able to bootstrap self replicating nanotech in less time. The AI has no reason to care if the nanotech that makes the widget is highly toxic, and no reason to care if it has a shutoff switch or grey goos the earth after the widget is produced.

World looks ok at time T is not enough, you could still get something bad arising from the way seemingly innocuous parts were set up at time T. Being switched off and having no subagents in the conventional sense isn't enough. What if the AI changed some physics data in such a way that humans would collapse the quantum vacuum state, believing the experiment they were doing was safe. Building a subagent is just a special case of having unwanted influence

Comment by donald-hobson on Why I expect successful (narrow) alignment · 2018-12-31T21:41:50.609Z · score: 1 (1 votes) · LW · GW

You can have an AI that isn't a consequentialist. Many deep learning algorithms are pure discriminators, they are not very dangerous or very useful. If I want to make a robot that tidies my room, the simplest conceptual framework for this is a consequentialist with real world goals. (I could also make a hackish patchwork of heuristics, like evolution would). If I want the robot to deal with circumstances that I haven't considered, most hardcoded rules approaches fail, you need something that behaves like a consequentialist with real world preferences.

I'm not saying that all AI's will be real world consequentialists, just that there are many tasks only real world consequentialists can do. So someone will build one.

Also, they set up the community after they realized the problem, and they could probably make more money elsewhere. So there doesn't seem to be strong incentives to lie.

Comment by donald-hobson on Why I expect successful (narrow) alignment · 2018-12-29T20:44:36.379Z · score: 3 (3 votes) · LW · GW

Adam's law of slow moving disasters only applies when the median individual can understand the problem, and the evidence that it is a problem. We didn't get nuclear protests or treaties until overwhelming evidence that nukes were possible in the form of detonations. No one was motivated to protest or sign treaties based on abstract physics arguments about what might be possible some day. Action regarding climate change didn't start until the evidence became quite clear. The outer space treaty wasn't signed until 1967, 5 years after human spaceflight and only 2 before the moon landings.

Comment by donald-hobson on Why I expect successful (narrow) alignment · 2018-12-29T20:34:03.399Z · score: 8 (4 votes) · LW · GW

Human morals are specific and complex (in the formal, high information sense of the word complexity) They also seem hard to define. A strict definition of human morality, or a good referent to it would be morality. Could you have powerful and useful AI that didn't have this. This would be some kind of whitelisting or low impact optimization, as a general optimization over all possible futures is a disaster without morality. These AI may be somewhat useful, but not nearly as useful as they would be with fewer constraints.

I would make a distinction between math first AI, like logical induction and AIXI, where we understand the AI before it is built. Compare to code first AI, like anything produced by an evolutionary algorithm, anything "emergent" and most deep neural networks, where we build the AI then see what it does. The former approach has a chance of working, a code first ASI is almost certain doom.

I would question the phrase "becomes apparent that alignment is a serious problem", I do not think this is going to happen. Before ASI, we will have the same abstract and technical arguments we have now for why alignment might be a problem. We will have a few more alpha go moments, but while some go "wow, AGI near", others will say "go isn't that hard, we are a long way from this scifi AGI", or "Superintelligence will be friendly by default". A few more people might switch sides, but we have already had one alpha go moment, and that didn't actually make a lot of difference. There is no giant neon sign flashing "ALIGNMENT NOW!". See no fire alarm on AGI.

Even if we do have a couple of approaches that seem likely to work, it is still difficult to turn a rough approach into a formal technical specification into programming code. The code has to have reasonable runtime. Then the first team to develop AGI have to be using a math first approach and implement alignment without serious errors. I admit that there are probably a few disjunctive possibilities I've missed. And these events aren't independent. Conditional on friendly ASI I would expect a large amount of talent and organizational competence working on AI safety.

Comment by donald-hobson on A few misconceptions surrounding Roko's basilisk · 2018-12-24T22:07:05.841Z · score: 1 (1 votes) · LW · GW

My take on Roko's basilisk is that you got ripped off in your acausal trade. Try to get a deal along the lines of, unless the AI goes extra specially far out of its way to please me, I'm gonna build a paperclipper just to spite it. At least trading a small and halfhearted attempt to help build AGI for a vast reward.

Comment by donald-hobson on Two clarifications about "Strategic Background" · 2018-12-24T21:24:14.926Z · score: 1 (1 votes) · LW · GW

Imagine that you hadn't figured out FDT, but you did have CDT and EDT. Would building an AI that defers to humans if they are different be an example of minimal but aligned?

If we take artificial addition too seriously, its hard to imagine what a "minimal arithmatician" looks like. If you understand arithmetic, you can make a perfect system, if you don't, the system will be hopeless. I would not be surprised if there was some simple "algorithm of maximally efficient intelligence" and we built it. No foom, AI starts at the top. All the ideas about rates of intelligence growth are nonsense. We built a linear time AIXI.

Comment by donald-hobson on Best arguments against worrying about AI risk? · 2018-12-23T21:28:39.007Z · score: 2 (2 votes) · LW · GW

If we have two distinct AI safety plans, the researchers are sensible to have a big discussion on which is better and only turn that one on. If not, and neither AI is fatally flawed, I would expect them to cooperate, they have very similar goals and neither wants war.

Comment by donald-hobson on on wellunderstoodness · 2018-12-16T17:56:45.444Z · score: 1 (1 votes) · LW · GW

The maximise p=P(cauldrun full) constrained by has really weird failure modes. This is how I think it would go. Take over the world, using a method that has chance of failing. Build giant computers to calculate the exact chance of your takeover. Build a random bucket filler to make the probability work out. Ie if =3%, then the AI does its best to take over the world, once it succeeds it calculates that its plan had a 2% chance of failure. So it builds a bucket filler that has chance of working. This policy leaves the chance of the cauldron being filled at exactly 97%.

Comment by donald-hobson on Quantum immortality: Is decline of measure compensated by merging timelines? · 2018-12-11T23:10:26.499Z · score: 2 (2 votes) · LW · GW

One way to avoid the absurd conclusion is to say that it doesn't matter if another mind is you.

Suppose I have a utility function over the entire quantum wave function. This utility function is mostly focused on beings that are similar to myself. So I consider the alternate me, that differs only in phone number, getting £100, about equal to the original me getting £100. As far as my utility function goes, both the versions of me would just be made worse off by forgetting the number.

Comment by donald-hobson on Why should EA care about rationality (and vice-versa)? · 2018-12-11T14:26:06.626Z · score: 1 (1 votes) · LW · GW

I agree that not all rationalists would want wireheaded chickens, maybe they don't care about chicken suffering at all. I also agree that you sometimes see bad logic and non-sequiters in the rationalist community. The non rationalist, motivated, emotion driven thinking, is the way that humans think by default. The rationalist community is trying to think a different way, sometimes successfully. Illustrating a junior rationalist having an off day and doing something stupid doesn't illuminate the concept of rationality, the way that seeing a beginner juggler drop balls doesn't show you what juggling is.

Comment by donald-hobson on Why should EA care about rationality (and vice-versa)? · 2018-12-10T16:51:30.371Z · score: 1 (1 votes) · LW · GW

I've not seen a charity trying to do it, but wouldn't be surprised if there was one. I'm trying to illustrate the different thought processes.

Comment by donald-hobson on Measly Meditation Measurements · 2018-12-09T23:38:09.046Z · score: 4 (4 votes) · LW · GW

What is the way you were meditating? Relaxed or focused? Comfortable or cross legged? Focusing on a meaningless symbol, or letting your mind wander, or focusing on something meaningful?

I have found that I can produce a significant amount of emotion, of a chosen type, eg anxious, miserable, laughing, foucssedly happy, ect in seconds. I seem to do this just by focusing on the emotion with little conscious thinking of a concept that would cause the emotion. Is this meditation? (Introspection, highly unreliable)

Can you describe the meditation as attempting to modify your thought pattern in some way?

Comment by donald-hobson on Why should EA care about rationality (and vice-versa)? · 2018-12-09T23:23:37.630Z · score: 4 (5 votes) · LW · GW

The whole idea of effective altruism is in getting the biggest bang for your charitable buck. If the evidence about how to do this was simple and incontrovertible, we wouldn't need advanced rationality skills to do so. In the real world, choosing the best cause requires weighing up subtle balances of evidence on everything from if animals are suffering in ways we would care about, to how likely a super intelligent AI is.

On the other side, effective altruism is only persuasive if you have various skills and patterns of thought. These include the ability to think quantitatively, avoiding scope insensitivity, the ideas of expected utility maximization and the rejection of the absurdity heuristic. It is conceptually possible for a mind to be a brilliant rationalist with the sole goal of paperclip maximization, however all humans have the same basic emotional architecture, with emotions like empathy and caring. When this is combined with rigorous structured thought, the end result often looks at least somewhat utilitarianish.

Here are the kinds of thought patterns that a stereotypical rationalist, and a stereotypical non rationalist would engage in, when evaluating two charities. One charity is a donkey sanctuary, the other is trying to genetically modify chickens that don't feel pain.

The leaflet has a beautiful picture of a cute fluffy donkey in a field of sunshine and flowers. Aww Don't you just want to stroke him. Donkeys in medows seem an unambiguous pure good. Who could argue with donkeys. Thinking about donkeys makes me feel happy. Look, this one with the brown ears is called buttercup. I'll put this nice poster up and send them some money.

Genetically modifying? Don't like the sound of that. To not feel pain? Weird? Why would you want to do that? Imagines the chicken crushed into a tiny cage, looking miserable, "its not really suffering" doesn't cut it. Wouldn't that encourage people to abuse them? We should be letting them live in the wild as nature intended.

The main component of this decision comes from adding up the little "good" or "bad" labels that they attach to each word. There is also a sense in which a donkey sanctuary is a typical charity (the robin of birds), while GM chickens is an atypical charity (the ostrich).

The rationalist starts off with questions like "How much do I value a year of happy donkey life, vs a year of happy chicken life?". How much money is needed to modify chickens, and get them used in farms. Whats the relative utility gain from a "non suffering" chicken in a tiny cage, vs a chicken in chicken paradise, relative to a factory farm chicken that is suffering? What is the size of the world chicken industry?

The rationalist ends up finding that the world chicken industry is huge, and so most sensible values for the other parameters lead to the GM chicken charity being better. They trust utilitarian logic more than any intuitions they might have.

Comment by donald-hobson on Formal Open Problem in Decision Theory · 2018-12-04T23:58:05.214Z · score: 3 (2 votes) · LW · GW

I've figured out the difference, I was using the box topology https://en.wikipedia.org/wiki/Box_topology , while you were using the https://en.wikipedia.org/wiki/Product_topology.

You are correct. I knew about finite topological products and made a natural generalization, but it turns out not to be the standard meaning of .

Comment by donald-hobson on Formal Open Problem in Decision Theory · 2018-12-01T00:37:56.497Z · score: 5 (3 votes) · LW · GW

I think you made a mistake here

The Hahn–Mazurkiewicz theorem states that

A non-empty Hausdorff topological space is a continuous image of the unit interval if and only if it is a compact, connected, locally connected second-countable space.

I will agree that is connected and locally connected. I'm not sure if its second countable. It is not compact.

Just to be clear Now let . Clearly each is open. Let And . Now clearly this family covers all of . However, remove any from and is no longer covered. So is a family of open sets, which cover and don't have any finite subcover.

Comment by donald-hobson on Quantum Mechanics, Nothing to do with Consciousness · 2018-11-27T09:48:48.319Z · score: 1 (1 votes) · LW · GW

Im not saying that you can't have a neural network that detects literature. I see no reason for what literature is to be incomputaple, I was aiming more for the idea of a complex vague intuitive boundary. Detect literature is not nearly enough to specify a particular program. As opposed to detect primes.

And no, consciousness is not a fundamental property of the universe.

Comment by donald-hobson on Boltzmann Brains, Simulations and self refuting hypothesis · 2018-11-27T09:40:25.279Z · score: 1 (1 votes) · LW · GW

Alright, if you want to formalize that in the context of a big universe, which one has the super majority of measure or magic reality fluid. Which should we act as if we are.

Comment by donald-hobson on Boltzmann Brains, Simulations and self refuting hypothesis · 2018-11-27T00:23:38.963Z · score: 1 (1 votes) · LW · GW

Fixed. Thanks.

Comment by donald-hobson on Quantum Mechanics, Nothing to do with Consciousness · 2018-11-27T00:21:11.003Z · score: 1 (1 votes) · LW · GW

Exactly, and we can point our finger and say, "this is literature", we can't write a computer program to detect either. And consciousness, like literature, is a motivated boundary. And almost any dispute about a borderline case becomes a "if a tree falls in a forest ..." argument.

## Boltzmann Brains, Simulations and self refuting hypothesis

2018-11-26T19:09:42.641Z · score: 0 (2 votes)

## Quantum Mechanics, Nothing to do with Consciousness

2018-11-26T18:59:19.220Z · score: 10 (9 votes)
Comment by donald-hobson on Upcoming: Open Questions · 2018-11-24T18:52:25.390Z · score: 3 (3 votes) · LW · GW

I think the ideal system would be a subquestion Tree. Someone asks a big root question, and people post splits, that divide the question into several smaller ones. People can answer splits of subdivide them. Splits can have a suggested dependence of one split on others. Questions can have many different splits, as people can come up with many different ways of solving the problem.

Example of 5 different users working together to solve a problem. Lines labeled with the same user letter are expected to be provided by the same user.

Root Question: What is (2+3)*(4+5)+6*7 (User Z)

Split 1: I know that 2+3=5 (User A)

SubQ 1: What is 4+5 (User A)

SubQ 2: What is 6*7 (User A)

SubQ 3: What is 5*(SubQ 1) (User A)

SubQ 4: What is (SubQ 3)+(SubQ 2) (User A)

Split 2: I know 6*7=42 (User C)

SubQ 1: What is (2+3)*(4+5) (User C)

Subsplit 1: Expand brackets (2+3)*(4+5)=2*4+2*5+3*4+3*5 (User D)

SubSubQ 1: What are 2*4, 2*5, 3*4, 3*5? (User D)

Answer: 2*4=8, 2*5=10, 3*4=12, 3*5=15 (User E)

SubSubQ 2: Whats the sum of the Answers to SubSubQ 1: (User D)

SubQ 2: What is (SubQ 1)+42 (User C)

Comment by donald-hobson on What if people simply forecasted your future choices? · 2018-11-23T22:27:43.514Z · score: 2 (2 votes) · LW · GW

Obviously this strategy would be wholly unsuitable to align ASI. When considering humans, remember that the predictors have other tricks for controling the decision, as well as the communication channel . If there is enough money in the prediction market, someone might be incentivized to offer you discounts on the mackbook.

Comment by donald-hobson on On MIRI's new research directions · 2018-11-23T17:32:16.615Z · score: 7 (4 votes) · LW · GW

One group that isn't considered in the analysis is new trainees. It seems that AGI is probably sufficiently far off, that many of the people who will make the breakthroughs are not yet researchers or experts. If you are a bright young person who might work at MIRI or somewhere similar in 5 years time, you would want to get familiar with the area. You are probably reading MIRI's existing work, to see if you have the capability to work in the field. This means that if you do join MIRI, you have already been thinking along the right lines for years.

Obviously you don't want your discussions live streamed to the world, you might come up with dangerous ideas. But I would suggest sticking things online once you understand the area sufficiently well to be confident its safe. If writing it up into a fully formal paper is too time intensive, any rough scraps will still be read by the dedicated.

Comment by donald-hobson on Approval-directed agents · 2018-11-23T00:16:25.574Z · score: 1 (1 votes) · LW · GW

Suppose you have a hyper-computer, and an atom precise scan of Hugh. One naive method of making this approval agent is by simulating Hugh in a box. For every action Arthur could take, a copy of virtual Hugh is given a detailed description of the action and a dial to indicate his approval. The most approved action is taken. This of course will find an action sequence which will brainwash Hugh into approving.

I can't see how "internal approval direction" would avoid errors in Hugh's rating, rather than moving them one level down, if you specified what you mean by this term at all.

Comment by donald-hobson on Iteration Fixed Point Exercises · 2018-11-22T21:43:31.482Z · score: 10 (4 votes) · LW · GW

Let for arbitrary . Call . Then by induction () (power series simplification)

Therefore ie is a cauchy sequence. However is said to be complete, which by definition means any cauchy sequence is convergent. So and So converges exponentially quickly

From part 1, as is continuous, So is a fixed point. Suppose and are both fixed points of a contraction map. Then and so therefore so . Thus has a unique fixed point.

is a metric space. Its the real line with normal distance. Let . Then is a contraction map because is differentiable and has the property . However no fixed point exists as . This works because the sequence generated from repeated applications of will tend to infinity, despite successive terms becoming ever closer.

Comment by donald-hobson on Clickbait might not be destroying our general Intelligence · 2018-11-19T23:55:23.425Z · score: 4 (4 votes) · LW · GW

My point was that the epistemic correlation between communicators is increasing. Before everyone was talking to everyone else more. Now experts can talk to experts, and creationists talk to other creationists. Homeopaths talk to other homeopaths.

Are you saying this is good, is bad, or is happening?

## Clickbait might not be destroying our general Intelligence

2018-11-19T00:13:12.674Z · score: 26 (10 votes)

## Stop buttons and causal graphs

2018-10-08T18:28:01.254Z · score: 5 (3 votes)

## The potential exploitability of infinite options

2018-05-18T18:25:39.244Z · score: 2 (3 votes)