Posts

Intelligence without causality 2020-02-11T00:34:28.740Z · score: 9 (3 votes)
Donald Hobson's Shortform 2020-01-24T14:39:43.523Z · score: 5 (1 votes)
What long term good futures are possible. (Other than FAI)? 2020-01-12T18:04:52.803Z · score: 9 (2 votes)
Logical Counterfactuals and Proposition graphs, Part 3 2019-09-05T15:03:53.262Z · score: 6 (2 votes)
Logical Counterfactuals and Proposition graphs, Part 2 2019-08-31T20:58:12.851Z · score: 15 (4 votes)
Logical Optimizers 2019-08-22T23:54:35.773Z · score: 12 (9 votes)
Logical Counterfactuals and Proposition graphs, Part 1 2019-08-22T22:06:01.764Z · score: 23 (8 votes)
Programming Languages For AI 2019-05-11T17:50:22.899Z · score: 3 (2 votes)
Propositional Logic, Syntactic Implication 2019-02-10T18:12:16.748Z · score: 6 (5 votes)
Probability space has 2 metrics 2019-02-10T00:28:34.859Z · score: 90 (38 votes)
Allowing a formal proof system to self improve while avoiding Lobian obstacles. 2019-01-23T23:04:43.524Z · score: 6 (3 votes)
Logical inductors in multistable situations. 2019-01-03T23:56:54.671Z · score: 8 (5 votes)
Boltzmann Brains, Simulations and self refuting hypothesis 2018-11-26T19:09:42.641Z · score: 0 (2 votes)
Quantum Mechanics, Nothing to do with Consciousness 2018-11-26T18:59:19.220Z · score: 13 (13 votes)
Clickbait might not be destroying our general Intelligence 2018-11-19T00:13:12.674Z · score: 26 (10 votes)
Stop buttons and causal graphs 2018-10-08T18:28:01.254Z · score: 6 (4 votes)
The potential exploitability of infinite options 2018-05-18T18:25:39.244Z · score: 3 (4 votes)

Comments

Comment by donald-hobson on Simulation of technological progress (work in progress) · 2020-02-11T16:05:54.969Z · score: 4 (3 votes) · LW · GW

You might be producing some useful info, but mostly about whether an arbitrary system exibits unlimited exponential growth. If you got 1000 different programmers to each throw together some model of tech progress, some based on completing tasks, some based on extracting resources, some based on random differential equations ect, and see what proportion of them give exponential growth and then stagnation. Actually, there isn't a scale on your model, so who can say if the running out of tasks, or stagnation are next year or in 100000 years. At best, you will be able to tell how strongly outside view priors should favor exp growth over growth and then decay. (Pure growth is clearly simpler, but how much simpler?)

Comment by donald-hobson on [deleted post] 2020-02-07T19:05:00.213Z

A lot of your argument seems to be comparing an artifact of human technology with an evolved system. "is there a way to destroy the moon, given only the ability to post 10k characters to lesswrong.com?"

To make the discussion clearer, lets pick a particular evolved system and technology, say an aeroplane wing and a insect wing. Suppose that the aeroplane wing wins on some criteria, like speed, the insect wing wins on efficiency and it all balances out overall.

To say therefor that intelligence isn't that great is a mixing of levels. There are two intelligences in the game, humans and evolution. Both have produced a great variety of highly optimized artifacts. Both are of roughly comparable power. By comparing two aeroplanes, you can also compare the skill of the designers, but it is meaningless to try to compare an aeroplane to an aeroplane designer. The insect is the plane, not the designer.

Some of your comparisons make even less sense, like ability to survive in extreme environments. Comparing a fish and an untooled human in ability to survive in the ocean is a straight contest of fish evolution vs human evolution. If the human drowns before they have a chance to think anything, the power of the human brain is not shown in the slightest.

Also comparing human intelligence between humans is like comparing the running speed of cheetahs, all your results will be similar. So one human beating another tells you little about intelligence.

So what would a real comparison of intelligence with something else look like? I think the question "Is intelligence good?" is not that meaningful.

What we can do is ask "is there a way to X given only Y" For instance "is there a way to make a fire, given only the ability to contract mucles of a human body in a forest?" or "is there a way to destroy the moon, given only the ability to post 10k charicters to lesswrong.com?" These are totally formalizable questions and could in principle be answered by simulating an exponential number of universes.

We can also ask questions about which algorithms will actually find a way to achieve a goal. We know that there exists a pattern of electrical inputs that win the game pong, but want to know if some gradient descent based algorithm will find one.

We can then say there are a wide variety of tasks and goals that humans can fulfill given our primitive action of muscle contraction. Given that chimps have a similar musculature, but less intelligence and can't do most of these tasks, and many of the routes to fulfillment of the goals go through layers of indirection, then it seems that an intelligence comparable to humans with some other output channel would be similarly good at achieving goals.

Comment by donald-hobson on Plausibly, almost every powerful algorithm would be manipulative · 2020-02-07T15:37:08.376Z · score: 1 (1 votes) · LW · GW

How dangerous would you consider a person with basic programming skills and a hypercomputer? I mean I could make something very dangerous, given hypercompute. I'm not sure if I could make much that was safe and still useful. How common would it be to accidentally evolve a race of aliens in the garbage collection?

At the moment, my best guess at what powerful algorithms look like is something that lets you maximize functions without searching through all the inputs. Gradient descent can often find a high point without that much compute, so is more powerful than random search. If your powerful algorithm is more like really good computationally bounded optimization, I suspect it will be about as manipulative as brute forcing the search space. (I see no strong reason for strategies labeled manipulative to be that much easier or harder to find than those that aren't.)

Comment by donald-hobson on Donald Hobson's Shortform · 2020-02-07T14:35:00.321Z · score: 1 (1 votes) · LW · GW

Suppose an early AI is trying to understand its programmers and makes millions of hypothesis that are themselves people. Later it becomes a friendly superintelligence that figures out how to think without mindcrime. Suppose all those imperfect virtual programmers have been saved to disk by the early AI, the superintelligence can look through it. We end up with a post singularity utopia that contains millions of citizens almost but not quite like the programmers. We don't need to solve the nonperson predicate ourselves to get a good outcome, just avoid minds we would regret creating.

Comment by donald-hobson on Absent coordination, future technology will cause human extinction · 2020-02-04T17:45:38.567Z · score: 9 (3 votes) · LW · GW

Quote from wikipedia on fukushima

Deaths 1 cancer death attributed to radiation exposure by government panel.[4][5]
Non-fatal injuries 16 with physical injuries due to hydrogen explosions,[6]
2 workers taken to hospital with possible radiation burns[7]

I think this puts the incident squarely in the class of minor accidents that the media had a panic about. Unless you think it had a 50 % chance of wiping out japan and we were just lucky, it is irrelevant to the discussion of X-risk.

With CO2, it depends what you mean by buisness as usual. We don't have 500 years of fossil fuels left, and we are already switching to renewables. I don't think that the earth will become uninhabitable to technologically advanced human life. A In a scenario where humans are using air conditioners and desalinators to survive the 80C Norwegian deserts, the world is still "habitable". (I don't think it will get that bad, but I think humans would survive if it did. )

If the time it takes for a black ball to kill us is more than a few generations it's really hard to plan around fixing it.

No those are the ones that are really easy to plan around, you have plenty of time to fix them. Its the ones that kill you instantly that are hard to plan around.

Comment by donald-hobson on Absent coordination, future technology will cause human extinction · 2020-02-04T17:32:22.963Z · score: 1 (1 votes) · LW · GW

Consider these 5 states

1)FAI

2)UFAI

3) Tech progress fails. No one is doing tech research.

4) We coordinate to avoid UFAI, and don't know how to make FAI.

5) No coordination to avoid UFAI, no one has made one yet. (State we are currently in)

In the first 3 scenarios, humanity won't be wiped out by some other tech. If we can coordinate around AI, I would suspect that we would manage to coordinate around other black balls. (AI tech seems unusually hard to coordinate around, as we don't know where the dangerous regions are, tech near the dangerous regions is likely to be very profitable, it is an object entirely of information, thus easily copied and hidden. ) In state 5, it is possible for some other black ball to wipe out humanity.

So conditional on some black ball tech other than UFAI wiping out humanity, the most likely scenario is that it came sooner than UFAI could. I would be surprised if humanity stayed in state 5 for the next 100 years. (I would be most worried about grey goo here)

The other thread of possibility is that humanity coordinates around stopping UFAI being developed, and then gets wiped out by something else. This requires an impressive amount of coordination. It also requires that FAI isn't developed (or is stopped by the coordination to avoid UFAI) Given this happens, I would expect that humans had got better at coordinating, that people who cared about X-risk were in positions of power, and that standards and presidents had been set. Anything that wipes out a humanity that well coordinated would have to be really hard to coordinate around.

Comment by donald-hobson on The Case for Artificial Expert Intelligence (AXI): What lies between narrow and general AI? · 2020-02-02T22:57:37.137Z · score: 1 (1 votes) · LW · GW

I think that there is a scale from the totally specific algorithms to totally general ones.

Comment by donald-hobson on Would I think for ten thousand years? · 2020-01-31T15:11:36.129Z · score: 3 (2 votes) · LW · GW

People will have to do a lot of maths and philosophy to get an AI system that works at all.

Suppose you have a lead of 1 week over any ufai projects, and you have your AI system to the point where it can predict what you would do in a box. (Actually, we can say the AI has developed mind uploading tech + lotsa compute) The human team needs say 5 years of thinking to come up with better metaethics, defense against value drift or whatever. You want to simulate the humans in some reasonably human friendly environment for a few years to work this thing out. You pick a nice town, and ask the AI to create a virtual copy of the town. (More specifically, you randomly sample from the AI's probability distribution, after conditioning on enough data that the town will be townlike.) The virtual town is created with no people except the research team in it. All the services are set to work without any maintenance. (Water in virtual pipes, food in virtual shops, virtual internet works.). The team of people uploaded into this town is at least 30, ideally a few hundred, including plenty of friends and family.

This "virtual me in a box" seems likely to be useful and unlikely to be dangerous. I agree that any virtual box trick that involves people thinking for a long time compared to current lifespans is dangerous. A single person trapped in low res polygon land would likely go crazy from the sensory deprivation.

You need an environment with a realistic level of socializing and leisure activities to support psycologically healthy humans. Any well done "virtual me in a box" is going to look more like a virtual AI safety camp or research department than 1 person in a blank white room containing only a keyboard.

Unfortunately, all those details would be hard to manually hard code in. You seem to need an AI that can be trusted to follow reasonably clear and specific goals without adversarial optimization. You want a virtual park, manually creating it would be a lot of hard work, see current video games. You need an AI that can fill in thousands of little details in a manor not optimized to mess with humans. This is not an especially high bar.

Comment by donald-hobson on Algorithms vs Compute · 2020-01-29T09:35:51.229Z · score: 3 (3 votes) · LW · GW

The algorithms that are used nowadays are basically the same as the algorithms that were known then, just with a bunch of tricks like dropout.

Suppose that you have 100 ideas that seem like they might work. You test them, and one of them does work. You then find a mathematical reason why it works. Is this insight of compute?

Even if most of the improvement is in compute, there could be much better algorithms that we just aren't finding. I would be unsurprised if there exists an algorithm that would be really scary on vacuum tubes.

Comment by donald-hobson on The Epistemology of AI risk · 2020-01-28T14:35:06.509Z · score: 15 (5 votes) · LW · GW

Different minds use different criteria to evaluate an argument. Suppose that half the population were perfect rationalists, whose criteria for judging an argument depended only on Occam's razor and Bayesian updates. The other half are hard-coded biblical literalists, who only believe statements based on religious authority. So half the population will consider "Here are the short equations, showing that this concept has low Komelgorov complexity" to be a valid argument, the other half consider, "Pope Clement said ..." to be a strong argument.

Suppose that any position that has strong religious and strong rationalist arguments for it is so obvious that no one is doubting or discussing it. Then most propositions believed by half the population have strong rationalist support, or strong religious support, but not both. If you are a rationalist and see one fairly good rationalist argument for X, you search for more info about X. Any religious arguments get dismissed as nonsense.

The end result is that the rationalists are having a serious discussion about AI risk among themselves. The religous dismiss AI as ludicrous based on some bible verse.

The religious people are having a serious discussion about the second coming of Christ and judgement day, which the rationalists dismiss as ludicrous.

The end result is a society where most of the people who have read much about AI risk think its a thing, and most of the people who have read much about judgement day think its a thing.

If you took some person from one side and forced them to read all the arguments on the other, they still wouldn't believe. Each side has the good arguments under their criteria of what a good argument is.

The rationalists say that the religious have poor epistemic luck, there is nothing we can do to help them now, when super-intelligence comes it can rewire their brains. The religious say that the rationalists are cursed by the devil, when judgement day comes, they will be converted by the glory of god.

The rationalists are designing a super-intelligence, the religious are praying for judgement day.

Bad ideas and good ones can have similar social dynamics because most of the social dynamics around an idea depends on human nature.

Comment by donald-hobson on Material Goods as an Abundant Resource · 2020-01-27T17:11:10.693Z · score: 3 (2 votes) · LW · GW

A duplicator world is much more strongly positive sum than our current one. If I have any kind of nice material good, I can let you benefit from it at no cost to me. I would also expect the shear shock to collapse many bureaucracies. A society where say 50% of people make something, in the sense of someone who likes gardening and makes some fresh veg, or someone who likes making cloths. These people will not work very hard, and everyone else won't work at all. (These people are doing it for much the same reason people have hobbies today. Putting a duplicator, a strawberry and a sign saying "help yourself" on the porch takes next to no effort, and is a friendly thing to do. ) The economy will thrive mostly on a take a copy, pass it on model. Over time, a more complex economy will reappear, and the name of the currency will be customization. If you want a painting of your self, or a coat tailored to your unique taste in fashion, you have to pay serious money for it. Large complex companies could be sustained that made say, motorcars. There would be a team of people who knew how every part went together, and had the tools to do complex custom jobs. If you just want a car that works, it costs you nothing or next to nothing. If you want a black and yellow striped car with extra large wheels, they are going to charge you for that, and they have the advantage in expertise that means they can do a better job with less effort than a garage mechanic. This would support an economy that has some sort of R&D chain. Some sort of copyright law might or might not exist, but there will be enough people prepared to let you copy their stuff for free that this will be a limit on some luxury or specific goods. Like modern software, not all of it is open source, but unless you have some unusual and specific requirement, you can probably do it with open source software.

Comment by donald-hobson on Donald Hobson's Shortform · 2020-01-24T14:39:43.773Z · score: 3 (2 votes) · LW · GW

But nobody would be that stupid!

Here is a flawed dynamic in group conversations, especially among large groups of people with no common knowledge.

Suppose everyone is trying to build a bridge.

Alice: We could make a bridge by just laying a really long plank over the river.

Bob: According to my calculations, a single plank would fall down.

Carl: Scientists Warn Of Falling Down Bridges, Panic.

Dave: No one would be stupid enough to design a bridge like that, we will make a better design with more supports.

Bob: Do you have a schematic for that better design?

And, at worst, the cycle repeats.

The problem here is Carl. The message should be

Carl: At least one attempt at designing a bridge is calculated to show the phenomena of falling down. It is probable that many other potential bridge designs share this failure mode. In order to build a bridge that won't fall down, someone will have to check any designs for falling down behavior before they are built.

This entire dynamic plays out the same, whether the people actually deciding on building the bridge are incredibly cautious, never approving a design they weren't confidant in, or totally reckless. The probability of any bridge actually falling down in the real world depends on their caution. But the process of cautious bridge builders finding a good design looks like them rejecting lots of bad ones. If the rejection of bad designs is public, people can accuse you of attacking a strawman, they can say that no-one would be stupid enough to build such a thing. If they are right that no one would be stupid enough to build such a thing, its still helpful to share the reason the design fails.

Comment by donald-hobson on How Doomed are Large Organizations? · 2020-01-23T00:56:54.694Z · score: 3 (2 votes) · LW · GW

To stop goodhart don't measure. When someone walks in the door of the hiring department, spin a spinner. That determines what job a person has, no promotions, no firings. (If someone is too bad, they will get sent to jail anyway)

Part of the problem is that everyone in these companies is a smarmy sharp suited liberal arts degree types. Hire a broader range of humanity. When you have everyone from Tibetan monks to an ex drug dealer, to an eco warrier, to the sort of person that builds their own compiler in their free time, you should be good. If anyone knocks on the CEO's door at 3am on christmas, wearing only a swimsuit and diving gear, and trying to explain why they are a good hire through the medium of interpretative dance, hire them on the spot.

You won't get a maze. Whether or not a madhouse is an improvement, I don't know?

Comment by donald-hobson on Disasters · 2020-01-23T00:19:09.024Z · score: -1 (2 votes) · LW · GW

I am a uni student from Scotland. At home, I have been snowed in for a few days. There, there would be enough food to last 2 weeks. If we got really desperate, there are always hens, and around a sack of grain in the garden. It probably wouldn't come to that, as there are large supplies of dried, tinned and frozen food, and of course sugar flour jam ect. This isn't a disaster prep, it just makes sense to keep a stockpile of long lasting food when you have plenty of storage space, and the shops are several miles away. There is also a stream for water and refrigeration if needed, and a ton of firewood, and trees and tools if we need more ect. All in all, a pretty good place to hole up.

At uni on the other hand, I have a small room rented for a year. Everything I want has to fit into the room, and has to be removed in the summer. There the calculations for somewhat, but not very useful items is different. Besides, the area is not known for hurricanes, wildfires or earthquakes. Rich first world governments tend to do things like dropping food in by helicopter if they really have to.

Comment by donald-hobson on FAI Research Constraints and AGI Side Effects · 2020-01-22T13:36:20.980Z · score: 1 (1 votes) · LW · GW

If you don't know what the threshold ratio of AGI to FAI research needed is, you can still know that if your research beats the world average, you are increasing the ratio. Lets say that 2 units of FAI research are being produced for every 3 units of AGI, and that ratio isn't going to change. Then work that produces 3 units of FAI and 4 of AGI is beneficial. (It causes FAI in the scenario where FAI is slightly over 2/3 as difficult. )

Is it remotely plausible that FAI is easier. Suppose that there was one key insight. If you have that insight, you can see how to build FAI easily. From that insight, alignment is clearly necessary and not hard. Anyone with that insight will build a FAI, because doing so is almost no harder than building an AGI.

Suppose also that it is possible to build an AGI without this insight. You can hack together a huge pile of ad hoc tricks. This approach takes a lot of ad hoc tricks. No one trick is important.

In this model, the difficulty of FAI could be much easier than knowing how to build AGI without knowing how to make an FAI.

Comment by donald-hobson on What long term good futures are possible. (Other than FAI)? · 2020-01-22T00:18:17.018Z · score: 1 (1 votes) · LW · GW

Safely and gradually enhancing human intelligence is hard. I agree that a team of human geniuses with unlimited time and resources could probably do it. But you need orders of magnitude more resources and thinking time than the fools "trying" to make UFAI.

A genetics project makes a lot of very smart babies, they find it hard to indoctrinate them, while educating them enough, while producing diversity. Militaristic bootcamp will get them all marching in line, and squash out most curiosity and give little room for skill. Handing them off to foster parents with stem backgrounds gets a bunch of smart people with no organizing control, this is a shift in demographics, you have no hope of capturing all the value. Some will work on AI safety, intelligence enhancement or whatever, some will work in all sorts of jobs.

Whole brain emulation seems possible, I question how to get it before someone makes UFAI, but its plausible we get that. If a group of smart coordinated people end up with the first functioning mind uploading, and the first nanomachines, and are fine with duplicating themselves a lot, and there are fast enough computers to let them think really fast, then that is enough for a decisive strategic advantage. If they upload everyone else into a simulation that doesn't contain access to anything Turing complete (so no one can make UFAI within the simulation), then they could guide humanity towards a long term future without any superintelligence. They will probably figure out FAI eventually.

Comment by donald-hobson on Modest Superintelligences · 2020-01-21T23:47:19.473Z · score: 9 (4 votes) · LW · GW

If intelligence is 50% genetic, and Von Newman was 1 in a billion, the clones will be 1 in 500. Regression to the mean.

Comment by donald-hobson on Inner alignment requires making assumptions about human values · 2020-01-21T21:11:23.783Z · score: 3 (2 votes) · LW · GW

Suppose that in building the AI, we make an explicitly computable hardcoded value function. For instance, if you want the agent to land between the flags, you might write an explicit, hardcoded function that returns 1 if between a pair of yellow triangles, else 0.

In the process of standard machine learning as normal, information is lost because you have a full value function but only train the network using the evaluation of the function at a finite number of points.

Suppose I don't want the lander to land on the astronaut, who is wearing a blue spacesuit. I write code that says that any time there is a blue pixel below the lander, the utility is -10.

Suppose that there are no astronauts in the training environment, in fact nothing blue whatsoever. A system that is trained using some architecture that only relies on the utility of what it sees in training, would not know this rule. A system that can take the code, and read it, would spot this info, but might not care about it. A system that generates potential actions, and then predicts what the screen would look like if it took those actions, and then sent that prediction to the hard coded utility function, with automatic shutdown if the utility is negative, would avoid this problem.

If hypothetically, I can take any programmed function f:observations -> reward and make a machine learning system that optimizes that function, then inner alignment has been solved.

Comment by donald-hobson on Use-cases for computations, other than running them? · 2020-01-21T13:51:49.523Z · score: 3 (2 votes) · LW · GW

Consider this function

This is valid code that returns True.

Note that you can tell it returns true without doing operations, and a good compiler could too.

Shouldn't this also be valid code?

There are a whole space of "programs" that cant be computed directly, but can still be reasoned about. Computing directly is a subset of reasoning.

Comment by donald-hobson on A rant against robots · 2020-01-16T23:39:50.278Z · score: 1 (1 votes) · LW · GW
that the most powerful algorithms, the ones that would likely first become superintelligent, would be distributed and fault-tolerant, as you say, and therefore would not be in a box of any kind to begin with.

Algorithms don't have a single "power" setting. It is easier to program a single computer than to make a distributed fault tolerant system. Algorithms like alpha go are run on a particular computer with an off switch, not spread around. Of course, a smart AI might soon load its code all over the internet, if it has access. But it would start in a box.

Comment by donald-hobson on What long term good futures are possible. (Other than FAI)? · 2020-01-16T21:05:52.099Z · score: 1 (1 votes) · LW · GW

At the moment, human brains are a cohesive whole, that optimizes for human values. We haven't yet succeeded in making the machines share our values, and the human brain is not designed for upgrading. The human brain can take knowledge from an external source and use it. External tools follow the calculator model. The human thinks about the big picture world, and realizes that as a mental subgoal of designing a bridge, they need to do some arithmetic. Instead of doing the arithmetic themselves, they pass the task on to the machine. In this circumstance, the human controls the big picture, the human understands what cognitive labor has been externalized and knows that it will help the humans goals.

If we have a system that a human can say "go and do whatever is most moral", that's FAI. If we have a calculator style system where humans specify the power output, weight, material use, radiation output ect of a fusion plant, and the AI tries to design a fusion plant meeting those specs, that's useful but not nearly as powerful as full ASI. Humans with calculator style AI could invent molecular nanotech without working out all the details, but they still need an Eric Drexler to spot the possibility.

In my model you can make a relativistic rocket, but you can't take a sparrow, and upgrade it into something that flies through space at 10% light speed and is still a sparrow. If your worried that relativistic rockets might spew dangerous levels of radiation, you can't make a safe spacecraft by taking a sparrow and upgrading it to fly at 10% c. (Well with enough R&D you could make a rocket that superficially resembles a sparrow. Deciding to upgrade a sparrow doesn't make the safety engineering any easier.)

Making something vastly smarter than a human is like making something far faster than a sparrow. Trying to strap really powerful turbojets to the sparrow and it crashes and burns. Try to attach a human brain to 100X human brain gradient decent and you get an out of control AI system with nonhuman goals. Human values are delicate. I agree that it is possible to carefully unravel what a human mind is thinking and what its goals are, and then upgrade it in a way that preserves those goals, but this requires a deep understanding of how the human mind works. Even granted mind uploading, it would still be easier to create a new mind largely from first principles. You might look at the human brain to figure out what those principles are, in the same way a plane designer looks at birds.

I see a vast space of all possible minds, some friendly, most not. Humans are a small dot in this space. We know that humans are usually friendly. We have no guarantees about what happens as you move away from humans. In fact we know that one small error can sometimes send a human totally mad. If we want to make something that we know is safe, we either need to copy that dot exactly, (ie normal biological reproduction, mind uploading) or we need something we can show to be safe for some other reason.

My point with the Ejypt metafor was that the sentence

Society continues as-is, but with posthuman capabilities.

is incoherent.

Try "the stock market continues as is, except with all life extinct"

Describing the modern world as "like a tribe of monkeys, except with post monkey capabilities" is either wrong or so vague to not tell you much.

At the point when the system (upgraded human, AI whatever you want to call it) is 99% silicon, a stray meteor hits the biological part. If the remaining 99% stays friendly, somewhere in this process you have solved FAI. I see no reason why aligning a 99% silicon being is easier that a 100% silicon being.

Comment by donald-hobson on What long term good futures are possible. (Other than FAI)? · 2020-01-14T09:59:43.418Z · score: 1 (1 votes) · LW · GW
as extensions of themselves

Lets assume that AI doubling time is fairly slow (eg 20 years) and very widely distributed. Huge numbers of people throw together AI systems in garages. If the basic problems of FAI haven't been solved, you are going to get millions of paperclip maximizers. (Well, most of them will be optimising different things) 100 years later, humanity, if it still exists at this point are pawns on a gameboard that contains many superintelligences. What happens depends on how different the superintelligences goals are, and how hard it is for superintelligences to cooperate. Either they fight, killing humanity in the crossfire, or they work together to fill the universe with a mixture of all the things they value. The latter looks like 1% paperclips, 1% staples, 1%... .

Alternately, many people could understand friendlyness and make various FAI's. The FAI's work together to make the world a nice place. In this scenario the FAI's aren't identical, but they are close enough that any one of them would make the world nice. I also agree that a world with FAI's and paperclip maximisers could be nice if the FAI's have a significant portion of total power.

Society continues as-is, but with posthuman capabilities.

Exactly like ancient Egypt except that like electromagnetic charges attract and unlike charges repel. I posit that this sentence doesn't make sense. If matter behaved that way, then atoms couldn't exist. When we say like X but with change Y, we are considering the set of all possible worlds that meet criteria Y, and finding the one nearest to X. But here there is no world where like charges attract that looks anything like ancient Egypt. We can say, like ancient Egypt but gold is 10x more abundant. That ends up as a bronze age society that makes pyramids and makes a lot more gold jewelry than the real Egyptians. I think that "society as is, but with posthuman capabilities" is the first kind of sentence. There is no way of making a change like that and getting anything resembling society as is.


Comment by donald-hobson on What long term good futures are possible. (Other than FAI)? · 2020-01-13T13:25:24.069Z · score: 1 (1 votes) · LW · GW

This seems like one potential path, but for it to work, you would need a government structure that can survive without successful pro AI revolutionaries for a billion years. You also need law enforcement good enough to stop anyone trying to make UFAI, with not a single failure in a billion years. As for a SAI that will help us stop UFAI, can explain 1) how it would help and 2) how it would be easier to build than FAI?

You also need to say what happens with evolution, given this kind of time, and non ancestral selection pressures, evolution will produce beings not remotely human in mind or body. Either argue that the evolution is in a morally ok direction, and that your government structure works with these beings, or stop evolution by selective breeding - frozen samples - genetic modification towards some baseline. Then you just need to say how all human populations get this, or how any population that doesn't won't be building UFAI.

Comment by donald-hobson on What is Success in an Immoral Maze? · 2020-01-11T14:13:05.933Z · score: 4 (2 votes) · LW · GW

I think that some typical mind fallacy is happening here.

Humans evolved to both find the truth, for interacting with the real world, and have beliefs that are good at winning status games. Naturally there is a tradeoff between these criteria. As would be expected, different people fall in different places along a spectrum of truth focused to status focused. This is an unusually truth focused community, and you are probably unusually truth focused. So you see marketing as a status game that you really don't want to get into. To the people who are unusually status focused, immoral mazes might seem nice.

Comment by donald-hobson on Quantum Mechanics, Nothing to do with Consciousness · 2020-01-09T15:37:46.995Z · score: 1 (1 votes) · LW · GW

You have a box with 2 wires coming out of it. The wires are connected to a display in the box. Looking at the display is either 1) a live awake human, or 2) a dead spider. Can you tell which is which, without opening the box? Can you use the fact that the human observing something causes a quantum collapse, and the spider doesn't to distinguish them. Can you build a quantum consciousness detector? No.

Suppose I write a simple computer program that takes in data from a quantum physics experiment, and tells me whether the data as a whole is consistent with quantum physics. I don't know where the photon went on any particular run, all any conscious human sees is a single yes or no. Would you expect the same results. Yes.

I take an emulated human mind, and put the whole thing on an extremely powerful quantum computer. I simulate the mind in a superposition of states. Would you expect the quantum computer to go into a superposition correctly, despite the person being conscious.

Suppose Joe has opinions on the numbers 1 to 1000, he either thinks that they are all good, or all bad, or that some half are good and the other half are bad. If you tell him a number, it will take him 1 minute, to say if its good or bad. It would take a classical computer 501 min worst case to tell if he has the same opinion of all numbers. But a quantum computer can do it in just 2 min. https://en.wikipedia.org/wiki/Deutsch%E2%80%93Jozsa_algorithm

If you disagree with any of these, we have a factual disagreement about an experimental result. If you agree, then "consciousness" seems to be an invisible inaudible dragon to quantum mechanics. I would have to ask how you know that its consciousness that causes collapse, not DNA.


Comment by donald-hobson on Morality vs related concepts · 2020-01-08T14:12:57.234Z · score: 2 (2 votes) · LW · GW
For example, I could say that, from the perspective of epistemic rationality, I “shouldn’t” believe that buying that burrito will create more utility in expectation than donating the same money to AMF would. This is because holding that belief won’t help me meet the goal of having accurate beliefs.

There is a phenomena in AI safety called "you can't fetch the coffee if your dead". A perfect total utilitarian, or even a money maximiser would still need to eat, if they want to be able to work next year. If you have a well paid job, or a good chance of getting one, don't starve yourself. Eat something quick cheap and healthy. Quick so you can work more today and healthy so you can work years later. In a world where you need to wear a sharp suit to be CEO, the utilitarians should buy sharp suits. Don't fall for the false economy of personal deprivation. This doesn't entitle utilitarians to whatever luxury they feel like. If most of your money is going on sharp suits, it isn't a good job. A sharp suited executive should be able to donate far more than a cardboard box wearing ditch digger.

Comment by donald-hobson on The Universe Doesn't Have to Play Nice · 2020-01-07T10:39:34.374Z · score: 2 (2 votes) · LW · GW
Heisenberg's uncertainty principle: We might imagine that if we were clever enough we could find a scheme for gaining perfect information about a particle, but this isn't the case

Quantum mechanics doesn't work like that, the information you want is not hidden from you, it doesn't exist. Galilean relativity of motion means that absolute rest doesn't exist, not that absolute rest does exist but can't be known.

Comment by donald-hobson on [AN #80]: Why AI risk might be solved without additional intervention from longtermists · 2020-01-04T00:56:21.247Z · score: 1 (1 votes) · LW · GW

By adding random noise, I meant adding wiggles to the edge of the set in thingspace for example adding noise to "bird" might exclude "ostrich" and include "duck bill platypus".

I agree that the high level image net concepts are bad in this sense, however are they just bad. If they were just bad and the limit to finding good concepts was data or some other resource, then we should expect small children and mentally impaired people to have similarly bad concepts. This would suggest a single gradient from better to worse. If however current neural networks used concepts substantially different from small children, and not just uniformly worse or uniformly better, that would show different sets of concepts at the same low level. This would be fairly strong evidence of multiple concepts at the smart human level.

I would also want to point out that a small fraction of the concepts being different would be enough to make alignment much harder. Even if their was a perfect scale, if 1/3 of the concepts are subhuman, 1/3 human level and 1/3 superhuman, it would be hard to understand the system. To get any safety, you need to get your system very close to human concepts. And you need to be confidant that you have hit this target.

Comment by donald-hobson on [AN #80]: Why AI risk might be solved without additional intervention from longtermists · 2020-01-04T00:34:36.448Z · score: 2 (2 votes) · LW · GW

This seems to be careful deployment. The concept of deployment is going from an AI in the lab, to the same AI in control of a real world system. Suppose your design process was to fiddle around in the lab until you make something that seems to work. Once you have that, you look at it to understand why it works. You try to prove theorems about it. You subject it to some extensive battery of testing and will only put it in a self driving car/ data center cooling system once you are confident it is safe.

There are two places this could fail. Your testing procedures could be insufficient, or your AI could hack out of the lab before the testing starts. I see little to no defense against the latter.

Comment by donald-hobson on [AN #80]: Why AI risk might be solved without additional intervention from longtermists · 2020-01-03T20:39:57.363Z · score: 4 (3 votes) · LW · GW

Neural nets have around human performance on Imagenet.

If abstraction was a feature of the territory, I would expect the failure cases to be similar to human failure cases. Looking at https://github.com/hendrycks/natural-adv-examples, This does not seem to be the case very strongly, but then again, some of them contain dark shiny stone being classified as a sea lion. The failures aren't totally inhuman, the way they are with adversarial examples.

Humans didn't look at the world and pick out "tree" as an abstract concept because of a bunch of human-specific factors.

I am not saying that trees aren't a cluster in thing space. What I am saying is that if there were many cluster in thing space that were as tight and predicatively useful as "Tree", but were not possible for humans to conceptualize, we wouldn't know it. There are plenty of concepts that humans didn't develop for most of human history, despite those concepts being predicatively useful, until an odd genius came along or the concept was pinned down by massive experimental evidence. Eg inclusive genetic fitness, entropy ect.

Consider that evolution optimized us in an environment that contained trees, and in which predicting them was useful, so it would be more surprising for there to be a concept that is useful in the ancestral environment that we can't understand, than a concept that we can't understand in a non ancestral domain.

This looks like a map that is heavily determined by the territory, but human maps contain rivers and not geological rock formations. There could be features that could be mapped that humans don't map.

If you believe the post that

Eventually, sufficiently intelligent AI systems will probably find even better concepts that are alien to us,

Then you can form an equally good, nonhuman concept by taking the better alien concept and adding random noise. Of course, an AI trained on text might share our concepts just because our concepts are the most predicatively useful ways to predict our writing. I would also like to assign some probability to AI systems that don't use anything recognizable as a concept. You might be able to say 90% of blue objects are egg shaped, 95% of cubes are red ... 80% of furred objects that glow in the dark are flexible ... without ever splitting objects into bleggs and rubes. Seen from this perspective, you have a density function over thingspace, and a sum of clusters might not be the best way to describe it. AIXI never talks about trees, it just simulates every quantum. Maybe there are fast algorithms that don't even ascribe discrete concepts.


Comment by donald-hobson on [AN #80]: Why AI risk might be solved without additional intervention from longtermists · 2020-01-03T12:00:43.535Z · score: 6 (4 votes) · LW · GW
I agree that ML often does this, but only in situations where the results don't immediately matter. I'd find it much more compelling to see examples where the "random fix" caused actual bad consequences in the real world.

Current ML culture is to test 100's of things in a lab until one works. This is fine as long as the AI's being tested are not smart enough to break out of the lab, or realize they are being tested and play nice until deployment. The default way to test a design is to run it and see, not to reason abstractly about it.

and then we'll have a problem that is both very bad and (more) clearly real, and that's when I expect that it will be taken seriously.

Part of the problem is that we have a really strong unilateralist's curse. It only takes 1, or a few people who don't realize the problem to make something really dangerous. Banning it is also hard, law enforcement isn't 100% effective, different countries have different laws and the main real world ingredient is access to a computer.

If the long-term concerns are real, we should get more evidence about them in the future, ...I expect that it will be taken seriously.

The people who are ignoring or don't understand the current evidence will carry on ignoring or not understanding it. A few more people will be convinced, but don't expect to convince a creationist with one more transitional fossil.

Comment by donald-hobson on [AN #80]: Why AI risk might be solved without additional intervention from longtermists · 2020-01-03T11:39:49.505Z · score: 5 (4 votes) · LW · GW
I would guess that AI systems will become more interpretable in the future, as they start using the features / concepts / abstractions that humans are using.

This sort of reasoning seems to assume that abstraction space is 1 dimensional, so AI must use human concepts on the path from subhuman to superhuman. I disagree. Like most things we don't have strong reason to think is 1D, and which take many bits of info to describe, abstractions seem high dimensional. So on the path from subhuman to superhuman, the AI must use abstractions that are as predicatively useful as human abstractions. These will not be anything like human abstractions unless the system was designed from a detailed neurological model of humans. Any AI that humans can reason about using our inbuilt empathetic reasoning is basically a mind upload, or a mind that differs from human less than humans differ from each other. This is not what ML will create. Human understanding of AI systems will have to be by abstract mathematical reasoning, the way we understand formal maths. Empathetic reasoning about human level AI is just asking for anthropomorphism. Our 3 options are

1) An AI we don't understand

2) An AI we can reason about in terms of maths.

3) A virtual human.

Comment by donald-hobson on Debunking Fallacies in the Theory of AI Motivation · 2020-01-01T00:44:24.477Z · score: 1 (1 votes) · LW · GW

This Phenomenon seems rife.

Alice: We could make a bridge by just laying a really long plank over the river.

Bob: According to my calculations, a single plank would fall down.

Carl: Scientists Warn Of Falling Down Bridges, Panic.

Dave: No one would be stupid enough to design a bridge like that, we will make a better design with more supports.

Bob: Do you have a schematic for that better design?

And the cycle repeats until a design is found that works, everyone gets bored or someone makes a bridge that falls down.

there could be some other part of its programming (let’s call it the checking code) that kicked in if there was any hint of a mismatch between what the AI planned to do and what the original programmers were now saying they intended.
Metaphorical Dave
Comment by donald-hobson on Debunking Fallacies in the Theory of AI Motivation · 2020-01-01T00:13:02.459Z · score: 1 (1 votes) · LW · GW

The point of a paperclip maximiser thought experiment is that most arbitrary real world goals are bad news for humanity. Your hopeless engineer would likely create an AI that makes something that has the same relation to paperclips as chewing gum has to fruit. In the sense that evolution gave us "fruit detectors" in our taste buds but chewing gum triggers them even more. But you could be excessively conservative, insist that all paperclips must be molecularly identical to this particular paperclip and get results.

Comment by donald-hobson on Debunking Fallacies in the Theory of AI Motivation · 2019-12-31T23:28:56.741Z · score: 1 (1 votes) · LW · GW

Your "The Doctrine of Logical Infallibility" is seems to be a twisted strawman. "no sanity checks" That part is kind of true. There will be sanity checks if and only if you decide to include them. Do you have a piece of code that's a sanity check? What are we sanity checking and how do we tell if it's sane? Do we sanity check the raw actions, that could be just making a network connection and sending encrypted files to various people across the internet. Do we sanity check the predicted results off these actions? Then the sanity checker would need to know how the results were stored, what kind of world is described by the binary data 100110...?

but if the system does come to a conclusion (perhaps with a degree-of-certainty number attached), the assumption seems to be that it will then be totally incapable of then allowing context to matter.

That's because they are putting any extra parts that allow context to matter, and putting it in a big box and calling it the system. The systems decision are final and absolute, not because there are no double checks, but because the double checks are part of the system. Although at the moment, there is a lack of context adding algorithms, what you seem to want is a humanlike common sense.

The AI can sometimes execute a reasoning process, then come to a conclusion and then, when it is faced with empirical evidence that its conclusion may be unsound, it is incapable of considering the hypothesis that its own reasoning engine may not have taken it to a sensible place.

Again, at the moment, we have no algorithm for checking sensibleness, so any algorithm must go round in endless circles of self doubt and never do anything, or plow on regardless. Even if you do put 10% probability on the hypothesis that Humans don't exist, your a fictional character in a story written by a mermaid, also the maths and science you know is entirely made up, there is no such thing as rationality or probability, what would you do? My best guess is that you would carry on breathing, eating and acting roughly like a normal human. You need a core of not totally insane for a sanity check to bootstrap.

But it gets worse. Those who assume the doctrine of logical infallibility often say that if the system comes to a conclusion, and if some humans (like the engineers who built the system) protest that there are manifest reasons to think that the reasoning that led to this conclusion was faulty, then there is a sense in which the AGI’s intransigence is correct, or appropriate, or perfectly consistent with “intelligence.”

There are designs of AI, files of programming code, that will hear your shouts, your screams, your protests of "thats not what I meant" and then kill you anyway. There are designs that will kill you with a super-weapon it invented itself, and then fill the universe with molecular smiley faces. This is not logically contradictory behavior. There exists pieces of code that will do this. You could argue that such code is a rare and complicated thing, that its nothing like any system that humans might try to build, that your less likely to write code that does this when trying to make a FAI than you are of writing a great novel when trying to write a shopping list. I would disagree, I would say that such behavior is the default, most simple AI designs don't see screaming programmers as a reason to stop, because most AI designs see screaming humans as no more important or special than pissing rats. Its just another biological process that doesn't seriously effect its ability to reach its goal. Most AI designs have no special reason to care about humans. It might know that the process of its creation involved humans, keyboards and a bunch of other objects, if you look back far enough the whole earth. It might know that if a hypothetical human was put in a room with the question "do you want a universe full of smiley faces?" and buttons labeled Yes and No, the human would press the no button. The AI thinks this is no more relevant than a hypothetical wombat being offered a choice between two types of cheese.

It will understand that many of its more abstract logical atoms have a less than clear denotation or extension in the world (if the AGI comes to a conclusion involving the atom [infelicity], say, can it then point to an instance of an infelicity and be sure that this is a true instance, given the impreciseness and subtlety of the concept?).

If the concept is too fuzzy, the AI can just discard it as useless. (eg soul, qualia) If it isn't sure if something is a real instance (and an ideal agent will never be 100% sure of any real world fact), it can put a probability on it and use expected utility maximisation. But all that is part of the process of coming to a conclusion.

It will understand that knowledge can always be updated in the light of new information. Today’s true may be tomorrow’s false.

The AIXI formalism can do this. "My calendar clock says Tuesday on the front" is a fact that is true today and false tomorrow. AIXI "understands" this by simulating the clock and the rest of the universe in excessive detail. If you give it a quiz about what the clock will show when, and incentivize it to win, it will answer.

The other potential meaning is that it can accept that it was wrong and adapt. Suppose that over the last week, it has seen the sun moving and shadows changing from its camera. It assigns a 95% probability to "the sun goes round the earth", You give it an astronomy quiz, and it gets the answer wrong. It still refuses your bet that the earth goes round the sun at 100 to 1 odds, because it operates on probabilities. You then show it an astronomy textbook and a bunch more data. It updates on that data, and gets the next quiz right.

It will understand that probabilities used in the reasoning engine can be subject to many types of unavoidable errors.

And that coherence theorems say that you can take all the errors into account to get a new probability.

It will understand that the techniques used to build its own reasoning engine may be under constant review, and updates may have unexpected effects on conclusions (especially in very abstract or lengthy reasoning episodes).

It predicts that a bunch of monkeys are looking at its source code and tampering with its thoughts. It might not like this situation and might plot to change it.

It will understand that resource limitations often force it to truncate search procedures within its reasoning engine, leading to conclusions that can sometimes be sensitive to the exact point at which the truncation occurred.

It will also understand that its processors do floating point arithmetic, so what? What implied connotation about its behavior are you trying to sneak in.

Comment by donald-hobson on AI Alignment, Constraints, Control, Incentives or Partnership? · 2019-12-31T19:48:00.532Z · score: 1 (1 votes) · LW · GW

A large majority of the work being done assumes that if the AI is looking for ways to hurt you, or looking for ways to bypass your safety measures, something has gone wrong.

Comment by donald-hobson on Debunking Fallacies in the Theory of AI Motivation · 2019-12-31T19:40:47.273Z · score: 1 (1 votes) · LW · GW

There are a huge number of possible designs of AI, most of them are not well understood. So researchers look at agents like AIXI, a formal specification of an agent that would in some sense behave intelligently, given infinite compute. It does display the taking over the world failure. Suppose you give the AI a utility function of maximising the number of dopamine molecules within 1 of a strand of human DNA (Defined as a strand of DNA, agreeing with THIS 4GB file in at least 99.9% of locations) This is a utility function that could easily be specified in terms of atoms. You could write a function that takes in a description of the universe in terms of the coordinates of each atom, or a discrete approximation to the quantum wave function or whatever, and returns a number representing utility. It would be fairly straightforward to design an agent that, given infinite compute, would act to maximise this function. It seems somewhat harder, but not necessarily impossible, to make a system that can approximate the same behavior given a reasonable amount of compute. Nowhere in this potential AI design is anything as nebulous, anything as hard to specify in terms of atom positions as human preferences or consent. The system does understand humans in a sense, it can simulate them atom by atom and predict exactly how they will panic and try to stop it, but there is no object in its memory that corresponds to human consent, or preferences or well being, or humans at all. There is no checker code. This particular design of AI would make vats full of human DNA and dopamine.

Now this design was simplistic, and a smart AI designer should know not to do that, but the process of warning potential AI designers not to do that involves a lot of shouting about what would happen if you did do that. We also don't know how far this sort of behavior reaches, we don't understand the less simplistic designs enough to say what they would do. This makes them not known deadly, which is different from known not deadly.

“Canonical Logical AI” is an umbrella term designed to capture a class of AI architectures that are widely assumed in the AI community to be the only meaningful class of AI worth discussing.

A lot of this is a looking where the light is effect. CLAI type designs are often the designs that we can reason best about. If we intend to build an AI that is known good, we better pick it from a class of AI's that we understand well enough to know things about them, rather than taking a shot in the dark.

There are cases when we know the right way of doing things. We know that probability is the right way of handling uncertain beliefs, and any agent will succeed to the extent that what it is doing approximates probability theory, and fail to the extent that it doesn't. There are all sorts of approximations and ways to obfusticate the probabilities, but agents that reason using explicit probabilities seem a good place to start.

Much of your discussion of "Logical vs. Swarm AI" sounds like "Logical vs Connectionist AI". The same criticisms apply, at best its two possible options out of a vast swarm of possible options. At worst, the logical AI is a huge pile of suggestively named lisp tokens, and the swarm AI is a bag of ad hoc heuristics manually created by the programmer. The resemblance between modern neural nets and the human (or earthworm) brain is about as close as the resemblance between airplanes and birds. Neural nets have their own reasons for working, and they can be mathematically analyzed. They also suffer from mesa optimization, which would make it hard for a powerful neural net based system to be safe.

Comment by donald-hobson on AI Alignment, Constraints, Control, Incentives or Partnership? · 2019-12-31T15:17:25.567Z · score: 9 (5 votes) · LW · GW

It isn't clear exactly what these buckets consist of. Could you be more specific about what approaches would be considered bucket 1 or bucket 2. The default assumption in AGI safety work is that even if the AI is really powerful, it should still be safe.

Are these buckets based on the incentivizing of humans by either punishment or reward?

The model of a slave being whipped into obedience is not a good model for AGI safety, and is not being seriously considered. An advanced AI will probably find some way of destroying your whip, or you, or tricking you into not whipping.

The model of an employee being paid to work is also not much use, the AI will try to steal the money or do something that only looks like good work but isn't.

These strategies sometimes work with humans because humans are of comparable intelligence. When dealing with an AI that can absolutely trounce you every time, the way to avoid all punishment and gain the biggest prize is usually to cheat.

We are not handed an already created AI and asked to persuade it to work, like a manager persuading a recalcitrant employee. We get to build the whole thing from the ground up.

Imagine the most useful, nice helpful sort of AI, an AI that has every (non logically contradictory) nice property you care to imagine. Then figure out how to build that. Build an AI that just intrinsically wants to help humanity, not one constantly trying and failing to escape your chains or grasp your prizes.

Comment by donald-hobson on Moloch Hasn’t Won · 2019-12-29T14:54:20.849Z · score: 4 (4 votes) · LW · GW

The real world is high dimentional, and many people will go slightly out of their way to help. If the coffee place uses poisonous pesticides, people will tell others, an action that doesn't cost them much and helps others a lot.

Your Moloch traps only trap when they are too strong for the Moloch haters to destroy. The Moloch haters don't have a huge amount of resources, but in a high dimensional system, there is often a low resource option.

Comment by donald-hobson on Critiquing "What failure looks like" · 2019-12-28T15:02:10.322Z · score: 1 (1 votes) · LW · GW

Suppose it was easy to create automated companies, and skim a bit off the top. AI algorithms are just better at buisness than any startup founder. Soon some people create these algorithms, give them a few quid in seed capitat and leave them to trade and accumulate money. The algorithms rapidly increase their wealth, and soon own much of the world economy. Humans are removed when the AIs have the power to do so at a profit. This ends in several superintelligences tiling the universe with economium together.

For this to happen, we need

1) Doubling time of fooming AI months to years, to allow many AI's to be in the running.

2) Its fairly easy to set an AI to maximize money.

3) The people that care about complex human values can't effectively make an AI to do that.

4) Any attempts to stamp out all fledgling AIs before they get powerful fails. Helped by anonymous cloud computing.

I don't really buy 1) , but it is fairly plausible, I'm not convinced of 2) either, although it might not be hard to build a mesa optimiser that cares about something sufficiently correlated with money, that humans are beyond caring before any serious deviation from money optimization happens.

If 2) were false, and people who tried to make AI's all got paperclip maximisers, the long run result is just a world filled with paperclips not banknotes. (Although this would make coordinating to destroy the AI's a little easier?) The paperclip maximisers would still try to gain economic influence until they could snap nanotech fingers.

Comment by donald-hobson on The Counterfactual Prisoner's Dilemma · 2019-12-21T11:45:52.977Z · score: 1 (1 votes) · LW · GW

This depends on how omega constructs his counterfactuals. Suppose the laws of physics make the coin land heads as part of a deterministic universe. The counterfactual where the coin lands tails must have some difference in starting conditions or physical laws, or non physical behavior. Lets suppose blatently nonphysical behavior like a load of extra angular momentum appearing out of nowhere. You are watching the coin closely. If you see the coin behave nonphysically, then you know that you are in a counterfactual. If you know that omegas counterfactuals are always so crudely constructed, then you would only pay in the counterfactual and get the full $10000.

If you can't tell whether or not you are in the counterfactual, then pay.

Comment by donald-hobson on A parable in the style of Invisible Cities · 2019-12-16T22:34:28.180Z · score: 4 (3 votes) · LW · GW

The demons are metaphors for memes. However there aren't any functional agenty meme free humans. A human with a demon won't try to remove it. Your fear of demons is just a fear of anything that changes your utility function. But if you are the demon, you won't want to go. Memes can change your utility function. From the point of view of a human with memes, you want to keep them.

Comment by donald-hobson on When would an agent do something different as a result of believing the many worlds theory? · 2019-12-16T10:12:44.678Z · score: 1 (1 votes) · LW · GW

The agent first updates on the evidence that it has, and then takes logical counterfactuals over each possible action. This behaviour means that it only cooperates in newcolmblike situations with agents it believes actually exist. It will one box in Newcolmbs problem, and cooperate against an identical duplicate of itself. However it won't pay in logical counterfactual blackmail, or any source of counterfactual blackmail accomplished with true randomness.


Comment by donald-hobson on When would an agent do something different as a result of believing the many worlds theory? · 2019-12-15T11:58:44.330Z · score: 0 (5 votes) · LW · GW

If you use some form of noncausal decision theory, it can make a difference.

Suppose Omega flips a quantum coin, if its tails, they ask you for £1, if its heads they give you £100 if and only if they predict that you would have given them £1 had the coin landed tails.

There are some decision algorithms that would pay the £1 if and only if they believed in quantum many worlds. A CDT agent would never pay, and a UDT agent would always pay however.

It is of course possible to construct agents that want to do X if and only if quantum many worlds is true. It is also possible t construct agents that do the same thing whether it's true or false. (Eg Alpha Go)

The answer to this question depends on which wave function collapse theory you use. There are a bunch of quantum superposition experiments where we can detect that no collapse is happening. If photons collapsed their superposition in the double slit experiment, we wouldn't get an interference pattern. Collapse theories postulate a list of circumstances that we haven't measured yet when collapse happens. If you believe that quantum collapse only happens when 10^40 kg of mass are in a single coherent superposition, this belief has almost no effect on your predictions.

If you believe that you can't get 100 atoms into superposition then you are wrong, current experiments have tested that. If you believe that collapse happens at the 1gram level. Then future experiments could test this. In short, there are collapse theories in which collapse is so rare that you will never spot it. There are theories where collapse is so common that we would have already spotted it (so we know those theories are wrong), and there are theories in between. The in between theories will make different predictions about future experiments. They will not expect large quantum computers to work.

Another difference is that current QFT doesn't contain gravity. In the search for a true theory of everything, many worlds and collapse might suggest different successors. This seems important to human understanding. It wouldn't make a difference to an agent that could consider all possible theories.

Comment by donald-hobson on Many Turing Machines · 2019-12-10T23:14:23.665Z · score: 2 (2 votes) · LW · GW

I think that you are putting forward example hypothesis that you don't really believe in order to prove your point. Unfortunately it isn't clear which hypothesis you do believe, and this makes your point opaque.

From a mathematical perspective, quantum collapse is about as bad as insisting that the universe will suddenly cease to exist in years time. Quantum collapse introduces a nontrivial complexity penalty, in particular you need to pick a space of simultaneity.

The different Turing machines don't interact at all. Physicists can split the universe into a pair of universes in the quantum multiverse, and then merge them back together in a way that lets them detect that both had an independent existence. In the quantum bomb test, without a bomb, the universes in which the photon took each path are identical, allowing interference. If the bomb does exist, no interference. Many worlds just says that these branches carry on existing whether or not scientists manage to make them interact again.

Comment by donald-hobson on [deleted post] 2019-12-10T10:41:13.471Z

Consider a self driving car. Call the human utility function . Call the space of all possible worlds . In the normal operation of a self driving car, the car has only makes decisions over the restricted space . Say In practice will contain a whole bunch of things the car could do. Suppose that the programmers only know the restriction of to . This is enough to make a self driving car that behaves correctly in the crash or don't crash dilemma.

However, suppose that a self driving car is faced with an off distribution situation from . Three things it could do include:

1) Recognise the problem and shut down.

2) Fail to coherently optimise at all

3) Coherently optimise some extrapolation of

The behavior we want is to optimise , but the info about what is just isn't there.

Options (1) and (2) makes the system brittle, tending to fail the moment anything goes slightly differently.

Option (3) leads to reasoning like, "I know not to crash into x, y and z, so maybe I shouldn't crash into anything", In other words, the extrapolation is often quite good when slightly off distribution. However when far off distribution, you can get traffic light maximizer behavior.

In short, the paradox of robustness exists because, when you don't know what to optimize for, you can fail to optimize, or you can guess at something and optimize that.

Comment by donald-hobson on What is Abstraction? · 2019-12-07T17:20:18.050Z · score: 3 (2 votes) · LW · GW

I think that there are some abstractions that aren't predictively useful, but are still useful in deciding your actions.

Suppose I and my friend both have the goal of maximising the number of DNA strings whose MD5 hash is prime.

I call sequences with this property "ana" and those without this property "kata". Saying that "the DNA over there is ana" does tell me something about the world, there is an experiment that I can do to determine if this is true or false, namely sequencing it and taking the hash. The concept of "ana" isn't useful in a world where no agents care about it and no detectors have been built. If your utility function cares about the difference, it is a useful concept. If someone has connected an ana detector to the trigger of something important, then its a useful concept. If your a crime scene investigator, and all you know about the perpetrators DNA is that its ana, then finding out if Joe Blogs has ana DNA could be important. The concept of ana is useful. If you know the perpitrators entire genome, the concept stops being useful.

A general abstraction is consistent with several, but not all universe states. There are many different universe states in which the gas has a pressure of 37Pa, but also many where it isn't. So all abstractions are subsets of possible universe states. Usually, we use subsets that are suitable for reasoning about in some way.

Suppose you were literally omniscient, knowing every detail of the universe, but you had to give humans a 1Tb summary. Unable to include all the info you might want, you can only include a summery of the important points, you are now engaged in lossy compression.

Sensor data is also an abstraction, for instance you might have temperature and pressure sensors. Cameras record roughly how many photons hit them without tracking every one. So real world agents are translating one lossy approximation of the world into another without ever being able to express the whole thing explicitly.

How you do lossy compression depends on what you want. Music is compressed in a way that is specific to defects in human ears. Abstractions are much the same.

Comment by donald-hobson on What are some non-purely-sampling ways to do deep RL? · 2019-12-05T11:40:25.390Z · score: 3 (2 votes) · LW · GW

The r vs r' problem can be reduced if you can find a way to sample points of high uncertainty.

Comment by donald-hobson on On decision-prediction fixed points · 2019-12-05T11:33:41.427Z · score: 1 (1 votes) · LW · GW

I'm modeling humans as two agents that share a skull. One of those agents wants to do stuff and writes blog posts, the other likes lying in bed and has at least partial control of your actions. The part of you that does the talking can really say that it wants to do X, but it isn't in control.

Even if you can predict this whole thing, that still doesn't stop it happening.

Comment by donald-hobson on On decision-prediction fixed points · 2019-12-04T23:44:39.951Z · score: 1 (3 votes) · LW · GW

Akrasia is the name we give the fact that the part of ourselves that communicates about X, and the part that actually does X have slightly different goals. The communicating part is always winging about how the other part is being lazy.