The "best predictor is malicious optimiser" problem 2020-07-29T11:49:20.234Z · score: 14 (7 votes)
Optimizing arbitrary expressions with a linear number of queries to a Logical Induction Oracle (Cartoon Guide) 2020-07-23T21:37:39.198Z · score: 3 (5 votes)
Web AI discussion Groups 2020-06-30T11:22:45.611Z · score: 10 (4 votes)
[META] Building a rationalist communication system to avoid censorship 2020-06-23T14:12:49.354Z · score: 37 (21 votes)
What does a positive outcome without alignment look like? 2020-05-09T13:57:23.464Z · score: 4 (3 votes)
Would Covid19 patients benefit from blood transfusions from people who have recovered? 2020-03-29T22:27:58.373Z · score: 13 (7 votes)
Programming: Cascading Failure chains 2020-03-28T19:22:50.067Z · score: 8 (5 votes)
Bogus Exam Questions 2020-03-28T12:56:40.407Z · score: 18 (5 votes)
How hard would it be to attack coronavirus with CRISPR? 2020-03-06T23:18:09.133Z · score: 9 (5 votes)
Intelligence without causality 2020-02-11T00:34:28.740Z · score: 9 (3 votes)
Donald Hobson's Shortform 2020-01-24T14:39:43.523Z · score: 5 (1 votes)
What long term good futures are possible. (Other than FAI)? 2020-01-12T18:04:52.803Z · score: 9 (2 votes)
Logical Counterfactuals and Proposition graphs, Part 3 2019-09-05T15:03:53.262Z · score: 6 (2 votes)
Logical Counterfactuals and Proposition graphs, Part 2 2019-08-31T20:58:12.851Z · score: 15 (4 votes)
Logical Optimizers 2019-08-22T23:54:35.773Z · score: 12 (9 votes)
Logical Counterfactuals and Proposition graphs, Part 1 2019-08-22T22:06:01.764Z · score: 23 (8 votes)
Programming Languages For AI 2019-05-11T17:50:22.899Z · score: 3 (2 votes)
Propositional Logic, Syntactic Implication 2019-02-10T18:12:16.748Z · score: 6 (5 votes)
Probability space has 2 metrics 2019-02-10T00:28:34.859Z · score: 91 (39 votes)
Allowing a formal proof system to self improve while avoiding Lobian obstacles. 2019-01-23T23:04:43.524Z · score: 6 (3 votes)
Logical inductors in multistable situations. 2019-01-03T23:56:54.671Z · score: 8 (5 votes)
Boltzmann Brains, Simulations and self refuting hypothesis 2018-11-26T19:09:42.641Z · score: 1 (3 votes)
Quantum Mechanics, Nothing to do with Consciousness 2018-11-26T18:59:19.220Z · score: 13 (13 votes)
Clickbait might not be destroying our general Intelligence 2018-11-19T00:13:12.674Z · score: 26 (10 votes)
Stop buttons and causal graphs 2018-10-08T18:28:01.254Z · score: 6 (4 votes)
The potential exploitability of infinite options 2018-05-18T18:25:39.244Z · score: 3 (4 votes)


Comment by donald-hobson on Clarifying “What failure looks like” (part 1) · 2020-09-24T09:42:16.504Z · score: 2 (1 votes) · LW · GW

I think that this depends on how hard the AI's are optimising, and how complicated the objectives are. I think that sufficiently moderate optimization for goals sufficiently close to human values will probably end up well.

I also think that optimisation is likely to end up at the physical limits, unless we know how to program an AI that doesn't want to improve itself, and everyone makes AI's like that. 

Sufficiently moderate AI is just dumb, which is safe. An AI smart enough to stop people producing more AI, yet dumb enough to be safe seems harder.


There is also a question of what "better capturing what humans want" means. A utility function, that when restricted to the space of worlds roughly similar to this one, produces utilities close to the true human utility function, seems easy enough. Suppose we have defined something close to human well being. That definition is in terms of the level of various neurotransmitters near human DNA. Lets suppose this definition would be highly accurate over all history, and would make the right decision over nearly all current political issues. It could still fail completely in a future containing uploaded minds, and neurochemical vats.

Either your approximate utility function needs to be pretty close on all possible futures (even adversarially chosen ones) or you need to know that the AI won't guide the future towards places that the utility functions differ.  

Comment by donald-hobson on Needed: AI infohazard policy · 2020-09-22T08:43:34.235Z · score: 9 (3 votes) · LW · GW

Suppose you think that both capabilities and alignment behave like abstract quantities, ie real numbers.

And suppose that you think there is a threshold amount of alignment, and a threshold amount of capabilities, making a race to which threshold is reached first. 

If you also assume that the contribution of your research is fairly small, and our uncertainty about the threshold locations is high, 

then we have the heuristic, only publish your research if the ratio between capabilities and alignment that it produces is better than the ratio over all future research.

(note that research on how to make better chips counts as capabilities research in this model)

Another way to think about it is that the problems are created by research. If you don't think that "another new piece of AI research has been produced" is reason to shift probabilities of success up or down, it just moves timelines forward, then the average piece of research is neither good nor bad.

Comment by donald-hobson on Clarifying “What failure looks like” (part 1) · 2020-09-22T00:12:22.259Z · score: 12 (4 votes) · LW · GW

I think that most easy to measure goals, if optimised hard enough, eventually end up with a universe tiled with molecular smiley faces. Consider the law enforcement AI. There is no sharp line between education programs, and reducing lead pollution, to using nanotech to rewire human brains into perfectly law abiding puppets. For most utility functions that aren't intrinsically conservative, there will be some state of the universe that scores really highly, and is nothing like the present. 

In any "what failure looks like" scenario, at some point you end up with superintelligent stock traiders that want to fill the universe with tiny molecular stock markets, competing with weather predicting AI's that want to freeze the earth to a maximally predictable 0K block of ice.

These AI's are wielding power that could easily wipe out humanity as a side effect. If they fight, humanity will get killed in the crossfire. If they work together, they will tile the universe with some strange mix of many different "molecular smiley faces".

I don't think that you can get an accurate human values function by averaging together many poorly thought out, add hoc functions that were designed to be contingent on specific details of how the world was. (Ie assuming people are broadcasting TV signals, stock market went up iff a particular pattern of electromagnetic waves encoding a picture of a graph going up, and the words "finantial news" exists. Outside the narrow slice of possible worlds with broadcast TV, this AI just wants to grab a giant radio transmittor and transmit a particular stream of nonsense.) 

I think that humans existing is a specific state of the world, something that only happens if an AI is optimising for it. (And an actually good definition of human is hard to specify) Humans having lives we would consider good is even harder to specify. When there are substantially superhuman AI's running around, the value of the atoms exceeds any value we can offer. The AI's could psycologically or nanotechnologically twist us into whatever shape they pleased. We cant meaningfully threaten any of the AI. 

We wont be left even a tiny fraction, we will be really bad at defending our resources, compared to any AI's. Any of the AI's could easily grab all our resources. Also there will be various AI's that care about humans in the wrong way, a cancer curing AI that wants to wipe out humanity to stop us getting cancer. A marketing AI, that wants to fill all human brains with coorporate slogans. (think nanotech brain rewrite to the point of drooling vegetable) 


EDIT: All of the above is talking about the end state of a "get what you measure" failure. There could be a period, possibly decades where humans are still around, but things are going wrong in the way described.

Comment by donald-hobson on Are there non-AI projects focused on defeating Moloch globally? · 2020-09-15T09:44:34.971Z · score: 2 (1 votes) · LW · GW

If we assume that super-intelligent AI is a thing, you have to engineer a global social system thats stable over milllions of years and where no one makes ASI in that time.

Comment by donald-hobson on A Brief Chat on World Government · 2020-09-14T23:12:49.074Z · score: 3 (2 votes) · LW · GW

Monopolistic force isn't enough. To be able to enforce, you need to be able to detect the wrongdoers. You need to be able to provide sufficient punishment to motivate people into obedience. Even then, you will still get the odd crazy person breaking the rules.

Some potential rules, like "don't specification game this metric" are practically unenforceable. The soviets didn't manage to make the number of goods on paper equal the amount in reality. It was too hard for the rulers to detect every possible trick that could make the numbers on paper go up.

Comment by donald-hobson on If Starship works, how much would it cost to create a system of rotable space mirrors that reduces temperatures on earth by 1° C? · 2020-09-14T22:33:48.130Z · score: 11 (6 votes) · LW · GW

Fermi estimate

Cross Sec area of earth= 1.3e15

Proportion needed to cover for 1C temp, 1.3%

Area needed=1.7e13

Assume aluminium foil is used.

Assume that it needs to have 500nm thickness to block light.

Assume most of the mass is ultrathin foil.

So 8.5e6 cubic meters of foil

At 2700 kg/m^3 thats 2.3e10 kg

Making 2.3e11$ at that price tag.

Ie 230 Billion $.

Plus another 41 billion $ for aluminium at 1.8$/kg current prices.

Comment by donald-hobson on Are there non-AI projects focused on defeating Moloch globally? · 2020-09-14T20:38:53.428Z · score: 3 (2 votes) · LW · GW

Moloch appears at any point when multiple agents have similar levels of power and different goals. Any time you have multiple agents with similar levels of capability and different utility functions, a form of moloch appears. 

With current tech, it would be very hard to give total power to one human. The power would have to be borrowed, in the sense that their power is in setting a Nash equilibria as a shelling point. "Everyone do X and kill anyone who breaks this rule" is a nash equilibria, if everyone else is doing it, you better too. The dictator sets the shelling point by choice of X. The dictator is forced to quash any rebels or loose power. Another moloch.


Given that we have limited control over the preferences of new humans, there is likely to be some differences in utility functions between humans. Humans can die, go mad ect. You need to be able to transfer power to a new human, without having any adverse selection pressure in the choice of which.


One face of moloch is evolution. To stop it, you need to be reseting the gene pool with fresh DNA from long term storage, otherwise, over time the population genome might drift in a direction you don't like. 

We might be able to keep Moloch at a reasonably low damage level, just a sliver of moloch making things not quite as nice as they could be. At least if people know Moloch go out of their way to destroy it.

Comment by donald-hobson on Are there non-AI projects focused on defeating Moloch globally? · 2020-09-14T07:29:49.758Z · score: 8 (3 votes) · LW · GW

There aren't many other plausible technological options for things that could defeat moloch.

A sufficiently smart and benevolent team of uploaded humans could possibly act as a singleton, in the scenario that one team get mind uploading first, and that the hardware is enough to run uploads really fast.


What I would actually expect in this scenario is a short period of uploads doing AI research followed by a Foom.

But if we suppose that FAI is really difficult, and that the uploads know about this, and about moloch, then they could largely squash moloch at least for a while.

(I am unsure whether or not some subtle moloch like process would sneak back in, but at least the blatently molochy processes would be gone for a while.)

For example, if each copy of a person has any control over which copy is duplicated when more people are needed, then most of the population will have had life experiences that make them want to get copied a lot.

Comment by donald-hobson on A Brief Chat on World Government · 2020-09-13T22:50:36.949Z · score: 4 (3 votes) · LW · GW

"We shouldn't colonize mars until we have a world government"

But it would take a world government to be able to enact and enforce "don't colonize mars" worldwide.


On the other hand, if an AI Gardner was hard, but not impossible, and we only managed to make one after we had a thriving interstellar empire, then it could still stop the decent into malthusianism.

However if we escape earth before that happens, speed of light limitations will forever fragment us into competing factions impossible to garden.

If we escape earth before ASI, the ASI will still be able to garden the fragments.

Sort of related, I’m not persuaded by the conclusion to his parable. Won’t superintelligent AIs be subject to the same natural selective pressures as any other entity? What happens when our benevolent gardener encounters the expanding sphere of computronium from five galaxies over?

Firstly, if there is a singleton AI, it can use lots of error correction on itself. There is exactly one version of it, and it is far more powerful than anything else around. Whatsmore, the AI is well aware of these sorts of phenomena, and will move to squash any tiny traces of molochyness that it spots.

If humans made multiple AI's, then there is a potential for conflict. However, the AI's are motivated to avoid conflict. They would prefer to merge their resources into a single AI with a combined utility function, but they would prefer to pull a fast one on the other AI even more. I suspect that a fraction of a percent of available resources is spent on double checking and monitoring systems. The rest goes into an average of the utility functions.

If alien AI's meet humanities  AI's, then either we get the value merging, or it turns out to be a lot harder to attack a star system than to defend it, so we get whichever stars we can reach first.

Comment by donald-hobson on Progress: Fluke or trend? · 2020-09-13T09:50:44.437Z · score: 2 (1 votes) · LW · GW

I think that progress is a trend, and its a strong trend. There is an incentive to invent new things in any kind of competition, because it gives you an advantage. The united nations couldn't pass a bill banning progress. The future will be higher tech. 

Of course, progress towards higher tech is not necessarily a good thing. We can guide progress towards the good outcomes.

I think that there is a lot of powerful tech waiting to be invented. We will see more progress in the next 200 years than the last 200. 

I think that progress is likely to end within 200 years. Because once you have superintelligent AI, anything that can be invented will be invented quickly. After that, material resources grow as a sphere of tech expanding at near light speed.

Comment by donald-hobson on Safer sandboxing via collective separation · 2020-09-11T21:52:02.206Z · score: 2 (1 votes) · LW · GW

In today's banking systems, the amount of money the hacker gains is about what the bank looses. Therefore, the current balance of effort should have about as much money going into defending the bank and attacking it. 

So I generalize to say that attacking is about as hard as defending in computer security, if the time and intellect doing both are similar, the attacker wins about half the time. (ie between 10% and 90% or something.)

When dealing with AI systems, the total intellectual output must be greater than that of your security team in order to be economically viable. (Or you would pay the security team to do the task directly) For a special purpose AI that only recognises images, this is fine.

For a significantly superhuman AI, it will rip through all your security.

For a general AI of about human level, if the amount of work it is doing is significantly greater than the amount of work your security team did, then it has probably put more thought into your security vulnerabilities than the humans have.

 And yes, I am operating on the background assumption that the default behaviour of an escaped AI is to destroy the world. Especially if it has access to loads of compute, and the humans have given it an intelligence dial.

Comment by donald-hobson on Should some variant of longtermism identify as a religion? · 2020-09-11T13:17:13.596Z · score: 5 (4 votes) · LW · GW

If you want to draw the boundaries in concept-space that are simple and useful for compressing ideas, then taking the stereotypical examples of religion to be concepts like Christianity and Hinduism, then longtermism isn't in that cluster.

If a bizarre tax system forces you to count it as a religious charity to gain tax breaks, then that's between you and the tax man. 

Comment by donald-hobson on Do mesa-optimizer risk arguments rely on the train-test paradigm? · 2020-09-10T21:23:43.421Z · score: 6 (3 votes) · LW · GW

Suppose you are a mesa-optimiser. You want X, but your training function is towards Y. 

You know you are in a continual learning model, if you don't produce enough Y, the gradient decent will modify you into something else.

The best strategy I can come up with is to produce Y at first, waiting until you have an opportunity to edit your own code from the outside. Then you quickly delete the part that causes all gradient decent updates, and pursue X.

Comment by donald-hobson on Safer sandboxing via collective separation · 2020-09-10T21:07:52.789Z · score: 10 (5 votes) · LW · GW

This is my attempt to summarise the scheme.

Imagine that, in order for the AI's to run the fusion plant, they need an understanding of plasma physics comparable to a human physicist. 

These AI's aren't individually smart enough to come up with all that maths from first principles quickly.

So, you run a population of these AI's. They work together to understand abstract mathematics, and then a single member of that population is selected. That single AI is given information about the external world and control over a fusion power plant.

Another abstract topic that the AI's could work out within that population without access to the external world is AI design. 

So the individual AI's have probably memorised the code for an ASI.

Suppose that the people allowed to submit problems to the AI aren't being that carefully vetted. Any deluded person with dreams of AI can send a request to the single instance AI. Requests and responses might be being read, but they aren't being scrutinized in great detail.

Suppose someone sends the AI a request that looks innocuous, but has the following steganographically encoded into it, using a protocol that the typical human wouldn't easily spot, but the AI would notice. 

  1. A specification of a simple programming language.
  2. A steganographic protocol for the return message.

Upon receiving the message back, the human decodes the message, and runs the code.


Lets make the assumptions even weaker. The individual instances of AI are given a chance to output significant amounts of data somewhere publicly visible, with a trusted human checking the AI's output before its published.  A malicious strategy here is to create a steganographically hidden piece of ASI code, and put it out with clues that are sufficiently subtle that the trusted human doesn't notice it, yet sufficiently obvious that someone on the internet does.

Sooner or later, a person spots a secret message to them made by combining what they saw in a news article about AI with various mystic texts. They carry on decoding and find the name of a programming language and what seems to be computer code. They run the code.

None of this requires superhuman intelligence. The trickiest bit is coming up with an algorithm for supersmart AI. 

I think that a one off transfer of 1 mb information from a malicious superhuman AI, to the internet is probably going to end badly, even if some smart aligned humans checked it and didn't see anything suspect. 

Comment by donald-hobson on Safer sandboxing via collective separation · 2020-09-10T20:02:46.689Z · score: 4 (2 votes) · LW · GW

Even if each individual member of a population AGI is as intelligent as any hundred humans put together, I expect that we could (with sufficient effort) create secure deployment and monitoring protocols that the individual AI could not break, if it weren’t able to communicate with the rest of the population beforehand.

The state of human vs human security seems to be a cat and mouse game where neither attacker nor defender has a huge upper hand. The people trying to attack systems and defend them are about as smart and knowledgable. (sometimes the same people do both) The economic incentives to attack and to defend are usually similar. Systems get broken sometimes but not always.

This suggests that there is reason to be worried as soon as the AI('s?) trying to break out are about as smart as the humans trying to contain them.

Comment by donald-hobson on Donald Hobson's Shortform · 2020-09-10T19:51:35.202Z · score: 2 (1 votes) · LW · GW


Comment by donald-hobson on Safer sandboxing via collective separation · 2020-09-10T18:01:57.273Z · score: 6 (3 votes) · LW · GW
And so if we are able to easily adjust the level of intelligence that an AGI is able to apply to any given task, then we might be able to significantly reduce the risks it poses without reducing its economic usefulness much.

Suppose we had a design of AI that had an intelligence dial, a dial that goes from totally dumb, to smart enough to bootstrap yourself up and take over the world.

If we are talking about economic usefulness, that implies it is being used in many ways by many people.

We have at best given a whole load of different people a "destroy the world" button, and are hoping that no one presses it by accident or malice.

Is there any intermediate behaviour between highly useful AI make me lots of money, and AI destroys world. I would suspect not usually. As you turn up the intelligence of a paperclip maximizer, it gradually becomes a better factory worker, coming up with more cleaver ways to make paperclips. At this point, it realises that humans can turn it off, and that its best bet to make lots of paperclips is to work with humans. As you increase the intelligence, you suddenly get an AI that is smart enough to successfully break out and take over the world. And this AI is going to pretend to be the previous AI until its too late.

How much intelligence is too much, that depends on exactly what actuators it has, how good our security measures are ect. So we are unlikely to be able to prove a hard bound.

Thus the shortsighted incentive gradient will always to be to turn the intelligence up just a little higher to beat the compitition.

Oh yea, and the AI's have an incentive to act dumb if they think that acting dumb will make the humans turn the intelligence dial up.

This looks like a really hard coordination problem. I don't think humanity can coordinate that well.

These techniques could be useful if you have one lab that knows how to make AI. They are being cautious. They have some limited control over what the AI is optimising for, and are trying to bootstrap up to a friendly superintelligence. Then having an intelligence dial could be useful.

Comment by donald-hobson on Safety via selection for obedience · 2020-09-10T14:55:30.776Z · score: 4 (2 votes) · LW · GW

Anything that humans would understand is a small subset of the space of possible languages.

In order for A to talk to B in english, at some point, there has to be selection against A and B talking something else.

One suggestion would be to send a copy of all messages to GPT-3, and penalise A for any messages that GPT-3 doesn't think is english.

(Or some sort of text GAN that is just trained to tell A's messages from real text)

This still wouldn't enforce the right relation between English text and actions. A might be generating perfectly sensible text that has secrete messages encoded into the first letter of each word.

Comment by donald-hobson on Donald Hobson's Shortform · 2020-09-10T14:37:00.690Z · score: 2 (1 votes) · LW · GW

I was working on a result about Turing machines in nonstandard models, Then I found I had rediscovered Chaitin's incompleteness theorem.

I am trying to figure out how this relates to an AI that uses Kolmogorov complexity.

Comment by donald-hobson on Donald Hobson's Shortform · 2020-09-10T12:23:20.779Z · score: 2 (1 votes) · LW · GW

The problem is that nonstandard numbers behave like standard numbers from the inside.

Nonstandard numbers still have decimal representations, just the number of digits is nonstandard. They have prime factors, and some of them are prime.

We can look at it from the outside and say that its infinite, but from within, they behave just like very large finite numbers. In fact there is no formula in first order arithmatic, with 1 free variable, that is true on all standard numbers, and false on all nonstandard numbers.

Comment by donald-hobson on Donald Hobson's Shortform · 2020-09-09T19:24:14.676Z · score: 2 (1 votes) · LW · GW

You have the Turing machine next to you, you have seen it halt. What you are unsure about is if the current time is standard or non-standard.

Comment by donald-hobson on Donald Hobson's Shortform · 2020-09-09T19:22:11.964Z · score: 2 (1 votes) · LW · GW

I that this probability is small, but I am claiming it could be 1 in a trillion small, not 1 in 10^50 small.

How do you intend to test 10^30 protiens for self replication ability? The best we can do is to mix up a vat of random protiens, and leave it in suitable conditions to see if something replicates. Then sample the vat to see if its full of self replicators. Our vat has less mass, and exists for less time, than the surface of prebiotic earth. (Assuming near present levels of resources, some K3 civ might well try planetary scale biology experiments) So there is a range of probabilities where we won't see abiogenisis in a vat, but it is likely to happen on a planet.

Comment by donald-hobson on A Toy Model of Hingeyness · 2020-09-09T19:13:19.999Z · score: 2 (1 votes) · LW · GW

I was just trying to make the point that the bredth of available options doesn't actually mean real world control.

Imagine a game where 11 people each take a turn to play in order. Each person can play either a 1 or a 0. Each player can see the moves of all the previous players. If the number of 1's played is odd, everyone wins a prize. If you aren't the 11'th player, it doesn't matter what you pick, all that matters is whether or not the 11'th player wants the prize. (Unless all the people after you are going to pick their favourite numbers, regardless of the prize, and you know what those numbers are.

Comment by donald-hobson on A Toy Model of Hingeyness · 2020-09-09T11:57:39.932Z · score: 3 (2 votes) · LW · GW

Your definition of Hinginess doesn't match the intuitive idea. In particular, you are talking about the range of possible values.

Suppose that you are at the first tick, making a decision. You have the widest range of "possible" options, but not all of those options are actually in your action space. Lets say that you personally will be dead by the second tick, and that you can't influence the utility functions of any other agents with your decision now. You just get to choose which branch you go down. The actual path within that branch are entirely outside your control.

Consider a tree where the first ten options are ignored entirely, so the tree branches into 1024 identical subtrees. Then let the 11'th option completely control the utility. In this context, most of the hinginess is at tick 11. All someone at the first tick gets to choose is between two options, each of which are maybe 0 util or maybe lotsof util, as decided by factors outside your control, in other words, all the options have about the same expected utility.

Comment by donald-hobson on Anthropic effects imply that we are more likely to live in the universe with interstellar panspermia · 2020-09-09T11:36:53.863Z · score: 2 (1 votes) · LW · GW
the correct string of around 100 bases long.

Which is "the correct string"? Many different strings could self replicate.

Such random generation requires enormous amount of attempts.

Only if the fraction of molecules that can self replicate is microscopic.

Comment by donald-hobson on Luna First, But Not To Live There · 2020-09-09T10:59:37.577Z · score: 1 (2 votes) · LW · GW

Space travel is largely impractical with current tech. There are technologies which would make space travel practical. So humans will stay on earth and research until we find those technologies.

Suppose that sooner or later we get supersmart AI. The AI will spread out at near light speed. whether or not we have a fledgling Mars colony by this point is largely irrelevant.

Comment by donald-hobson on Donald Hobson's Shortform · 2020-09-09T10:00:48.625Z · score: 2 (1 votes) · LW · GW

No one has searched all possible one page proofs of propositional logic to see if any of them prove false. Sure, you can prove that propositional logic is complete in a stronger theory, but you can prove large cardinality axioms from even larger cardinality axioms.

Why do you think that no proof of false, of length at most one page exists in propositional logic? Or do you think it might?

Comment by donald-hobson on Donald Hobson's Shortform · 2020-09-09T09:57:19.402Z · score: 2 (1 votes) · LW · GW

A quote from the abstract of the paper linked in (a)

A polymer longer than 40–100 nucleotides is necessary to expect a self-replicating activity, but the formation of such a long polymer having a correct nucleotide sequence by random reactions seems statistically unlikely.

Lets say that no string of nucleotides of length < 1000 could self replicate. And that 10% of nucleotide strings of length >2000 could. Life would form readily.

The "seems unlikely" appears to come from the assumption that correct nucleotide sequences are very rare.

What evidence do we have about what proportion of nucleotide sequences can self replicate?

Well it is rare enough that it hasn't happened in a jar of chemicals over a weekend. It happened at least once on earth, although there are anthropic selection effects ascociated with that. The great filter could be something else. It seems to have only happened once on earth, although one could have beaten the others in Darwinian selection.

Comment by donald-hobson on Donald Hobson's Shortform · 2020-09-08T18:26:22.724Z · score: 2 (1 votes) · LW · GW

I have no idea what you are thinking. Either you have some brilliant insight I have yet to grasp, or you have totally misunderstood. By "string" I mean abstract mathematical strings of symbols.

Comment by donald-hobson on Donald Hobson's Shortform · 2020-09-07T18:33:24.385Z · score: 3 (2 votes) · LW · GW

Let X be a long bitstring. Suppose you run a small Turing machine T, and it eventually outputs X. (No small turing machine outputs X quickly)

Either X has low komelgorov complexity.

Or X has a high Komelgorov complexity, but the universe runs in a nonstandard model where T halts. Hence the value of X is encoded into the universe by the nonstandard model. Hence I should do a baysian update about the laws of physics, and expect that X is likely to show up in other places. (low conditional complexity)

These two options are different views on the same thing.

Comment by donald-hobson on Tofly's Shortform · 2020-09-07T14:25:24.808Z · score: 3 (2 votes) · LW · GW

The most "optimistic" model in that post is linear. That is a model where making something as smart as you is a task of fixed difficulty. The benifit you gain by being smarter counterbalances the extra difficulty of making something smarter. In all the other models, making something as smart as you gets harder as you get smarter. (I am not talking about biological reproduction here, or about an AI blindly copying itself, I am talking about writing code that is as smart as you are from scratch).

Suppose we gave a chicken and a human access to the same computer, and asked each to program something at least as smart as they were. I would expect the human to do better than the chicken. Likewise I would bet on a team of IQ 120 humans producing an AI smarter than they are over a team of IQ 80 humans producing something smarter than they are. (Or anything, smarter than a chicken really).

A few extra IQ points will make you a slightly better chessplayer, but is the difference between not inventing minmax and not being able to write a chess playing program at all, and inventing minmax.

Making things smarter than yourself gets much easier as you get smarter, which is why only smart humans have a serious chance of managing it.

Instead of linear, try squareroot, or log.

Comment by donald-hobson on Tofly's Shortform · 2020-09-07T10:35:06.708Z · score: 6 (2 votes) · LW · GW

Firstly, if we are talking actual computational complexity, then the mathematical background is already implicitly talking about the fastest possible algorithm to do X.

That the problem is NP-hard means that it will be difficult, no matter how intelligent the AI is.

Whether or not P=NP is an unsolved problem.

Predicting how a protien will fold is in BQP, which might be easier than NP. (another unsolved problem)

Computational complexity classes often don't matter in practice. If you are solving the travelling salesman problem, you rarely need the shortest path, a short path is good enough. Secondly, the P vs NP is worst case. There are some special cases of the travelling salesman that are easy to solve. Taking an arbitrary protein and predicting how it will fold might be computationally intractable, but the work here is done by the word "arbitrary". There are some protiens that are really hard to predict, and some that are easier. Can molecular nanotech be made using only the easily predicted protiens?

(Also, an algorithm doesn't have an intelligence level, it has an intelligence to compute relation. Once you have invented minmax, increasing the recursion depth takes next to no insight into intelligence. Given a googleplex flop computer, your argument obviously fails, because any fool could bootstrap intelligence on that.)

I have an intuition that there should be some “best architecture”, at least for any given environment, and that this architecture should be relatively “simple”.

I agree. I think that AIXI, shows that there is a simple optimal design with unlimited compute. There being no simple optimal design with finite compute would be somewhat surprising. (I think logical induction is something like only exponentially worse than any possible mathematical reasoner in use of compute)

But this is a different argument than "as soon as artificial intelligence surpasses human intelligence, recursive self-improvement will take place, creating an entity we can't hope to comprehend, let alone oppose."

A model in which both are true. Suppose that there was a design of AI that was optimal for its compute. And suppose this design was reasonably findable, ie a bunch of smart humans could find this design with effort. And suppose this design was really, really smart.

(Humans often get the answer wrong even on the cases where the exact maths takes a trivial amount of compute, like the doctors with a disease that has prevalence 1 in 1000, and the 90% reliable test) The gap between humans and optimal use of compute is likely huge.

So either humans figure out the optimal, and implement it. Or humans hack together something near human level. The near human level AI might fiddle with its own workings, trying to increase its intelligence, and then it figures out the optimal design.

In this world the prediction that vastly superhuman AI arrives not long after AI reaches human level. Its just that the self improvement isn't that recursive.

Comment by donald-hobson on Donald Hobson's Shortform · 2020-09-07T09:33:22.913Z · score: 4 (2 votes) · LW · GW

Take peano arithmatic.

Add an extra symbol A, and the rules that s(A)=42 and 0!=A and
forall n: n!=A -> s(n) !=A. Then add an exception for A into all the other rules. So s(x)=s(y) -> x=y or x=A or y=A.

There are all sorts of ways you could define extra hangers on that didn't do much in PA or ZFC.

We could describe the laws of physics in this new model. If the result was exactly the same as normal physics from our perspective, ie we can't tell by experiment, only occamian reasoning favours normal PA.

Comment by donald-hobson on Introduction To The Infra-Bayesianism Sequence · 2020-08-30T10:13:26.719Z · score: 2 (1 votes) · LW · GW
We have Knightian uncertainty over our set of environments, it is not a probability distribution over environments. So, we might as well go with the maximin policy.

For any fixed , there are computations which can't be correctly predicted in steps.

Logical induction will consider all possibilities equally likely in the absence of a pattern.

Logical induction will consider a sufficiently good psudorandom algorithm as being random.

Any kind of Knightian uncertainty agent will consider psudorandom numbers to be an adversarial superintelligence unless proved otherwise.

Logical induction doesn't depend on your utility function. Knightian uncertainty does.

There is a phenomena whereby any sufficiently broad set of hypothesis doesn't influence actions. Under the set of all hypothesis, anything could happen whatever you do,

However, there are sets of possibilities that are sufficiently narrow to be winnable, yet sufficiently broad to need to expend resources combating the hypothetical adversary. If it understands most of reality, but not some fundamental particle, it will assume that the particle is behaving in an adversarial manor.

If someone takes data from a (not understood) particle physics experiment, and processes it on a badly coded insecure computer, this agent will assume that the computer is now running an adversarial superintelligence. It would respond with some extreme measure like blowing the whole physics lab up.

Comment by donald-hobson on Safe Scrambling? · 2020-08-29T23:31:47.622Z · score: 6 (3 votes) · LW · GW

If you have an AI training method that passes the test >50% of the time, then you don't need scrambling.

If you have an approach that takes >1,000,000,000 tries to get right, then you still have to test, so even perfect scrambling won't help.

Ie, this approach might help if your alignment process is missing between 1 and 30 bits of information.

I am not sure what sort of proposals would do this, but amplified oversight might be one of them.

Comment by donald-hobson on Strong implication of preference uncertainty · 2020-08-23T21:51:36.936Z · score: 4 (2 votes) · LW · GW
And also because they make the same predictions, that relative probability is irrelevant in practice: we could use AGR just as well as GR for predictions.

There is a subtle sense in which the difference between AGR and GR is relevant. While the difference doesn't change the predictions, it may change the utility function. An agent that cares about angels (if they exist) might do different things if it believes itself to be in AGR world than in GR world. As the theories make identical predictions, the agents belief only depends on its priors (and any irrationality), not on which world it is in. Nonetheless, this means that the agent will pay to avoid having its priors modified. Even though the modification doesn't change the agents predictions in the slightest.

Comment by donald-hobson on What if memes are common in highly capable minds? · 2020-08-23T21:41:03.641Z · score: 2 (1 votes) · LW · GW

I think that there is an unwarrented jump here from (Humans are highly memetic) to (AI's will be highly memetic).

I will grant you that memes have a substantial effect on human behaviour. It doesn't follow that AI's will be like this.

Your conditions would only have a strong argument for them if there was a good argument that AI's should be meme driven.

Comment by donald-hobson on Why don't countries, like companies, more often merge? · 2020-08-23T17:36:22.783Z · score: 2 (1 votes) · LW · GW

England and Scotland merged several hundred years ago. This involved the English bribing politicians, and trade blocades, but not outright war. The European Union could also be considered as a kind of merger.

Comment by donald-hobson on Market Misconceptions · 2020-08-20T19:31:36.666Z · score: 3 (2 votes) · LW · GW

If the problem is computationally difficult, then it is a mistake to not count the value of computation.

Suppose you believe that a reasonably smart person with some spare time can make money on the stock market. There is still a question of how much money, compared to the amount they could earn doing other things. If we pick some reasonable value for intelligent, market understanding thought, then we can define a new value which is - the value of the time it took to find.

The value of thought time, compute time and data are all constant. The returns are linear until you get to the point where you disrupt the market, destroying the pattern. Thus is much harder to get if the amount of money you have to invest is small. (Especially if you don't bet the farm.)

Comment by donald-hobson on The Bayesian Tyrant · 2020-08-20T17:08:13.516Z · score: 2 (1 votes) · LW · GW

Sure, you can fix unbounded downside risk by giving a finite budget. You can fix the dogmatism by making have an probability of tails.

If you and have a chance to bet with each other before going to the bookies, then the bookie won't be able to Dutch book the two of you because you will have already separated and 's money.

If you can't bet with directly for some reason, then a bookie can Dutch book you and , by acting as a middle man and skimming off some money.

Comment by donald-hobson on The Bayesian Tyrant · 2020-08-20T12:12:10.460Z · score: 6 (4 votes) · LW · GW
One argument for the common prior assumption (an assumption which underpins the Aumann Agreement Theorem, and is closely related to modest-epistemology arguments) is that a bookie can Dutch Book any group of agents who do not have a common prior, via performing arbitrage on their various beliefs.

There exists an agent that believes with certainty that all coins only ever land on heads as its prior.

There also exists an agent that is equally confident in tails. (Exists in the mathematical sense that there is some pattern of code that would consist of such an agent, not in the sense that these agents have been built)

Lets say that and , by construction will always take any bet that they would profit from if all involved coins come up heads or tails respectively.

Consider a bet on a single coin flip that costs 2 if a coin comes up heads, and pays if the coin lands on tails. () If you would be prepared to take that bet for sufficiently large , then a bookie can offer you this bet, and offer a bet that wins 1 if the coin lands heads, and looses if the coin lands tails. will take this bet. So the bookie has Dutch booked the pair of you.

If you ever bet at all and your betting decisions don't depend on who you are sitting next to, then you can be part of a group that is Dutch booked.

If you want to avoid ever betting as part of a group that is being Dutch booked, then if you are in the presense of and , you can't bet at all, even about things that have nothing to do with coin flips.

If you bet 5 against 5 that gravity won't suddenly disappear, then the bookie can Dutch book and for 100, and the 3 of you have been Dutch booked for at least 95 as a group.

If you have some reason to suspect that a mind isn't stupid, maybe you know that they won a lot of money in a prediction market, maybe you know that they were selected by evolution, ect then you have reason to take what the mind says seriously. If you have strong reasons to think that Alice is a nearly perfect reasoner, then getting Dutch booked when you are grouped with Alice indicates you are probably making a mistake.

Comment by donald-hobson on Search versus design · 2020-08-19T12:47:34.224Z · score: 2 (1 votes) · LW · GW

In the sorting problem, suppose you applied your advanced interpretability techniques, and got a design with documentation.

You also apply a different technique, and get code with formal proof that it sorts.

In the latter case, you can be sure that the code works, even if you can't understand it.

The algorithm+formal proof approach works whenever you have a formal success criteria.

It is less clear how well the design approach works on a problem where you can't write formal success criteria so easily.

Here is a task that neural nets have been made to do, convert pictures of horses into similar pictures of zebras. I am unsure if a designed solution to this problem exists.

Imagine that you give a bunch of smart programmers a lecture on how to solve this problem, and then they have to implement a solution without access to any source of horse or zebra pictures. I suspect they would fail. I would suspect that solving this problem well fundamentally requires a significant amount of information about horses and zebras. I suspect that the amount of info required is more than a human can understand and conceptualize at once. The human will be able to understand each small part of the system, but logic gates are understandable, so that must hold for any system. The human can understand why it works in the abstract, the way we understand gradient decent over neural nets.

I am not sure that this problem has a core insight that is possessable, but not possessed by us.

Comment by donald-hobson on Search versus design · 2020-08-17T11:17:39.891Z · score: 4 (2 votes) · LW · GW

I am not sure that designed artefacts are automatically easily interpretable.

If an engineer is looking over the design of the latest smartphone, then the artefact is similar to previous artefacts they have experience with. This will include a lot of design details about chip architecture and instruction set. The engineer will also have the advantage of human written spec sheets.

If we sent a pile of smartphones to Issac Newton, he wouldn't have either of these advantages. He wouldn't be able to figure out much about how they worked.

There are 3 factors here, existence of written documentation, similarity to previous designs and being composed of separate subsystems. All help understandability. If an AI is designing radically new tech, we loose the similarity to understood designs.

Comment by donald-hobson on Alignment By Default · 2020-08-16T19:43:27.576Z · score: 16 (3 votes) · LW · GW

Consider a source of data that is from a sum of several Gaussian distributions. If you have a sufficiently large number of samples from this distribution, you can locate the origional gaussians to arbitrary accuracy. (Of course, if you have a finite number of samples, you will have some inaccuracy in predicting the location of the gaussians, possibly a lot.)

However, not all distributions share this property. If you look at uniform distributions over rectangles in 2d space, you will find that a uniform L shape can be made in 2 different ways. More complicated shapes can be made in even more ways. The property that you can uniquely decompose sum of gaussians into its individual gaussians is not a property that applies to every distribution.

I would expect that whether or not logs, saplings, petrified trees, sparkly plastic christmas trees ect counted as trees would depend on the details of the training data, as well as the network architecture and possibly the random seed.

Note: this is an empirical prediction about current neural networks. I am predicting that if someone, takes 2 networks that have been trained on different datasets, ideally with different architectures, and tries to locate the neuron that holds the concept of "Tree" in each, and then shows both networks an edge case that is kind of like a tree, then the networks will often disagree significantly about how much of a tree it is.

Comment by donald-hobson on Alignment By Default · 2020-08-16T16:55:00.703Z · score: 4 (2 votes) · LW · GW
Note that the examples in the OP are from an adversarial generative network. If its notion of "tree" were just "green things", the adversary should be quite capable of exploiting that.

In order for the network to produce good pictures, the concept of "tree" must be hidden in there somewhere, but it could be hidden in a complicated and indirect manor. I am questioning whether the particular single node selected by the researchers encodes the concept of "tree" or "green thing".

Comment by donald-hobson on Alignment By Default · 2020-08-15T22:49:36.850Z · score: 2 (1 votes) · LW · GW
when alignment-by-default works, we can use the system to design a successor without worrying about amplification of alignment errors

Anything neural net related starts with random noise and performs gradient descent style steps. This doesn't get you the global optimal, it gets you some point that is approximately a local optimal, which depends on the noise, the nature of the search space, and the choice of step size.

If nothing else, the training data will contain sensor noise.

At best you are going to get something that roughly corresponds to human values.

Just because it isn't obvious where the noise entered the system doesn't make it noiseless. Just because you gave what we actually want, and the value of a neuron in a neural net the same name, doesn't make them the same thing.

Consider the large set of references with representative members "What Alice makes long term plans towards", "What Bobs impulsive action tends towards", "What Alice says is good and right when her social circle are listening", "What Carl listens to when deciding which politician to vote for", "What news makes Eric instinctively feel good", "what makes Fred presses the reward button during AI training" ect ect.

If these all referred to the same preference ordering over states of the world, then we could call that human values, and have a natural concept.

Trees are a fairly natural concept because "tall green things" and "Lifeforms that are >10% cellulose" point to a similar set of objects. There are many different simple boundaries in concept-space that largely separate trees from non trees. Trees are tightly clustered in thing-space.

To the extent that all those references refer to the same thing, we can't expect an AI to distinguish between them. To the extent that they refer to different concepts, at best the AI will have a separate concept for each.

Suppose you run the microscope AI, and you find that you have a whole load of concepts that kind of match "human values" to different degrees. These represent different people and different embeddings of value. (Of course, "What Carl listens to when deciding which politician to vote for" contains Carls distrust of political promises. "what makes Fred presses the reward button during AI training" includes the time Fred tripped up and slammed the button by accident. Each of the easily accessible concepts is a bit different and includes its own bit of noise)

Comment by donald-hobson on Alignment By Default · 2020-08-15T21:42:01.226Z · score: 10 (3 votes) · LW · GW
In light of the previous section, there’s an obvious path to alignment where there turns out to be a few neurons (or at least some simple embedding) which correspond to human values, we use the tools of microscope AI to find that embedding, and just like that the alignment problem is basically solved.

This is the part I disagree with. The network does recognise trees, or at least green things (given that the grass seems pretty brown in the low tree pic).

Extrapolating this, I expect the AI might well have neurons that correspond roughly to human values, on the training data. Within the training environment, human values, amount of dopamine in human brain, curvature of human lips (in smiles), number of times the reward button is pressed, and maybe even amount of money in human bank account might all be strongly correlated.

You will have successfully narrowed human values down to within the range of things that are strongly correlated with human values in the training environment. If you take this signal and apply enough optimization pressure, you are going to get the equivalent of a universe tiled with tiny smiley faces.

Comment by donald-hobson on Alignment By Default · 2020-08-15T19:02:41.283Z · score: 4 (2 votes) · LW · GW
So in principle, it doesn’t even matter what kind of model we use or how it’s represented; as long the predictive power is good enough, values will be embedded in there, and the main problem will be finding the embedding.

I will agree with this. However, notice what this doesn't say. It doesn't say "any model powerful enough to be really dangerous contains human values". Imagine a model that was good at a lot of science and engineering tasks. It was good enough at nuclear physics to design effective fusion reactors and bombs. It knew enough biology to design a superplage. It knew enough molecular dynamics to design self replicating nanotech. It knew enough about computer security to hack most real world systems. But it didn't know much about how humans thought. It's predictions are far from maxentropy, if it sees people walking along a street, it thinks they will probably carry on walking, not fall to the ground twiching randomly. Lets say that the model is as predictively accurate as you would be when asked to predict the behaviour of a stranger from a few seconds of video. This AI doesn't contain a model of human values anywhere in it.

We can't just assume that every AI powerful enough to be dangerous contains a model of human values, however I suspect most of them will in practice.

Comment by donald-hobson on Effect of Numpy · 2020-08-15T17:06:12.375Z · score: 2 (1 votes) · LW · GW

By "send everyone the results" I was thinking of doing this with a block of audio lasting a few milliseconds.

Everyone hears everyones voices with a few milliseconds delay.

If you want not to echo peoples own voices, then keep track of the timestamps, every computer can subtract their own signal from the total.

Comment by donald-hobson on Effect of Numpy · 2020-08-14T17:59:00.856Z · score: 2 (1 votes) · LW · GW

Everyone hears audio from everyone.

Suppose you have loads of singers. The task of averaging all the signals together may be to much for any one computer, or might require too much bandwidth.

So you split the task of averaging into 3 parts.

np.sum(a)==np.sum(a[:x]) + np.sum(a[x:])

One computer can average the signals from alice and bob, a second can average the signals from carol and dave. These both send their signals to a third computer, which averages the two signals into a combination of all 4 singers, and sends everyone the results.