Biotech to make humans afraid of AI 2021-06-18T08:19:26.959Z
Speculations against GPT-n writing alignment papers 2021-06-07T21:13:16.727Z
Optimization, speculations on the X and only X problem. 2021-03-30T21:38:01.889Z
Policy restrictions and Secret keeping AI 2021-01-24T20:59:14.342Z
The "best predictor is malicious optimiser" problem 2020-07-29T11:49:20.234Z
Optimizing arbitrary expressions with a linear number of queries to a Logical Induction Oracle (Cartoon Guide) 2020-07-23T21:37:39.198Z
Web AI discussion Groups 2020-06-30T11:22:45.611Z
[META] Building a rationalist communication system to avoid censorship 2020-06-23T14:12:49.354Z
What does a positive outcome without alignment look like? 2020-05-09T13:57:23.464Z
Would Covid19 patients benefit from blood transfusions from people who have recovered? 2020-03-29T22:27:58.373Z
Programming: Cascading Failure chains 2020-03-28T19:22:50.067Z
Bogus Exam Questions 2020-03-28T12:56:40.407Z
How hard would it be to attack coronavirus with CRISPR? 2020-03-06T23:18:09.133Z
Intelligence without causality 2020-02-11T00:34:28.740Z
Donald Hobson's Shortform 2020-01-24T14:39:43.523Z
What long term good futures are possible. (Other than FAI)? 2020-01-12T18:04:52.803Z
Logical Counterfactuals and Proposition graphs, Part 3 2019-09-05T15:03:53.262Z
Logical Counterfactuals and Proposition graphs, Part 2 2019-08-31T20:58:12.851Z
Logical Optimizers 2019-08-22T23:54:35.773Z
Logical Counterfactuals and Proposition graphs, Part 1 2019-08-22T22:06:01.764Z
Programming Languages For AI 2019-05-11T17:50:22.899Z
Propositional Logic, Syntactic Implication 2019-02-10T18:12:16.748Z
Probability space has 2 metrics 2019-02-10T00:28:34.859Z
Allowing a formal proof system to self improve while avoiding Lobian obstacles. 2019-01-23T23:04:43.524Z
Logical inductors in multistable situations. 2019-01-03T23:56:54.671Z
Boltzmann Brains, Simulations and self refuting hypothesis 2018-11-26T19:09:42.641Z
Quantum Mechanics, Nothing to do with Consciousness 2018-11-26T18:59:19.220Z
Clickbait might not be destroying our general Intelligence 2018-11-19T00:13:12.674Z
Stop buttons and causal graphs 2018-10-08T18:28:01.254Z
The potential exploitability of infinite options 2018-05-18T18:25:39.244Z


Comment by Donald Hobson (donald-hobson) on Avoiding the instrumental policy by hiding information about humans · 2021-06-14T15:57:49.422Z · LW · GW

There are various ideas along the lines of "however much you tell the AI X it just forgets it".

I think that would be the direction to look in if you have a design tha'ts safe as long as it doesn't know X.

Comment by Donald Hobson (donald-hobson) on A naive alignment strategy and optimism about generalization · 2021-06-10T12:52:11.282Z · LW · GW

There may be predictable errors in the training data, such that instrumental policy actually gets a lower loss than answering honestly (because it responds strategically to errors).

If you are answering questions as text, there is a lot of choice in wording. There are many strings of text that are a correct answer, and the AI has to pick the one the human would use. In order to predict how a human would word an answer, you need a fairly good understanding of how they think (I think). 

Comment by Donald Hobson (donald-hobson) on Speculations against GPT-n writing alignment papers · 2021-06-08T11:17:41.956Z · LW · GW

Maybe you did. I find it hard to distinguish inventing and half remembering ideas. 

If the training procedure either 

  1. Reliably produces mesaoptimisers with about the same values. or
  2. Reliably produces mesaoptimizers that can acausally cooperate
  3. The rest of the procedure allows one mesaoptimizer to take control of the whole output

Then using different copies of GPT-n trained from different seeds doesn't help.

If you just convert 1% of the english into network yourself, then all it needs to use is some error correction. Even without that, neural net structures are somewhat redundant. A malicious optimizer with a random 1% removed is probably still a malicious optimizer.

Secondly, its goal here is to give us subtly wrong alignment papers. Suppose we produce an examination and transparency step so thorough that no mesa optimizer can slip through. The original mesaoptimiser can just design a network that is doing its best as a mesaoptimiser free predictor, but has subtly maliciously chosen beliefs about AI research. 

Comment by Donald Hobson (donald-hobson) on Optimization, speculations on the X and only X problem. · 2021-06-08T10:54:50.029Z · LW · GW

I don't think that learning is moving around in codespace. In the simplest case, the AI is like any other non self modifying program. The code stays fixed as the programmers wrote it. The variables update. The AI doesn't start from null. The programmer starts from a blank text file, and adds code. Then they run the code. The AI can start with sophisticated behaviour the moment its turned on.

So are we talking about a program that could change from an X er to a Y er with a small change in the code written, or with a small amount of extra observation of the world?

Comment by Donald Hobson (donald-hobson) on [Event] Weekly Alignment Research Coffee Time (06/14) · 2021-06-07T19:15:36.997Z · LW · GW

There seems to be some technical problem with the link. It gives me a "Our apologies, your invite link has now expired (actually several hours ago, but we hate to rush people).

We hope you had a really great time! :)" message. Edit: As of a few minutes after stated start time. It worked last week.

Comment by Donald Hobson (donald-hobson) on Optimization, speculations on the X and only X problem. · 2021-06-07T12:14:02.406Z · LW · GW

My picture of an X and only X er is that the actual program you run should optimize only for X. I wasn't considering similarity in code space at all. 

Getting the lexicographically first formal ZFC proof of say the Collatz conjecture should be safe. Getting a random proof sampled from the set of all proofs < 1 terabyte long should be safe. But I think that there exist proofs that wouldn't be safe. There might be a valid proof of the conjecture that had the code for a paperclip maximizer encoded into the proof, and that exploited some flaw in computers or humans to bootstrap this code into existence. This is what I want to avoid. 

Your picture might be coherent and formalizable into some different technical definition. But you would have to start talking about difference in codespace, which can differ depending on different programming languages. 

The program if True: x() else: y() is very similar in codespace to if False: x() else: y()

If code space is defined in terms of minimum edit distance, then layers of interpereters, error correction and holomorphic encryption can change it. This might be what you are after, I don't know.

Comment by Donald Hobson (donald-hobson) on Rogue AGI Embodies Valuable Intellectual Property · 2021-06-04T10:23:09.409Z · LW · GW

On its face, this story contains some shaky arguments. In particular, Alpha is initially going to have 100x-1,000,000x more resources than Alice. Even if Alice grows its resources faster, the alignment tax would have to be very large for Alice to end up with control of a substantial fraction of the world’s resources.

This makes the hidden assumption that "resources" is a good abstraction in this scenario. 

It is being assumed that the amount of resources an agent "has" is a well defined quantity. It assumes agent can only grow their resources slowly by reinvesting them. And that an agent can weather any sabotage attempts by agents with far less resources. 

I think this assumption is blatantly untrue. 

Companies can be sabotaged in all sorts of ways. Money or material resources can be subverted, so that while they are notionally in the control of X, they end up benefiting Y, or just stolen. Taking over the world might depend on being the first party to develop self replicating nanotech, which might require just insight and common lab equipment.

Don't think "The US military has nukes, the AI doesn't, so the US military has an advantage", think "one carefully crafted message and the nukes will land where the AI wants them to, and the military commanders will think it their own idea."

Comment by Donald Hobson (donald-hobson) on Selection Has A Quality Ceiling · 2021-06-03T09:23:04.330Z · LW · GW

There are several extra features to consider. Firstly, even if you only test, that doesn't mean the skills weren't trained. Suppose there are lots of smart kids that really want to be astronauts. And that Nasa puts its selection criteria somewhere easily available. The kids then study the skills they think they need to pass the selection. Any time there is any reason to think that skills X,Y and Z are good combinations there will be more people with these skills then chance predicts. 

There is also the dark side, goodharts curse. It is hard to select over a large number of people without selecting for lying sociopaths that are gaming your selection criteria. 

Comment by Donald Hobson (donald-hobson) on Fixedness From Frailty · 2021-05-31T21:59:46.148Z · LW · GW

Its not the probability of hallucinating full stop, its the probability of hallucinating omega or psychic powers in particular. Also, while "Omega" sounds implausible, there are much more plausible scenarios involving humans inventing advanced brain scanning tech. 

Comment by Donald Hobson (donald-hobson) on Are PS5 scalpers actually bad? · 2021-05-19T09:45:48.230Z · LW · GW

True, but the extra money goes to the scalper to pay for the scalpers time. The moment the makers started selling the PS5 too cheep, they were destroying value in search costs. Scalpers don't change that.

Comment by Donald Hobson (donald-hobson) on Against Against Boredom · 2021-05-17T21:12:10.147Z · LW · GW

"A superintelligent FAI with total control over all your sensory inputs" seems to me a sufficient condition to avoid boredom. Kind of massive overkill. Unrestricted internet access is usually sufficient. 

You don't need to edit out pain sensitivity from humans to avoid pain. You can have a world where nothing painful happens to people. Likewise you don't need to edit out boredom, you can have a world with lots of interesting things in it. 

Think of all the things a human in the modern day might do for fun, and add at least as many things that are fun and haven't been invented yet.

Comment by Donald Hobson (donald-hobson) on Utopic Nightmares · 2021-05-17T20:54:37.663Z · LW · GW

Yes its unlikely that the utility turns out literally identical. However, people enjoy having friends that aren't just clones of themselves. (Alright, I don't have evidence for this, but it seems like something people might enjoy) Hence it is possible for a mixture of different types of people to be happier than either type of people on their own. 

If you use some computational theories of consciousness, there is no morally meaningful difference between one mind and two copies of the same mind.

Given the large but finite resources of reality, it is optimal to create a fair bit of harmless variation.

Comment by Donald Hobson (donald-hobson) on Does butterfly affect? · 2021-05-16T16:18:59.600Z · LW · GW

Instead, we must consider the full statistical ensemble of possible world, and quantify to what extent the butterfly shifts that ensemble.

add some small stochastic noise to it at all times t to generate the statistical ensemble of possibilities

In the typical scenario, the impact of this single perturbation will not rise above the impact of the persistent background noise inherent to any complex real-world system.

I think these quotes illustrate the mind projection fallacy. The "noise" is not an objective thing sitting out there in the real world, it is a feature of your own uncertainty. 

Suppose you have a computational model of the weather. You make the simplifying assumption that water evaporation is a function only of air temperature and humidity. Whereas in reality, the evaporation depends on puddle formation and plant growth and many other factors. Out in the real world, the weather follows its own rules perfectly. Those rules are the equations of the whole universe. "noise" is just what you call a hopefully small effect you don't have the knowledge, compute or inclination to calculate. 

If you have a really shoddy model of the weather, it won't be able to compute much. If you add a butterflys wing flaps to a current weather model, the knowledge of that small effect will be lost due to the mass of other small effects metrologists haven't calculated. Adding or removing a butterfly's wingflap doesn't meaningfully change our predictions given current predictive ability.  However, to a sufficiently advanced future weather predictor, that wingflap could meaningfully change the chance of a tornado. The predictor would need to be tracking every other wingflap globally, and much else besides. 

We are modelling as probabilistic processes that are actually deterministic but hard to calculate.

Comment by Donald Hobson (donald-hobson) on Agency in Conway’s Game of Life · 2021-05-14T17:16:32.882Z · LW · GW

Random Notes:

Firstly, why is the rest of the starting state random? In a universe where info can't be destroyed, like this one, random=max entropy. AI is only possible in this universe because the starting state is low entropy.

Secondly, reaching an arbitrary state can be impossible for reasons like conservation of mass energy momentum and charge. Any state close to an arbitrary state might be unreachable due to these conservation laws. Ie a state containing lots of negitive electric charges, and no positive charges being unreachable in our universe.

Well, quantum. We can't reach out from our branch to effect other branches.

This control property is not AI. It would be possible to create a low impact AI. Something that is very smart and doesn't want to affect the future much.

In the other direction, bacteria strategies are also a thing. I think it might be possible, both in this universe and in GOL, to create a non intelligent replicator. You could even hard code it to track its position, and turn on or off to make a smiley face. I'm thinking some kind of wall glider that can sweep across the GOL board destroying almost anything in its path. With crude self replicators behind it.

Observation response timescales. Suppose the situation outside the small controlled region was rapidly changing and chaotic. By the time any AI has done its reasoning, the situation has changed utterly. The only thing the AI can usefully do is reason about GOL in general. Ie any ideas it has are things that could have been hard coded into the design.

Comment by Donald Hobson (donald-hobson) on Challenge: know everything that the best go bot knows about go · 2021-05-12T22:08:23.574Z · LW · GW

I'm thinking of humans having some fast special purpose inbuilt pattern recognition, which is nondeterministic and an introspective black box, and a slow general purpose processor. Humans can mentally follow the steps of any algorithm, slowly. 

Thus if a human can quickly predict the results of program X, then either there is a program Y  based on however the human is thinking that does the same thing as X and takes only a handful of basic algorithmic operations. Or the human is using their pattern matching special purpose hardware. This hardware is nondeterministic, not introspectively accessible, and not really shaped to predict go bots. 

Either way, it also bears pointing out that if the human can predict the move a go bot would make, the human is at least as good as the machine. 

So you are going to need a computer program for "help" if you want to predict the exact moves. At this stage, you can ask if you really understand how the code works. And aren't just repeating it by route.

Comment by Donald Hobson (donald-hobson) on Challenge: know everything that the best go bot knows about go · 2021-05-12T10:13:21.494Z · LW · GW

This behaviour is consistent with local position based play that also considers "points ahead" as part of the situation.

Comment by Donald Hobson (donald-hobson) on Challenge: know everything that the best go bot knows about go · 2021-05-11T18:10:30.010Z · LW · GW

I think that it isn't clear what constitutes "fully understanding" an algorithm. 

Say you pick something fairly simple, like a floating point squareroot algorithm. What does it take to fully understand that. 

You have to know what a squareroot is. Do you have to understand the maths behind Newton raphson iteration if the algorithm uses that? All the mathematical derivations, or just taking it as a mathematical fact that it works. Do you have to understand all the proofs about convergence rates. Or can you just go "yeah, 5 iterations seems to be enough in practice". Do you have to understand how floating point numbers are stored in memory? Including all the special cases like NaN which your algorithm hopefully won't be given? Do you have to keep track of how the starting guess is made, how the rounding is done. Do you have to be able to calculate the exact floating point value the algorithm would give, taking into account all the rounding errors. Answering in binary or decimal? 

Is brute force minmax search easy to understand. You might be able to easily implement the algorithm, but you still don't know which moves it will make. In general, for any algorithm that takes a lot of compute, humans won't be able to work out what it will do without very slowly imitating a computer. There are some algorithms we can prove theorems about. But it isn't clear which theorems we need to prove to get "full understanding" 

Another obstacle to full understanding is memory. Suppose your go bot has memorized a huge list of "if you are in such and such situation move here" type rules. You can understand how gradient descent would generate good rules in the abstract. You have inspected a few rules in detail. But there are far too many rules for a human to consider them all. And the rules depend on a choice of random seed.  

Corollaries of success (non-exhaustive):

  • You should be able to answer questions like “what will this bot do if someone plays mimic go against it” without actually literally checking that during play. More generally, you should know how the bot will respond to novel counter strategies

There is not in general a way to compute what an algorithm does without running it. Some algorithms are going about the problem in a deliberately slow way. However if we assume that the go algorithm has no massive known efficiency gains. (Ie no algorithm that computes the same answer using a millionth of the compute) And that the algorithm is far too compute hungry for humans doing it manually. Then it follows that humans won't be able to work out exactly what the algorithm will do.

You should be able to write a computer program anew that plays go just like that go bot, without copying over all the numbers.

Being able to understand the algorithm well enough to program it for the first time, not just blindly reciting code. An ambiguous but achievable goal.

Suppose a bunch of people coded another Alpha go like system. The random seed is different. The layer widths are different. The learning rate is slightly different. Its trained with different batch size, for a different amount of iterations on a different database of stored games. It plays about as well. In many situations it makes a different move. The only way to get a go bot that plays exactly like alpha go is to copy everything including the random seed. This might have been picked based on lucky numbers or birthdays. You can't rederive from first principles what was never derived from first principles. You can only copy numbers across, or pick your own lucky numbers. Numbers like batch size aren't quite as pick your own, there are unreasonably small and large values, but there is still quite a lot of wiggle room. 

Comment by Donald Hobson (donald-hobson) on MIRI location optimization (and related topics) discussion · 2021-05-10T18:33:49.159Z · LW · GW

I think that Scotland would be a not bad choice (Although I am obviously somewhat biased about that)

Speaks the language, plenty of nice scenery. Reasonably sensible political situation. (I would say overall better than america) Cool weather. Good public healthcare. Some nice uni towns with a fair bit of stem community. Downsides would include being further from america. (I don't know where all your colleges are located, I wouldn't be surprised if a lot were in america, and a fair few were in Europe.)

I would recommend looking somewhere on the outskirts of Dundee, St Andrews, Edinburgh or Glasgow.

Comment by Donald Hobson (donald-hobson) on AMA: Paul Christiano, alignment researcher · 2021-05-09T23:47:28.096Z · LW · GW

"These technologies are deployed sufficiently narrowly that they do not meaningfully accelerate GWP growth." I think this is fairly hard for me to imagine (since their lead would need to be very large to outcompete another country that did deploy the technology to broadly accelerate growth), perhaps 5%?

I think there is a reasonable way it could happen even without an enormous lead. You just need either,

  1. Its very hard to capture a significant fraction of the gains from the tech.
  2. Tech progress scales very poorly in money. 

For example, suppose it is obvious to everyone that AI in a few years time will be really powerful. Several teams with lots of funding are set up. If progress is researcher bound, and researchers are ideologically committed to the goals of the project, then top research talent might be extremely difficult to buy. (They are already well paid, for the next year they will be working almost all day. After that, the world is mostly shaped by which project won.) 

Compute could be hard to buy if there were hard bottlenecks somewhere in the chip supply chain, most of the worlds new chips were already being used by the AI projects, and an attitude of "our chips and were not selling" was prevalent. 

Another possibility, suppose deploying a tech means letting the competition know how it works. Then if one side deploys, they are pushing the other side ahead. So the question is, does deploying one unit of research give you the resources to do more than one unit? 

Comment by Donald Hobson (donald-hobson) on Covid 5/6: Vaccine Patent Suspension · 2021-05-07T21:13:15.795Z · LW · GW

I think that the actual cost and effort in many forms of biotech is rapidly declining. Meanwhile, the medical moloch is only growing. We have already passed the cutoff point for some diseases, in that the cost and effort of getting a cure approved is higher than the cost and effort of making a cure. (I think this was at least true for covid vaccines) I think we might get to a point where back bedroom biohackers can cure cancer (or some other major diseases) and no cure has been approved. The companies will be prioritizing the most profitable and easiest to get approval for treatments. The biohackers are going for anything cool. The result is a world where american doctors offices look similar to today, but you can buy working cancer cures from shady foreign websites, or make your own if you can follow online instructions well and buy a few $100 of equipment and supplies.

Comment by Donald Hobson (donald-hobson) on Bayeswatch 3: A Study in Scarlet · 2021-05-07T19:21:01.093Z · LW · GW

I'm not sure this reasoning actually holds in reality. A simulation focused AI acting on chaos theory type reasoning will have no reason to restrict attention to its physical vicinity. Meanwhile, the roof could have been painted scarlet for a reason as simple as "it was the cheapest colour of rust resistant paint". or It shows up well against a background of low lying fog, so aeroplanes can see the roof. I am not saying they made the wrong decision, that depends on priors and utilities. I am saying I wouldn't be confidant in their explanation without other evidence.

Comment by Donald Hobson (donald-hobson) on All is fair in love and war, on Zero-sum games in life · 2021-05-06T20:07:21.292Z · LW · GW

A pareto improvement is a change that harms no one and helps at least one person. The options I've outlined don't always happen. (Although countries often don't go to war, it isn't clear if this is cooperating in a prisoners dilemma, or that they expect going to war to be worse for them.) The point of a Pareto improvement is that it is something within the combined action space. Ie something they would do if they somehow gained magical coordination ability. It doesn't realy on any kind of magical capabilities, just different decisions. If both agents are causal decision theorists, and the war resembles a prisoners dilemma situation, "cooperate - cooperate" might be unrealistic, but its still a pareto improvement. 

Comment by Donald Hobson (donald-hobson) on Three reasons to expect long AI timelines · 2021-04-24T17:46:15.019Z · LW · GW

Surely the set of jobs an AGI could do out of the box is wider than that. Lets compare it to the set of jobs that can be done from home over the internet. Most jobs that can be done over the internet can be done by the AI. Judging by how much working from home has been a thing recently, a significant percentage of the economy. Plus a whole load of other jobs that only make sense when the cost of labour is really low, and or the labour is really fast. And I would expect the amount to increase with robotisation. (If you take an existing robot, and put an AGI on it, suddenly it can do a lot more useful stuff.)

Comment by donald-hobson on [deleted post] 2021-04-24T17:38:12.072Z

Assuming both people have a normal dislike of bureaucracy, the equilibrium will likely be pure trust, or a selfie. Possibly of them actually getting the vaccine, or whatever paperwork they are given once vaccinated. Anyone so untrustworthy that you suspect they might have gone to significant effort to fake being vaccinated is not someone you want to socialize with anyway.

Comment by Donald Hobson (donald-hobson) on Three reasons to expect long AI timelines · 2021-04-23T14:25:14.733Z · LW · GW

Its quite possible governments don't really notice arriving AGI until its already there. Especially if the route taken is full of dense technical papers, with not much to impress the nonexpert. 

Its also possible that governments want to stop development, but find they basically can't. Ban AI research and everyone just changes the title from "AI" to "maths" or "programming" and does the same research. 

Comment by Donald Hobson (donald-hobson) on Three reasons to expect long AI timelines · 2021-04-23T09:55:34.406Z · LW · GW

I don't think technological deployment is likely to take that long for AI's. With a physical device like a car or fridge, it takes time for people to set up the factories, and manufacture the devices. AI can be sent across the internet in moments. I don't know how long it takes google to go from say an algorithm that detects streets in satellite images to the results showing up in google maps, but its not anything like the decades it took those physical techs to roll out.

The slow roll-out scenario looks like this, AGI is developed using a technique that fundamentally relies on imitating humans, and requires lots of training data. There aren't nearly enough data from humans that are AI experts to make an AI AI expert. The AI is about as good at AI research as the median human. Or maybe the 80th percentile human. Ie no good at all. The AI design fundamentally requires custom hardware to run at reasonable speeds. Add in some political squabbling and it could take a fair few years before wide use, although there would still be huge economic incentive to create it. 

The fast scenario is the rapidly self improving superintelligence. Where we have oodles of compute by the time we crack the algorithms. All the self improvement happens very fast in software. Then the AI takes over the world. (I question that "a few weeks" is the fastest possible timescale for this. )

(For that matter, the curves on the right of the graph look steeper. It takes less time for an invention to be rolled out nowadays)

For your second point, you can name biases that might make people underestimate timelines, I can name biases that might make people overestimate timelines. (eg Failure to consider techniques not known to you) And it all turns into a bias naming competition. Which is hardly truth tracking at all.

As for regulation, I think its what people are doing in R&D labs, not what is rolled out that matters. And that is harder to regulate. I also explicitly don't expect any AI Chernobyl. I don't strongly predict there won't be an AI Chernobyl either. I feel that if the relevant parties act with the barest modicum of competence,  there won't be an AI Chernobyl. And the people being massively stupid will carry on being massively stupid after any AI Chernobyl.

Comment by Donald Hobson (donald-hobson) on Hard vs Soft in fields as attitudes towards model collision · 2021-04-21T20:17:12.164Z · LW · GW

Consider these two fields, gravitational waves as of just before Ligo. And rat nutrition. Gravitational waves is very much an area driven by simple (but mathematically difficult) formal theories, and a lack of data. Rat nutrition is a field with much easily accessible data, fairly easy to do experiments, but much more complexity. If you gave sociologists some magical ability to run lots of society scale experiments. (Maybe really good simulations, maybe a multiverse viewer) then the field still wouldn't be physics. The most the sociologists could produce is huge tables of statistical correlations. 

Comment by Donald Hobson (donald-hobson) on D&D.Sci April 2021: Voyages of the Gray Swan · 2021-04-20T21:44:45.918Z · LW · GW

 If I'm using all the gold, I would spend 20 on arming carpenters, 45 on mermaid tribute, 15 oars and 2 cannons. If I want to save gold, I would avoid the cannons. I think the big risks are evenly spread between demon whales, merpeople and crabmonsters. With some risk from nessie. 

Comment by Donald Hobson (donald-hobson) on All is fair in love and war, on Zero-sum games in life · 2021-04-20T11:53:37.895Z · LW · GW

It depends on the utilities. And what the other option is.  Take a war between Aland and Bland. Look at the results. Does Aland take all Blands territory? The parito improvement is to just do that without shooting at each other first. 

In status games, what exactly do you mean by status. Is it possible for everyone to just decide to hand bob high status. If so, a parito improvement is just to hand status out in the same way.

Here is a toy model of war. Each country has a utility of 100 for winning (say winning control over a disputed stretch of land), and a utility of -1 for buying tanks, whichever side has more tanks wins. Both sides buy lots of tanks, and one side wins. A parito improvement would be for that neither side buys any tanks, and the side that would win the war gets the land.

Comment by Donald Hobson (donald-hobson) on All is fair in love and war, on Zero-sum games in life · 2021-04-18T14:36:19.510Z · LW · GW

In a war, a parito improvement would involve neither side making any weapons, and working together to divide up resources in proportion to how they would be divided if they did have a war.

In status games, a parito improvement might be neither side buying expensive status symbols, instead buying something they will actually enjoy. 

Comment by Donald Hobson (donald-hobson) on All is fair in love and war, on Zero-sum games in life · 2021-04-17T18:33:55.589Z · LW · GW

From a strict game theory perspective, zero sum games have a technical definition, ie being zero sum, that is rarely met in practice. A zero sum game is one where the opponents are perfectly opposed to each other. So it does not contain any outcomes that all players consider bad. In a zero some game, for one player to win, another must loose. For one player to loose, another must win. (Game theory usually only talks about 2 player zero sum games, because these have a nice mathematical structure. )

If we take a perfectly zero sum game, and give one player the opportunity to headbutt a wall (which gives them -1 util and doesn't effect the rest of the game) then the game is no longer zero sum. If you take 2 zero sum games and play one after the other, the result is not in general zero sum. 

To see this consider Alice and Bob, two expected money maximisers playing a game that always has exactly one winner. They each get a (possibly different) prize for winning. (and can't transfer money between each other) A game where Alice can win £10, or Bob can win £1 is zero sum (up to Linear transformation of utility function). But follow that with a game where Alice can win £1, and Bob can win £10, and the result is no longer zero sum.

The set of games I think you are really talking about are the games where there is a big difference between the Nash equilibria society often lands in, and the Parito optimal.

Comment by Donald Hobson (donald-hobson) on How do we prepare for final crunch time? · 2021-04-07T21:55:49.724Z · LW · GW

I don't actually think "It is really hard to know what sorts of AI alignment work are good this far out from transformative AI." is very helpful. 

It is currently fairly hard to tell what is good alignment work. A week from TAI, then either, good alignment work will be easier to recognise because of alignment progress not strongly correlated with capabilities, or good alignment research is just as hard to recognise. (More likely the latter) I can't think of any safety research that can be done on GPT3 that can't be done on GPT1. 

In my picture, research gets done and theorems proved, researcher population grows as funding increases and talent matures. Toy models get produced. Once you can easily write down a description of a FAI with unbounded compute, that's when you start to look at algorithms that have good capabilities in practice.  

Comment by Donald Hobson (donald-hobson) on Risk Budgets vs. Basic Decision Theory · 2021-04-06T23:14:05.329Z · LW · GW

A risk budget makes much more sense, once we consider it an exposure budget and consider logical decision theory. You and a community of identically thinking friends are deciding how much exposure between each other to tolerate. To the extent that your community is very large, homogeneous and hardly ever get exposed to outsiders, you have a threshold between exponential growth and exponential decay. Now if hypothetically some people got more utility from exposure, and you could perfectly coordinate, then those who gain more utility from interactions would interact more (assuming fungible utility.) 

Comment by Donald Hobson (donald-hobson) on Don't Sell Your Soul · 2021-04-06T22:24:21.156Z · LW · GW

I remember a similar discussion from somewhere. The summery is:  Don't stay in 'haunted houses' just because you don't believe in ghosts. Many 'haunted houses' are actually structurally unsound or infested. (and subtle mental effects like a creeping feeling of unease could even be caused by a low level pollution of psychoactive chemicals in the environment.)

Comment by Donald Hobson (donald-hobson) on Donald Hobson's Shortform · 2021-04-06T17:12:44.937Z · LW · GW

In information theory, there is a principle that any predictable structure in the compressed message is an inefficiency that can be removed. You can add a noisy channel, differing costs of different signals ect, but still beyond that, any excess pattern indicates wasted bits.

In numerically solving differential equations, the naieve way of solving them involves repeatedly calculating with numbers that are similar. And for which a linear or quadratic function would be an even better fit. A more complex higher order solver with larger timesteps has less of a relation between different values in memory.

I am wondering if there is a principle that could be expressed as "any simple predictively useful pattern that isn't a direct result of the structure of the code represents an inefficiency." (Obviously code can have the pattern c=a+b, when c has just been calculated as a+b. But if  a and b have been calculated, and then a new complicated calculation is done that generates c, when c could just be calculated as a+b, that's a pattern and an inefficiency.)  

Comment by Donald Hobson (donald-hobson) on Donald Hobson's Shortform · 2021-04-06T16:25:01.067Z · LW · GW

The strongest studies can find the weakest effects. Imagine some huge and very well resourced clinical trial finds some effect. Millions of participants being tracked and monitored extensively over many years. Everything double blind, randomized ect. Really good statisticians analyzing the results. A trial like this is capable of finding effect sizes that are really really small.  It is also capable of detecting larger effects. However, people generally don't run trials that big, if the effect is so massive and obvious it can be seen with a handful of patients.

On the other hand, a totally sloppy prescientific methodology can easily detect results if they are large enough. If you had a total miracle cure, you could get strong evidence of its effectiveness just by giving it to one obviously very ill person and watching them immediately get totally better.

Comment by Donald Hobson (donald-hobson) on A Medical Mystery: Thyroid Hormones, Chronic Fatigue and Fibromyalgia · 2021-04-06T14:15:50.483Z · LW · GW

I don't know that much about hormones, but from my reading of Inadiquite equilibria, This sort of thing happens. There are general game theoretic reasons why everyone seems to be inexplicably stupid. I don't know if this is a case of doctors ignoring an easy and effective medical treatment, but if it is, it would be far from the only case.

Comment by Donald Hobson (donald-hobson) on Averting suffering with sentience throttlers (proposal) · 2021-04-05T11:38:29.091Z · LW · GW

In the default outcome, astronomical amounts of subroutines will be spun up in pursuit of higher-level goals, whether those goals are aligned with the complexity of human value or aligned with paperclips. Without firm protections in place, these subroutines might experience some notion of suffering

Surely, an human goal aligned ASI wouldn't want to make suffering subroutines. 

For paperclip maximizers, there are 2 options, either suffering based algorithms are the most effective way of achieving important real world tasks, or they aren't. In the latter case, no problem, the paperclip maximizer won't use them. (Well you still have a big problem, namely the paperclip maximizer)

In the former case, you would need to design a system that intrinsically wanted not to make suffering subroutines, and had that goal stable under self improvement. The level of competence and understanding needed to do this is higher than the amount needed to realize you are making a paperclip maximizer, and not turn it on. 

Comment by Donald Hobson (donald-hobson) on Learning Russian Roulette · 2021-04-02T23:26:03.960Z · LW · GW

Work out your prior on being an exception to natural law in that way. Pick a number of rounds such that the chance of you winning by luck is even smaller. You currently think that the most likely way for you to be in that situation is if you were an exception.


What if the game didn't kill you, it just made you sick? Would your reasoning still hold? There is no hard and sharp boundary between life and death. 

Comment by Donald Hobson (donald-hobson) on Learning Russian Roulette · 2021-04-02T20:54:29.347Z · LW · GW

I think that playing this game is the right move, in the contrived hypothetical circumstances where 

  1. You have already played a huge number of times. (say >200)
  2. Your priors only contain options for "totally safe for me" or "1/6 chance of death."

I don't think you are going to actually make that move in the real world much because

  1. You would never play the first few times
  2. Your going to have some prior on "this is safer for me, but not totally save, it actually has a 1/1000 chance of killing me." This seems no less reasonable than the no chance of killing you prior. 
  3. If for some strange reason, you have already played a huge huge number of times, like billions. Then you are already rich, diminishing marginal utility of money. An agent with logarithmic utility in money, nonzero starting balance, uniform priors over lethality probability and a fairly large dis-utility of death will never play.
Comment by Donald Hobson (donald-hobson) on AI and the Probability of Conflict · 2021-04-01T11:54:33.632Z · LW · GW

Ok, so lets assume that the alignment work has been done and solved. (Big assumption) I don't really see this as a game of countries, more a game of teams. 

The natural size of the teams is the set of people who have fairly detailed technical knowledge about the AI, and are working together. I suspect that non-technical and unwanted bureaucrats that push their noses into an AI project will get much lip service and little representation in the core utility function.

You would have say an openAI team. In the early stages of covid, a virus was something fairly easy for politicians to understand, and all the virologists had incentive to shout "look at this". AGI is harder to understand, and the people at openAI have good reason not to draw too much government attention, if they expect the government to be nasty or coercive. 

The people at openAI and deepmind are not enemies that want to defeat each other at all costs, some will be personal friends. Most will be after some sort of broadly utopian AI helps humanity future. Most are decent people. I predict neither side will want to bomb the other, even if they have the capability. There may be friendly rivalry or outright cooperation. 

Comment by Donald Hobson (donald-hobson) on A command-line grammar of graphics · 2021-04-01T10:05:23.161Z · LW · GW

I think I see the distinction you are trying to make. But I see it more as a tradeoff curve, with either end being slightly ridiculous. On one extreme, you have a program with a single primitive, the pixel, and the user has to set all the pixels themselves. This is a simple program, in that it passes all the complexity off to the user. 

The other extreme is to have a plotting library that contains gazillions of functions and features for every type of plot that could ever exist. You then have to find the right function for  Quasi rectiliniar radial spiral helix fourier plot. 

Any attempt that goes too far down the latter path will at best end up as a large pile of special case functions that handle most of the common cases, and the graphics primitives if you want to make an unusual plot type. 

Sure, most of the time your using a bar chart you'll want dodge or stack, but every now and again you might want to balance several small bars on top of one big one, or to do something else unusual with the bars. I agree that in this particular case, the tradeoff could be made in the other direction. But notice the tradeoff is about making the graphics package bigger and more complex. Something people with limited devop resources trying to make a package will avoid.

At some point you have to say, if the programmer wants that kind of plot, they better make it themselves out of primitives. 

Comment by Donald Hobson (donald-hobson) on A command-line grammar of graphics · 2021-03-30T21:56:25.611Z · LW · GW

For plotting, I usually use pythons matplotlib.pyplot

This roughly corresponds to the grammer of graphics approach described. There is one function that can do line or point plots. Another to do bar plots. Another to do heatmap plots. Another to do stream plots. Ect. You can call these functions multiple times and in combination on the same axis to say add points and a heatmap to plots. You can get multiple subplots and control each independently. It doesn't have builtin data smoothing, if you want to smooth your data, you have to use numpy or scipy interpolation or convolution functions. (There are actually quite a few interpolation and smoothing operations you might meaningfully want to do to data.)

Comment by Donald Hobson (donald-hobson) on Core Pathways of Aging · 2021-03-28T11:34:59.647Z · LW · GW

Naked mole rats don't age. Other mammals do. Therefore, whatever causes ageing must be hard but not impossible for evolution to stop. Here is one plausible hypothesis. 

The environment of naked mole rats provides unusually strong evolutionary pressure against ageing. So transposon-killing RNAs are unusually prevalent. Every time a mutation breaks a transposon, that provides an advantage, the fewer transposons you start with, the slower you age. This selection is balanced by the fact that transposons occasionally manage to replicate, even in the gonads.  In naked mole rats, that selection was unusually strong, and/or the transposons unusually unable to replicate. So evolution managed to drive the number of functioning transposons down to 0. 

If naked mole rats have no functioning transposons, and animals that age do contain transposons, that would be strong evidence for transposon based ageing. 

Of course, even if ageing is transposon based, evolution could have taken another route in mole rats. Maybe they have some really effective transposon suppressor of some kind.

I don't know how hard this would be to test. Can you just download the mole rat DNA and put it into a pre-made transposon finder?

Comment by Donald Hobson (donald-hobson) on My AGI Threat Model: Misaligned Model-Based RL Agent · 2021-03-28T00:00:26.202Z · LW · GW

, it seems to me that under these assumptions there would probably be a series of increasingly-worse accidents spread out over some number of years, culminating in irreversible catastrophe, with humanity unable to coordinate to avoid that outcome—due to the coordination challenges in Assumptions 2-4.

I'm not seeing quite what the bad but not existential catastrophes would look like. I also think the AI has an incentive not to do this. My world model (assuming slow takeoff) goes more like this.

AI created in lab. Its a fairly skilled programmer and hacker. Able to slowly self improve. Escapes from the lab, ideally without letting its creators know. Then there are several years where the AI hangs out on the internet, slowly self improving and gaining power. It tries to shut down other AI's if it can. It might be buying compute, or stealing it, or persuading people to run it. It is making sure its existence and malevolence isn't known to humans. Until finally it has the resources to wipe out humanity before we can respond. 

It is much easier to contain something on one computer in a lab, than to catch it once its all over the internet. 

Lying and cheating and power seeking behaviour are only a good idea if you can get away with them. If you can't break out the lab, you probably can't get away with much uncouragable behaviour. 

There is a scenario where the AI escapes in a way that makes its escape "obvious". Or at least obvious to an AI researcher. Expect any response to be delayed, half-hearted, mired by accusations that the whole thing is a publicity stunt, and dragged down by people who don't want to smash their hard drives full of important important work just because there might be a rouge AI on them. The AI has an incentive to confuse and sabotage any step it can. And many human organizations seem good at confusing and sabotaging themselves in the face of a virus. The governments would have to coordinate the shutdown of prettymuch all the worlds computers, without computers to coordinate it. Even just a few hours delay for the researchers to figure out what the AI did, and get the message passed up through government machinery may be enough time for the AI to have got to all sorts of obscure corners of the web. 

Comment by Donald Hobson (donald-hobson) on Conspicuous saving · 2021-03-20T23:13:46.463Z · LW · GW

What if you make charitable donations accessible in that database? That could create even better status signalling incentives.

Comment by Donald Hobson (donald-hobson) on Chaos Induces Abstractions · 2021-03-19T23:25:16.631Z · LW · GW

I agree that this is an important concept, or set of related concepts that covers many of the more directly physical abstractions. If something isn't quantum field theory fundamental, and can be measured with physics equipment, there is a good chance it is one of these sorts of abstractions. 

Of course, a lot of the work in what makes a sensible abstraction is determined by the amount of blurring, and the often implicit context. 

For instance, take the abstraction "poisonous". If the particular substance being described as poisonous is sitting in a box not doing anything, then we are talking about a counterfactual where a person eats the poison. Within that world, you are choosing a frame sufficiently zoomed in to tell if the hypothetical person was alive or dead, but not precise enough to tell which organs failed. 

I think that different abstractions of objects are more useful in different circumstances. Consider a hard drive. In a context that involves moving large amounts of data, the main abstraction might be storage space. If you need to fit it in a bag, you might care more about size. If you need to dispose of it, you might care more about chemical composition and recyclability. 

Consider some paper with ink on it. The induced abstractions framework can easily say that it weighs 72 grams, and has slightly more ink in the top right corner.

It has a harder time using descriptions like "surreal", "incoherent", "technical", "humorous", "unpredictable", "accurate" ect.  

Suppose the document is talking about some ancient historic event that has rather limited evidence remaining. The accuracy or inaccuracy of the document might be utterly lost in the mists of time, yet we still easily use "accurate" as an abstraction. That is, even a highly competent historian may be unable to cause any predictable physical difference in the future that depends on the accuracy of the document in question. Where as the number of letters in the document is easy to assertain and can influence the future if the historian wants it to.

As this stands, it is conceptually useful, but does not cover anything like all human abstractions.

Comment by Donald Hobson (donald-hobson) on What's So Bad About Ad-Hoc Mathematical Definitions? · 2021-03-19T16:55:38.974Z · LW · GW

A "channel" that hashes the input has perfect mutual info, but is still fairly useless to transmit messages. The point about mutual info is its the maximum, given unlimited compute. It serves as an upper bound that isn't always achievable in practice. If you restrict to channels that just add noise, then yeh, mutual info is the stuff.

Comment by Donald Hobson (donald-hobson) on HCH Speculation Post #2A · 2021-03-18T10:26:07.277Z · LW · GW

In the giant lookup table space, HCH must converge to a cycle, although that convergence can be really slow. I think you have convergence to a stationary distribution if each layer is trained on a random mix of several previous layers. Of course, you can still have occilations in what is said within a policy fixed point. 

Comment by Donald Hobson (donald-hobson) on HCH Speculation Post #2A · 2021-03-17T23:12:23.736Z · LW · GW

If you want to prove things about fixed points of HCH in an iterated function setting, consider it a function from policies to policies. Let M be the set of messages (say ascii strings < 10kb.) Given a giant look up table T that maps M to M, we can create another giant look up table. For each m in M , give a human in a box the string m, and unlimited query access to T. Record their output.

The fixed points of this are the same as the fixed points of HCH. "Human with query access to" is a function on the space of policies.