Yet More Modal Combat 2021-08-24T10:32:49.078Z
Brute force searching for alignment 2021-06-27T21:54:26.696Z
Biotech to make humans afraid of AI 2021-06-18T08:19:26.959Z
Speculations against GPT-n writing alignment papers 2021-06-07T21:13:16.727Z
Optimization, speculations on the X and only X problem. 2021-03-30T21:38:01.889Z
Policy restrictions and Secret keeping AI 2021-01-24T20:59:14.342Z
The "best predictor is malicious optimiser" problem 2020-07-29T11:49:20.234Z
Optimizing arbitrary expressions with a linear number of queries to a Logical Induction Oracle (Cartoon Guide) 2020-07-23T21:37:39.198Z
Web AI discussion Groups 2020-06-30T11:22:45.611Z
[META] Building a rationalist communication system to avoid censorship 2020-06-23T14:12:49.354Z
What does a positive outcome without alignment look like? 2020-05-09T13:57:23.464Z
Would Covid19 patients benefit from blood transfusions from people who have recovered? 2020-03-29T22:27:58.373Z
Programming: Cascading Failure chains 2020-03-28T19:22:50.067Z
Bogus Exam Questions 2020-03-28T12:56:40.407Z
How hard would it be to attack coronavirus with CRISPR? 2020-03-06T23:18:09.133Z
Intelligence without causality 2020-02-11T00:34:28.740Z
Donald Hobson's Shortform 2020-01-24T14:39:43.523Z
What long term good futures are possible. (Other than FAI)? 2020-01-12T18:04:52.803Z
Logical Counterfactuals and Proposition graphs, Part 3 2019-09-05T15:03:53.262Z
Logical Counterfactuals and Proposition graphs, Part 2 2019-08-31T20:58:12.851Z
Logical Optimizers 2019-08-22T23:54:35.773Z
Logical Counterfactuals and Proposition graphs, Part 1 2019-08-22T22:06:01.764Z
Programming Languages For AI 2019-05-11T17:50:22.899Z
Propositional Logic, Syntactic Implication 2019-02-10T18:12:16.748Z
Probability space has 2 metrics 2019-02-10T00:28:34.859Z
Allowing a formal proof system to self improve while avoiding Lobian obstacles. 2019-01-23T23:04:43.524Z
Logical inductors in multistable situations. 2019-01-03T23:56:54.671Z
Boltzmann Brains, Simulations and self refuting hypothesis 2018-11-26T19:09:42.641Z
Quantum Mechanics, Nothing to do with Consciousness 2018-11-26T18:59:19.220Z
Clickbait might not be destroying our general Intelligence 2018-11-19T00:13:12.674Z
Stop buttons and causal graphs 2018-10-08T18:28:01.254Z
The potential exploitability of infinite options 2018-05-18T18:25:39.244Z


Comment by Donald Hobson (donald-hobson) on What is the evidence on the Church-Turing Thesis? · 2021-09-20T18:35:08.192Z · LW · GW

Turing machines are kind of the limit of finite state machines. There are particular Turing machines that can simulate all possible finite state machines. In general, an arbitrary sequence of finite state machines can be uncomputable. But there are particular primitive recursive functions that simulate arbitrary finite state machines, (given the value as an input. ) You don't need the full strength of turing completeness to do that. So I would say, kind of no. There are systems strictly stronger than all finite automaton, yet not Turing complete. 

Really, the notion of a limit isn't rigorously defined here, so its hard to say.

Comment by Donald Hobson (donald-hobson) on What is the evidence on the Church-Turing Thesis? · 2021-09-19T13:00:23.001Z · LW · GW

Sometimes in mathematics, you can right 20 slightly different definitions and find you have defined 20 slightly different things. Other times you can write many different formalizations and find they are all equivalent. Turing completeness is the latter. It turns up in Turing machines, register machines, tiling the plane, Conways game of life and many other places. There are weaker and stronger possibilities, like finite state machines, stack machines and oracle machines. (Ie a Turing machine with a magic black box that solves the halting problem is stronger than a normal Turing machine)


Human brains are finite state machines. A Turing machine has unlimited memory and time. 

Physical laws are generally continuous, but there exists a Turing machine that takes in a number N and computes the laws to accuracy 1/N. This isn't philosophically forced, but it seems to be the way things are. All serious theories are computable.

We could conceivably be in a universe that wasn't simulateable by a Turing machine. Assuming our brains are simulatable, we could never know this absolutely, as simulators with a huge but finite amount of compute trying to trick us could never be ruled out. 0 and 1 aren't probabilities and you are never certain. Still, we could conceivably be in a situation were an uncomputable explanation is far simpler than any computable theory. 

Comment by Donald Hobson (donald-hobson) on Can you control the past? · 2021-08-28T15:58:29.732Z · LW · GW

perfect deterministic software twins, exposed to the exact same inputs. This example that shows, I think, that you can write on whiteboards light-years away, with no delays; you can move the arm of another person, in another room, just by moving your own.


In this situation, you can  draw a diagram of the whole thing, including all identical copies, on the whiteboard. However you can't point out which copy is you. 

In this scenario, I don't think you can say that you are one copy or the other. You are both copies.

Comment by Donald Hobson (donald-hobson) on So You Want to Colonize The Universe · 2021-08-27T14:38:18.126Z · LW · GW

You can send probes programmed to just grab resources, build radio receivers and wait.

Comment by Donald Hobson (donald-hobson) on Yet More Modal Combat · 2021-08-25T13:39:24.933Z · LW · GW

Not yet. I'll let you know if I make a follow-up post with this. Thanks for a potential research direction.

Comment by Donald Hobson (donald-hobson) on A gentle apocalypse · 2021-08-16T09:43:26.360Z · LW · GW

This seems to be a bizarre mangling of several different scenarios. 

Yet in most of the world, humans will probably no longer be useful to anything or anyone – even to each other – and will peacefully and happily die off. 

Many humans will want to avoid death as long as they can, and to have children. Most humans will not think "robots do all that boring factory work, therefore I'm useless therefore kill myself now". If the robots also do nappy changing and similar, it might encourage more people to be parents. And there are some humans that want humanity to continue, some that want to be immortal. 

 Having been trained to understand our human needs and human nature in minute detail, the AI we leave behind will be the sum total of all human values, desires, knowledge and aspiration.

I think that this is not nessesarily true. There are desings of AI that don't have human values. Its possible for the AI to understand human values in great detail but still care about something else. This is one of the problems Miri is trying to avoid. 


At that point, or soon thereafter, in the perfect world we can imagine all humans being provided all the basic needs without needing to work.

There is some utopian assumption here. Presumably the AI's have a lot of power at this point. Why are they using this power to create the bargin basement utopia you described. What stops an AI from indiscriminately slaughtering humans.

Also in the last paragraphs, I feel you are assuming the AI is rather humanlike. Many AI designs will be seriously alien. They do not think like you. There is no reason to assume they would be anything recognisably conscious.

And since by then the AI-economy will have already had a long run of human-supervised self-sufficiency, there is no reason to fear that without our oversight the robots left behind will run the world any worse than we can. 

A period of supervision doesn't prove much. There are designs of AI that behave when the humans are watching and then misbehave when the humans aren't watching. Maybe we have trained them to make good responsible use of the tech that existed at training time, but if they invent new different tech, they use it in a way we wouldn't want.


It really isn't clear what is supposed to be happening here. Did we build an AI that genuinely had our best interests at heart, but it turned out immortality was too hard, and the humans were having too much fun to reproduce? (Even though reproducing is generally considered to be quite fun) Or were these AI's delibirately trying to get rid of humanity. In which case why didn't all humans drop dead the moment the AI got access to serious weaponry? 

Comment by Donald Hobson (donald-hobson) on D&D.Sci August 2021: The Oracle and the Monk · 2021-08-14T12:54:18.659Z · LW · GW

I would use

Solar and Lunar Mana, which I think has about 66% chance of working. The  only mana type that can be predicted 10 days in advance (beyond the prediction of a random sample from the previous data) is Doom. And that still doesn't work out with as high probability. (Doom mana will be on a high at the time, but using it with solar gives a 69% chance of reaching 70 and an 11% chance of demons. So a 58% chance of doing good.) If the utility is 0 for any amount of mana <70, and the amount of mana if its >=70, then solar + earth does slightly better in expected utility. 54.6 vs the 54.2 for solar+lunar. It has slightly lower success probability 63%, but slightly more mana if it does succeed. 

Comment by Donald Hobson (donald-hobson) on D&D.Sci August 2021: The Oracle and the Monk · 2021-08-14T12:37:00.971Z · LW · GW

I would use 

Comment by Donald Hobson (donald-hobson) on Donald Hobson's Shortform · 2021-08-11T21:35:27.874Z · LW · GW

The bible is written by many authors, and contains fictional and fictionalized characters. Its a collection of several thousand year old fanfiction. Like modern fanfiction, people tried telling variations on the same story or the same characters. (2 entirely different genesis stories) Hence there not even being a pretence at consistency. This explains why the main character is so often portrayed as a Mary Sue. And why there are many different books each written in a different style. And the prevalence of weird sex scenes.

Comment by Donald Hobson (donald-hobson) on Against "blankfaces" · 2021-08-09T10:12:10.366Z · LW · GW

Another reason someone might stick to the rules is if they think the rules carry more wisdom than their own judgement. Suppose you knew you weren't great at verbal discussions, and could be persuaded into a lot of different positions by a smart fast-talker, if you engaged with the arguments at all. You also trust that the rules were written by smart wise experienced people. Your best strategy is to stick to the rules and ignore their arguments.

Someone comes along with a phone that's almost out of battery and a sob story about how they need it to be charged. They ask if they can just plug it in to your computer for a bit to charge it. If you refuse, citing "rule 172) no customer can plug any electronics into your computer. " then you look almost like a blankface. If you let them plug the phone in, you run the risk of malware. If you understand the risk of malware, you could refuse because of that. But if you don't understand that, the best you can do is follow rules that were written for some good reason, even if you don't know what it was.

Comment by Donald Hobson (donald-hobson) on The biological intelligence explosion · 2021-08-07T15:11:03.569Z · LW · GW

Genetic modification takes time. If you are genetically modifying embryos, thats ~20 years before they are usefully contributing to your attempt to make better embryos. 

Maybe you can be faster when enhancing already grown brains. Maybe not. Either way, enhancing already grown brains introduces even more complications.

At some point in this process, a human with at most moderate intelligence enhancement decides it would be easier to make an AI from scratch than to push biology any further. And then the AI can improve itself at computer speeds. 

In short, I don't expect the biological part of the process to be that explosive. It might be enough to trigger an AI intelligence explosion.

Comment by Donald Hobson (donald-hobson) on Slack Has Positive Externalities For Groups · 2021-07-30T14:06:03.461Z · LW · GW

If you have 100 hours, and and 100 commitments, each of which takes an hour, that is clearly a case of low time slack.

If you have 100 hours, and 80 commitments each of which takes either 0 or 2 hours (with equal chance) that is the high slack you seem to be talking about. Note that  units of free time are available. This person is still pretty busy. 

If you have 100 hours, and 1 hour of commitment, and most of the rest of the time will be spent laying in bed doing nothing or timewasting, this person has the most slack. 

A way reality might not line up with the superlinear returns on slack is that there aren't that many opportunities of similar value to take. 

Comment by Donald Hobson (donald-hobson) on Is the argument that AI is an xrisk valid? · 2021-07-25T12:42:30.573Z · LW · GW

Imagine a device that looks like a calculator. When you type 2+2, you get 7. You could conclude its a broken calculator, or that arithmetic is subjective, or that this calculator is not doing addition at all. Its doing some other calculation. 

Imagine a robot doing something immoral. You could conclude that its broken, or that morality is subjective, or that the robot isn't thinking about morality at all. 

These are just different ways to describe the same thing. 

Addition has general rules. Like a+b=b+a. This makes it possible to reason about. Whatever the other calculator computes may follow this rule, or different rules, or no simple rules at all. 

Comment by Donald Hobson (donald-hobson) on Is the argument that AI is an xrisk valid? · 2021-07-24T19:37:45.111Z · LW · GW

I think the assumption it that human-like morality isn't universally privileged. 

Human morality has been shaped by evolution in the ancestral environment. Evolution in a different environment would create a mind with different structures and behaviours.

In other words, a full specification of human morality is sufficiently complex that it is unlikely to be spontaneously generated.

In other words, there is no compact specification of an AI that would do what humans want, even when on an alien world with no data about humanity. An AI could have a pointer at human morality with instructions to copy it. There are plenty of other parts of the universe it could be pointed to, so this is far from a default.  

Comment by Donald Hobson (donald-hobson) on Is the argument that AI is an xrisk valid? · 2021-07-19T22:08:10.574Z · LW · GW

if a human had been brought up to have ‘goals as bizarre … as sand-grain-counting or paperclip-maximizing’, they could reflect on them and revise them in the light of such reflection.

Human "goals" and AI goals are a very different kind of thing. 

Imagine the instrumentally rational paperclip maximizer. If writing a philosophy essay will result in more paperclips, it can do that. If winning a chess game will lead to more paperclips, it will win the game. For any gradable task, if doing better on the task leads to more paperclips, it can do that task. This includes the tasks of talking about ethics, predicting what a human acting ethically would do etc. In short, this is what is meant by "far surpass all the intellectual activities of any man however clever.". 

The singularity hypothesis is about agents that are better at achieving their goal than human. In particular, the activities this actually depends on for an intelligence explosion are engineering and programming AI systems. No one said that an AI needed to be able to reflect on and change its goals.

Humans "ability" to reflect on and change our goals is more that we don't really know what we want. Suppose we think we want chocolate, and then we read about the fat content, and change our mind.  We value being thin more. The goal of getting chocolate was only ever an instrumental goal, it changed based on new information. Most of the things humans call goals are instrumental goals, not terminal goals. The terminal goals are difficult to intuitively access. This is how humans appear to change their "goals". And this is the hidden standard to which paperclip maximizing is compared and found wanting. There is some brain module that feels warm and fuzzy when it hears "be nice to people", and not when it hears "maximize paperclips". 

Comment by Donald Hobson (donald-hobson) on Is keeping AI "in the box" during training enough? · 2021-07-10T17:06:40.830Z · LW · GW

The training procedure is only judging based on actions during training. This makes it incapable of distinguishing between an agent that behaves in the box, and runs wild the moment it gets out the box, from an agent that behaves all the time. 

The training process produces no incentive that controls the behaviour of the agent after training. (Assuming the training and runtime environment differ in some way.)

As such, the runtime behaviour depends on the priors. The decisions implicit in the structure of the agent and training process, not just the objective. What kinds of agents are easiest for the training process to find. A sufficiently smart agent that understands its place in the world seems simple. A random smart agent will probably not have the utility function we want. (There are lots of possible utility functions.) But almost any agent with real world goals that understands the situation its in will play nice on the training, and then turn on us in deployment.

There are various discussions about what sort of training processes have this problem, and it isn't really settled. 

Comment by Donald Hobson (donald-hobson) on How much chess engine progress is about adapting to bigger computers? · 2021-07-08T20:05:00.307Z · LW · GW

I don't think this research, if done, would give you strong information about the field of AI as a whole. 

I think that, of the many topics researched by AI researchers, chess playing is far from the typical case. 

It's [chess] not the most relevant domain to future AI, but it's one with an unusually long history and unusually clear (and consistent) performance metrics.

An unusually long history implies unusually slow progress. There are problems that computers couldn't do at all a few years ago that they can do fairly efficiently now. Are there problems where people basically figured out how to do that decades ago and no significant progress has been made since? 

The consistency of chess performance looks like more selection bias. You aren't choosing a problem domain where there was one huge breakthrough that. You are choosing a problem domain that has had slow consistent progress. 

For most of the development of chess AI (All the way from Alpha Beta pruning to Alpha Zero) Chess AI's improved by an accumulation of narrow, chess specific tricks. (And more compute) How to represent chess states in memory in a fast and efficient manor. Better evaluation functions. Tables of starting and ending games. Progress on chess AI's contained no breakthroughs, no fundamental insights, only a slow accumulation of little tricks. 

There are cases of problems that we basically knew how to solve from the early days of computers, any performance improvements are almost purely hardware improvements.

There are problems where one paper reduces the compute requirements by 20 orders of magnitude. Or gets us from couldn't do X at all, to able to do X easily. 

The pattern of which algorithms are considered AI and which are considered maths and which are considered just programming is somewhat arbitrary. A chess playing algorithm is AI, a prime factoring algorithm is maths, a sorting algorithm is programming or computer science. Why? Well those are the names of the academic departments that work on them. 

You have a spectrum of possible reference classes for transformative AI that range from the almost purely software driven progress, to the almost totally hardware driven progress. 

To gain more info about transformative AI, someone would have to make either a good case for why it should be at a particular position on the scale, or a good case for why its position on the scale should be similar to the position of some previous piece of past research. In the latter case, we can gain from examining the position of that research topic. If hypothetically that topic was chess, then the research you propose would be useful. If the reason you chose chess was purely that you thought it was easier to measure, then the results are likely useless.

Comment by Donald Hobson (donald-hobson) on Rationality Yellow Belt Test Questions? · 2021-07-08T11:41:06.647Z · LW · GW

In front of you is several experimental results relating to an obscure physics phenomena. There are also 6 proposed explanations for this phenomena. One of these explanations is correct. The rest were written by people who didn't know the experimental results (and who failed to correctly deduce them based on surrounding knowledge) As such, these explanations cannot shift anticipation towards the correct experimental result. Your task is to distinguish the real explanation from the plausible sounding fake explanations.

Comment by Donald Hobson (donald-hobson) on Rationality Yellow Belt Test Questions? · 2021-07-08T11:27:42.962Z · LW · GW

On your screens you should see a user interface that looks somewhat like a circuit simulator program. You have several different types of components, available, and connectors between them. Some of the components contain an adjustable knob. Some contain a numeric output dial and some have neither. 

These components obey some simple equations. These equations do not necessarily correspond to any real world electronic components. To encourage thought about which experiment to perform next, you will be penalized for the number of components used. You are of course free to reuse components from one experiment in the next, but some experiments will break components. Most broken components will have a big red X appear on them. However at least one component can be broken in a way that doesn't have any visual indicator. You may assume that all fresh components are perfectly identical. You may use a calculator. Your task is to figure out the equations behind these components. Go.

Comment by Donald Hobson (donald-hobson) on Rationality Yellow Belt Test Questions? · 2021-07-08T10:59:57.916Z · LW · GW

The environment rolls 2 standard 6 sided dice , one red and one blue. The red dice shows the number of identical copies of your agent that will be created. Each agent will be shown one of the 2 dice. This is an independent coinflip for each agent. The agents are colourblind so have no idea which dice they saw. They just see a number.  The agents must assign probabilities to each of the 36 outcomes, and are scored on the log of the probability they assigned to the correct outcome. 

Write code for an agent that maximizes its total score across all copies.

Write code for an agent that maximizes its average score. 

Explain how and why these differ.

Comment by Donald Hobson (donald-hobson) on Could Advanced AI Drive Explosive Economic Growth? · 2021-07-03T23:06:44.761Z · LW · GW

I am not confident that GDP is a useful abstraction over the whole region of potential futures.

Suppose someone uses GPT5 to generate code, and then throws lots of compute at the generated code. GPT5 has generalized from the specific AI techniques humans have invented, seeing them as just a random sample from the broader space of AI techniques. When it samples from that space, it sometimes happens to pull out a technique more powerful than anything humans have invented. Given plenty of compute, it rapidly self improves. The humans are happy to keep throwing compute at it. (Maybe the AI is doing some moderately useful task for them, maybe they think its still training.) Neither the AI's actions, nor the amount of compute used are economically significant. (The AI can't yet gain much more compute without revealing how smart it is, and having humans try to stop it.) After a month of this, the AI hacks some lab equipment over the internet, and sends a few carefully chosen emails to a biotech company. A week later, nanobots escape the lab. A week after that and the grey goo has extinguished all earth life. 

Alternate. The AI thinks its most reliable route to takeover involves economic power. It makes loads of money performing various services, (Like 50% GDP money) It uses this money to buy all the compute, and to pay people to make nanobots. Grey goo as before.

(Does grey goo count as GDP? What about various techs that the AI develops that would be ever so valuable if they were under meaningful human control?)

So in this set of circumstances, whether there is explosive economic growth or not depends on whether "Do everything and make loads of money" or "stay quiet and hack lab equipment" offer faster / more reliable paths to nanobots. 

Comment by Donald Hobson (donald-hobson) on Confusions re: Higher-Level Game Theory · 2021-07-03T14:58:41.332Z · LW · GW

In a game with any finite number of players, and any finite number of actions per player.

Let  the set of possible outcomes.

Player   implements policy  . For each outcome in  , each player searches for proofs (in PA) that the outcome is impossible. It then takes the set of outcomes it has proved impossible, and maps that set to an action.

There is always a unique action that is chosen. Whatsmore, given oracles for 

Ie the set of actions you might take if you can prove at least the impossility results in   and possibly some others. 

Given such an oracle  for each agent, there is an algorithm for their behaviour that outputs the fixed point in polynomial (in  ) time.

Comment by Donald Hobson (donald-hobson) on Vignettes Workshop (AI Impacts) · 2021-06-25T14:02:58.680Z · LW · GW


The next task to fall to narrow AI is adversarial attacks against humans. Virulent memes and convincing ideologies become easy to generate on demand. A small number of people might see what is happening, and try to shield themselves off from dangerous ideas. They might even develop tools that auto-filter web content. Most of society becomes increasingly ideologized, with more decisions being made on political rather than practical grounds. Educational and research institutions become full of ideologues crowding out real research. There are some wars. The lines of division are between people and their neighbours, so the wars are small scale civil wars. 

Researcher have been replaced with people parroting the party line. Society is struggling to produce chips of the same quality as before. Depending on how far along renewables are, there may be an energy crisis. Ideologies targeted at baseline humans are no longer as appealing. The people who first developed the ideology generating AI didn't share it widely. The tech to AI generate new ideologies is lost. 

The clear scientific thinking needed for major breakthroughs has been lost. But people can still follow recipes. And make rare minor technical improvements to some things. Gradually, idealogical immunity develops. The beliefs are still crazy by a truth tracking standard, but they are crazy beliefs that imply relatively non-detrimental actions. Many years of high, stagnant tech pass. Until the culture is ready to reembrace scientific thought.

Comment by Donald Hobson (donald-hobson) on How can there be a godless moral world ? · 2021-06-23T07:39:05.036Z · LW · GW

A different way of going about this is to unpack what we mean by "god". We can ask about morality in worlds that contains something similar, something that is arguably "god" and arguably not. 

Polytheism. A whole bunch of gods that worked together to create the world. Between them they wrote the bible, koran and most other religious texts. 

A god matching all the typical descriptions, spaceless timeless creator of life and earth.. Responsible for all the bible and jesus stuff. This god is made out of particles behaving under physical laws. 

Perhaps this god is actually an alien that evolved billions of years ago, then used their advanced tech to travel to earth, create life and write the bible.

Maybe a ASI that is simulating the world as we know it. A team of human sociologists are instructing the AI, and told it to write some books and perform some miracles, to see how that effected society.

Stand off god. Spaceless timeless all knowing creator of the universe. Didn't write the bible, or even inspire it. Not related to Jesus in any way. Jesus was just a normal human skilled in magic tricks. This god isn't actually that interested in humans, they care more about pulsars, black holes etc.

Imagine you were in these worlds. Would the beings be what you called "god"? Well that's just a question of how you define words. Where would your morality come from in these worlds?

Comment by Donald Hobson (donald-hobson) on How can there be a godless moral world ? · 2021-06-22T21:30:28.888Z · LW · GW

You could consider morality to be like money or artistic beauty. There are no money particles. There are metal circles and plastic rectangles. But the concept of money as a means of transaction is something that exists in peoples heads. Money shapes the world by affecting what people do. Changing when they pick up and use objects. Causing people to do work they otherwise wouldn't. Such abstract and powerful forces can greatly shape the world. Little metal circles just sit there being little metal circles. That too is the force morality has in this world. All those people who consider morality and choose to do the right thing. There is light in this world, and it is us.

Different people have different preferences in paintings. Imagine you are programming a computer to calculate artistic beauty. Imagine coding a function func1(person, picture). This function takes as inputs a detailed brain scan of a person, and a particular picture. It calculates how much the person would enjoy looking at the picture. You can use this to define func2(picture)=func1(Bob,picture) . It happens that Bob has a rather simple taste in pictures. Bob likes red and symmetrical pictures. So we code func3(picture). This function just measures redness and symmetricalnes. It makes no reference to Bob. func2 calculates Bob's thoughts, it reasons about how bob will think. And yet both functions produce the same answers. 

Morality is similar. We can define moral1( person, situation) which measures how moral a particular person thinks a particular situation is.  A large value for moral1( Alice, situation) means that Alice actually feels motivated towards the situation and tries to cause it to happen. 

Now suppose that Eve is just evil. moral1( Alice, situation) might be wildly different from moral1( Eve, situation). But both Alice and Eve can (assuming both were knowledgable)  calculate both of these functions. When Bob looks at a picture, they aren't reasoning about their own reasoning, they are just thinking about how wonderfully red it is, the function within his mind is func3. When Alice looks at a situation, they aren't thinking about their own thinking either (in general) they are looking at all the cute bunnies and thinking how fluffy they are. 

So is morality subjective or objective? That depends on how you define your words. By morality, do you mean the equivalents of func1, func2 or func3? (Possibly with yourself or the average human in place of Bob?)This is just a question of definitions. 


On to how you should think and act. Imagine god gave you the complete and unambiguous guide to morality. Totally complete, totally superseding all previous instructions. Can you imagine being disappointed in this guide? Feeling that it wasn't as nice as it could be. Or perhaps horrified as god orders a human sacrifice? Can you imagine good news instead. That the guide to morality was everything you hoped it would be. What if you could somehow write the guide to morality, what would you write? Why not just do that. It is, after all, your choice. 

You can if you want delegate. You can fully acknowledge that god is fictional, and try to work out what god would want if he did exist, and do that. You could try to work out what Harry Potter would do if he did exist, and do that. Even if god does exist, its your choice whether to follow his instructions or to do something else. You could go with what feels right, your instinctive sense of niceness. You could calculate, weighing up lives with statistics to choose the greatest good. You could delegate to a hypothetical version of yourself who was smarter and kinder, and try to figure out what they would do. 

While it is your choice, choosing is in itself a process. Your choices are shaped by the kind of person you are. Which is itself shaped by the genes and words that lead to your existence as it is now. So in a sense, you are learning about the sort of person you are, a fact determined but not known to you. You must go through the process of choosing to learn this fact about yourself. It is your choice how you see the choice, you can see it as an onerous imposition of responsibility, or you can treat it lightly. A freedom from any external morality weighing you down. A feather that can drift in whatever direction their whims may take them. 

This is, I suspect, where your morality came from all along. Some would have come from deep within yourself. From the genes that specify the brain circuits of empathy. Some will come from the ethical advice of friends and neighbours. Some may come from the bible. The ink and paper of the bible is still there. You can use it as a source of ethical advice if you think it is good. I would recommend not using the bible as a source of ethical advice. Maybe look instead? But it is, after all, your choice. You can trust and rely on others, but it is your choice of who to trust. It always was. Even if the decision to trust was implicit and invisible. 

Comment by Donald Hobson (donald-hobson) on Reward Is Not Enough · 2021-06-20T22:50:25.193Z · LW · GW

I would be potentially concerned that this is a trick that evolution can use, but human AI designers can't use safely. 

In particular, I think this is the sort of trick that produces usually fairly good results when you have a fixed environment, and can optimize the parameters and settings for that environment. Evolution can try millions of birds, tweaking the strengths of desire, to get something that kind of works. When the environment will be changing rapidly; when the relative capabilities of cognitive modules are highly uncertain and when self modification is on the table, these tricks will tend to fail. (I think) 

Use the same brain architecture in a moderately different environment, and you get people freezing their credit card in blocks of ice so they can't spend it, and other self defeating behaviour. I suspect the tricks will fail much worse with any change to mental architecture. 

On your equivalence to an AI with an interpretability/oversight module. Data shouldn't be flowing back from the oversight into the AI. 

Comment by Donald Hobson (donald-hobson) on Covid 6/17: One Last Scare · 2021-06-20T11:19:29.829Z · LW · GW

We can get a rough idea of this by considering how much physical changes have a mental effect. Psychoactive chemicals, brain damage etc. Look at how much ethanol changes the behaviour of a single neuron in a lab dish. How much it changes human behaviour. And that gives a rough indication of how sensitively dependant human behaviour is on the exact behaviour of its constituent neurons. 

Comment by Donald Hobson (donald-hobson) on Covid 6/17: One Last Scare · 2021-06-20T11:12:39.374Z · LW · GW

For the goal of getting humans to mars, we can do the calculations and see that we need quite a bit of rocket fuel. You could reasonably be in a situation where you had all the design work done, but you still needed to get atoms into the right places, and that took a while. Big infrastructure projects can be easier to design. For a giant damm, most of the effort is in actually getting all the raw materials in place. This means you can know what it takes to build a damm, and be confident it will take at least 5 years given the current rate of concrete production. 

Mathematics is near the other end of the scale. If you know how to prove theorem X, you've proved it. This stops us being confident that a theorem won't be proved soon. Its more like a radioactive decay of an fairly long lived atom more likely to be next week than any other week. 

I think AI is fairly close to the maths, most of the effort is figuring out what to do.

Ways my statement could be false.

If we knew the algorithm, and the compute needed, but couldn't get that compute.

If AI development was an accumulation of many little tricks, and we knew how many tricks were needed. 


But at the moment, I think we can rule out confident long termism on AI. We have no way of knowing that we aren't just one clever idea away from AGI.

Comment by Donald Hobson (donald-hobson) on Covid 6/17: One Last Scare · 2021-06-20T10:44:19.049Z · LW · GW

I agree that purely synthetic AI will probably happen sooner.

Comment by Donald Hobson (donald-hobson) on Covid 6/17: One Last Scare · 2021-06-18T20:00:16.465Z · LW · GW

I think this post sums up the situation.

If you know how to make an AGI, you are only a little bit of coding before making it. We have limited AI's that can do some things, and aren't clear what we are missing. Experts are inventing all sorts of algorithms. 

There are various approaches like mind uploading, evolutionary algorithms etc that fairly clearly would work if we threw enough effort at them. Current reinforcement learning approaches seem like they might get smart, with enough compute and the right environment. 

Unless you personally end up helping make the first AGI, then you personally will probably not be able to see how to do it until after it is done (if at all). The fact that you personally can't think of any path to AGI does not tell us where we are on the tech path. Someone else might be putting the finishing touches on their AI right now. Once you know how to do it, you've done it.

Comment by Donald Hobson (donald-hobson) on Biotech to make humans afraid of AI · 2021-06-18T18:12:12.063Z · LW · GW

The main risk here is that it's easy to scare people so much that all of the research gets shut down.

Why do you think that this is easy to do and bad. There are currently a small number of people warning about AI. There is some scary media stories, but not enough to really do much. 

Do I really need to spell it out how that would be abused? 

If the capability is there, the world has to deal with it, whoever first uses it. If the project is somewhat "use once, then burn all the notes", then it wouldn't make it much easier for anyone else to follow in their footsteps.

I feel like if you start seriously considering things that are themselves almost as bad as AI ruin in their implications in order to address potential AI ruin, you took a wrong turn somewhere.

Typical human priors are full of anthropomorphism when thinking about AI. Suppose you have something that has about the effect of some rationality training, of learning about and really understanding a few good arguments for AI risk. Yes the same tech could be used for horrible brainwashy purposes, but hopefully we can avoid giving the tech to people who would use it like that. The hopeful future being one where humanity develops advanced AI very cautiously, taking as long as it needs to get it right, and then has a glorious FAI future.  This does not look "almost as bad as AI ruin" to me.

Comment by Donald Hobson (donald-hobson) on Avoiding the instrumental policy by hiding information about humans · 2021-06-14T15:57:49.422Z · LW · GW

There are various ideas along the lines of "however much you tell the AI X it just forgets it".

I think that would be the direction to look in if you have a design tha'ts safe as long as it doesn't know X.

Comment by Donald Hobson (donald-hobson) on A naive alignment strategy and optimism about generalization · 2021-06-10T12:52:11.282Z · LW · GW

There may be predictable errors in the training data, such that instrumental policy actually gets a lower loss than answering honestly (because it responds strategically to errors).

If you are answering questions as text, there is a lot of choice in wording. There are many strings of text that are a correct answer, and the AI has to pick the one the human would use. In order to predict how a human would word an answer, you need a fairly good understanding of how they think (I think). 

Comment by Donald Hobson (donald-hobson) on Speculations against GPT-n writing alignment papers · 2021-06-08T11:17:41.956Z · LW · GW

Maybe you did. I find it hard to distinguish inventing and half remembering ideas. 

If the training procedure either 

  1. Reliably produces mesaoptimisers with about the same values. or
  2. Reliably produces mesaoptimizers that can acausally cooperate
  3. The rest of the procedure allows one mesaoptimizer to take control of the whole output

Then using different copies of GPT-n trained from different seeds doesn't help.

If you just convert 1% of the english into network yourself, then all it needs to use is some error correction. Even without that, neural net structures are somewhat redundant. A malicious optimizer with a random 1% removed is probably still a malicious optimizer.

Secondly, its goal here is to give us subtly wrong alignment papers. Suppose we produce an examination and transparency step so thorough that no mesa optimizer can slip through. The original mesaoptimiser can just design a network that is doing its best as a mesaoptimiser free predictor, but has subtly maliciously chosen beliefs about AI research. 

Comment by Donald Hobson (donald-hobson) on Optimization, speculations on the X and only X problem. · 2021-06-08T10:54:50.029Z · LW · GW

I don't think that learning is moving around in codespace. In the simplest case, the AI is like any other non self modifying program. The code stays fixed as the programmers wrote it. The variables update. The AI doesn't start from null. The programmer starts from a blank text file, and adds code. Then they run the code. The AI can start with sophisticated behaviour the moment its turned on.

So are we talking about a program that could change from an X er to a Y er with a small change in the code written, or with a small amount of extra observation of the world?

Comment by Donald Hobson (donald-hobson) on [Event] Weekly Alignment Research Coffee Time (09/20) · 2021-06-07T19:15:36.997Z · LW · GW

There seems to be some technical problem with the link. It gives me a "Our apologies, your invite link has now expired (actually several hours ago, but we hate to rush people).

We hope you had a really great time! :)" message. Edit: As of a few minutes after stated start time. It worked last week.

Comment by Donald Hobson (donald-hobson) on Optimization, speculations on the X and only X problem. · 2021-06-07T12:14:02.406Z · LW · GW

My picture of an X and only X er is that the actual program you run should optimize only for X. I wasn't considering similarity in code space at all. 

Getting the lexicographically first formal ZFC proof of say the Collatz conjecture should be safe. Getting a random proof sampled from the set of all proofs < 1 terabyte long should be safe. But I think that there exist proofs that wouldn't be safe. There might be a valid proof of the conjecture that had the code for a paperclip maximizer encoded into the proof, and that exploited some flaw in computers or humans to bootstrap this code into existence. This is what I want to avoid. 

Your picture might be coherent and formalizable into some different technical definition. But you would have to start talking about difference in codespace, which can differ depending on different programming languages. 

The program if True: x() else: y() is very similar in codespace to if False: x() else: y()

If code space is defined in terms of minimum edit distance, then layers of interpereters, error correction and holomorphic encryption can change it. This might be what you are after, I don't know.

Comment by Donald Hobson (donald-hobson) on Rogue AGI Embodies Valuable Intellectual Property · 2021-06-04T10:23:09.409Z · LW · GW

On its face, this story contains some shaky arguments. In particular, Alpha is initially going to have 100x-1,000,000x more resources than Alice. Even if Alice grows its resources faster, the alignment tax would have to be very large for Alice to end up with control of a substantial fraction of the world’s resources.

This makes the hidden assumption that "resources" is a good abstraction in this scenario. 

It is being assumed that the amount of resources an agent "has" is a well defined quantity. It assumes agent can only grow their resources slowly by reinvesting them. And that an agent can weather any sabotage attempts by agents with far less resources. 

I think this assumption is blatantly untrue. 

Companies can be sabotaged in all sorts of ways. Money or material resources can be subverted, so that while they are notionally in the control of X, they end up benefiting Y, or just stolen. Taking over the world might depend on being the first party to develop self replicating nanotech, which might require just insight and common lab equipment.

Don't think "The US military has nukes, the AI doesn't, so the US military has an advantage", think "one carefully crafted message and the nukes will land where the AI wants them to, and the military commanders will think it their own idea."

Comment by Donald Hobson (donald-hobson) on Selection Has A Quality Ceiling · 2021-06-03T09:23:04.330Z · LW · GW

There are several extra features to consider. Firstly, even if you only test, that doesn't mean the skills weren't trained. Suppose there are lots of smart kids that really want to be astronauts. And that Nasa puts its selection criteria somewhere easily available. The kids then study the skills they think they need to pass the selection. Any time there is any reason to think that skills X,Y and Z are good combinations there will be more people with these skills then chance predicts. 

There is also the dark side, goodharts curse. It is hard to select over a large number of people without selecting for lying sociopaths that are gaming your selection criteria. 

Comment by Donald Hobson (donald-hobson) on Fixedness From Frailty · 2021-05-31T21:59:46.148Z · LW · GW

Its not the probability of hallucinating full stop, its the probability of hallucinating omega or psychic powers in particular. Also, while "Omega" sounds implausible, there are much more plausible scenarios involving humans inventing advanced brain scanning tech. 

Comment by Donald Hobson (donald-hobson) on Are PS5 scalpers actually bad? · 2021-05-19T09:45:48.230Z · LW · GW

True, but the extra money goes to the scalper to pay for the scalpers time. The moment the makers started selling the PS5 too cheep, they were destroying value in search costs. Scalpers don't change that.

Comment by Donald Hobson (donald-hobson) on Against Against Boredom · 2021-05-17T21:12:10.147Z · LW · GW

"A superintelligent FAI with total control over all your sensory inputs" seems to me a sufficient condition to avoid boredom. Kind of massive overkill. Unrestricted internet access is usually sufficient. 

You don't need to edit out pain sensitivity from humans to avoid pain. You can have a world where nothing painful happens to people. Likewise you don't need to edit out boredom, you can have a world with lots of interesting things in it. 

Think of all the things a human in the modern day might do for fun, and add at least as many things that are fun and haven't been invented yet.

Comment by Donald Hobson (donald-hobson) on Utopic Nightmares · 2021-05-17T20:54:37.663Z · LW · GW

Yes its unlikely that the utility turns out literally identical. However, people enjoy having friends that aren't just clones of themselves. (Alright, I don't have evidence for this, but it seems like something people might enjoy) Hence it is possible for a mixture of different types of people to be happier than either type of people on their own. 

If you use some computational theories of consciousness, there is no morally meaningful difference between one mind and two copies of the same mind.

Given the large but finite resources of reality, it is optimal to create a fair bit of harmless variation.

Comment by Donald Hobson (donald-hobson) on Does butterfly affect? · 2021-05-16T16:18:59.600Z · LW · GW

Instead, we must consider the full statistical ensemble of possible world, and quantify to what extent the butterfly shifts that ensemble.

add some small stochastic noise to it at all times t to generate the statistical ensemble of possibilities

In the typical scenario, the impact of this single perturbation will not rise above the impact of the persistent background noise inherent to any complex real-world system.

I think these quotes illustrate the mind projection fallacy. The "noise" is not an objective thing sitting out there in the real world, it is a feature of your own uncertainty. 

Suppose you have a computational model of the weather. You make the simplifying assumption that water evaporation is a function only of air temperature and humidity. Whereas in reality, the evaporation depends on puddle formation and plant growth and many other factors. Out in the real world, the weather follows its own rules perfectly. Those rules are the equations of the whole universe. "noise" is just what you call a hopefully small effect you don't have the knowledge, compute or inclination to calculate. 

If you have a really shoddy model of the weather, it won't be able to compute much. If you add a butterflys wing flaps to a current weather model, the knowledge of that small effect will be lost due to the mass of other small effects metrologists haven't calculated. Adding or removing a butterfly's wingflap doesn't meaningfully change our predictions given current predictive ability.  However, to a sufficiently advanced future weather predictor, that wingflap could meaningfully change the chance of a tornado. The predictor would need to be tracking every other wingflap globally, and much else besides. 

We are modelling as probabilistic processes that are actually deterministic but hard to calculate.

Comment by Donald Hobson (donald-hobson) on Agency in Conway’s Game of Life · 2021-05-14T17:16:32.882Z · LW · GW

Random Notes:

Firstly, why is the rest of the starting state random? In a universe where info can't be destroyed, like this one, random=max entropy. AI is only possible in this universe because the starting state is low entropy.

Secondly, reaching an arbitrary state can be impossible for reasons like conservation of mass energy momentum and charge. Any state close to an arbitrary state might be unreachable due to these conservation laws. Ie a state containing lots of negitive electric charges, and no positive charges being unreachable in our universe.

Well, quantum. We can't reach out from our branch to effect other branches.

This control property is not AI. It would be possible to create a low impact AI. Something that is very smart and doesn't want to affect the future much.

In the other direction, bacteria strategies are also a thing. I think it might be possible, both in this universe and in GOL, to create a non intelligent replicator. You could even hard code it to track its position, and turn on or off to make a smiley face. I'm thinking some kind of wall glider that can sweep across the GOL board destroying almost anything in its path. With crude self replicators behind it.

Observation response timescales. Suppose the situation outside the small controlled region was rapidly changing and chaotic. By the time any AI has done its reasoning, the situation has changed utterly. The only thing the AI can usefully do is reason about GOL in general. Ie any ideas it has are things that could have been hard coded into the design.

Comment by Donald Hobson (donald-hobson) on Challenge: know everything that the best go bot knows about go · 2021-05-12T22:08:23.574Z · LW · GW

I'm thinking of humans having some fast special purpose inbuilt pattern recognition, which is nondeterministic and an introspective black box, and a slow general purpose processor. Humans can mentally follow the steps of any algorithm, slowly. 

Thus if a human can quickly predict the results of program X, then either there is a program Y  based on however the human is thinking that does the same thing as X and takes only a handful of basic algorithmic operations. Or the human is using their pattern matching special purpose hardware. This hardware is nondeterministic, not introspectively accessible, and not really shaped to predict go bots. 

Either way, it also bears pointing out that if the human can predict the move a go bot would make, the human is at least as good as the machine. 

So you are going to need a computer program for "help" if you want to predict the exact moves. At this stage, you can ask if you really understand how the code works. And aren't just repeating it by route.

Comment by Donald Hobson (donald-hobson) on Challenge: know everything that the best go bot knows about go · 2021-05-12T10:13:21.494Z · LW · GW

This behaviour is consistent with local position based play that also considers "points ahead" as part of the situation.

Comment by Donald Hobson (donald-hobson) on Challenge: know everything that the best go bot knows about go · 2021-05-11T18:10:30.010Z · LW · GW

I think that it isn't clear what constitutes "fully understanding" an algorithm. 

Say you pick something fairly simple, like a floating point squareroot algorithm. What does it take to fully understand that. 

You have to know what a squareroot is. Do you have to understand the maths behind Newton raphson iteration if the algorithm uses that? All the mathematical derivations, or just taking it as a mathematical fact that it works. Do you have to understand all the proofs about convergence rates. Or can you just go "yeah, 5 iterations seems to be enough in practice". Do you have to understand how floating point numbers are stored in memory? Including all the special cases like NaN which your algorithm hopefully won't be given? Do you have to keep track of how the starting guess is made, how the rounding is done. Do you have to be able to calculate the exact floating point value the algorithm would give, taking into account all the rounding errors. Answering in binary or decimal? 

Is brute force minmax search easy to understand. You might be able to easily implement the algorithm, but you still don't know which moves it will make. In general, for any algorithm that takes a lot of compute, humans won't be able to work out what it will do without very slowly imitating a computer. There are some algorithms we can prove theorems about. But it isn't clear which theorems we need to prove to get "full understanding" 

Another obstacle to full understanding is memory. Suppose your go bot has memorized a huge list of "if you are in such and such situation move here" type rules. You can understand how gradient descent would generate good rules in the abstract. You have inspected a few rules in detail. But there are far too many rules for a human to consider them all. And the rules depend on a choice of random seed.  

Corollaries of success (non-exhaustive):

  • You should be able to answer questions like “what will this bot do if someone plays mimic go against it” without actually literally checking that during play. More generally, you should know how the bot will respond to novel counter strategies

There is not in general a way to compute what an algorithm does without running it. Some algorithms are going about the problem in a deliberately slow way. However if we assume that the go algorithm has no massive known efficiency gains. (Ie no algorithm that computes the same answer using a millionth of the compute) And that the algorithm is far too compute hungry for humans doing it manually. Then it follows that humans won't be able to work out exactly what the algorithm will do.

You should be able to write a computer program anew that plays go just like that go bot, without copying over all the numbers.

Being able to understand the algorithm well enough to program it for the first time, not just blindly reciting code. An ambiguous but achievable goal.

Suppose a bunch of people coded another Alpha go like system. The random seed is different. The layer widths are different. The learning rate is slightly different. Its trained with different batch size, for a different amount of iterations on a different database of stored games. It plays about as well. In many situations it makes a different move. The only way to get a go bot that plays exactly like alpha go is to copy everything including the random seed. This might have been picked based on lucky numbers or birthdays. You can't rederive from first principles what was never derived from first principles. You can only copy numbers across, or pick your own lucky numbers. Numbers like batch size aren't quite as pick your own, there are unreasonably small and large values, but there is still quite a lot of wiggle room. 

Comment by Donald Hobson (donald-hobson) on MIRI location optimization (and related topics) discussion · 2021-05-10T18:33:49.159Z · LW · GW

I think that Scotland would be a not bad choice (Although I am obviously somewhat biased about that)

Speaks the language, plenty of nice scenery. Reasonably sensible political situation. (I would say overall better than america) Cool weather. Good public healthcare. Some nice uni towns with a fair bit of stem community. Downsides would include being further from america. (I don't know where all your colleges are located, I wouldn't be surprised if a lot were in america, and a fair few were in Europe.)

I would recommend looking somewhere on the outskirts of Dundee, St Andrews, Edinburgh or Glasgow.

Comment by Donald Hobson (donald-hobson) on AMA: Paul Christiano, alignment researcher · 2021-05-09T23:47:28.096Z · LW · GW

"These technologies are deployed sufficiently narrowly that they do not meaningfully accelerate GWP growth." I think this is fairly hard for me to imagine (since their lead would need to be very large to outcompete another country that did deploy the technology to broadly accelerate growth), perhaps 5%?

I think there is a reasonable way it could happen even without an enormous lead. You just need either,

  1. Its very hard to capture a significant fraction of the gains from the tech.
  2. Tech progress scales very poorly in money. 

For example, suppose it is obvious to everyone that AI in a few years time will be really powerful. Several teams with lots of funding are set up. If progress is researcher bound, and researchers are ideologically committed to the goals of the project, then top research talent might be extremely difficult to buy. (They are already well paid, for the next year they will be working almost all day. After that, the world is mostly shaped by which project won.) 

Compute could be hard to buy if there were hard bottlenecks somewhere in the chip supply chain, most of the worlds new chips were already being used by the AI projects, and an attitude of "our chips and were not selling" was prevalent. 

Another possibility, suppose deploying a tech means letting the competition know how it works. Then if one side deploys, they are pushing the other side ahead. So the question is, does deploying one unit of research give you the resources to do more than one unit?