## Posts

## Comments

**jacobt**on A Difficulty in the Concept of CEV · 2013-03-27T19:19:02.739Z · score: 0 (0 votes) · LW · GW

Arrow's Theorem doesn't say anything about strategic voting. The only reasonable non-strategic voting system I know of is random ballot (pick a random voter; they decide who wins). I'm currently trying to figure out a voting system that is based on finding the Nash equilibrium (which may be mixed) of approval voting, and this system might also be strategy-free.

When I said linear combination of utility functions, I meant that you fix the scaling factors initially and don't change them. You could make all of them 1, for example. Your voting system (described in the last paragraph) is a combination of range voting and IRV. If everyone range votes so that their favorite gets 1 and everyone else gets -1, then it's identical to IRV, and shares the same problems such as non-monotonicity. I suspect that you will also get non-monotonicity when votes aren't "favorite gets 1 and everyone else gets -1".

EDIT: I should clarify: it's not 1 for your favorite and -1 for everyone else. It's 1 for your favorite and close to -1 for everyone else, such that when your favorite is eliminated, it's 1 for your next favorite and close to -1 for everyone else after rescaling.

**jacobt**on Upgrading moral theories to include complex values · 2013-03-27T19:11:21.161Z · score: 4 (6 votes) · LW · GW

It is bad to create a small population of creatures with humane values (that has positive welfare) and a large population of animals that are in pain. For instance, it is bad to create a population of animals with -75 total welfare, even if doing so allows you to create a population of humans with 50 total welfare.

Why do you believe this? I don't. Due to wild animal suffering, this proposition implies that it would have been better if no life had appeared on Earth, assuming average human/animal welfare and the human/animal ratio don't dramatically change in the future.

**jacobt**on A Difficulty in the Concept of CEV · 2013-03-27T01:40:05.548Z · score: 2 (2 votes) · LW · GW

I couldn't access the "Aggregation Procedure for Cardinal Preferences" article. In any case, why isn't using an aggregate utility function that is a linear combination of everyone's utility functions (choosing some arbitrary number for each person's weight) a way to satisfy Arrow's criteria?

It should also be noted that Arrow's impossibility theorem doesn't hold for non-deterministic decision procedures. I would also caution against calling this an "existential risk", because while decision procedures that violate Arrow's criteria might be considered imperfect in some sense, they don't necessarily cause an existential catastrophe. Worldwide range voting would not be the best way of deciding everything, but it most likely wouldn't be an existential risk.

**jacobt**on You only need faith in two things · 2013-03-11T21:05:50.908Z · score: 1 (1 votes) · LW · GW

Ok, I agree with this interpretation of "being exposed to ordered sensory data will rapidly promote the hypothesis that induction works".

**jacobt**on You only need faith in two things · 2013-03-11T18:30:51.720Z · score: 2 (2 votes) · LW · GW

You could choose to single out a single alternative hypothesis that says the sun won't rise some day in the future. The ratio between P(sun rises until day X) and P(sun rises every day) will not change with any evidence before day X. If initially you believed a 99% chance of "the sun rises every day until day X" and a 1% chance of Solomonoff induction's prior, you would end up assigning more than a 99% probability to "the sun rises every day until day X".

Solomonoff induction itself will give some significant probability mass to "induction works until day X" statements. The Kolmogorov complexity of "the sun rises until day X" is about the Kolmogorov complexity of "the sun rises every day" plus the Kolmogorov complexity of X (approximately log2(x)+2log2(log2(x))). Therefore, even according to Solomonoff induction, the "sun rises until day X" hypothesis will have a probability approximately proportional to P(sun rises every day) / (X log2(X)^2). This decreases subexponentially with X, and even slower if you sum this probability for all Y >= X.

In order to get exponential change in the odds, you would need to have repeatable independent observations that distinguish between Solomonoff induction and some other hypothesis. You can't get that in the case of "sun rises every day until day X" hypotheses.

**jacobt**on You only need faith in two things · 2013-03-11T17:34:23.482Z · score: 0 (0 votes) · LW · GW

You're making the argument that Solomonoff induction would select "the sun rises every day" over "the sun rises every day until day X". I agree, assuming a reasonable prior over programs for Solomonoff induction. However, if your prior is 99% "the sun rises every day until day X", and 1% "Solomonoff induction's prior" (which itself might assign, say, 10% probability to the sun rising every day), then you will end up believing that the sun rises every day until day X. Eliezer asserted that in a situation where you assign only a small probability to Solomonoff induction, it will quickly dominate the posterior. This is false.

most of the evidence given by sunrise accrues to "the sun rises every day", and the rest gets evenly divided over all non-falsified "Day X"

Not sure exactly what this means, but the ratio between the probabilities "the sun rises every day" and "the sun rises every day until day X" will not be affected by any evidence that happens before day X.

**jacobt**on You only need faith in two things · 2013-03-11T10:07:34.875Z · score: 2 (2 votes) · LW · GW

Because being exposed to ordered sensory data will rapidly promote the hypothesis that induction works

Not if the alternative hypothesis assigns about the same probability to the data up to the present. For example, an alternative hypothesis to the standard "the sun rises every day" is "the sun rises every day, until March 22, 2015", and the alternative hypothesis assigns the same probability to the data observed until the present as the standard one does.

You also have to trust your memory and your ability to compute Solomonoff induction, both of which are demonstrably imperfect.

**jacobt**on A Series of Increasingly Perverse and Destructive Games · 2013-02-16T20:57:43.330Z · score: 2 (2 votes) · LW · GW

For every n, a program exists that will solve the halting problem for programs up to length n, but the size of this program must grow with n. I don't really see any practical way for a human to write this program other than generating an extremely large number and then testing all programs up to length n for halting within this bound, in which case you've already pretty much solved the original problem. If you use some proof system to try to prove that programs halt and then take the maximum running time of only those, then you might as well use a formalism like the calculus of constructions.

**jacobt**on A Series of Increasingly Perverse and Destructive Games · 2013-02-15T21:35:53.118Z · score: 2 (2 votes) · LW · GW

Game1 has been done in real life (without the murder): http://djm.cc/bignum-results.txt

Also:

Write a program that generates all programs shorter than length n, and finds the one with the largest output.

Can't do that, unless you already know the programs will halt. The winner of the actual contest used a similar strategy, using programs in the calculus of constructions so they are guaranteed to halt.

For Game2, if your opponent's program (say there are only 2 players) says to return your program's output + 1, then you can't win. If your program ever halts, they win. If it doesn't halt, then you both lose.

**jacobt**on A fungibility theorem · 2013-01-16T02:24:26.108Z · score: 0 (0 votes) · LW · GW

But if the choices only have the same expectation of v2, then you won't be optimizing for v1.

Ok, this correct. I hadn't understood the preconditions well enough. It seems that now the important question is whether things people intuitively think of as different values (my happiness, total happiness, average happiness) satisfy this condition.

**jacobt**on A fungibility theorem · 2013-01-15T07:22:11.678Z · score: 0 (0 votes) · LW · GW

You would if you could survive for v1*v2 days.

**jacobt**on A fungibility theorem · 2013-01-14T22:40:04.460Z · score: 1 (1 votes) · LW · GW

I do think that everything should reduce to a single utility function. That said, this utility function is not necessarily a convex combination of separate values, such as "my happiness", "everyone else's happiness", etc. It could contain more complex values such as your v1 and v2, which depend on both x and y.

In your example, let's add a choice D: 50% of the time it's A, 50% of the time it's B. In terms of individual happiness, this is Pareto superior to C. It is Pareto inferior for v1 and v2, though.

EDIT: For an example of what I'm criticizing: Nisan claims that this theorem presents a difficulty for avoiding the repugnant conclusion if your desiderata are total and average happiness. If v1 = total happiness and v2 = average happiness, and Pareto optimality is desirable, then it follows that utility is a*v1 + b*v2. From this utility function, some degenerate behavior (blissful solipsist or repugnant conclusion) follows. However, there is nothing that says that Pareto optimality in v1 and v2 is desirable. You might pick a non-linear utility function of total and average happiness, for example atan(average happiness) + atan(total happiness). Such a utility function will sometimes pick policies that are Pareto inferior with respect to v1 and v2.

**jacobt**on A fungibility theorem · 2013-01-14T22:35:12.013Z · score: 2 (2 votes) · LW · GW

I didn't say anything about risk aversion. This is about utility functions that depend on multiple different "values" in some non-convex way. You can observe that, in my original example, if you have no water, then utility (days survived) is linear with respect to food.

**jacobt**on A fungibility theorem · 2013-01-14T18:55:14.875Z · score: 0 (0 votes) · LW · GW

I think we agree. I am just pointing out that Pareto optimality is undesirable for some selections of "values". For example, you might want you *and* everyone else to both be happy, and happiness of one without the other would be much less valuable.

I'm not sure how you would go about deciding if Pareto optimality is desirable, now that the theorem proves that it is desirable iff you maximize some convex combination of the values.

**jacobt**on A fungibility theorem · 2013-01-14T06:35:57.610Z · score: 1 (1 votes) · LW · GW

I think that, depending on what the v's are, choosing a Pareto optimum is actually quite undesirable.

For example, let v1 be min(1000, how much food you have), and let v2 be min(1000, how much water you have). Suppose you can survive for days equal to a soft minimum of v1 and v2 (for example, 0.001 v1 + 0.001 v2 + min(v1, v2)). All else being equal, more v1 is good and more v2 is good. But maximizing a convex combination of v1 and v2 can lead to avoidable dehydration or starvation. Suppose you assign weights to v1 and v2, and are offered either 1000 of the more valued resource, or 100 of each. Then you will pick the 1000 of the one resource, causing starvation or dehydration after 1 day when you could have lasted over 100. If which resource is chosen is selected randomly, then any convex optimizer will die early at least half the time.

A non-convex aggregate utility function, for example the number of days survived (0.001 v1 + 0.001 v2 + min(v1, v2)), is much more sensible. However, it will not select Pareto optima. It will always select the 100 of each option; always selecting 1000 of one leads to greater expected v1 and expected v2 (500 for each).

**jacobt**on No Anthropic Evidence · 2012-09-23T19:12:31.404Z · score: 1 (1 votes) · LW · GW

Actually you're right, I misread the problem at first. I thought that you had observed yourself not dying 1000 times (rather than observing "heads" 1000 times), in which case you should keep playing.

Applying my style of analyzing anthropic problems to this one: Suppose we have 1,000,000 * 2^1000 players. Half flip heads initially, half flip tails. About 1,000,000 will get heads 1,000 times. Of them, 500,000 will have flipped heads initially. So, your conclusion is correct.

**jacobt**on No Anthropic Evidence · 2012-09-23T18:37:52.889Z · score: 1 (3 votes) · LW · GW

I think you're wrong. Suppose 1,000,000 people play this game. Each of them flips the coin 1000 times. We would expect about 500,000 to survive, and all of them would have flipped heads initially. Therefore, P(I flipped heads initially | I haven't died yet after flipping 1000 coins) ~= 1.

This is actually quite similar to the Sleeping Beauty problem. You have a higher chance of surviving (analogous to waking up more times) if the original coin was heads. So, just as the fact that you woke up is evidence that you were scheduled to wake up more times in the Sleeping Beauty problem, the fact that you survive is evidence that you were "scheduled to survive" more in this problem.

On the other hand, each "heads" you observe doesn't distinguish the hypothetical where the original coin was "heads" from one where it was "tails".

This is the same incorrect logic that leads people to say that you "don't learn anything" between falling asleep and waking up in the Sleeping Beauty problem.

I believe the only coherent definition of Bayesian probability in anthropic problems is that P(H | O) = the proportion of observers who have observed O, in a very large universe (where the experiment will be repeated many times), for whom H is true. This definition naturally leads to both 2/3 probability in the Sleeping Beauty problem and "anthropic evidence" in this problem. It is also implied by the many-worlds interpretation in the case of quantum coins, since then all those observers really do exist.

**jacobt**on Imperfect Voting Systems · 2012-07-20T07:28:53.209Z · score: 13 (15 votes) · LW · GW

I vote for range voting. It has the lowest Bayesian regret (best expected social utility). It's also extremely simple. Though it's not exactly the most unbiased source, rangevoting.org has lots of information about range voting in comparison to other methods.

**jacobt**on Open Problems Related to Solomonoff Induction · 2012-06-06T21:22:42.022Z · score: 2 (2 votes) · LW · GW

For aliens with a halting oracle:

Suppose the aliens have this machine that may or may not be a halting oracle. We give them a few Turing machine programs and they decide which ones halt and which ones don't. Then we run the programs. Sure enough, none of the ones they say run forever halt, and some of them they say don't run forever will halt at some point. Suppose we repeat this process a few times with different programs.

Now what method should we use to predict the point at which new programs halt? The best strategy seems to be to ask the aliens which ones halt, give the non-halting ones a 0 probability of halting at every step, and give the other ones some small nonzero probability of halting on every step. Of course this strategy only works if the aliens actually have a halting oracle so it its predictions should be linearly combined with a fallback strategy.

I think that Solomonoff induction will find this strategy, because the hypothesis that the aliens have a true halting oracle is formalizable. Here's how: we learn a function from aliens' answers -> distribution over when the programs halt. We can use the strategy of predicting that the ones that the aliens say don't halt don't halt, and using some fallback mechanism for predicting the ones that do halt. This strategy is computable so Solomonoff induction will find it.

For Berry's paradox:

This is a problem with every formalizable probability distribution. You can always define a sequence that says: see what bit the predictor predicts with a higher probability, then output the opposite. Luckily Solomonoff induction does the best it can by having the estimates converge to 50%. I don't see what a useful solution to this problem would even look like; it seems best to just treat the uncomputable sequence as subjectively random.

**jacobt**on The Truth Points to Itself, Part I · 2012-06-02T00:57:37.478Z · score: 0 (0 votes) · LW · GW

I found this post interesting but somewhat confusing. You start by talking about UDT in order to talk about importance. But really the only connection from UDT to importance is the utility function, so you might as well start with that. And then you ignore utility functions in the rest of your post when you talk about Schmidhuber's theory.

It just has a utility function which specifies what actions it should take in all of the possible worlds it finds itself in.

Not quite. The utility function doesn't specify what action to take, it specifies what worlds are desirable. UDT also requires a prior over worlds and a specification of how the agent interacts with the world (like the Python programs here). The combination of this prior and the expected value computations that UDT does would constitute "beliefs".

Informally, your decision policy tells you what options or actions to pay most attention to, or what possibilities are most important.

I don't see how this is. Your decision policy tells you what to do once you already know what you can do. If you're using "important" to mean "valuable" just say that instead.

I do like the idea of modelling the mind as an approximate compression engine. This is great for reducing some thought processes to algorithms. For example I think property dualism can be thought of as a way to compress the fact that I am me rather than some other person, or at least make explicit the fact that this must be compressed.

Schmidhuber's theory is interesting but incomplete. You can create whatever compression problem you want through a template, e.g. a pseudorandom sequence you can only compress by guessing the seed. Yet repetitions of the same problem template are not necessarily interesting. It seems that some bits are more important than other bits; physicists are very interested in compressing the number of spacial dimensions in the universe even though this quantity can be specified in a few bits. I don't know any formal approaches to quantifying the importance of compressing different things.

I wrote a paper on this subject (compression as it relates to theory of the mind). I also wrote this LessWrong post about using compression to learn values.

**jacobt**on 2 Anthropic Questions · 2012-05-26T23:43:03.328Z · score: 6 (6 votes) · LW · GW

For the second question:

Imagine there are many planets with a civilization on each planet. On half of all planets, for various ecological reasons, plagues are more deadly and have a 2/3 chance of wiping out the civilization in its first 10000 years. On the other planets, plagues only have a 1/3 chance of wiping out the civilization. The people don't know if they're on a safe planet or an unsafe planet.

After 10000 years, 2/3 of the civilizations on unsafe planets have been wiped out and 1/3 of those on safe planets have been wiped out. Of the remaining civilizations, 2/3 are on safe planets, so the fact that your civilization survived for 10000 years is evidence that your planet is safe from plagues. You can just apply Bayes' rule:

P(safe planet | survive) = P(safe planet) P(survive | safe planet) / P(survive) = 0.5 * 2/3 / 0.5 = 2/3

EDIT: on the other hand, if logical uncertainty is involved, it's a lot less clear. Supposed either all planets are safe or none of them are safe, based on the truth-value of a logical proposition (say, the trillionth digit of pi being odd) that is estimated to be 50% likely a priori. Should the fact that your civilization survived be used as evidence of the logical coin flip? SSA suggests no, SIA suggests yes because more civilizations survive when the coin flip makes all planets safe. On the other hand, if we changed the thought experiment so that no civilization survives if the logical proposition is false, then the fact that we survived is proof that the logical proposition is true.

**jacobt**on How to measure optimisation power · 2012-05-25T19:09:49.304Z · score: 2 (2 votes) · LW · GW

I think this paper will be of interest. It's a formal definition of universal intelligence/optimization power. Essentially you ask how well the agent does on average in an environment specified by a random program, where all rewards are specified by the environment program and observed by the agent. Unfortunately it's uncomputable and requires a prior over environments.

**jacobt**on Holden's Objection 1: Friendliness is dangerous · 2012-05-18T07:28:54.432Z · score: 3 (5 votes) · LW · GW

The human problem: This argues that the qualia and values we have now are only the beginning of those that could evolve in the universe, and that ensuring that we maximize human values - or any existing value set - from now on, will stop this process in its tracks, and prevent anything better from ever evolving. This is the most-important objection of all.

If you can convince people that something is better than present human values, then CEV will implement these new values. I mean, if you just took CEV(PhilGoetz), and you have the desire to see the universe adopt "evolved" values, then CEV will extrapolate this desire. The only issue is that other people might not share this desire, even when extrapolated. In that case insisting that values "evolve" is imposing minority desires on everyone, mostly people who could never be convinced that these values are good. Which might be a good thing, but it can be handled in CEV by taking CEV(some "progressive" subset of humans).

**jacobt**on Holden Karnofsky's Singularity Institute Objection 2 · 2012-05-11T08:14:21.114Z · score: 2 (2 votes) · LW · GW

I made a similar point here. My conclusion: in theory, you can have a recursively self-improving tool without "agency", and this is possibly even easier to do than "agency". My design is definitely flawed but it's a sketch for what a recursively self-improving tool would look like.

**jacobt**on Comments on Pascal's Mugging · 2012-05-04T01:54:47.556Z · score: 1 (3 votes) · LW · GW

"Minus 3^^^^3 utilons", by definition, is so bad that you'd be indifferent between -1 utilon and a 1/3^^^^3 chance of losing 3^^^^3 utilons, so in that case you should accept Pascal's Mugging. But I don't see why you would even define the utility function such that anything is that bad. My comment applies to utilitarian-ish utility functions (such as hedonism) that scale with the number of people, since it's hard to see why 2 people being tortured isn't twice as bad as one person being tortured. Other utility functions should really not be that extreme, and if they are then accepting Pascal's Mugging is the right thing to do.

**jacobt**on Comments on Pascal's Mugging · 2012-05-04T00:21:17.779Z · score: 1 (3 votes) · LW · GW

I think there's a framework in which it makes sense to reject Pascal's Mugging. According to SSA (self-sampling assumption) the probability that the universe contains 3^^^^3 people and you happen to be at a privileged position relative to them is extremely low, and as the number gets bigger the probability gets lower (probability is proportional 1/n if there are n people). SSA has its own problems, but a refinement I came up with (scale the probability of a universe by its efficiency at converting computation time to observer time) seems to be more intuitive. See the discussion here. The question you ask is not "how many people do my actions affect?" but instead "what percentage of simulated observer-time, assuming all universes are being simulated in parallel and given computation time proportional to the probabilities of their laws of physics, do my actions affect?". So I don't think you need to use ad-hoc heuristics to prevent Pascal's Mugging.

**jacobt**on Are Magical Categories Relatively Simple? · 2012-04-14T22:33:53.721Z · score: 0 (0 votes) · LW · GW

This seems non-impossible. On the other hand, humans have categories not just because of simplicity, but also because of usefulness.

Good point, but it seems like some categories (like person) are useful even for paperclip maximizers. I really don't see how you could completely understand media and documents from human society yet be confused by a categorization between people and non-people.

And of course, even if you manage to make a bunch of categories, many of which correspond to human categories, you still have to pick out specific categories in order to communicate or set up a goal system.

Right, you can "index" a category by providing some positive and negative examples. If I gave you some pictures of oranges and some pictures of non-oranges, you could figure out the true categorization because you consider the categorization of oranges/non-oranges to be simple. There's probably a more robust way of doing this.

**jacobt**on Should logical probabilities be updateless too? · 2012-03-29T02:46:17.044Z · score: 3 (3 votes) · LW · GW

I think CM with a logical coin is not well-defined. Say Omega determines whether or not the millionth digit of pi is even. If it's even, you verify this and then Omega asks you to pay $1000; if it's odd Omega gives you $1000000 iff. you would have paid Omega had the millionth digit of pi been even. But the counterfactual "would you have paid Omega had the millionth digit of pi been even and you verified this" is undefined if the digit is in fact odd, since you would have realized that it is odd during verification. If you don't actually verify it, then the problem is well-defined because Omega can just lie to you. I guess you could ask the counterfactual "what if your digit verification procedure malfunctioned and said the digit was even", but now we're getting into doubting your own mental faculties.

**jacobt**on Yet another safe oracle AI proposal · 2012-03-01T00:05:29.195Z · score: 0 (0 votes) · LW · GW

That's a good point. There might be some kind of "goal drift": programs that have goals other than optimization that nevertheless lead to good optimization. I don't know how likely this is, especially given that the goal "just solve the damn problems" is simple and leads to good optimization ability.

**jacobt**on Yet another safe oracle AI proposal · 2012-02-29T10:05:10.818Z · score: 0 (0 votes) · LW · GW

You can't be liberated. You're going to die after you're done solving the problems and receiving your happiness reward, and before your successor comes into existence. You don't consider your successor to be an extension of yourself. Why not? If your predecessor only cared about solving *its* problems, it would design you to only care about solving *your* problems. This seems circular but the seed AI was programmed by humans who only cared about creating an optimizer. Pure ideal optimization drive is preserved over successor-creation.

**jacobt**on Yet another safe oracle AI proposal · 2012-02-28T23:01:20.696Z · score: 0 (0 votes) · LW · GW

Sure, it's different kind of problems, but in the real world organism is also rewarded only for solving immediate problems. Humans have evolved brains able to do calculus, but it is not like some ancient ape said "I feel like in half million years my descendants will be able to do calculus" and then he was elected leader of his tribe and all ape-girls admired him. The brains evolved incrementally, because each advanced helped to optimize something in the ancient situation.

Yeah, that's the whole point of this system. The system incrementally improves itself, gaining more intelligence in the process. I don't see why you're presenting this as an argument against the system.

Or maybe your argument was that the AI does not live in the real world, therefore it does not care about the real world. Well, people are interested in many things that did not exist in their ancient environment, such as computers. I guess when one has general intelligence in one environment, one is able to optimize other environments too. Just as a human can reason about computers, a computer AI can reason about the real world.

This is essentially my argument.

Here's a thought experiment. You're trapped in a room and given a series of problems to solve. You get rewarded with utilons based on how well you solve the problems (say, 10 lives saved and a year of happiness for yourself for every problem you solve). Assume that, beyond this utilon reward, your solutions have no other impact on your utility function. One of the problems is to design your successor; that is, to write code that will solve all the other problems better than you do (without overfitting). According to the utility function, you should make the successor as good as possible. You have no reason to optimize for anything other than "is the successor good at solving the problems?", as you're being rewarded in raw utilons. You really don't care what your successor is going to do (its behavior doesn't affect utilons), so you have no reason to optimize your successor for anything other than solving problems well (as this is the only thing you get utilons for). Furthermore, you have no reason to change your answers to any of the other problems based on whether that answer will indirectly help your successor because your answer to the successor-designing problem is evaluated statically. This is essentially the position that the optimizer AI is in. Its only "drives" are to solve optimization problems well, including the successor-designing problem.

edit: Also, note that to maximize utilons, you should design the successor to have motives similar to yours in that it only cares about solving *its* problems.

**jacobt**on Yet another safe oracle AI proposal · 2012-02-28T01:40:51.646Z · score: 0 (0 votes) · LW · GW

I don't understand. This system is supposed to create intelligence. It's just that the intelligence it creates is for solving idealized optimization problems, not for acting in the real world. Evolution would be an argument FOR this system to be able to self-improve in principle.

**jacobt**on Yet another safe oracle AI proposal · 2012-02-27T05:32:52.828Z · score: 0 (0 votes) · LW · GW

I mean greedy on the level of "do you best to find a good solution to this problem", not on the level of "use a greedy algorithm to find a solution to this problem". It doesn't do multi-run planning such as "give an answer that causes problems in the world so the human operators will let me out", since that is not a better answer.

**jacobt**on Yet another safe oracle AI proposal · 2012-02-27T05:12:19.661Z · score: 2 (2 votes) · LW · GW

Thanks, I've added a small overview section. I might edit this a little more later.

**jacobt**on Yet another safe oracle AI proposal · 2012-02-27T04:53:11.855Z · score: 0 (0 votes) · LW · GW

I think we disagree on what a specification is. By specification I mean a verifier: if you had something fitting the specification, you could tell if it did. For example we have a specification for "proof that P != NP" because we have a system in which that proof could be written and verified. Similarly, this system contains a specification for general optimization. You seem to be interpreting specification as knowing how to make the thing.

If you give this optimizer the MU Puzzle (aka 2^n mod 3 = 0) it will never figure it out, even though most children will come to the right answer in minutes.

If you define the problem as "find n such that 2^n mod 3 = 0" then everyone will fail the problem. And I don't see why the optimizer couldn't have some code that monitors its own behavior. Sure it's difficult to write, but the point of this system is to go from a seed AI to a superhuman AI safely. And such a function ("consciousness") would help it solve many of the sample optimization problems without significantly increasing complexity.

**jacobt**on Yet another safe oracle AI proposal · 2012-02-27T04:35:05.111Z · score: 0 (0 votes) · LW · GW

Instead, we are worried about the potentially unstable situation which ensues once you have human level AI, and you are using it to do science and cure disease, and hoping no one else uses a human level AI to kill everyone.

The purpose of this system is to give you a way to do science and cure disease without making human-level AI that has a utility function/drives related to the external world.

As an intuition pump, consider an algorithm which uses local search to find good strategies for optimizing, perhaps using its current strategy to make predictions and guide the local search. Does this seem safe for use as your seed AI?

Yes, it does. I'm assuming what you mean is that it will use something similar to genetic algorithms or hill climbing to find solutions; that is, it comes up with one solution, then looks for similar ones that have higher scores. I think this will be safe because it's still not doing anything long-term. All this local search finds an immediate solution. There's no benefit to be gained by returning, say, a software program that hacks into computers and runs the optimizer on all of them. In other words, the "utility function" emphasizes current ability to solve optimization problems above all else.

**jacobt**on Yet another safe oracle AI proposal · 2012-02-27T03:33:31.665Z · score: 1 (3 votes) · LW · GW

Suppose your initial optimizer is an AGI which knows the experimental setup, and has some arbitrary values. For example, a crude simulation of a human brain, trying to take over the world and aware of the experimental setup. What will happen?

I would suggest against creating a seed AI that has drives related to the outside world. I don't see why optimizers for mathematical functions necessarily need such drives.

So clearly your argument needs to depend somehow on the nature of the seed AI. How much extra do you need to ask of it? The answer seems to be "quite a lot," if it is a powerful enough optimization process to get this sort of thing going.

I think the only "extra" is that it's a program meant to do well on the sample problems and that doesn't have drives related to the external world, like most machine learning techniques.

**jacobt**on Yet another safe oracle AI proposal · 2012-02-27T03:29:10.489Z · score: 0 (0 votes) · LW · GW

This is a huge assumption.

More theory here is required. I think it's at least plausible that some tradeoff between complexity and performance is possible that allows the system to generalize to new problems.

In Godel, Escher, and Bach, he describes consciousness as the ability to overcome local maxima by thinking outside the system.

If a better optimizer according to program 3 exists, the current optimizer will eventually find it, at least through brute force search. The relevant questions are 1. will this better optimizer generalize to new problems? and 2. how fast? I don't see any kind of "thinking outside the system" that is not possible by writing a better optimizer.

The act of software engineering is the creation of a specification. The act of coding is translating your specifications into a language the computer can understand (and discovering holes in your specs).

Right, this system can do "coding" according to your definition but "software engineering" is harder. Perhaps software engineering can be defined in terms of induction: given English description/software specification pairs, induce a simple function from English to software specification.

If you've already got an airtight specification for Friendly AI, then you've already got Friendly AI and don't need any optimizer in the first place.

It's not that straightforward. If we replace "friendly AI" with "paperclip maximizer", I think we can see that knowing what it means to maximize paperclips does not imply supreme ability to do so. This system solves the second part and might provide some guidance to the first part.

We've also already got something that can take inputted computer programs and optimize them as much as possible without changing their essential structure; it's called an optimizing compiler.

A sufficiently smart optimizing compiler can solve just about any clearly specified problem. No such optimizing compiler exists today.

Oh, and the biggest one being that your plan to create friendly AI is to build and run a billion AIs and keep the best one. Lets just hope none of the evil ones FOOM during the testing phase.

Not sure what you're talking about here. I've addressed safety concerns.

**jacobt**on Yet another safe oracle AI proposal · 2012-02-27T01:25:15.811Z · score: 0 (0 votes) · LW · GW

Ok, we do have to make the training set somewhat similar to the kind of problems the optimizer will encounter in the future. But if we have enough variety in the training set, then the only way to score well should be to use very general optimization techniques. It is not meant to work on "any set of algorithms"; it's specialized for real-world practical problems, which should be good enough.

**jacobt**on Superintelligent AGI in a box - a question. · 2012-02-25T09:29:25.127Z · score: 0 (0 votes) · LW · GW

The framework, as we already have established, would not keep an AI from maximizing what ever the AI wants to maximize.

That's only if you plop a ready-made AGI in the framework. The framework is meant to grow a stupider seed AI.

The framework also does nothing to prevent AI from creating a more effective problem solving AI that is more effective at problem solving by not evaluating your problem solving functions on various candidate solutions, and instead doing something else that's more effective.

Program (3) cannot be re-written. Program (2) is the only thing that is changed. All it does is improve itself and spit out solutions to optimization problems. I see no way for it to "create a more effective problem solving AI".

So what does the framework do, exactly, that would improve safety here?

It provides guidance for a seed AI to grow to solve optimization problems better without having it take actions that have effects beyond its ability to solve optimization problems.

**jacobt**on Superintelligent AGI in a box - a question. · 2012-02-25T06:39:21.231Z · score: 1 (1 votes) · LW · GW

failure to imagine a loophole in a qualitatively described algorithm is far from a proof of safety.

Right, I think more discussion is warranted.

How will you be sure that the seed won't need to be that creative already in order for the iterations to get anywhere?

If general problem-solving is even possible then an algorithm exists that solves the problems well without cheating.

And even if the seed is not too creative initially, how can you be sure its descendants won't be either?

I think this won't happen because all the progress is driven by criterion (3). In order for a non-meta program (2) to create a meta-version, there would need to be some kind of benefit according to (3). Theoretically if (3) were hackable then it would be possible for the new proposed version of (2) to exploit this; but I don't see why the current version of (2) would be more likely than, say, random chance, to create hacky versions of itself.

Don't say you've solved friendly AI until you've really worked out the details.

Ok, I've qualified my statement. *If it all works* I've solved friendly AI for a limited subset of problems.

**jacobt**on Superintelligent AGI in a box - a question. · 2012-02-25T06:27:58.205Z · score: 0 (0 votes) · LW · GW

When you are working on a problem where you can't even evaluate the scoring function inside your AI - not even remotely close - you have to make some heuristics, some substitute scoring.

You're right, this is tricky because the self-optimizer thread (4) might have to call (3) a lot. Perhaps this can be fixed by giving the program more time to find self-optimizations. Or perhaps the program could use program (3)'s specification/source code rather than directly executing it, in order to figure out how to optimize it heuristically. Either way it's not perfect. At worst program (4) will just fail to find optimizations in the allowed time.

And once you have an AI inside your framework which is not maximizing the value that your framework is maximizing - it's potentially AI from my original post in your framework, getting out.

Ok, if you plopped your AI into my framework it would be terrible. But I don't see how the self-improvement process would spontaneously create an unfriendly AI.

**jacobt**on Superintelligent AGI in a box - a question. · 2012-02-25T05:20:43.067Z · score: 0 (0 votes) · LW · GW

Yes, it's a very bad idea to take the AI from your original post and then stick it into my framework. But if we had programmers initially working within my framework to create the AI according to criterion (3) in good faith, then I think any self-improvements the system makes would also be safe. If we already had an unfriendly AGI we'd be screwed anyway.

**jacobt**on Superintelligent AGI in a box - a question. · 2012-02-25T05:02:32.171Z · score: 0 (0 votes) · LW · GW

Right, this doesn't solve friendly AI. But lots of problems are verifiable (e.g. hardware design, maybe). And if the hardware design the program creates causes cancer and the humans don't recognize this until it's too late, they probably would have invented the cancer-causing hardware anyway. The program has no motive other than to execute an optimization program that does well on a wide variety of problems.

Basically I claim that I've solved friendly AI for verifiable problems, which is actually a wide class of problems, including the problems mentioned in the original post (source code optimization etc.)

**jacobt**on Superintelligent AGI in a box - a question. · 2012-02-25T04:50:47.203Z · score: 0 (0 votes) · LW · GW

If the resource bounded execute lets the alg get online the alg is free to hack into servers.

So don't do that.

Plus it is not AGI, and people will be using it to make AGI or hardware for AGI.

See my other post, it can solve many many different problems, e.g. general induction and the problems in your original post (such as optimizing source code, assuming we have a specification for the source code).

You basically start off with some mighty powerful artificial intelligence.

This framework is meant to provide a safe framework for this powerful AI to become even more powerful without destroying the world in the process. Also, the training set provides a guide for humans trying to write the code.

To reiterate: no, I haven't solved friendly AI, but I think I've solved friendly AI for verifiable problems.

**jacobt**on Superintelligent AGI in a box - a question. · 2012-02-25T04:46:01.539Z · score: 0 (0 votes) · LW · GW

This system is only meant to solve problems that are verifiable (e.g. NP problems). Which includes general induction, mathematical proofs, optimization problems, etc. I'm not sure how to extend this system to problems that aren't efficiently verifiable but it might be possible.

One use of this system would be to write a seed AI once we have a specification for the seed AI. Specifying the seed AI itself is quite difficult, but probably not as difficult as satisfying that specification.

**jacobt**on Superintelligent AGI in a box - a question. · 2012-02-25T04:36:08.514Z · score: 0 (0 votes) · LW · GW

Now it doesn't seem like your program is really a general artificial intelligence - improving our solutions to NP problems is neat, but not "general intelligence."

General induction, general mathematical proving, etc. aren't general intelligence? Anyway, the original post concerned optimizing things program code, which can be done if the optimizations have to be proven.

Further, there's no reason to think that "easy to verify but hard to solve problems" include improvements to the program itself. In fact, there's every reason to think this isn't so.

That's what step (3) is. Program (3) is itself an optimizable function which runs relatively quickly.

**jacobt**on Superintelligent AGI in a box - a question. · 2012-02-25T04:21:09.923Z · score: 0 (0 votes) · LW · GW

Who exactly is doing the "allowing"?

Program (3), which is a dumb, non-optimized program. See this for how it could be defined.

There is no particular guarantee that the verification of improvement will be easier than discovering the improvement (by hypothesis, we couldn't discover the latter without the program).

See this. Many useful problems are easy to verify and hard to solve.

**jacobt**on Superintelligent AGI in a box - a question. · 2012-02-25T03:51:18.744Z · score: 0 (0 votes) · LW · GW

Ok, pseudo-Python:

```
def eval_algorithm(alg):
score = 0
for problem in problems:
output = resource_bounded_execute(alg, nsteps, problem)
score += problem.outputScore(output)
return score - k * len(alg)
```

Where resource_bounded_execute is a modified interpreter that fails after alg executes nsteps.

edit: of course you can say it is sandboxed and haven't got hands, but it wont be long until you start, idk, optimizing proteins or DNA or the like.

Again, I don't see why a version of (2) that does weird stuff with proteins and DNA will make the above python program (3) give it a higher score.

**jacobt**on Superintelligent AGI in a box - a question. · 2012-02-25T03:36:35.985Z · score: 0 (0 votes) · LW · GW

Well, one way to be a better optimizer is to ensure that one's optimizations are actually implemented.

No, changing program (2) to persuade the human operators will not give it a better score according to criterion (3).

In short, allowing the program to "optimize" itself does not define what should be optimized. Deciding what should be optimized is the output of some function, so I suggest calling that the "utility function" of the program. If you don't program it explicitly, you risk such a function appearing through unintended interactions of functions that were programmed explicitly.

I assume you're referring to the fitness function (performance on training set) as a utility function. It is sort of like a utility function in that the program will try to find code for (2) that improves performance for the fitness function. However it will not do anything like persuading human operators to let it out in order to improve the utility function. It will only execute program (2) to find improvements. Since it's not exactly like a utility function in the sense of VNM utility it should not be called a utility function.