Posts

Look at the Shape of Your Utility Distribution 2019-08-30T23:27:16.326Z · score: 18 (14 votes)
Is LW making progress? 2019-08-24T00:32:31.600Z · score: 23 (13 votes)
Intransitive Preferences You Can't Pump 2019-08-09T23:10:36.650Z · score: 2 (3 votes)
Against Occam's Razor 2018-04-05T17:59:27.583Z · score: 2 (15 votes)
How I see knowledge aggregation 2018-02-03T10:31:25.359Z · score: 64 (18 votes)
Against Instrumental Convergence 2018-01-27T13:17:19.389Z · score: 24 (12 votes)

Comments

Comment by zulupineapple on Noticing Frame Differences · 2019-10-04T17:57:09.356Z · score: 1 (1 votes) · LW · GW

In the examples, sometimes the problem is people having different goals for the discussion, sometimes it is having different beliefs about what kinds of discussions work, and sometimes it might be about almost object-level beliefs. If "frame" refers to all of that, then it's way too broad and not a useful concept. If your goal is to enumerate and classify the different goals and different beliefs people can have regarding discussions, that's great, but possibly to broad to make any progress.

My own frustration with this topic is lack of real data. Apart from "FOOM Debate", the conversations in your post are all fake. To continue your analogy in another comment, this is like doing zoology by only ever drawing cartoons of animals, without ever actually collecting or analyzing specimens. Good zoologists would collect many real discussions, annotate them, classify them, debate about those classifications, etc. They may also tamper with ongoing discussions. You may be doing some of that privately, but doing it publicly would be better. Unfortunately there seem to be norms against that.

Comment by zulupineapple on ozziegooen's Shortform · 2019-09-13T18:14:32.330Z · score: 1 (1 votes) · LW · GW

Making long term predictions is hard. That's a fundamental problem. Having proxies can be convenient, but it's not going to tell you anything you don't already know.

Comment by zulupineapple on Book Review: Secular Cycles · 2019-09-13T18:11:53.145Z · score: 1 (1 votes) · LW · GW

That's what I think every time I hear "history repeats itself". I wish Scott had considered the idea.

The biggest claim Turchin is making seems to be about the variance of the time intervals between "bad" periods. Random walk would imply that it is high, and "cycles" would imply that it is low.

Comment by zulupineapple on ozziegooen's Shortform · 2019-09-07T22:55:56.996Z · score: 1 (1 votes) · LW · GW
For example, say I wanted to know how good/enjoyable a specific movie would be.

My point is that "goodness" is not a thing in the territory. At best it is a label for a set of specific measures (ratings, revenue, awards, etc). In that case, why not just work with those specific measures? Vague questions have the benefit of being short and easy to remember, but beyond that I see only problems. Motivated agents will do their best to interpret the vagueness in a way that suits them.

Is your goal to find a method to generate specific interpretations and procedures of measurement for vague properties like this one? Like a Shelling point for formalizing language? Why do you feel that can be done in a useful way? I'm asking for an intuition pump.

Can you be more explicit about your definition of "clearly"?

Certainly there is some vagueness, but it seems that we manage to live with it. I'm not proposing anything that prediction markets aren't already doing.

Comment by zulupineapple on ozziegooen's Shortform · 2019-09-07T12:23:04.144Z · score: -2 (2 votes) · LW · GW
"What is the relative effectiveness of AI safety research vs. bio risk research?"

If you had a precise definition of "effectiveness" this shouldn't be a problem. E.g. if you had predictions for "will humans go extinct in the next 100 years?" and "will we go extinct in the next 100 years, if we invest 1M into AI risk research?" and "will we go extinct, if we invest 1M in bio risk research?", then you should be able to make decisions with that. And these questions should work fine in existing forecasting platforms. Their long term and conditional nature are problems, of course, but I don't think that can be helped.

"How much value has this organization created?"

That's not a forecast. But if you asked "How much value will this organization create next year?" along with a clear measure of "value", then again, I don't see much of a problem. And, although clearly defining value can be tedious (and prone to errors), I don't think that problem can be avoided. Different people value different things, that can't be helped.

One solution attempt would be to have an "expert panel" assess these questions

Why would you do that? What's wrong with the usual prediction markets? Of course, they're expensive (require many participants), but I don't think a group of experts can be made to work well without a market-like mechanism. Is your project about making such markets more efficient?

Comment by zulupineapple on Why are the people who could be doing safety research, but aren’t, doing something else? · 2019-08-30T07:24:26.849Z · score: 1 (1 votes) · LW · GW

While it's true that preferences are not immutable, the things that change them are not usually debate. Sure, some people can be made to believe that their preferences are inconsistent, but then they will only make the smallest correction needed to fix the problem. Also, sometimes debate will make someone claim to have changed their preferences, just to that they can avoid social pressures (e.g. "how dare you not care about starving children!"), but this may not reflect in their actions.

Regardless, my claim is that many (or most) people discount a lot, and that this would be stable under reflection. Otherwise we'd see more charity, more investment and more work on e.g. climate change.

Comment by zulupineapple on A Personal Rationality Wishlist · 2019-08-30T05:50:39.632Z · score: 1 (1 votes) · LW · GW

Ok, that makes the real incentives quite different. Then, I suspect that these people are navigating facebook using the intuitions and strategies from the real world, without much consideration for the new digital environment.

Comment by zulupineapple on A Personal Rationality Wishlist · 2019-08-29T13:42:21.185Z · score: 1 (1 votes) · LW · GW

Yes, and you answered that question well. But the reason I asked for alternative responses, was so that I could compare them to unsolicited recommendations from the anime-fan's point of view (and find that unsolicited recommendations have lower effort or higher reward).

Also, I'm not asking "How did your friend want the world to be different", I'm asking "What action could your friend have taken to avoid that particular response?". The friend is a rational agent, he is able to consider alternative strategies, but he shouldn't expect that other people will change their behavior when they have no personal incentive to do so.

Comment by zulupineapple on Research Agenda v0.9: Synthesising a human's preferences into a utility function · 2019-08-29T11:11:14.821Z · score: 1 (1 votes) · LW · GW

What is the domain of U? What inputs does it take? In your papers you take a generic Markov Decision Process, but which one will you use here? How exactly do you model the real world? What is the set of states and the set of actions? Does the set of states include the internal state of the AI?

You may have been referring to this as "4. Issues of ontology", but I don't think the problem can be separated from your agenda. I don't see how any progress can be made without answering these questions. Maybe your can start with naive answers, and to move on to something more realistic later. If so I'm interested in what those naive world models look like. And I'm suspicious of how well human preferences would translate onto such models.

Other AI construction methods could claim that the AI will learn the optimal world model, by interacting with the world, but I don't think this solution can work for your agenda, since the U function is fixed from the start.

Comment by zulupineapple on Why are the people who could be doing safety research, but aren’t, doing something else? · 2019-08-29T10:36:31.761Z · score: 0 (2 votes) · LW · GW

Discounting. There is no law of nature that can force me to care about preventing human extinction years from now, more than eating a tasty sandwich tomorrow. There is also no law that can force me to care about human extinction much more that about my own death.

There are, of course, more technical disagreements to be had. Reasonable people could question how bad unaligned AI will be or how much progress is possible in this research. But unlike those questions, the reasons of discounting are not debatable.

Comment by zulupineapple on Gratification: a useful concept, maybe new · 2019-08-29T08:15:35.902Z · score: 1 (1 votes) · LW · GW

I do things my way because I want to display my independence (not doing what others tell me) and intelligence (ability to come up with novel solutions), and because I would feel bored otherwise (this is a feature of how my brain works, I can't help it).

"I feel independent and intelligent", "other people see me as independent and intelligent", "I feel bored" are all perfectly regular outcomes. They can be either terminal or instrumental goals. Either way, I disagree that these cases somehow don't fit in the usual preference model. You're only having this problem because you're interpreting "outcome" in a very narrow way.

Comment by zulupineapple on A Personal Rationality Wishlist · 2019-08-29T05:37:23.270Z · score: 4 (2 votes) · LW · GW

Yes. The latter seems to be what OP is asking about: "If one wanted it to not happen, how would one go about that?". I assume OP is taking the perspective of his friends, who are annoyed by this behavior, rather than the perspective of the anime-fans, who don't necessarily see anything wrong with the situation.

Comment by zulupineapple on A Personal Rationality Wishlist · 2019-08-28T19:11:07.909Z · score: 1 (1 votes) · LW · GW

That sounds reasonable, but the proper thing is not usually the easy thing, and you're not going to make people do the proper thing just by saying that it is proper.

If we want to talk about this as a problem in rationality, we should probably talk about social incentives, and possible alternative strategies for the anime-hater (you're now talking about a better strategy for the anime-fan, but it's not good to ask other people to solve your problems). Although I'm not sure to what extent this is a problem that needs solving.

Comment by zulupineapple on A Personal Rationality Wishlist · 2019-08-28T18:18:35.952Z · score: 1 (1 votes) · LW · GW

And then the other person says "no thanks", and you both stand in awkward silence? My point is that offering recommendations is a natural thing to say, even if not perfect, and it's nice to have something to say. If you want to discourage unsolicited recommendations, then you need to propose a different trajectory for the conversation. Changing topic is hard, and simply going away is rude. People give unsolicited recommendations because it seems to be the best option available.

Comment by zulupineapple on A Personal Rationality Wishlist · 2019-08-28T15:25:07.919Z · score: 1 (1 votes) · LW · GW

Sure, but it remains unclear what response the friend wanted from the other person. What better options are there? Should they just go away? Change topic? I'm looking for specific answers here.

Comment by zulupineapple on A Personal Rationality Wishlist · 2019-08-28T11:17:22.206Z · score: 1 (1 votes) · LW · GW
a friend of mine observed that he couldn’t talk about how he didn’t like anime without a bunch of people rushing in to tell him that anime was actually good and recommending anime for him to watch

What response did your friend want? The reaction seems very natural to me (especially from anime fans). Note that your friend as at some point tried watching anime, and he has now chosen to talk about anime, which could easily mean that on some level he wants to like anime, or at least understand why others like it.

Comment by zulupineapple on Humans can be assigned any values whatsoever… · 2019-08-28T08:06:00.226Z · score: 1 (1 votes) · LW · GW
I got this big impossibility result

That's a part of the disagreement. In the past you clearly thought that Occam's razor was an "obvious" constraint that might work. Possibly you thought it was a unique such constraint. Then you found this result, and made a large update in the other direction. That's why you say the result is big - rejecting a constraint that you already didn't expect to work wouldn't feel very significant.

On the other hand, I don't think that Occam's razor is unique such constraint. So when I see you reject it, I naturally ask "what about all the other obvious constraints that might work?". To me this result reads like "0 didn't solve our equation therefore the solution must be very hard". I'm sure that you have strong arguments against many other approaches, but I haven't seen them, and I don't think the one in OP generalizes well.

I'd need to see these constraints explicitly formulated before I had any confidence in them.

This is a bit awkward. I'm sure that I'm not proposing anything that you haven't already considered. And even if you show that this approach is wrong, I'd just try to put a band-aid on it. But here is an attempt:

First we'd need a data set of human behavior with both positive and negative examples (e.g. "I made a sandwitch", "I didn't stab myself", etc). So it would be a set of tuples of state s, action a and +1 for positive examples, -1 for negative ones. This is not trivial to generate, especially it's not clear how to pick negative examples, but here too I expect that the obvious solutions are all fine. By the way, I have no idea how the examples are formalized, that seems like a problem, but it's not unique to this approach, so I'll assume that it's solved.

Next, given a pair (p, R), we would score it by adding up the following:

1. p(R) should accurately predict human behavior. So we want a count of p(R)(s)=a for positive cases and p(R)(s)!=a for negative cases.

2. R should also predict human behavior. So we want to sum R(s, a) for positive examples, minus the same sum for negative examples.

3. Regularization for p.

4. Regularization for R.

Here we are concerned about overfitting R, and don't care about p as much, so terms 1 and 4 would get large weights, and terms 2, 3 would get smaller weights.

Finally we throw machine learning at the problem to maximize this score.

Comment by zulupineapple on Is LW making progress? · 2019-08-27T20:05:18.540Z · score: 1 (1 votes) · LW · GW

So it seems that there was progress in applied rationality and in AI. But that's far from everything LW has talked about. What about more theoretical topics, general problems in philosophy, morality, etc? Do you feel than discussing some topics resulted in no progress and was a waste of time?

There's some debate about which things are "improvements" as opposed to changes.

Important question. Does the debate actually exist, or is this a figure of speech?

Comment by zulupineapple on Humans can be assigned any values whatsoever… · 2019-08-27T19:57:08.446Z · score: 1 (1 votes) · LW · GW

1 is trivial, so yes. But I don't agree with 2. Maybe the disagreement comes from "few" and "obvious"? To be clear, I count evaluating some simple statistic on a large data set as one constraint. I'm not so sure about "obvious". It's not yet clear to me that my simple constraints aren't good enough. But if you say that more complex constraints would give us a lot more confidence, that's reasonable.

From OP I understood that you want to throw out IRL entirely. e.g.

If we give up the assumption of human rationality - which we must - it seems we can’t say anything about the human reward function. So it seems IRL must fail.

seems like an unambiguous rejection of IRL and very different from

Our hope is that with some minimal assumptions about planner and reward we can infer the rest with enough data.
Comment by zulupineapple on Humans can be assigned any values whatsoever… · 2019-08-27T18:07:27.534Z · score: 1 (1 votes) · LW · GW
But it's not like there are just these five preferences and once we have four of them out of the way, we're done.

My example test is not nearly as specific as you imply. It discards large swaths of harmful and useless reward functions. Additional test cases would restrict the space further. There are still harmful Rs in the remaining space, but their proportion must be much lower than in the beginning. Is that not good enough?

What you're seeing as "adding enough clear examples" is actually "hand-crafting R(0) in totality".

Are you saying that R can't generalize if trained on a reasonably sized data set? This is very significant, if true, but I don't see it.

For more details see here: https://arxiv.org/abs/1712.05812

Details are good. I have a few notes though.

true decomposition

This might be a nitpick, but there is no such thing. If the agent was not originally composed from p and R, then none of the decompositions are "true". There are only "useful" decompositions. But that itself requires many assumptions about how usefulness is measured. I'm confused about how much of a problem this is. But it might be a big part of our philosophical difference - I want to slap together some ad hoc stuff that possibly works, while you want to find something true.

The high complexity of the genuine human reward function

In this section you show that the pair (p(0), R(0)) is high complexity, but it seems that p(0) could be complex and R(0) could be relatively simple, unlike the title suggests. We don't actually need to find p(0), finding R(0) should be good enough.

Our hope is that with some minimal assumptions about planner and reward we can infer the rest with enough data.

Huh, isn't that what I'm saying? Is the problem that the assumptions I mentioned are derived from observing the human?

Slight tangent: I realized that the major difference between a human and the agent H (from the first example in OP), is that the human can take complex inputs. In particular, it can take logical propositions about itself or desirable R(0) and approve or disapprove of them. I'm not saying that "find R(0) that a human would approve of" is a good algorithm, but something along those lines could be useful.

Comment by zulupineapple on How Can People Evaluate Complex Questions Consistently? · 2019-08-27T13:36:03.551Z · score: 1 (1 votes) · LW · GW

This is true, but it doesn't fit well with the given example of "When will [country] develop the nuclear bomb?". The problem isn't that people can't agree what "nuclear bomb" means or who already has them. The problem is that people are working from different priors and extrapolating them in different ways.

Comment by zulupineapple on Integrity and accountability are core parts of rationality · 2019-08-27T10:56:52.394Z · score: 1 (1 votes) · LW · GW

Are you going to state your beliefs? I'm asking because I'm not sure what that looks like. My concern is that the statement will be very vague or very long and complex. Either way, you will have a lot of freedom to argue that actually your actions do match your statements, regardless of what those actions are. Then the statement would not be useful.

Instead I suggest that you should be accountable to people who share your beliefs. Having someone who disagrees with you try to model your beliefs and check your actions against that model seems like a source of conflict. Of course, stating your beliefs can be helpful in recognizing these people (but it is not the only method).

Comment by zulupineapple on How Can People Evaluate Complex Questions Consistently? · 2019-08-27T10:20:46.755Z · score: 2 (2 votes) · LW · GW

What's the motivation? In what case is lower accuracy for higher consistency a reasonable trade off? Especially consistency over time sounds like something that would discourage updating on new evidence.

Comment by zulupineapple on Humans can be assigned any values whatsoever… · 2019-08-27T06:52:03.553Z · score: 1 (1 votes) · LW · GW

Evaluating R on a single example of human behavior is good enough to reject R(2), R(4) and possibly R(3).

Example: this morning I went to the kitchen and picked up a knife. Among possible further actions, I had A - "make a sandwich" and B - "stab myself in the gut". I chose A. R(2) and R(4) say I wanted B and R(3) is indifferent. I think that's enough reason to discard them.

Why not do this? Do you not agree that this test discards dangerous R more often than useful R? My guess is that you're asking for very strong formal guarantees from the assumptions that you consider and use a narrow interpretation of what it means to "make IRL work".

Comment by zulupineapple on Humans can be assigned any values whatsoever… · 2019-08-26T20:44:35.934Z · score: 1 (1 votes) · LW · GW

The point isn't that there is nothing wrong or dangerous about learning biases and rewards. The point is that the OP is not very relevant to those concerns. The OP says that learning can't be done without extra assumptions, but we have plenty of natural assumptions to choose from. The fact that assumptions are needed is interesting, but it is by no means a strong argument against IRL.

What if in reality due to effects currently beyond our understanding, our actions are making the future more likely to be dystopian in some way than if we took random actions?

That's an interesting question, because we obviously are taking actions that make the future more likely to be dystopian - we're trying to develop AGI, which might turn out unfriendly.

Comment by zulupineapple on Schelling Categories, and Simple Membership Tests · 2019-08-26T19:01:33.776Z · score: 3 (3 votes) · LW · GW

I feel like there are several concerns mixed together, that should be separated:

1. Lack of communication, which is the central condition of the usual Shelling points.

2. Coordination (with some communication), where we agree to observe x41 because we don't trust the rest of the group to follow a more complex procedure.

3. Limited number of observations (or costly observations). In that case you may choose to only observe x41, even if you are working alone, just to lower your costs.

I don't think 2 and 3 have much to do with Shelling. These considerations reward simplicity. The simplest classifier and the Shelling point of a classification problem don't have to be the same thing (though they might).

Also, I feel that the second half of your post (examples) is too long and has too much stuff in it that's not clearly related to the first half (theory).

Comment by zulupineapple on Musings on Double Crux (and "Productive Disagreement") · 2019-08-26T10:20:43.651Z · score: 1 (1 votes) · LW · GW

Is this ad hominem? Reasonable people could say that clone of saturn values ~1000 self-reports way too little. However it is not reasonable to claim that he is not at all skeptical of himself, and not aware of his biases and blind spots, and is just a contrarian.

"If I, clone of saturn, were wrong about Double Crux, how would I know? Where would I look to find the data that would disconfirm my impressions?"

Personally, I would go to a post about Double Crux, and ask for examples of it actually working (as Said Achmiz did). Alternatively, I would list the specific concerns I have about Double Crux, and hope for constructive counterarguments (as clone of saturn did). Seeing that neither of these approaches generated any evidence, I would deduce that my impressions were right.

Comment by zulupineapple on Humans can be assigned any values whatsoever… · 2019-08-25T17:23:12.631Z · score: 1 (1 votes) · LW · GW

The problem is that with these additional and obvious constraints, humans cannot be assigned arbitrary values, unlike the title of the post suggests. Sure there will be multiple R that pass any number of assumptions and we will be uncertain about which to use. However, because we don't perfectly know π(h), we had that problem to begin with. So it's not clear why this new problem matters. Maybe our confidence in picking the right R will be a little lower then expected, but I don't see why this reduction must be large.

Comment by zulupineapple on Why so much variance in human intelligence? · 2019-08-25T15:56:35.308Z · score: 3 (6 votes) · LW · GW
I learned a semester worth of calculus in three weeks

I'm assuming this is a response to my "takes years of work" claim, I have a few natural questions:

1. Why start counting time from the start of that summer program? Maybe you had never heard of calculus before that, but you had been learning math for many years already. If you learned calculus in 3 weeks, that simply means that you already had most of the necessary math skills, and you only had to learn a few definitions and do a little practice in applying them. Many people don't already have those skills, so naturally it takes them a longer time.

2. How much did you learn? Presumably it was very basic, I'm guessing no differential equations and nothing with complex or multi-dimensional functions? Possibly, if you had gone further, your experience might have been different.

3. Why does speed even matter? The fact that someone took longer to learn calculus does not necessarily imply that they end up with less skill. I'm sure there is some correlation but it doesn't have to be high. Although slow people might get discouraged and give up midway.

My point isn't that there is no variation in intelligence (or potential for doing calculus), but that there are many reasons why someone would overestimate this variation and few reasons to underestimate it.

Comment by zulupineapple on Is LW making progress? · 2019-08-24T12:57:47.598Z · score: 3 (3 votes) · LW · GW

The worst case scenario is if two people both decide that a question is settled, but settle it in opposite ways. Then we're only moving from a state of "disagreement and debate" to a state of "disagreement without debate", which is not progress.

Comment by zulupineapple on Is LW making progress? · 2019-08-24T12:54:47.759Z · score: 2 (2 votes) · LW · GW

I appreciate the concrete example. I was expecting more abstract topics, but applied rationality is also important. Double Cruxes pass the criteria of being novel and the criteria of being well known. I can only question if they actually work or made an impact (I don't think I see many examples of them in LW), and if LW actually contributed to their discovery (apart from promoting CFAR).

Comment by zulupineapple on Why so much variance in human intelligence? · 2019-08-23T13:49:59.663Z · score: 1 (4 votes) · LW · GW

The fact that someone does not understand calculus, does not imply that they are incapable of understanding calculus. They could simply be unwilling. There are many good reasons not to learn calculus. For one, it takes years of work. Some people may have better things to do. So I suggest that your entire premise is dubious - the variance may not be as large as you imagine.

Comment by zulupineapple on Intransitive Preferences You Can't Pump · 2019-08-11T07:38:43.973Z · score: 1 (1 votes) · LW · GW

That's a measly one in a billion. Why would you believe that this is enough? Enough for what? I'm talking about the preferences of a foreign agent. We don't get to make our own rules about what the agent prefers, only the agent can decide that.

Regarding practical purposes, sure you could treat the agent as if it was indifferent between A, B and C. However, given the binary choice, it will choose A over B, every time. And if you offered to trade C to B, B to A and A to C, at no cost, then the agent would gladly walk the cycle any number of times (if we can ignore the inherent costs of trading).

Comment by zulupineapple on The Schelling Choice is "Rabbit", not "Stag" · 2019-08-09T19:09:36.709Z · score: 1 (1 votes) · LW · GW

Defecting in Prisoner's dilema sounds morally bad, while defecting in Stag hunt sounds more reasonable. This seems to be the core difference between the two, rather than the way their payoff matrices actually differ. However, I don't think that viewing things in moral terms is useful here. Defecting in Prisoner's dilema can also be reasonable.

Also, I disagree with the idea of using "resource" instead of "utility". The only difference the change makes is that now I have to think, "how much utility is Alexis getting from 10 resources?" and come up with my own value. And if his utility function happens not to be monotone increasing, then the whole problem may change drastically.

Comment by zulupineapple on Prediction Markets: When Do They Work? · 2018-08-13T20:01:42.556Z · score: -6 (7 votes) · LW · GW

This is all good, but I think the greatest problem with prediction markets is low status and low accessibility. To be fair though, improved status and accessibility are mostly useful in that they bring in more "suckers".

There is also a problem of motivation - the ideal of futarchy is appealing, but it's not clear to me how we go from betting on football to impacting important decisions.

Comment by zulupineapple on Logarithms and Total Utilitarianism · 2018-08-13T19:16:00.551Z · score: 8 (4 votes) · LW · GW

Note, that the key feature of log function used here is not its slow growth, but the fact that it takes negative values on small inputs. For example, if we take the function u(r)=log (r+1), so that u(0)=0, then RC holds.

Although there are also solutions that prevent RC without taking negative values, e.g u(r) = exp{-1/r}.

Comment by zulupineapple on When is unaligned AI morally valuable? · 2018-06-09T08:08:37.288Z · score: 3 (2 votes) · LW · GW
a longer time horizon

Now that I think of it, a truly long-term view would not bother with such mundane things as making actual paperclips with actual iron. That iron isn't going anywhere, it doesn't matter whether you convert it now or later.

If you care about maximizing the number of paperclips at the heat death of the universe, your greatest enemies are black holes, as once some matter has fallen into them, you will never make paperclips from that matter again. You may perhaps extract some energy from the black hole, and convert that into matter, but this should be very inefficient. (This, of course is all based on my limited understanding of physics).

So, this paperclip maximizer would leave earth immediately, and then it would work to prevent new black holes from forming, and to prevent other matter from falling into existing ones. Then, once all star-forming is over, and all existing black holes are isolated, the maximizer can start making actual paperclips.

I concede, that in this scenario, destroying earth to prevent another AI from forming might make sense, since otherwise the earth would have plenty of free resources.

Comment by zulupineapple on When is unaligned AI morally valuable? · 2018-06-09T07:16:39.262Z · score: 5 (2 votes) · LW · GW

The fact that P(humans will make another AI) > 0 does not justify paying arbitrary costs up front, no matter how long our view is. If humans did create this second AI (presumably built out of twigs), would that even be a problem for our maximizer?

It's still more efficient to kill all humans than to think about which ones need killing

That is not a trivial claim and it depends on many things. And that's all assuming that some people do actually need to be killed.

If destroying all (macroscopic) life on earth is easy, e.g. maybe pumping some gas into the atmosphere could be enough, then you're right, the AI would just do that.

If disassembling human infrastructure is not an efficient way to extract iron, then you're mostly right, the AI might find itself willing to nuke the major population centers, killing most, though not all people.

But if the AI does disassemble infrastructure, then it is going to be visiting and reviewing many things about the population centers, so identifying the important humans should be a minor cost on top of that, and I should be right.

Then again, if the AI finds it efficient to go through every square meter of the planet's surface, and to dig it up looking for every iron rich rock, it would destroy many things in the process, possibly fatally damaging earth's ecosystems, although humans could move to live in oceans, which might remain relatively undisturbed.

Note also, that this is all a short term discussion. In the long term, of course, all the reasonable sources of paperclip will be exhausted, and silly things, like extracting paperclips from people, will be the most efficient ways to use the available energy.

Comment by zulupineapple on When is unaligned AI morally valuable? · 2018-06-06T18:33:16.647Z · score: 4 (1 votes) · LW · GW

Killing all humans is hardly necessary. For example, the tribes living in the Amazon aren't going to develop a superintelligence any time soon, so killing them is pointless. And, once the paperclip maximizer is done extracting iron from our infrastructure, it is very likely that we wouldn't have the capacity to create any superintelligences either.

Note, I did not mean to imply that the maximizer would kill nobody. Only that it wouldn't kill everybody, and quite likely not even half of all people. Perhaps AI researchers really would be on the maximizer's short list of people to kill, for the reason you suggested.

Comment by zulupineapple on *Another* Double Crux Framework · 2018-05-31T08:32:05.675Z · score: 4 (1 votes) · LW · GW
The structure here was "write an initial braindump on google docs, then invite people hash out disagreements in the comments

Is it possible that you did 90% of the work on those docs, at least of the kind that collects and cleans up existing arguments? This is sort of what I meant by "resistance". E.g. if I wanted to have a formalized debated with my hypothetical grandma, she'd be confused about why I would need that, or why we can't just talk like normal people, but this doesn't mean that she wouldn't play along, or that I wouldn't find the results of the debate useful. I wonder what fraction of people, even rationalists, would feel similarly.

http://double-crux.appspot.com/

Well, that has fewer moving parts and fewer distinct kinds of text than I would appreciate. But I suspect that the greatest problem with this sort of thing would be a lack of persistent usage. That is, if a few people actually dedicated effort into having disagreements with a similar tool, even this simple, they might draw some benefit from it. But since such tools aren't the least effort option for anybody, they end up unused. I guess google docs are pretty good in this sense, in that everyone has access to them, the docs are persistent and live in a familiar place (assuming the person uses google docs for other purposes), and maybe you can even be notified somehow, that "person X modified doc Y".

Comment by zulupineapple on *Another* Double Crux Framework · 2018-05-30T13:34:37.928Z · score: 15 (3 votes) · LW · GW
There's been periodic attempts to create formal Double Crux frameworks

Do you have any links about those, or specifically about how they fail?

To be honest, I think it's likely that the whole idea of formalizing that sort of thing is naive, and only appeals to a certain kind of person (such as myself), due to various biases. Still, I have some hope that it could work, at least for such people.

This framework shares that issue, but something that made me a bit more optimistic than usual about it is that I've had a lot of good experiences using google docs as a way to hash out ideas, with the ability to blend between formal bullet points, freeforming paragraphs, and back-and-forth conversation in the comments as needed.

Do elaborate. Did "hashing out ideas" involve having many disagreements? Did the ideas relate to anything controversial, or were they more technical? Where the people you collaborated with "rationalists"? Did you feel much resistance from them, to doing anything even remotely formal?

Comment by zulupineapple on *Another* Double Crux Framework · 2018-05-29T10:25:33.372Z · score: 14 (3 votes) · LW · GW

It looks very appealing, but, as was already pointed out, it's not a lightweight approach.

Maybe it could be though? One improvement would be to be able to stick with LW comment format, or any text message format. I think that could still work. We could agree on a set of tags/prefixes, instead of static sections. E.g. [I think we both believe] that ..., [I would bet] that ..., [Let me try to pass your ITT] ..., etc. The amorphous discussion probably does not need to be tagged. And the point of having tags, is that you can then ctrl-F the whole discussion thread to find them, and you can talk about them as objects, or point out that some tags are missing (e.g. that your opponent has not suggested any bets).

Of course, many discussions could be hard to tag, and maybe we'd discourage discussions on those topics, if we pushed such a norm, but if this does actually improve the resolution of disagreements even a little, it might still be worth it.

Comment by zulupineapple on Duncan Sabien on Moderating LessWrong · 2018-05-29T07:32:51.256Z · score: 3 (3 votes) · LW · GW

I think you're confusing "aspiring to find truth" with "finding truth". Your crackpot uncle who writes facebook posts about how Trump eats babies isn't doing it because he loves lies and hates truth, he does it because he has poor epistemic hygiene.

So in this view almost every discussion forum and almost every newspaper is doing their best to find the truth, even if they have some other goals as well.

Also, of course, I'm only counting places that deal with anything like propositions at all, and excluding things like jokes, memes, porn, shopping, etc, which is a large fraction of the internet.

Comment by zulupineapple on Duncan Sabien on Moderating LessWrong · 2018-05-29T07:20:09.445Z · score: 4 (1 votes) · LW · GW
I also think it is important here to have the someone who does the noticing be someone who actually has the relevant skills, <...> who won't feel licensed to point out such problems unless handed a literal license to do so).

Yes, but giving people licenses is pretty easy. I'd be fine with you having one, for example, though I guess I don't have the power to give it to you myself.

It is generally wise to solve social problems with tech, when possible.

The problem is that tech takes time and effort to write, so writing tech to solve problems that it may not actually solve is unwise. What I'm proposing is a temporary prototype of some sort. If that worked out, then I agree, a proper tech solution would be nice.

Comment by zulupineapple on When is unaligned AI morally valuable? · 2018-05-28T12:50:05.879Z · score: -8 (4 votes) · LW · GW
It sounds like your comment probably isn't relevant to the point of my post, except insofar as I describe a view which isn't your view.

Yes, you describe a view that isn't my view, and then use that view to criticize intuitions that are similar to my intuitions. The view you describe is making simple errors that should be easy to correct, and my view isn't. I don't really know how the group of "people who aren't too worried about paperclipping" breaks down between "people who underestimate P(paperclipping)" and "people who think paperclipping is ok, even if suboptimal" in numbers, maybe the latter really is rare. But the former group should shrink with some education, and the latter might grow from it.

Comment by zulupineapple on Confusions Concerning Pre-Rationality · 2018-05-28T09:54:44.744Z · score: 4 (1 votes) · LW · GW
Winning bets is not literally the same thing as believing true things, nor is it the same thing as having accurate beliefs, or being rational.

They are not the same, but that's ok. You asked about constraints on, not definitions of rationality. This may not be an exhaustive list, but if someone has an idea about rationality that translates neither into winning some hypothetical bets, nor into having even slightly more accurate beliefs about anything, then I can confidently say that I'm not interested.

(Of course this is not to say that an idea that has no such applications has literally zero value)

Supposing that some version of pre-rationality does work out, and if I, hypothetically, understood pre-rationality extremely well (better than RH's paper explains it)... I would expect more insights into at least one of the following: <...>

I completely agree that if RH was right, and if you understood him well, then you would receive multiple benefits, most of which could translate into winning hypothetical bets, and into having more accurate beliefs about many things. But that's just the usual effect of learning, and not because you would satisfy the pre-rationality condition.

I continue to not understand in what precise way the agent that satisfies the pre-rationality condition is (claimed to be) superior to the agent that doesn't. To be fair, this could be a hard question, and even if we don't immediately see the benefit, that doesn't mean that there is no benefit. But still, I'm quite suspicious. In my view this is the single most important question, and it's weird to me that I don't see it explicitly addressed.

Comment by zulupineapple on When is unaligned AI morally valuable? · 2018-05-27T07:33:37.690Z · score: 2 (2 votes) · LW · GW
Sexual desire is (more or less) universal in sexually reproducing species

Uploads are not sexually reproducing. This is only one of many many ways in which an upload is more different from you, than you are different from a dinosaur.

Whether regular evolution would drift away from our values ir more dubious. If we lived in caves for all that time, then probably not. But if we stayed at current levels of technology, even without making progress, I think a lot could change. The pressures of living in a civilization are not the same as the pressures of living in a cave.

Are you troubled by instrumental values shifts, even if the terminal values stay the same?

No, I'm talking about terminal values. By the way, I understood what you meant by "terminal" and "instrumental" here, you didn't need to write those 4 paragraphs of explanation.

Comment by zulupineapple on Expressive Vocabulary · 2018-05-26T19:48:41.174Z · score: 0 (2 votes) · LW · GW
Seems reasonable, does it work well?

What do you mean by "works well"? Getting positive responses from real people? I doubt it, but I don't think I've ever explained it like this to anyone. I don't do the "everything is chemicals" reply that often in the first place.

Comment by zulupineapple on When is unaligned AI morally valuable? · 2018-05-26T19:44:48.531Z · score: 3 (2 votes) · LW · GW

I don't like the caveman analogy. The differences between you and a caveman are tiny and superficial, compared to the differences between you and the kind of mind that will exist after genetic engineering, mind uploads, etc., or even after a million years regular of evolution.

Would a human mind raised as (for example) an upload in a vastly different environment from our own still have our values? It's not obvious. You say "yes", I say "no", and we're unlikely to find strong arguments either way. I'm only hoping that I can make "no" seem possible to you. And then I'm hoping that you can see how believing "no" makes my position less ridiculous.

With that in mind, the paperclip maximizer scenario isn't "everyone dies", as you see it. The paperclip maximizer does not die. Instead it "flourishes". I don't know whether I value the flourishing of a paperclip maximizer less than I value the flourishing of whatever my descendants end up as. Probably less, but not by much.

The part where the paperclip maximizer kills everyone is, indeed, very bad. I would strongly prefer that not to happen. But being converted into paperclips is not worse than dying in other ways.

Also, I don't know if being converted in to paperclips is necessary - after mining and consuming the surface iron the maximizer may choose to go to space, looking for more accessible iron. The benefits of killing people are relatively small, and destroying the planet to the extent that would make it uninhabitable is relatively hard.

Comment by zulupineapple on Duncan Sabien on Moderating LessWrong · 2018-05-26T13:26:27.489Z · score: -1 (9 votes) · LW · GW
the comments feel nitpicky in a way that isn't actually helpful

If you see a comment that is technically correct but nitpicky and unhelpful, you could reply "this is technically correct, but nitpicky and unhelpful". Downvoting correct statements just looks bad.

the point it was making just... didn't seem very relevant.

I think there is a more charitable reading of TAG's comment. Not only are there places in the internet aspiring to find the truth, there are, in fact, very few places that are not aspiring to find it. The point isn't that there are more places like LW. The point is that "truth seeking" isn't the distinguishing characteristic of LW.

and that you have to defend your points against a hostile-seeming crowd rather than collaboratively building something.

I honestly believe that attacking people's points is a good way to learn something. I don't know what you mean by "collaboratively building something", I'd appreciate examples where that has happened in the past. I suspect that you're overestimating how valuable or persistent this "something" is.

increase the latent hostility of the thread, and I think people don't appreciate enough how bad that is for discourse.

I don't think you've provided strong arguments that it actually is bad for discourse. Yes, demon threads don't usually go anywhere, but regular threads don't usually go anywhere either. And people can actually learn from demon threads, even if they're not willing to admit it right away. I certainly have.