Posts

Matt Levine spots IRL Paperclip Maximizer in Reddit 2021-09-23T19:10:16.546Z
Why would "necro-ing" be a bad idea? 2019-06-28T02:21:43.537Z

Comments

Comment by Nebu on Doomsday Argument and the False Dilemma of Anthropic Reasoning · 2024-09-10T12:05:15.238Z · LW · GW

Imagine someone named Omega offers to play a game with you. Omega has a bag, and they swear on their life that exactly one of the following statements is true:

  1. They put a single piece of paper in the bag, and it has "1" written on it.
  2. They put 10 trillion pieces of paper in the bag, numbered "1", "2", "3", etc. up to ten trillion.

Omega then has an independent neutral third party reach into the bag and pull out a random piece of paper which they then hand to you. You look at the piece of paper and it says "1" on it. Omega doesn't get to look at the piece of paper, so they don't know what number you saw on that paper.

Now the game Omega propose to you is: If you can guess which of the two statements was the true one, they'll give you a million dollars. Otherwise, you get nothing.

Which do you guess? Do you guess that the bag had a single piece of paper in it, or do you guess that the bag had 10 trillion pieces of paper in it?

Comment by Nebu on Would you like me to debug your math? · 2021-06-11T18:05:38.873Z · LW · GW

as they code I notice nested for loops that could have been one matrix multiplication.

 

This seems like an odd choice for your primary example.

  • Is the primary concern that a sufficiently smart compiler could take your matrix multiplication and turn it into a vectorized instruction?
    • Is it only applicable in certain languages then? E.g. do JVM languages typically enable vectorized instruction optimizations?
  • Is the primary concern that a single matrix multiplication is more maintainable than nested for loops?
    • Is it only applicable in certain domains then (e.g. machine learning)? Most of my data isn't modelled as matrices, so would I need some nested for loops anyway to populate a matrix to enable this refactoring?

Is it perhaps worth writing a (short?) top level post with an worked out example of the refactoring you have in mind, and why matrix multiplication would be better than nested for loops?
 

Comment by Nebu on ‘Maximum’ level of suffering? · 2020-07-21T13:29:29.506Z · LW · GW

For something to experience pain, some information needs to exist (e.g. in the mind of the sufferer, informing them that they are experiencing pain). There are known information limits, e.g. https://en.wikipedia.org/wiki/Bekenstein_bound or https://en.wikipedia.org/wiki/Landauer%27s_principle

These limits are related to entropy, space, energy, etc., so if you further assume the universe is finite (or perhaps equivalently, that the malicious agent can only access a finite portion of the universe due to e.g. speed-of-light limits), then there is an upon bound of information possible, which implies an upper bound of pain possible.

Comment by Nebu on Operationalizing Newcomb's Problem · 2020-01-18T04:15:59.249Z · LW · GW

Yeah, which I interpret to mean you'd "lose" (where getting $10 is losing and getting $200 is winning). Hence this is not a good strategy to adopt.

Comment by Nebu on Operationalizing Newcomb's Problem · 2020-01-18T04:11:02.074Z · LW · GW
99% of the time for me, or for other people?

99% for you (see https://wiki.lesswrong.com/wiki/Least_convenient_possible_world )

More importantly, when the fiction diverges by that much from the actual universe, it takes a LOT more work to show that any lessons are valid or useful in the real universe.

I believe the goal of these thought experiments is not to figure out whether you should, in practice, sit in the waiting room or not (honestly, nobody cares what some rando on the internet would do in some rando waiting room).

Instead, the goal is to provide unit tests for different proposed decision theories as part of research on developing self modifying super intelligent AI.

Comment by Nebu on 12020: a fine future for these holidays · 2019-12-25T23:55:52.457Z · LW · GW

Any recommendations for companies that can print and ship the calendar to me?

Comment by Nebu on Operationalizing Newcomb's Problem · 2019-12-12T20:24:45.785Z · LW · GW

Okay, but then what would you actually do? Would you leave before the 10 minutes is up?

Comment by Nebu on Operationalizing Newcomb's Problem · 2019-12-12T20:24:02.045Z · LW · GW
why do I believe that it's accuracy for other people (probably mostly psych students) applies to my actions?

Because historically, in this fictional world we're imagining, when psychologists have said that a device's accuracy was X%, it turned out to be within 1% of X%, 99% of the time.

Comment by Nebu on Overcoming Akrasia/Procrastination - Volunteers Wanted · 2019-08-02T03:11:07.197Z · LW · GW

I really should get around to signing up for this, but...

Comment by Nebu on How much background technical knowledge do LW readers have? · 2019-08-02T02:42:22.055Z · LW · GW

Seems like the survey is now closed, so I cannot take the survey at the moment I see the post.

Comment by Nebu on [deleted post] 2019-06-29T19:06:17.994Z
suppose Bob is trying to decide to go left or right at an intersection. In the moments where he is deciding to go either left or right, many nearly identical copies in nearly identical scenarios are created. They are almost entirely all the same, and if one Bob decides to go left, one can assume that 99%+ of Bobs made the same decision.

I don't think this assumption is true (and thus perhaps you need to put more effort into checking/arguing its true, if the rest of your argument relies on this assumption). In the moments where Bob is trying to decide whether to go either left or right, there is no apriori reason to believe he would choose one side over the other -- he's still deciding.

Bob is composed of particles with quantum properties. For each property, there is no apriori reason to assume that those properties (on average) contribute more strongly to causing Bob to decide to go left vs to go right.

For each quantum property of each particle, an alternate universe is created where that property takes on some value. In a tiny (but still infinite) proportion of these universe, "something weird" happens, like Bob spontaneously disappears, or Bob spontaneously becomes Alice, or the left and right paths disappear leaving Bob stranded, etc. We'll ignore these possibilities for now.

Of the remaining "normal" universes, the properties of the particles have proceeded in such a way to trigger Bob to think "I should go Left", and in other "normal" universes, the properties of the particles have proceeded in such a way to trigger Bob to think "I should go Right". There is no apriori reason to think that the proportion of the first type of universe is higher or lower probability than the proportion of the second type of universe. That is, being maximally ignorant, you'd expect about 50% of Bobs to go left, and 50% to go right.

Going a bit more meta, if MWI is true, then decision theory "doesn't matter" instrumentally to any particular agent. No matter what arguments you (in this universe) provide for one decision theory being better than another, there exists an alternate universe where you argue for a different decision theory instead.

Comment by Nebu on The Cacophony Hypothesis: Simulation (If It is Possible At All) Cannot Call New Consciousnesses Into Existence · 2019-05-02T05:36:13.755Z · LW · GW

I see some comments hinting at towards this pseudo-argument, but I don't think I saw anyone make it explicitly:

Say I replace one neuron in my brain with a little chip that replicates what that neuron would have done. Say I replace two, three, and so on, until my brain is now completely artificial. Am I still conscious, or not? If not, was there a sudden cut-off point where I switched from conscious to not-conscious, or is there a spectrum and I was gradually moving towards less and less conscious as this transformation occurred?

If I am still conscious, what if we remove my artificial brain, put it in a PC case, and just let it execute? Is that not a simulation of me? What if we pause the chips, record each of their exact states, and instantiate those same states in another set of chips with an identical architecture?

If there consciousness is a spectrum instead of a sudden cut off point, how confident are we that "simulations" of the type that you're claiming are "not" (as in 0) conscious, aren't actually 0.0001 conscious?

Comment by Nebu on [deleted post] 2019-04-22T09:35:16.135Z

I played the game "blind" (i.e. I avoided reading the comments before playing) and was able to figure it out and beat the game without ever losing my ship. I really enjoyed it. The one part that I felt could have been made a lot clearer was that the "shape" of the mind signals how quickly they move towards your ship; I think I only figured that out around level 3 or so.

Comment by Nebu on Has "politics is the mind-killer" been a mind-killer? · 2019-04-06T03:33:01.715Z · LW · GW
I'm not saying this should be discussed on LessWrong or anywhere else.

You might want to lead with that, because there have been some arguments in the last few days that people should repeal the "Don't talk about politics" rule on a rationality-focused Facebook group, and I thought you were trying to argue for in favor of repealing those rules.

But I'm saying that the impact of this article and broader norm within the rationalsphere made me think in these terms more broadly. There's a part of me that wishes I'd never read it in the first place.

https://slatestarcodex.com/2013/06/09/all-debates-are-bravery-debates/

https://slatestarcodex.com/2014/03/24/should-you-reverse-any-advice-you-hear/

For some people "talk less about politics" is the right advice, and for other people "talk more about politics" might be the right advice. FWIW, in my experience, a lot of the people I see talking about politics should not be talking about politics (if their goal is to improve their rationality).

Comment by Nebu on [Question] Tracking accuracy of personal forecasts · 2019-03-21T04:57:08.546Z · LW · GW

From a brief skim (e.g. "A Democratic candidate other than Yang to propose UBI before the second debate", "Maduro ousted before end of 2019", "Donald Trump and Xi Jinping to meet in March 2019", etc.), this seems to be focused on "non-personal" (i.e. global) events, whereas my understanding is the OP is interested in tracking predictions for personal events.

Comment by Nebu on [Question] Tracking accuracy of personal forecasts · 2019-03-21T04:55:55.720Z · LW · GW

Spreadsheet sounds "good enough" if you're not sure you even want to commit to doing this.

That said, I'm "mildly interested" in doing this, but I don't really have inspiration for questions I'd like to make predictions on. I'm not particularly interested in doing predictions about global events and would rather make predictions about personal events. I would like a site that lets me see other people's personal predictions (really, just their questions they're prediction an answer to -- I don't care about their actual answers), so that I can try to make the same predictions about my life. So for example, now that I've seen you've submitted a prediction for "Will my parents die this year?", I don't know what your answer is, but I can come up with my own answer to the question of whether or not *my* parents will die this year.

Comment by Nebu on Pedagogy as Struggle · 2019-03-09T05:26:00.998Z · LW · GW

I think this technique only works for one-on-one (or a small group), live interactions. I.e. it doesn't work well for online writings.

The two components that are important for ensuring this technique is successful is:

1. You should tailor the confusion to the specific person you're trying to teach.

2. You have to be able to detect when the confusion is doing more damage than good, and abort it if necessary.

Comment by Nebu on Alignment Newsletter #43 · 2019-02-19T05:42:44.385Z · LW · GW
Note: I'm not sure if at the beginning of the game, one of the agents [of AlphaStar] is chosen according to the Nash probabilities, or if at each timestep an action is chosen according to the Nash probabilities.

It's the former. During the video demonstration, the pro player remarked how after losing game 1, in game 2 he went for a strategy that would counter the strategy AlphaStar used in game 1, only to find AlphaStar had used a completely different strategy. The AlphaStar representatives responded saying there's actually 5 AlphaStar agents that form the Nash Equilibrium, and he played one of them during game 1, and then played a different one during game 2.

And in fact, they didn't choose the agents by the Nash probabilities. Rather, they did a "best of 5" tournament, and they just had each of the 5 agents play one game. The human player did not know this, and thus could not on the 5th game know ahead of time by process of elimination that there was only 1 remaining agent possible, and thus know what strategy to use to counter it.

Comment by Nebu on The E-Coli Test for AI Alignment · 2019-01-01T05:58:33.937Z · LW · GW

I'm assuming you think wireheading is a disastrous outcome for a super intelligent AI to impose on humans. I'm also assuming you think if bacteria somehow became as intelligent as humans, they would also agree that wireheading would be a disastrous outcome for them, despite the fact that wireheading is probably the best solution that can be done given how unsophisticated their brains are. I.e. the best solution for their simple brains would be considered disastrous by our more complex brains.

This suggests the possibility that maybe the best solution that can be applied to human brains would be considered disastrous for a more complex brain imagining that humans somehow became as intelligent as them.

Comment by Nebu on [deleted post] 2018-11-06T06:20:16.293Z

I feel like this game has the opposite problem of 2-4-6. In 2-4-6, it's very easy to come up with a hypothesis that appear to work with every set of test cases you come up with, and thus become overconfident in your hypothesis.

In your game, I had trouble coming up with any hypothesis that would fit the test cases.

Comment by Nebu on No Really, Why Aren't Rationalists Winning? · 2018-11-04T23:36:18.470Z · LW · GW

Yeah, but which way is the arrow of causality here? Like, was he already a geeky intellectual, and that's why he's both good at calculus/programming and he reads SSC/OB/LW? Or was he "pretty average", started reading SSC/OB/LW, and then that made him become good at calculus/programming?

Comment by Nebu on What To Do If Nuclear War Seems Imminent · 2018-10-06T04:40:13.738Z · LW · GW

Would any "participants in nuclear war" (for lack of a better term) be interested in killing escaping rich westerners?

Comment by Nebu on Do what we mean vs. do what we say · 2018-09-10T05:43:30.900Z · LW · GW
Just don't ask your AI system to optimize for general and long-term preferences without a way for you to say "actually, stop, I changed my mind".

I believe that reduces to "solve the Friendly AI problem".

Comment by Nebu on Ontological uncertainty and diversifying our quantum portfolio · 2018-08-10T06:37:29.849Z · LW · GW

It's not clear to me that for all observers in our universe, there'd be a distinction between "a surgeon from a parallel universe suddenly appears in our universe, and that surgeon has memories of existing in a universe parallel to the one he now finds himself in." vs "a surgeon, via random quantum fluctuations, suddenly appears in our universe, and that surgeon has memories of existing in a universe parallel to the one he now finds himself in."

In your example, rather than consider all infinitely many parallel universes, you chose to consider 10 specific universes where a surgeon appears and "claims" to have come from a parallel universe, and saves copies of himself.

Even in a multiverse where travel between different quantum parallel universes is impossible, you can still find 10 universes where a surgeon appears and "claims" to have come from a parallel universe, and saves copies of himself. You can, in fact, find infinitely many universes where that happens, without requiring any travel between universes.

Comment by Nebu on GAZP vs. GLUT · 2016-08-11T04:07:52.652Z · LW · GW

So does that mean a GLUT in the zombie world cannot be conscious, but a GLUT in our world (assuming infinite storage space, since apparently we were able to assume that for the zombie world) can be conscious?

Comment by Nebu on JFK was not assassinated: prior probability zero events · 2016-04-28T03:09:04.899Z · LW · GW

suppose that we (or Omega, since we're going to assume nigh omniscience) asked the person whether JFK was murdered by Lee Harvey Oswald or not, and if they get it wrong, then they are killed/tortured/dust-specked into oblivion/whatever.

Okay, but what is the utility function Omega is trying to optimize?

Let's say you walk up to Omega, tell it "was JFK murdered by Lee Harvey Oswald or not? And by the way, if you get this wrong, I am going to kill you/torture you/dust-spec you."

Unless we've figured out how to build safe oracles, with very high probability, Omega is not a safe oracle. Via https://arbital.com/p/instrumental_convergence/, even though Omega may or may not care if it gets tortured/dust-speced, we can assume it doesn't want to get killed. So what is it going to do?

Do you think it's going to tell you what it thinks is the true answer? Or do you think it's going to tell you the answer that will minimize the risk of it getting killed?

Comment by Nebu on The Fable of the Burning Branch · 2016-04-09T20:41:37.189Z · LW · GW

I also inferred rape from the story. It was the part about how in desperation, he reached out and grabbed at her ankle. And then he was imprisoned in response to that.

Comment by Nebu on Reply to Holden on 'Tool AI' · 2016-02-17T11:28:40.459Z · LW · GW

But what then makes it recommend a policy that we will actually want to implement?

First of all, I'm assuming that we're taking as axiomatic that the tool "wants" to improve itself (or else why would it have even bothered to consider recommending that it be modified to improve itself?); i.e. improving itself is favorable according to its utility function.

Then: It will recommend a policy that we will actually want to implement, because its model of the universe includes our minds and it can see that if it recommends a policy we will actually want to implement leads it to a higher ranked state in its utility function.

Comment by Nebu on Reply to Holden on 'Tool AI' · 2016-02-17T11:24:41.376Z · LW · GW

To steelman the parent argument a bit, a simple policy can be dangerous, but if an agent proposed a simple and dangerous policy to us, we probably would not implement it (since we could see that it was dangerous), and thus the agent itself would not be dangerous to us.

If the agent were to propose a policy that, as far as we could tell, appears safe, but was in fact dangerous, then simultaneously:

  1. We didn't understand the policy.
  2. The agent was dangerous to us.
Comment by Nebu on Reply to Holden on 'Tool AI' · 2016-02-17T11:15:09.957Z · LW · GW

Can you be a bit more specific in your interpretation of AIXI here?

Here are my assumptions, let me know where you have different assumptions:

  • Traditional-AIXI is assumed to exists in the same universe as the human who wants to use AIXI to solve some problem.
  • Traditional-AIXI has a fixed input channel (e.g. it's connected to a webcam, and/or it receives keyboard signals from the human, etc.)
  • Traditional-AIXI has a fixed output channel (e.g. it's connected to a LCD monitor, or it can control a robot servo arm, or whatever).
  • The human has somehow pre-provided Traditional-AIXI with some utility function.
  • Traditional-AIXI operates in discrete time steps.
  • In the first timestep that elapses since Traditional-AIXI is activated, Traditional-AIXI examines the input it receives. It considers all possible programs that take pair (S, A) and emits an output P, where S is the prior state, A is an action to take, and P is the predicted output of taking the action A in state S. Then it discards all programs that would not have produced the input it received, regardless of what S or A it was given. Then it weighs the remaining program according to their Kolmorogov complexity. This is basically the Solomonoff induction step.
  • Now Traditional-AIXI has to make a decision about an output to generate. It considers all possible outputs it could produce, and feeds it to the programs under consideration, to produce a predicted next time step. Traditional-AIXI then calculates the expected utility of each output (using its pre-programmed utility function), picks the one with the highest utility, and emits that output. Note that it has no idea how any of its outputs would the universe, so this is essentially a uniformly random choice.
  • In the next timestep, Traditional-AIXI reads its inputs again, but this time taking into account what output it has generated in the previous step. It can now start to model correlation, and eventually causation, between its input and outputs. It has a previous state S and it knows what action A it took in its last step. It can further discard more programs, and narrow the possible models that describes the universe it finds itself in.

How does Tool-AIXI work in contrast to this? Holden seems to want to avoid having any utility function pre-defined at all. However, presumably Tool-AIXI still receives inputs and still produces outputs (probably Holden intends not to allow Tool-AIXI to control a robot servo arm, but he might intend for Tool-AIXI to be able to control an LCD monitor, or at the very least, produce some sort of text file as output).

Does Tool-AIXI proceed in discrete time steps gathering input? Or do we prevent Tool-AIXI from running until a user is ready to submit a curated input to Tool-AIXI? If the latter, how quickly to we expect Tool-AIXI to be able to formulate an reasonable model of our universe?

How does Tool-AIXI choose what output to produce, if there's no utility function?

If we type in "Tool-AIXI, please give me a cure for cancer" onto a keyboard attached to Tool-AIXI and submit that as an input, do we think that a model that encodes ASCII, the English language, bio-organisms, etc. has a lower kolmogorov complexity than a model that says "we live in a universe where we receive exactly this hardcoded stream of bytes"?

Does Tool-AIXI model the output it produces (whether that be pixels on a screen, or bytes to a file) as an action, or does it somehow prevent itself from modelling its output as if it were an action that had some effect on the universe that it exists in? If the former, then isn't this just an agenty Oracle AI? If the latter, then what kind of programs is it generate for its model (surely not programs that take (S, A) pairs as inputs, or else what would it use for A when evaluating its plans and predicting the future)?

Comment by Nebu on Reply to Holden on 'Tool AI' · 2016-02-17T10:12:43.456Z · LW · GW

I think LearnFun might be informative here. https://www.youtube.com/watch?v=xOCurBYI_gY

LearnFun watches a human play an arbitrary NES games. It is hardcoded to assume that as time progresses, the game is moving towards a "better and better" state (i.e. it assumes the player's trying to win and is at least somewhat effective at achieving its goals). The key point here is that LearnFun does not know ahead of time what the objective of the game is. It infers what the objective of the game is from watching humans play. (More technically, it observes the entire universe, where the entire universe is defined to be the entire RAM content of the NES).

I think there's some parallels here with your scenario where we don't want to explicitly tell the AI what our utility function is. Instead, we're pointing to a state, and we're saying "This is a good state" (and I guess either we'd explicitly tell the AI "and this other state, it's a bad state" or we assume the AI can somehow infer bad states to contrast the good states from), and then we ask the AI to come up with a plan (and possibly execute the plan) that would lead to "more good" states.

So what happens? Bit of a spoiler, but sometimes the AI seems to make a pretty good inference for what the utility function a human would probably have had for a given NES game, but sometimes it makes a terrible inference. It never seems to make a "perfect" inference: the even in its best performance, it seems to be optimizing very strange things.

The other part of it is that even if it does have a decent inference for the utility function, it's not always good at coming up with a plan that will optimize that utility function.

Comment by Nebu on Reply to Holden on 'Tool AI' · 2016-02-17T09:35:03.997Z · LW · GW

The analogy with cryptography is an interesting one, because...

In cryptography, even after you've proven that a given encryption scheme is secure, and that proof has been centuply (100 times) checked by different researchers at different institutions, it might still end up being insecure, for many reasons.

Examples of reasons include:

  • The proof assumed mathematical integers/reals, of which computer integers/floating point numbers are just an approximation.
  • The proof assumed that the hardware the algorithm would be running on was reliable (e.g. a reliable source of randomness).
  • The proof assumed operations were mathematical abstractions and thus exist out of time, and thus neglected side channel attacks which measures how long a physical real world CPU took to execute a the algorithm in order to make inferences as to what the algorithm did (and thus recover the private keys).
  • The proof assumed the machine executing the algorithm was idealized in various ways, when in fact a CPU emits heat other electromagnetic waves, which can be detected and from which inferences can be drawn, etc.
Comment by Nebu on The Pascal's Wager Fallacy Fallacy · 2016-02-01T04:18:06.262Z · LW · GW

Also, if you're going to measure information content, you really need to fix a formal language first, or else "the number of bits needed to express X" is ill-defined.

Basically, learn model theory before trying to wield it.

I don't know model theory, but isn't the crucial detail here whether or not the number of bits needed to express X is finite or infinite? If so, then it seems we can handwave the specific formal language we're using to describe X, in the same way that we can handwave what encoding for Turing Machines generally when talking about Kolmogorov complexity, even though to actually get a concrete integer K(S) representing the Kolmogorovo complexity of a string S requires us to use a fixed encoding of Turing Machines. In practice, we never actually care what the number K(S) is.

Comment by Nebu on How to Not Lose an Argument · 2016-01-24T22:47:50.343Z · LW · GW

I can't directly observe Eliezer winning or losing, but I can make (perhaps very weak) inferences about how often he wins/loses given his writing.

As an analogy, I might not have the opportunity to play a given videogame ABC against a given blogger XYZ that I've never met and will never meet. But if I read his blog posts on ABC strategies, and try to apply them when I play ABC, and find that my win-rate vastly improves, I can infer that XYZ also probably wins often (and probably wins more often than I do).

Comment by Nebu on The Number Choosing Game: Against the existence of perfect theoretical rationality · 2016-01-24T22:44:33.390Z · LW · GW

I guess I'm asking "Why would a finite-universe necessarily dictate a finite utility score?"

In other words, why can't my utility function be:

  • 0 if you give me the entire universe minus all the ice cream.
  • 1 if you give me the entire universe minus all the chocolate ice cream.
  • infinity if I get chocolate ice cream, regardless of how much chocolate ice cream I receive, and regardless of whether the rest of the universe is included with it.
Comment by Nebu on Greg Egan disses stand-ins for Overcoming Bias, SIAI in new book · 2016-01-24T22:28:00.540Z · LW · GW

I suspect that if we're willing to say human minds are Turing Complete[1], then we should also be willing to say that an ant's mind is Turing Complete. So when imagining a human with a lot of patience and a very large notebook interacting with a billion year old alien, consider an ant with a lot of patience and a very large surface area to record ant-pheromones upon, interacting with a human. Consider how likely it is that human would be interested in telling the ant things it didn't yet know. Consider what topics the human would focus on telling the ant, and whether it might decide to hold back on some topics because it figures the ant isn't ready to understand those concepts yet. Consider whether it's more important for the patience to lie within the ant or within the human.

1: I generally consider human minds to NOT be Turing Complete, because Turing Machines have infinite memory (via their infinite tape), whereas human minds have finite memory (being composed of a finite amount of matter). I guess Egan is working around this via the "very large notebook", which is why I'll let this particular nitpick slide for now.

Comment by Nebu on Very Basic Model Theory · 2016-01-22T03:58:04.224Z · LW · GW

Why not link to the books or give their ISBNs or something?

There are at least two books on model theory by Hodges: ISBN:9780521587136 and ISBN:9780511551574

Comment by Nebu on Naturalism versus unbounded (or unmaximisable) utility options · 2016-01-05T08:50:07.842Z · LW · GW

Why would we give the AI a utility function that assigns 0 utility to an outcome where we get everything we want but it never turns itself off?

The designer of that AI might have (naively?) thought this was a clever way of solving the friendliness problem. Do the thing I want, and then make sure to never do anything again. Surely that won't lead to the whole universe being tiled with paperclips, etc.

Comment by Nebu on The Number Choosing Game: Against the existence of perfect theoretical rationality · 2016-01-05T07:56:46.568Z · LW · GW

Alternately, letting "utility" back in, in a universe of finite time, matter, and energy, there does exist a maximum finite utility which is the sum total of the time, matter, and energy in the universe.

Why can't my utility function be:

  • 0 if I don't get ice cream
  • 1 if I get vanilla ice cream
  • infinity if I get chocolate ice cream

?

I.e. why should we forbid a utility function that returns infinity for certain scenarios, except insofar that it may lead to the types of problems that the OP is worrying about?

Comment by Nebu on Information cascades · 2015-12-20T06:55:09.131Z · LW · GW

But what about prediction markets?

Comment by Nebu on That Alien Message · 2015-12-18T06:23:45.932Z · LW · GW

Yes, this is a parable about AI safety research, with the humans in the story acting as the AI, and the aliens acting as us.

Comment by Nebu on How to Not Lose an Argument · 2015-12-18T06:10:51.404Z · LW · GW

Right, I suspect just having heard about someone's accomplishments would be an extremely noisy indicator. You'd want to know what they were thinking, for example by reading their blog posts.

Eliezer seems pretty rational, given his writings. But if he repeatedly lost in situations where other people tend to win, I'd update accordingly.

Comment by Nebu on How to Not Lose an Argument · 2015-12-16T08:19:37.346Z · LW · GW

I assume that you accept the claim that it is possible to define what a fair coin is, and thus what an unfair coin is.

If we observe some coin, at first, it may be difficult to tell if it's a fair coin or not. Perhaps the coin comes from a very trustworthy friend who assures you that it's fair. Maybe it's specifically being sold in a novelty store and labelled as an "unfair coin" and you've made many purchases from this store in the past and have never been disappointed. In other words, you have some "prior" probability belief that the coin is fair (or not fair).

As you see the coin flip, you can keep track of its outcomes, and adjust your belief. You can ask yourself "Given the outcomes I've seen, is it more likely that the coin is fair? or unfair?" and update accordingly.

I think the same applies for rationalist here. I meet someone new. Eliezer vouches for her as being very rational. I observe her sometimes winning, sometimes not winning. I expend mental effort and try to judge how easy/difficult her situation was and how much effort/skill/rationality/luck/whatever it would have taken her to win in that situation. I try to analyze how it came about that she won when she won, or lost when she lost. I try to dismiss evidence where luck was a big factor. She bought a lottery ticket, and she won. Should I update towards her being a rationalist or not? She switched doors in Monty Hall, but she ended up with a goat. Should I update towards her being a rationalist or not? Etc.

Comment by Nebu on How to Not Lose an Argument · 2015-12-14T05:41:16.571Z · LW · GW

People who win are not necessarily rationalists. A person who is a rationalist is more likely to win than a person who is not.

Consider someone who just happens to win the lottery vs someone who figures out what actions have the highest expected net profit.

Edit: That said, careful not to succumb to http://rationalwiki.org/wiki/Argument_from_consequences maybe Genghis Khan really was one of the greatest rationalists ever. I've never met the guy nor read any of his writings, so I wouldn't know.

Comment by Nebu on How to Not Lose an Argument · 2015-12-13T07:24:00.937Z · LW · GW

Actually, I think "Rationalists should WIN" regardless of what their goals are, even if that includes social wrestling matches.

The "should" here is not intended to be moral prescriptivism. I'm not saying in an morally/ethically ideal world, rationalists would win. Instead, I'm using "should" to help define what the word "Rationalist" means. If some person is a rationalist, then given equal opportunity, resources, difficult-of-goal, etc., they will on average, probabilistically win more often than someone who was not a rationalist. And if they happen to be an evil rationalist, well that sucks for the rest of the universe, but that's still what "rationalist" means.

I believe this definitional-sense of "should" is also what the originator of the "Rationalists should WIN" quote intended.

Comment by Nebu on How to Not Lose an Argument · 2015-12-13T07:20:47.485Z · LW · GW

rationalists as people who make optimal plays versus rationalists as people who love truth and hate lies

It's only possible for us to systematically make optimal plays IF we have a sufficient grasp of truth. There's only an equivocation in the minds of people who don't understand that one goal is a necessary precursor for the other.

No, I think there is an equivocation here, though that's probably because of the term "people who love truth and hate lies" instead of "epistemic rationalist".

An epistemic rationalist wants to know truth and to eliminate lies from their mind. An instrumental rationalist wants to win, and one precursor to winning is to know truth and to eliminate lies from one's own mind.

However, someone who "loves truth and hates lies" doesn't merely want their own mind to filled with truth. They want for all minds in the universe to be filled with truth and for lies to be eliminated from all minds. This can be an impediment to "winning" if there are competing minds.

Comment by Nebu on How to Not Lose an Argument · 2015-12-13T06:22:26.361Z · LW · GW

The problem with the horses of one color problem is that you are using sloppy verbal reasoning that hides an unjustified assumption that n > 1.

I'm not sure what you mean. I thought I stated it each time I was assuming n=1 and n=2.

In the induction step, we reason "The first horse is the same colour as the horses in the middle, and the horses in the middle have the same colour as the last horse. Therefore, all n+1 horses must be of the same colour". This reasoning only works if n > 1, because if n = 1, then there are no "horses in the middle", and so "the first horse is the same colour as the horses in the middle" is not true.

Comment by Nebu on To what degree do we have goals? · 2015-12-11T06:43:37.590Z · LW · GW

I think this argument is misleading.

Re "for game theoretical reasons", the paperclipper might take revenge if it predicted that doing so would be a signalling-disincentive for other office-supply-maximizers from stealing paperclips. In other words, the paperclip-maximizer is spending paperclips to take revenge solely because in its calculation, this actually leads to the expected total number of paperclips going up.

Comment by Nebu on The Blue-Minimizing Robot · 2015-12-11T05:31:04.291Z · LW · GW

What does it mean for a program to have intelligence if it does not have a goal?

This is a very interesting question, thanks for making me think about it.

(Based on your other comments elsewhere in this thread), it seems like you and I are in agreement that intelligence is about having the capability to make better choices. That is, two agents given an identical problem and identical resources to work with, the agent that is more intelligent is more likely to make the "better" choice.

What does "better" mean here? We need to define some sort of goal and then compare the outcome of their choices and how closely those outcome matches those goals. I have a couple of disorganized thoughts here:

  • The goal is just necessary for us, outsiders, to compare the intelligence of the two agents. The goal is not necessary for the existence of intelligence in the agents if no one's interested in measuring their intelligence.
  • Assuming the agents are cooperative, you can temporarily assign subgoals. For example, perhaps you and I would like to know which one of us is smarter. You and I might have many different goals, but we might agree to temporarily take on a similar goal (e.g. win this game of chess, or get the highest amount of correct answers on this IQ test, etc.) so that our intelligence can be compared.
  • The "assigning" of goals to an intelligence strongly implies to me that goals are orthogonal to intelligence. Intelligence is the capability to fulfil any general goal, and it's possible for someone to be intelligent even if they do not (currently, or ever) have any goals. If we come up with a new trait called Sodadrinkability which is the capability to drink a given soda, one can say that I possess Sodadrinkability -- that I am capable of drinking a wide range of possible sodas provided to me -- even if I do not currently (or ever) have any sodas to drink.
Comment by Nebu on Perceptual Entropy and Frozen Estimates · 2015-10-10T03:09:09.396Z · LW · GW

Feedback:

Need an example? Sure! I have two dice, and they can each land on any number, 1-6. I’m assuming they are fair, so each has probability of 1/6, and the logarithm (base 2) of 1/6 is about -2.585. There are 6 states, so the total is 6* (1/6) * 2.585 = 2.585. (With two dice, I have 36 possible combinations, each with probability 1/36, log(1/36) is -5.17, so the entropy is 5.17. You may have notices that I doubled the number of dice involved, and the entropy doubled – because there is exactly twice as much that can happen, but the average entropy is unchanged.) If I only have 2 possible states, such as a fair coin, each has probability of 1/2, and log(1/2)=-1, so for two states, (-0.5*-1)+(-0.5*-1)=1. An unfair coin, with a ¼ probability of tails, and a ¾ probability of heads, has an entropy of 0.81. Of course, this isn’t the lowest possible entropy – a trick coin with both sides having heads only has 1 state, with entropy 0. So unfair coins have lower entropy – because we know more about what will happen.

I've had to calculate information entropy for a data compression course, so I felt like I already knew the concepts you were trying to explain here, but I was not able to follow your explanation at all.

the logarithm (base 2) of 1/6 is about -2.585. There are 6 states, so the total is 6* (1/6) * 2.585 = 2.585.

The total what? Total entropy for the two dice that you have? For just one of those two dice? log(1/6) is a negative number, so why do I not see any negative numbers used in your equation? There are 6 states, so I guess that sort of explains why you're multiplying some figure by 6, but why are you dividing by 6?

If I only have 2 possible states, such as a fair coin, each has probability of 1/2, and log(1/2)=-1, so for two states, (-0.5*-1)+(-0.5*-1)=1.

Why do you suddenly switch from the notation 1/2 to the notation 0.5? Is that significant (they're referring to different concepts who coincidentally happen to have equal values)? If they actually refer to the same value, why do we have the positive value 1/2, but negative value -0.5?

Suggestion:

  • Do fair coin first, then fair dice, then trick coin.
  • Point out that a fair coin has 2 outcomes when flipped, each with equal probability, so it has entropy [-1/2 log2(1/2)] + [-1/2 log2(1/2)] = (1/2) + (1/2) = 1.
  • Point out a traditional fair dice has 6 outcomes when rolled, each of equal probability, and so it has entropy ∑n=1 to 6 of -1/6 log2(1/6) =~ 6 * -1/6 * -2.585 = 2.585.
  • Point out that a trick coin that always comes up heads has 1 outcome when flipped, so it has entropy -1 log2(1/1) = 0.
  • Point out that a trick coin that always comes up heads 75% of the time has entropy [-3/4 log2(3/4)]+[-1/4 log2(1/4)] =~ 0.311 + 0.5 = 0.811.
  • Consistently use the same notation for each example (I sort of got lazy and used ∑ for the dice to avoid writing out a value 6 times). In contrast, do not use 6 * (1/6) * 2.585 = 2.585 for one example (where all the factors are positive) and then (-0.5*-1)+(-0.5*-1)=1 for another example (where we rely on pairs of negative factors to become positive).