Posts

Errors in the Bostrom/Kulczycki Simulation Arguments 2017-03-25T21:48:00.161Z

Comments

Comment by der (DustinWehr) on AI Safety is Dropping the Ball on Clown Attacks · 2023-11-02T15:20:25.369Z · LW · GW

"Clown attack" is a phenomenal term, for a probably real and serious thing. You should be very proud of it.

Comment by der (DustinWehr) on Douglas Hofstadter changes his mind on Deep Learning & AI risk (June 2023)? · 2023-07-05T23:01:34.474Z · LW · GW

This was thought provoking. While I believe what you said is currently true for the LLMs I've used, a sufficiently expensive decoding strategy would overcome it. Might be neat to try this for the specific case you describe. Ask it a question that it would answer correctly with a good prompt style, but use the bad prompt style (asking to give an answer that starts with Yes or No), and watch how the ratio of the cumulative probabilities of Yes* and No* sequences changes as you explore the token sequence tree.

Comment by der (DustinWehr) on The mathematical universe: the map that is the territory · 2023-06-27T16:05:37.081Z · LW · GW

Anybody know who the author is? I'm trying to get in contact, but they haven't posted on LW in 12 years, so they might not get message notifications.

Comment by der (DustinWehr) on Policy discussions follow strong contextualizing norms · 2023-04-10T09:46:48.265Z · LW · GW

I see. I guess hadn't made the connection of attributing benefits to high-contextualizing norms. Only got as far as observing that certain conversations go better with comp lit friends than with comp sci peers. That was the only sentence that gave me a parse failure. I liked the post a lot.

Comment by der (DustinWehr) on Ng and LeCun on the 6-Month Pause (Transcript) · 2023-04-09T19:16:11.701Z · LW · GW

@lc and @Mateusz, keep up that theorizing. This needs a better explanation. 

Comment by der (DustinWehr) on Policy discussions follow strong contextualizing norms · 2023-04-06T16:14:57.735Z · LW · GW

Ah, no line number. Context:

To me it seems analogous to how there are many statements that need to be said very carefully in order to convey the intended message under high-decoupling norms, like claims about how another person's motivations or character traits affect their arguments.

Comment by der (DustinWehr) on Policy discussions follow strong contextualizing norms · 2023-04-06T16:13:35.371Z · LW · GW

high-decoupling

Did you mean high-contextualizing here?

Comment by der (DustinWehr) on AGI will have learnt utility functions · 2023-03-03T14:04:25.214Z · LW · GW

Interestingly, learning a reward model for use in planning has a subtle and pernicious effect we will have to deal with in AGI systems, which AIXI sweeps under the rug: with an imperfect world or reward model, the planner effectively acts as an adversary to the reward model. The planner will try very hard to push the reward model off distribution so as to get it to move into regions where it misgeneralizes and predicts incorrect high reward.

Remix: With an imperfect world... the mind effectively acts as an adversary to the heart.

Think of a person who pursues wealth as an instrumental goal for some combination of doing good, security, comfort, and whatever else their value function ought to be rewarding ("ought" in a personal coherent extrapolated volition sense). They achieve it but then, apparently it's less uncomfortable to go on accumulating more wealth than it is to get back to the thorny question of what their value function ought to be.

Comment by der (DustinWehr) on Writeup: Progress on AI Safety via Debate · 2022-09-23T19:28:37.570Z · LW · GW

Is there a more-formal statement somewhere of the theorem in Complexity theory of team games without secrets? Specifically, one that only uses terms with standard meanings in complexity theory? I find that document hard to parse. 

If concreteness is helpful, take "terms with standard meanings in Complexity Theory" to be any term defined in any textbook on complexity theory. 

Comment by der (DustinWehr) on A plausible story about AI risk. · 2022-06-14T16:29:47.382Z · LW · GW

This is awesome! A couple suggestions:

"and quickly starts to replace both Fox News and other news sources among members of all political parties." -- if this is plausible, it's not clear to me, and while I'm not a great predictor of the human race, I'm pretty damn smart. More importantly, your story doesn't need it; what it needs is just that Face News is useful and liked by a strong majority of people, like Google is today.

Murpt is a fun detail, but your story doesn't need him either. Fasecure can become dominate in government systems over a period of years, without needing to assume bad politicians. It's empirically a great system for cybersecurity, after all. It can start at the state level. Few people besides the AI safety nerds (who, I'll grant, by now have realized that nanotech stories don't seem to resonate with the general public) seem to be raising a fuss about it. Elon Musk tweets some concerns, but he's too preoccupied with mars to do much else. There are plenty of non-billionaire computer scientists who are concerned, but they look at how the careers of earlier computer scientists have fared after speaking out, in the wake of ridicule from famous academics that followed, and that triggers enough doubt in their perspective, or simply short-sighted self-interest, that they mostly keep quiet.

Would love to read more stuff like this, and would be happy to help!
 

Comment by der (DustinWehr) on Net Utility and Planetary Biocide · 2017-04-26T16:35:07.827Z · LW · GW

Lukas, I wish you had a bigger role in this community.

Comment by der (DustinWehr) on OpenAI makes humanity less safe · 2017-04-18T17:29:46.649Z · LW · GW

I've kept fairly up to date on progress in neural nets, less so in reinforcement learning, and I certainly agree at how limited things are now.

What if protecting against the threat of ASI requires huge worldwide political/social progress? That could take generations.

Not an example of that (which I haven't tried to think of), but the scenario that concerns me the most, so far, is not that some researchers will inadvertently unleash a dangerous ASI while racing to be the first, but rather that a dangerous ASI will be unleashed during an arms race between (a) states or criminal organizations intentionally developing a dangerous ASI, and (b) researchers working on ASI-powered defences to protect us against (a).

Comment by der (DustinWehr) on OpenAI makes humanity less safe · 2017-04-18T16:45:24.137Z · LW · GW

He might be willing to talk off the record. I'll ask. Have you had Darklight on? See http://lesswrong.com/r/discussion/lw/oul/openai_makes_humanity_less_safe/dqm8

Comment by der (DustinWehr) on Brief update on the consequences of my "Two arguments for not thinking about ethics" (2014) article · 2017-04-05T14:46:44.445Z · LW · GW

If my own experience and the experiences of the people I know is indicative of the norm, then thinking about ethics, the horror that is the world at large, etc, tends to encourage depression. And depression, as you've realized yourself, is bad for doing good (but perhaps good for not doing bad?). I'm still working on it myself (with the help of a strong dose of antidepressants, regular exercise, consistently good sleep, etc). Glad to hear you are on the path to finding a better balance.

Comment by der (DustinWehr) on Naturally solved problems that are easy to verify but that would be hard to compute · 2017-04-04T16:37:45.073Z · LW · GW

For Bostrom's simulation argument to conclude the disjunction of the two interesting propositions (our doom, or we're sims), you need to assume there are simulation runners who are motivated to do very large numbers of ancestor simulations. The simulation runners would be ultrapowerful, probably rich, amoral history/anthropology nerds, because all the other ultrapowerful amoral beings have more interesting things to occupy themselves with. If it's a set-it-and-forget-it simulation, that might be plausible. If the simulation requires monitoring and manual intervention, I think it's very implausible.

Comment by der (DustinWehr) on "On the Impossibility of Supersized Machines" · 2017-04-04T13:44:51.137Z · LW · GW

Working link

Comment by der (DustinWehr) on OpenAI makes humanity less safe · 2017-04-04T13:29:38.604Z · LW · GW

If my anecdotal evidence is indicative of reality, the attitude in the ML community is that people concerned about superhuman AI should not even be engaged with seriously. Hopefully that, at least, will change soon.

Comment by der (DustinWehr) on OpenAI makes humanity less safe · 2017-04-04T13:23:20.509Z · LW · GW

I'm not sure either. I'm reassured that there seems to be some move away from public geekiness, like using the word "singularity", but I suspect that should go further, e.g. replace the paperclip maximizer with something less silly (even though, to me, it's an adequate illustration). I suspect getting some famous "cool"/sexy non-scientist people on board would help; I keep coming back to Jon Hamm (who, judging from his cameos on great comedy shows, and his role in the harrowing Black Mirror episode, has plenty of nerd inside).

Comment by der (DustinWehr) on OpenAI makes humanity less safe · 2017-04-04T01:38:24.636Z · LW · GW

heh, I suppose he would agree

Comment by der (DustinWehr) on OpenAI makes humanity less safe · 2017-04-03T22:06:59.843Z · LW · GW

A guy I know, who works in one of the top ML groups, is literally less worried about superintelligence than he is about getting murdered by rationalists. That's an extreme POV. Most researchers in ML simply think that people who worry about superintelligence are uneducated cranks addled by sci fi.

I hope everyone is aware of that perception problem.

Comment by der (DustinWehr) on OpenAI makes humanity less safe · 2017-04-03T21:27:35.339Z · LW · GW

You're right.

Comment by der (DustinWehr) on OpenAI makes humanity less safe · 2017-04-03T21:23:01.852Z · LW · GW

Great post. I even worry about the emphasis on FAI, as it seems to depend on friendly superintelligent AIs effectively defending us against deliberately criminal AIs. Scott Alexander speculated:

For example, it might program a virus that will infect every computer in the world, causing them to fill their empty memory with partial copies of the superintelligence, which when networked together become full copies of the superintelligence.

But way before that, we will have humans looking to get rich programming such a virus, and you better believe they won't be using safeguards. It won't take over every computer in the world - just the ones that aren't defended by a more-powerful superintelligence (i.e. almost all computers) and that aren't interacting with the internet using formally verified software. We'll be attacked by a superintelligence running on billions of smart phones. Might be distributed initially through a compromised build of the hottest new social app for anonymous VR fucking.

Comment by der (DustinWehr) on Against responsibility · 2017-04-03T20:56:47.425Z · LW · GW

Love this. The Rationalist community hasn't made any progress on the problem of controlling, over confident, non-self-critical people rising to the top in any sufficiently large organization. Reading more of your posts now.

Comment by der (DustinWehr) on Errors in the Bostrom/Kulczycki Simulation Arguments · 2017-03-27T22:03:52.955Z · LW · GW

The positive reviewer agreed with you, though about an earlier version of that section. I stand by it, but admit that the informal and undetailed style clashes with the rest of the paper.

Comment by der (DustinWehr) on Errors in the Bostrom/Kulczycki Simulation Arguments · 2017-03-27T21:59:25.437Z · LW · GW

Ha, actually I agree with your retracted summary.

Comment by der (DustinWehr) on Errors in the Bostrom/Kulczycki Simulation Arguments · 2017-03-27T21:58:05.597Z · LW · GW

I think that was B/K's point of view as well, although in their review they fell back on the Patch 2 argument. The version of my paper they read didn't flesh out the problems with the Patch 2 argument.

I respectfully disagree that the criticism is entirely based on the wording of that one sentence. For one thing, if I remember correctly, I counted at least 6 prose locations in the paper about the Patch 1 argument that need to be corrected. Anywhere "significant number of" appears needs to be changed, for example, since "significant number of" can actually mean, depending on the settings of the parameters, "astronomically large number of". I think presenting the argument without parameters is misleading, and essentially propaganda.

Patch 2 has a similar issue (see Section 2.1), as well as (I think) another, more serious issue (Section 3.1, "Step 3").

Comment by der (DustinWehr) on Errors in the Bostrom/Kulczycki Simulation Arguments · 2017-03-25T21:49:25.411Z · LW · GW

Seeing that there was some interest in Bostrom's simulation argument before (http://lesswrong.com/lw/hgx/paper_on_the_simulation_argument_and_selective/), I wanted to post a link to a paper I wrote on the subject, together with the following text, but I was only able to post into my (private?) Drafts section. I'm sorry I don't know better about where the appropriate place is for this kind of thing (if it's welcome here at all). The paper: http://www.cs.toronto.edu/~wehr/rd/simulation_args_crit_extended_with_proofs.pdf

This is a very technical paper, which requires some (or a lot) of familiarity with Bostrom/Kulczycki's "patched" Simulation Argument (www.simulation-argument.com/patch.pdf). I'm choosing to publish it here after experiencing Analysis's depressing version of peer review (they rejected a shorter, more-professional version of the paper based on one very positive review, and one negative review that was almost certainly written by Kulczycki or Bostrom themself).

The positive review (of the earlier shorter, more-professional version of the paper) does a better job of summarizing the contribution than I did, so with the permission of the reviewer I'm including an excerpt here:

Bostrom (2003) argued that at least one of the following three claims is true: (1) the fraction of civilizations that reach a 'post-human' stage is approximately zero; (2) the fraction of post-human civilizations interested in running 'significant numbers' of simulations of their own ancestors is approximately zero; (3) the fraction of observers with human-type experiences that are simulated is approximately one.

The informal argument for this three-part disjunction is that, given what we know about the physical limits of computation, a post-human civilization would be so technologically advanced that it could run 'hugely many' simulations of observers very easily, should it choose to do so, so that the falsity of (1) and (2) implies the truth of (3). However, this informal argument falls short of a formal proof.

Bostrom himself saw that his attempt at a formal proof in the (2003) paper was sloppy, and he attempted to put it right in Bostrom and Kulczycki (2011). The take-home message of Sections 1 and 2 of the manuscript under review is that these (2011) reformulations of the argument are still rather sloppy. For example, the author points out (p. 6) that the main text of B&K inaccurately describes the mathematical argument in the appendix: the appendix uses an assumption much more favourable to B&K's desired conclusion than the assumption stated in the main text. Moreover, B&K's use of vague terms such as 'significant number' and 'astronomically large factor' creates a misleading impression. The author shows, amusingly, that the 'significant number' must be almost 1 million times greater than the 'astronomically large factor' for their argument to work (p. 9).

In Section 3, the author provides a new formulation of the simulation argument that is easily the most rigorous I have seen. This formulation deserves to be the reference point for future discussions of the argument's epistemological consequences."

Comment by der (DustinWehr) on The map of ideas how the Universe appeared from nothing · 2017-03-08T00:23:54.664Z · LW · GW

Your note about Gödel's theorem is confusing or doesn't make sense. There is no such thing as an inconsistent math structure, assuming that by "structure" you mean the things used in defining the semantics of first order logic (which is what Tegmark means when he says "structure", unless I'm mistaken).

The incompleteness theorems only give limitations on recursively enumerable sets of axioms.

Other than that, this looks like a great resource for people wanting to investigate the topic for themselves.

Comment by der (DustinWehr) on Welcome to Less Wrong! (11th thread, January 2017) (Thread B) · 2017-02-23T04:13:50.038Z · LW · GW

For example, the statement of the argument in https://wiki.lesswrong.com/wiki/Simulation_argument definitely needs to be revised.

Comment by der (DustinWehr) on Welcome to Less Wrong! (11th thread, January 2017) (Thread B) · 2017-02-23T03:09:11.639Z · LW · GW

Hey, I've been an anonymous reader off and on over the years.

Seeing that there was some interest in Bostrom's simulation argument before (http://lesswrong.com/lw/hgx/paper_on_the_simulation_argument_and_selective/), I wanted to post a link to a paper I wrote on the subject, together with the following text, but I was only able to post into my (private?) Drafts section. I'm sorry I don't know better about where the appropriate place is for this kind of thing (if it's welcome here at all). The paper: http://www.cs.toronto.edu/~wehr/rd/simulation_args_crit_extended_with_proofs.pdf

This is a very technical paper, which requires some (or a lot) of familiarity with Bostrom/Kulczycki's "patched" Simulation Argument (www.simulation-argument.com/patch.pdf). I'm choosing to publish it here after experiencing Analysis's depressing version of peer review (they rejected a shorter, more-professional version of the paper based on one very positive review, and one negative review that was almost certainly written by Kulczycki or Bostrom themself).

The positive review (of the earlier shorter, more-professional version of the paper) does a better job of summarizing the contribution than I did, so with the permission of the reviewer I'm including an excerpt here:

Bostrom (2003) argued that at least one of the following three claims is true: (1) the fraction of civilizations that reach a 'post-human' stage is approximately zero; (2) the fraction of post-human civilizations interested in running 'significant numbers' of simulations of their own ancestors is approximately zero; (3) the fraction of observers with human-type experiences that are simulated is approximately one.

The informal argument for this three-part disjunction is that, given what we know about the physical limits of computation, a post-human civilization would be so technologically advanced that it could run 'hugely many' simulations of observers very easily, should it choose to do so, so that the falsity of (1) and (2) implies the truth of (3). However, this informal argument falls short of a formal proof.

Bostrom himself saw that his attempt at a formal proof in the (2003) paper was sloppy, and he attempted to put it right in Bostrom and Kulczycki (2011). The take-home message of Sections 1 and 2 of the manuscript under review is that these (2011) reformulations of the argument are still rather sloppy. For example, the author points out (p. 6) that the main text of B&K inaccurately describes the mathematical argument in the appendix: the appendix uses an assumption much more favourable to B&K's desired conclusion than the assumption stated in the main text. Moreover, B&K's use of vague terms such as 'significant number' and 'astronomically large factor' creates a misleading impression. The author shows, amusingly, that the 'significant number' must be almost 1 million times greater than the 'astronomically large factor' for their argument to work (p. 9).

In Section 3, the author provides a new formulation of the simulation argument that is easily the most rigorous I have seen. This formulation deserves to be the reference point for future discussions of the argument's epistemological consequences."

Comment by der (DustinWehr) on The Semiotic Fallacy · 2017-02-21T14:23:11.194Z · LW · GW

Love example 2. Maybe there is a name for this already, but you could generalize the semiotic fallacy to arguments where there is an appeal to any motivating idea (whether of a semiotic nature of not) that is exceptionally hard to evaluate from a consequentialist perspective. Example: From my experience, among mathematicians (at least in theoretical computer science, though I'd guess it's the same in other areas) who attempt to justify their work, most end up appealing to the idea of unforeseen connections/usage in the future.

Comment by DustinWehr on [deleted post] 2017-02-20T19:44:07.148Z

This is a very technical paper, which requires some (or a lot) of familiarity with Bostrom/Kulczycki's "patched" Simulation Argument (www.simulation-argument.com/patch.pdf). I'm choosing to publish it here after experiencing Analysis's depressing version of peer review (they rejected a shorter, more-professional version of the paper based on one very positive review, and one negative review, from a superficial reading of the paper, that is almost certainly written by Kulczycki or Bostrom themself).

The positive review (of the earlier shorter, more-professional version of the paper) does a better job of summarizing the contribution than I did, so with the permission of the reviewer I'm including an excerpt here:

Bostrom (2003) argued that at least one of the following three claims is true: (1) the fraction of civilizations that reach a 'post-human' stage is approximately zero; (2) the fraction of post-human civilizations interested in running 'significant numbers' of simulations of their own ancestors is approximately zero; (3) the fraction of observers with human-type experiences that are simulated is approximately one.

The informal argument for this three-part disjunction is that, given what we know about the physical limits of computation, a post-human civilization would be so technologically advanced that it could run 'hugely many' simulations of observers very easily, should it choose to do so, so that the falsity of (1) and (2) implies the truth of (3). However, this informal argument falls short of a formal proof.

Bostrom himself saw that his attempt at a formal proof in the (2003) paper was sloppy, and he attempted to put it right in Bostrom and Kulczycki (2011). The take-home message of Sections 1 and 2 of the manuscript under review is that these (2011) reformulations of the argument are still rather sloppy. For example, the author points out (p. 6) that the main text of B&K inaccurately describes the mathematical argument in the appendix: the appendix uses an assumption much more favourable to B&K's desired conclusion than the assumption stated in the main text. Moreover, B&K's use of vague terms such as 'significant number' and 'astronomically large factor' creates a misleading impression. The author shows, amusingly, that the 'significant number' must be almost 1 million times greater than the 'astronomically large factor' for their argument to work (p. 9).

In Section 3, the author provides a new formulation of the simulation argument that is easily the most rigorous I have seen. This formulation deserves to be the reference point for future discussions of the argument's epistemological consequences."