Comment by steve_rayhawk on Harry Potter and the Methods of Rationality discussion thread, February 2015, chapter 113 · 2015-03-03T00:34:34.129Z · score: 2 (4 votes) · LW · GW

[nvm]

Comment by steve_rayhawk on Harry Potter and the Methods of Rationality discussion thread, February 2015, chapter 113 · 2015-03-03T00:10:16.321Z · score: -1 (1 votes) · LW · GW

Pessimistic assumption: Voldemort evaded the Mirror, and is watching every trick Harry's coming up with to use against his reflection.

Comment by steve_rayhawk on Harry Potter and the Methods of Rationality discussion thread, February 2015, chapter 113 · 2015-03-03T00:07:26.184Z · score: -1 (1 votes) · LW · GW

Semi-pessimistic assumption: Harry is in the Mirror, which has staged this conflict (perhaps on favorable terms) because it's stuck on the problem of figuring out what Tom Riddle's ideal world is.

Comment by steve_rayhawk on Harry Potter and the Methods of Rationality discussion thread, February 2015, chapter 113 · 2015-03-02T13:09:41.008Z · score: 0 (2 votes) · LW · GW

Pessimistic assumption: Voldemort can reliably give orders to Death Eaters within line-of-sight, and Death Eaters can cast several important spells, without any visible sign or sound.

Comment by steve_rayhawk on Harry Potter and the Methods of Rationality discussion thread, February 2015, chapter 113 · 2015-03-02T13:08:04.352Z · score: 2 (6 votes) · LW · GW

Pessimistic assumption: Voldemort has reasonable cause to be confident that his Horcrux network will not be affected by Harry's death.

Comment by steve_rayhawk on Harry Potter and the Methods of Rationality discussion thread, February 2015, chapter 113 · 2015-03-02T13:00:42.949Z · score: 1 (3 votes) · LW · GW

A necessary condition for a third ending might involve a solution that purposefully violates the criteria in some respect.

Comment by steve_rayhawk on Harry Potter and the Methods of Rationality discussion thread, February 2015, chapter 113 · 2015-03-02T12:59:13.791Z · score: 0 (2 votes) · LW · GW

Pessimistic assumption: Voldemort wants Harry to reveal important information as a side effect of using his wand. To get the best ending, Harry must identify what information this would be, and prevent Voldemort from acquiring this information.

Comment by steve_rayhawk on Harry Potter and the Methods of Rationality discussion thread, February 2015, chapter 113 · 2015-03-02T12:52:40.862Z · score: 3 (5 votes) · LW · GW

Pessimistic assumption: Voldemort wants Harry to defeat him on this occasion. To get the best ending, Harry must defeat Voldemort, and then, before leaving the graveyard, identify a benefit that Voldemort gains by losing and deny him that benefit.

Comment by steve_rayhawk on Harry Potter and the Methods of Rationality discussion thread, February 2015, chapter 113 · 2015-03-02T12:30:25.614Z · score: 0 (2 votes) · LW · GW

Pessimistic assumption: Free Transfiguration doesn't work like a superpower from Worm: it does not grant sensory feedback about the object being Transfigured, even if it does interpret the caster's idea of the target.

Comment by steve_rayhawk on Harry Potter and the Methods of Rationality discussion thread, February 2015, chapter 113 · 2015-03-02T12:24:09.996Z · score: 0 (2 votes) · LW · GW

Pessimistic assumption: At least in the limit of unusually thin and long objects, Transfiguration time actually scales as the product of the shortest local dimension with the square of the longest local dimension of the target, rather than the volume. Harry has not detected this because he was always Transfiguring volumes or areas, and McGonagall was mistaken.

Comment by steve_rayhawk on Harry Potter and the Methods of Rationality discussion thread, February 2015, chapter 113 · 2015-03-02T12:20:49.893Z · score: 3 (5 votes) · LW · GW

Pessimistic assumption: An intended solution involves, as a side-effect, Harry suffering a mortal affliction such as Transfiguration sickness or radiation poisoning, and is otherwise highly constrained. The proposed solution is close to this intended solution, and to match the other constraints, it must either include Harry suffering such an affliction with a plan to recover from it, or subject Harry to conditions where he would normally suffer such an affliction except that he has taken unusual measures to prevent it.

(This is one reading of the proviso, "evade immediate death".)

Comment by steve_rayhawk on Harry Potter and the Methods of Rationality discussion thread, February 2015, chapter 113 · 2015-03-02T12:10:24.065Z · score: 0 (2 votes) · LW · GW

Pessimistic assumption: Hermione, once wakened, despite acting normal, will be under Voldemort's control.

Comment by steve_rayhawk on Harry Potter and the Methods of Rationality discussion thread, February 2015, chapter 113 · 2015-03-02T12:05:35.809Z · score: 1 (3 votes) · LW · GW

Pessimistic assumption: Any plan which causes the occurrence of the vignette from Ch. 1 does not lead to the best ending. (For example, one reading of phenomena in Ch. 89 is that that Harry is in a time loop, and the vignette may be associated with the path that leads to a reset of the loop.)

Comment by steve_rayhawk on Harry Potter and the Methods of Rationality discussion thread, February 2015, chapter 113 · 2015-03-02T12:03:12.150Z · score: 0 (2 votes) · LW · GW

Pessimistic assumption: Voldemort, and some of the Death Eaters, have witnessed combat uses of the time-skewed Transfiguration featuring in Chapter 104. They will have appropriate reflexes to counter any attacks by partial Transfiguration which they could have countered if the attacks had been made using time-skewed Transfiguration.

Comment by steve_rayhawk on Harry Potter and the Methods of Rationality discussion thread, February 2015, chapter 113 · 2015-03-02T11:55:17.246Z · score: 0 (2 votes) · LW · GW

Pessimistic assumption: It is not possible to Transfigure antimatter.

Comment by steve_rayhawk on Harry Potter and the Methods of Rationality discussion thread, February 2015, chapter 113 · 2015-03-02T11:53:52.941Z · score: 1 (3 votes) · LW · GW

Pessimistic assumption: Neither partial Transfiguration nor extremely fast Transfiguration (using extremely small volumes) circumvent the limits on Transfiguring air.

Comment by steve_rayhawk on Harry Potter and the Methods of Rationality discussion thread, February 2015, chapter 113 · 2015-03-02T11:51:50.707Z · score: 0 (2 votes) · LW · GW

Pessimistic assumption: Plans which depend on the use of partial Transfiguration, or Transfiguration of volumes small enough to complete at timescales smaller than that of mean free paths in air (order of 160 picoseconds?), to circumvent the limitation on Transfiguring air, will only qualify as valid if they contain an experimental test of the ability to Transfigure air, together with a backup plan which is among the best available in case it is not possible to Transfigure air.

Comment by steve_rayhawk on Harry Potter and the Methods of Rationality discussion thread, February 2015, chapter 113 · 2015-03-02T11:43:11.038Z · score: 0 (2 votes) · LW · GW

Pessimistic assumption: Plans which depend on Transfiguring antimatter will only qualify as valid if they contain an experimental test of the ability to Transfigure antimatter, together with a backup plan which is among the best available in case it is not possible to Transfigure antimatter.

Comment by steve_rayhawk on Harry Potter and the Methods of Rationality discussion thread, February 2015, chapter 113 · 2015-03-02T11:37:38.607Z · score: 1 (3 votes) · LW · GW

Pessimistic assumption: Harry's wand is not already touching a suitable object for Transfiguration. Neither partial Transfiguration nor extremely fast Transfiguration of extremely small volumes lift the restriction against Transfiguring air, dust specks or surface films would need to be specifically seen, the tip of the wand is not touching his skin, and the definition of "touching the wand" starts at the boundary of the wand material.

Comment by steve_rayhawk on Harry Potter and the Methods of Rationality discussion thread, February 2015, chapter 113 · 2015-03-02T11:30:51.331Z · score: 0 (2 votes) · LW · GW

Concerning Transfiguration:

Comment by steve_rayhawk on Harry Potter and the Methods of Rationality discussion thread, February 2015, chapter 113 · 2015-03-02T11:12:05.673Z · score: 1 (3 votes) · LW · GW

Pessimistic assumption: The effect of the Unbreakable Vow depends crucially on the order in which Harry lets himself become aware of arguments about its logical consequences.

Comment by steve_rayhawk on Harry Potter and the Methods of Rationality discussion thread, February 2015, chapter 113 · 2015-03-02T10:54:07.037Z · score: 6 (8 votes) · LW · GW

Pessimistic assumption: Voldemort has made advance preparations which will thwart every potential plan of Harry's based on favorable tactical features or potential features of the situation which might reasonably be obvious to him. These include Harry's access to his wand, the Death Eaters' lack of armor enchantments or prepared shields, the destructive magic resonance, the Time-Turner, Harry's other possessions, Harry's glasses, the London portkey, a concealed Patronus from Hermione's revival, or Hermione's potential purposeful assistance. Any attempt to use these things will fail at least once and and will, absent an appropriate counter-strategy, immediately trigger lethal force against Harry.

Comment by steve_rayhawk on Harry Potter and the Methods of Rationality discussion thread, February 2015, chapter 113 · 2015-03-02T10:39:07.554Z · score: 0 (2 votes) · LW · GW

Pessimistic assumption: There are more than two endings. A solution meeting the stated criteria is a necessary but not sufficient condition for the least sad ending.

If a viable solution is posted [...] the story will continue to Ch. 121.

Otherwise you will get a shorter and sadder ending.

Note that the referent of "Ch. 121" is not necessarily fixed in advance.

Counterargument: "I expect that the collective effect of 'everyone with more urgent life issues stays out of the effort' shifts the probabilities very little" suggests that reasonable prior odds of getting each ending are all close to 0 or 1, so any possible hidden difficulty thresholds are either very high or very low.

Counterargument: The challenge in Three Worlds Collide only had two endings.

Counterargument: A third ending would have taken additional writing effort, to no immediately obvious didactic purpose.

Comment by steve_rayhawk on Harry Potter and the Methods of Rationality discussion thread, February 2015, chapter 113 · 2015-03-02T10:20:31.858Z · score: 6 (8 votes) · LW · GW

Pessimistic Assumptions Thread

"Excuse me, I should not have asked that of you, Mr. Potter, I forgot that you are blessed with an unusually pessimistic imagination -"

Ch. 15

Sometimes people called Moody 'paranoid'.

Moody always told them to survive a hundred years of hunting Dark Wizards and then get back to him about that.

Mad-Eye Moody had once worked out how long it had taken him, in retrospect, to achieve what he now considered a decent level of caution - weighed up how much experience it had taken him to get good instead of lucky - and had begun to suspect that most people died before they got there. Moody had once expressed this thought to Lyall, who had done some ciphering and figuring, and told him that a typical Dark Wizard hunter would die, on average, eight and a half times along the way to becoming 'paranoid'. This explained a great deal, assuming Lyall wasn't lying.

Yesterday, Albus Dumbledore had told Mad-Eye Moody that the Dark Lord had used unspeakable dark arts to survive the death of his body, and was now awake and abroad, seeking to regain his power and begin the Wizarding War anew.

Someone else might have reacted with incredulity.

Ch. 63

Under standard literary convention... the enemy wasn't supposed to look over what you'd done, sabotage the magic items you'd handed out, and then send out a troll rendered undetectable by some means the heroes couldn't figure out even after the fact, so that you might as well have not defended yourself at all. In a book, the point-of-view usually stayed on the main characters. Having the enemy just bypass all the protagonists' work, as a result of planning and actions taken out of literary sight, would be a diabolus ex machina, and dramatically unsatisfying.

But in real life the enemy would think that they were the main character, and they would also be clever, and think things through in advance, even if you didn't see them do it. That was why everything about this felt so disjointed, with parts unexplained and seemingly inexplicable.

Ch. 94

"You may think that a grade of Dreadful... is not fair. That Miss Granger was faced with a test... for which her lessons... had not prepared her. That she was not told... that the exam was coming on that day."

The Defense Professor drew in a shaking breath.

"Such is realism," said Professor Quirrell.

Ch. 103

Recalling finewbs's coordinated saturation bombing strategy, if the goal is to maximize the total best-guess probability of the set of scenarios covered by at least one solution, this means crafting and posting diverse solutions which handle as wide a diversity of conjunctions of pessimistic assumptions as possible. This would be helped by having a list of pessimistic assumptions.

(It also may be helped by having a reasonable source of probabilities of scenarios, such as HPMOR predictions on PredictionBook. Also: in an adversarial context, the truth of pessimistic assumptions is correlated.)

Comment by steve_rayhawk on Breaking the vicious cycle · 2014-11-24T07:55:30.014Z · score: 12 (13 votes) · LW · GW

there very likely exist misrepresentations. There are many reasons for this, but I can assure you that I never deliberately lied and that I never deliberately tried to misrepresent anyone. The main reason might be that I feel very easily overwhelmed

I think the thing to remember is that, when you've run into contexts where you feel like someone might not care that they're setting you up to be judged unfairly, you've been too overwhelmed to keep track of whether or not your self-defense involves doing things that you'd normally be able to see would set them up to be judged unfairly.

You've been trying to defend a truth about a question -- about what actions you could reasonably be expected to have been sure you should have taken, after having been exposed to existential-risk arguments -- that's made up of many complex implicit emotional and social associations, like the sort of "is X vs. Y the side everyone should be on?" that Scott Alexander discusses in "Ethnic Tension and Meaningless Arguments". But you've never really developed the necessary emotional perspective to fully realize that the only language you've had access to, to do that with, is a different language: that of explicit factual truths. If you try to state truths in one language using the other without accounting for the difference, blinded by pain and driven by the intuitive impulse escape the pain, you're going to say false things. It only makes sense that you would have screwed up.

written in a tearing hurry, akin to a reflexive retraction from the painful stimulus

Try to progress to having a conscious awareness of your desperation, I mean a conscious understanding of how the desperation works and what it's tied to emotionally. Once you've done that, you should be able to consciously keep in mind better the other ways that the idea of "justice" might also relate to your situation, and so do a lot less unjust damage. (Contrariwise, if you do choose to do damage, a significantly greater fraction of it will be just.)

It might also help to have a stronger deontological proscription against misrepresenting anyone in a way that would cause them to be judged unfairly. That proscription would put you under more pressure to develop this kind of emotional perspective and conscious awareness, although it would do this at the cost of adding extra deontological hoops you have to jump through to escape the pain when it comes. If this leaves you too bound-up to say anything, you can usually go meta and explain how you're too bound-up, at least once you have enough practice at explaining things like that.

I'm sorry. I claim to have some idea what it's like.

(Also, on reflection, I should admit that mostly I'm saying this because I'm afraid of third parties keeping mistakenly unfavorable impressions about your motives; so it's slightly dishonest of me to word some of the above comments as simply directed to you, the way I have. And in the process I've converted an emotional truth, "I think it's important for other people not to believe as-bad things about your motives, because I can see how that amount of badness is likely mistaken", into a factual claim, "your better-looking motives are exactly X".)

Comment by steve_rayhawk on Causal Universes · 2012-11-28T19:16:54.208Z · score: 3 (3 votes) · LW · GW

I know that the idea of "different systems of local consistency constraints on full spacetimes might or might not happen to yield forward-sampleable causality or things close to it" shows up in Wolfram's "A New Kind of Science", for all that he usually refuses to admit the possible relevance of probability or nondeterminism whenever he can avoid doing so; the idea might also be in earlier literature.

that there is in fact a way to finitely Turing-compute a discrete universe with self-consistent Time-Turners in it.

I'd thought about that a long time previously (not about Time-Turners; this was before I'd heard of Harry Potter). I remember noting that it only really works if multiple transitions are allowed from some states, because otherwise there's a much higher chance that the consistency constraints would not leave any histories permitted. ("Histories", because I didn't know model theory at the time. I was using cellular automata as the example system, though.) (I later concluded that Markov graphical models with weights other than 1 and 0 were a less brittle way to formulate that sort of intuition (although, once you start thinking about configuration weights, you notice that you have problems about how to update if different weight schemes would lead to different partition function) values).)

I think there might have been an LW comment somewhere that put me on that track

I know we argued briefly at one point about whether Harry could take the existence of his subjective experience as valid anthropic evidence about whether or not he was in a simulation. I think I was trying to make the argument specifically about whether or not Harry could be sure he wasn't in a simulation of a trial timeline that was going to be ruled inconsistent. (Or, implicitly, a timeline that he might be able to control whether or not it would be ruled inconsistent. Or maybe it was about whether or not he could be sure that there hadn't been such simulations.) But I don't remember you agreeing that my position was plausible, and it's possible that that means I didn't convey the information about which scenario I was trying to argue about. In that case, you wouldn't have heard of the idea from me. Or I might have only had enough time to figure out how to halfway defensibly express a lesser idea: that of "trial simulated timelines being iterated until a fixed point".

Comment by steve_rayhawk on What does the world look like, the day before FAI efforts succeed? · 2012-11-17T04:27:37.441Z · score: 37 (39 votes) · LW · GW

The main way complexity of this sort would be addressable is if the intellectual artifact that you tried to prove things about were simpler than the process that you meant the artifact to unfold into. For example, the mathematical specification of AIXI is pretty simple, even though the hypotheses that AIXI would (in principle) invent upon exposure to any given environment would mostly be complex. Or for a more concrete example, the Gallina kernel of the Coq proof engine is small and was verified to be correct using other proof tools, while most of the complexity of Coq is in built-up layers of proof search strategies which don't need to themselves be verified, as the proofs they generate are checked by Gallina.

Isn't that as unbelievable as the idea that you can prove that a particular zygote will never grow up to be an evil dictator? Surely this violates some principles of complexity, chaos [...]

Yes, any physical system could be subverted with a sufficiently unfavorable environment. You wouldn't want to prove perfection. The thing you would want to prove would be more along the lines of, "will this system become at least somewhere around as capable of recovering from any disturbances, and of going on to achieve a good result, as it would be if its designers had thought specifically about what to do in case of each possible disturbance?". (Ideally, this category of "designers" would also sort of bleed over in a principled way into the category of "moral constituency", as in CEV.) Which, in turn, would require a proof of something along the lines of "the process is highly likely to make it to the point where it knows enough about its designers to be able to mostly duplicate their hypothetical reasoning about what it should do, without anything going terribly wrong".

We don't know what an appropriate formalization of something like that would look like. But there is reason for considerable hope that such a formalization could be found, and that this formalization would be sufficiently simple that an implementation of it could be checked. This is because a few other aspects of decision-making which were previously mysterious, and which could only be discussed qualitatively, have had powerful and simple core mathematical descriptions discovered for cases where simplifying modeling assumptions perfectly apply. Shannon information was discovered for the informal notion of surprise (with the assumption of independent identically distributed symbols from a known distribution). Bayesian decision theory was discovered for the informal notion of rationality (with assumptions like perfect deliberation and side-effect-free cognition). And Solomonoff induction was discovered for the informal notion of Occam's razor (with assumptions like a halting oracle and a taken-for-granted choice of universal machine). These simple conceptual cores can then be used to motivate and evaluate less-simple approximations for situations where where the assumptions about the decision-maker don't perfectly apply. For the AI safety problem, the informal notions (for which the mathematical core descriptions would need to be discovered) would be a bit more complex -- like the "how to figure out what my designers would want to do in this case" idea above. Also, you'd have to formalize something like our informal notion of how to generate and evaluate approximations, because approximations are more complex than the ideals they approximate, and you wouldn't want to need to directly verify the safety of any more approximations than you had to. (But note that, for reasons related to Rice's theorem, you can't (and therefore shouldn't want to) lay down universally perfect rules for approximation in any finite system.)

Two other related points are discussed in this presentation: the idea that a digital computer is a nearly deterministic environment, which makes safety engineering easier for the stages before the AI is trying to influence the environment outside the computer, and the idea that you can design an AI in such a way that you can tell what goal it will at least try to achieve even if you don't know what it will do to achieve that goal. Presumably, the better your formal understanding of what it would mean to "at least try to achieve a goal", the better you would be at spotting and designing to handle situations that might make a given AI start trying to do something else.

(Also: Can you offer some feedback as to what features of the site would have helped you sooner be aware that there were arguments behind the positions that you felt were being asserted blindly in a vacuum? The "things can be surprisingly formalizable, here are some examples" argument can be found in lukeprog's "Open Problems Related to the Singularity" draft and the later "So You Want to Save the World", though the argument is very short and hard to recognize the significance of if you don't already know most of the mathematical formalisms mentioned. A backup "you shouldn't just assume that there's no way to make this work" argument is in "Artificial Intelligence as a Positive and Negative Factor in Global Risk", pp 12-13.)

what will prevent them from becoming "bad guys" when they wield this much power

That's a problem where successful/practically applicable formalizations are harder to hope for, so it's been harder for people to find things to say about it that pass the threshold of being plausible conceptual progress instead of being noisy verbal flailing. See the related "How can we ensure that a Friendly AI team will be sane enough?". But it's not like people aren't thinking about the problem.

Comment by steve_rayhawk on What does the world look like, the day before FAI efforts succeed? · 2012-11-17T01:40:37.444Z · score: 17 (19 votes) · LW · GW

you know what I mean.

Right, but this is a public-facing post. A lot of readers might not know why you could think it was obvious that "good guys" would imply things like information security, concern for Friendliness so-named, etc., and they might think that the intuition you mean to evoke with a vague affect-laden term like "good guys" is just the same argument-disdaining groupthink that would be implied if they saw it on any other site.

To prevent this impression, if you're going to use the term "good guys", then at or before the place where you first use it, you should probably put an explanation, like

(I.e. people who are familiar with the kind of thinking that can generate arguments like those in "The Detached Lever Fallacy", "Fake Utility Functions" and the posts leading up to it, "Anthropomorphic Optimism" and "Contaminated by Optimism", "Value is Fragile" and the posts leading up to it, and the "Envisioning perfection" and "Beyond the adversarial attitude" discussions in Creating Friendly AI or most of the philosophical discussion in Coherent Extrapolated Volition, and who understand what it means to be dealing with a technology that might be able to bootstrap to the singleton level of power that could truly engineer a "forever" of the "a boot stamping on a human face — forever" kind.)

Comment by steve_rayhawk on Value Loading · 2012-10-23T12:32:22.471Z · score: 5 (5 votes) · LW · GW

See also "Acting Rationally with Incomplete Utility Information" by Urszula Chajewska, 2002.

Comment by steve_rayhawk on Thoughts on the Singularity Institute (SI) · 2012-10-21T10:10:58.703Z · score: 13 (13 votes) · LW · GW

these are all literally from the Nonprofits for Dummies book. [...] The history I've heard is that SI [...]

\

failed to read Nonprofits for Dummies,

I remember that, when Anna was managing the fellows program, she was reading books of the "for dummies" genre and trying to apply them... it's just that, as it happened, the conceptual labels she accidentally happened to give to the skill deficits she was aware of were "what it takes to manage well" (i.e. "basic management") and "what it takes to be productive", rather than "what it takes to (help) operate a nonprofit according to best practices". So those were the subjects of the books she got. (And read, and practiced.) And then, given everything else the program and the organization was trying to do, there wasn't really any cognitive space left over to effectively notice the possibility that those wouldn't be the skills that other people afterwards would complain that nobody acquired and obviously should have known to. The rest of her budgeted self-improvement effort mostly went toward overcoming self-defeating emotional/social blind spots and motivated cognition. (And I remember Jasen's skill learning focus was similar, except with more of the emphasis on emotional self-awareness and less on management.)

failed to ask advisors for advice,

I remember Anna went out of her way to get advice from people who she already knew, who she knew to be better than her at various aspects of personal or professional functioning. And she had long conversations with supporters who she came into contact with for some other reasons; for those who had executive experience, I expect she would have discussed her understanding of SIAI's current strategies with them and listened to their suggestions. But I don't know how much she went out of her way to find people she didn't already have reasonably reliable positive contact with, to get advice from them.

I don't know much about the reasoning of most people not connected with the fellows program about the skills or knowledge they needed. I think Vassar was mostly relying on skills tested during earlier business experience, and otherwise was mostly preoccupied with the general crisis of figuring out how to quickly-enough get around the various hugely-saliently-discrepant-seeming-to-him psychological barriers that were causing everyone inside and outside the organization to continue unthinkingly shooting themselves in the feet with respect to this outside-evolutionary-context-problem of existential risk mitigation. For the "everyone outside's psychological barriers" side of that, he was at least successful enough to keep SIAI's public image on track to trigger people like David Chalmers and Marcus Hutter into meaningful contributions to and participation in a nascent Singularity-studies academic discourse. I don't have a good idea what else was on his mind as something he needed to put effort into figuring out how to do, in what proportions occupying what kinds of subjective effort budgets, except that in total it was enough to put him on the threshold of burnout. Non-profit best practices apparently wasn't one of those things though.

But the proper approach to retrospective judgement is generally a confusing question.

the kind of thing that makes me want to say [. . .]

The general pattern, at least post-2008, may have been one where the people who could have been aware of problems felt too metacognitively exhausted and distracted by other problems to think about learning what to do about them, and hoped that someone else with more comparative advantage would catch them, or that the consequences wouldn't be bigger than those of the other fires they were trying to put out.

strategic plan [...] SI failed to make these kinds of plans in the first place,

There were also several attempts at building parts of a strategy document or strategic plan, which together took probably 400-1800 hours. In each case, the people involved ended up determining, from how long it was taking, that, despite reasonable-seeming initial expectations, it wasn't on track to possibly become a finished presentable product soon enough to justify the effort. The practical effect of these efforts was instead mostly just a hard-to-communicate cultural shared understanding of the strategic situation and options -- how different immediate projects, forms of investment, or conditions in the world might feed into each other on different timescales.

expenses tracking, funds monitoring [...] some funds monitoring was insisted upon after the large theft

There was an accountant (who herself already cost like $33k/yr as the CFO, despite being split three ways with two other nonprofits) who would have been the one informally expected to have been monitoring for that sort of thing, and to have told someone about it if she saw something, out of the like three paid administrative slots at the time... well, yeah, that didn't happen.

I agree with a paraphrase of John Maxwell's characterization: "I'd rather hear Eliezer say 'thanks for funding us until we stumbled across some employees who are good at defeating their akrasia and [had one of the names of the things they were aware they were supposed to] care about [happen to be "]organizational best practices["]', because this seems like a better depiction of what actually happened." Note that this was most of the purpose of the Fellows program in the first place -- to create an environment where people could be introduced to the necessary arguments/ideas/culture and to help sort/develop those people into useful roles, including replacing existing management, since everyone knew there were people who would be better at their job than they were and wished such a person could be convinced to do it instead.

Comment by steve_rayhawk on A probability question · 2012-10-21T09:46:55.097Z · score: 2 (2 votes) · LW · GW

If you have a lot of experts and a lot of objects, I might try a generative model where each object had unseen values from an n-dimensional feature space, and where experts decided what features to notice using weightings from a dual n-dimensional space, with the weight covectors generated as clustered in some way to represent the experts' structured non-independence. The experts' probability estimates would be something like a logistic function of the product of each object's features with the expert's weights (plus noise), and your output summary probability would be the posterior mean of an estimate based on a special "best" expert weighting, derived using the assumption that the experts' estimates are well-calibrated.

I'm not sure what an appropriate generative model of clustered expert feature weightings would be.

Actually, I guess the output of this procedure would just end up being a log-linear model of the truth given the experts' confidences. (Some of the coefficients might be negative, to cancel confounding factors.) So maybe a lot easier way to fix this is to sample from the space of such log-linear models directly, using sampled hypothetical imputed truths, while enforcing some constraint that the experts' opinions be reasonably well-calibrated.

You have 2 independent measurements

I ignored this because I wasn't sure what you could have meant by "independent". If you meant that the experts' outputs are fully independent, conditional on the truth, then the problem is straightforward. But this seems unlikely in practice. You probably just meant the informal English connotation "not completely dependent".

Comment by steve_rayhawk on Friendly AI and the limits of computational epistemology · 2012-08-16T20:04:05.148Z · score: 2 (2 votes) · LW · GW

It wouldn't just be that some models of reality acknowledge your existence and others don't; it would mean that you are nothing more than a fuzzy heuristic concept in someone else's model, and that if they switched models, you would no longer exist even in that limited sense.

Or in a cascade of your own successive models, including of the cascade.

Or an incentive to keep using that model rather than to switch to another one. The models are made up, but the incentives are real. (To whatever extent the thing subject to the incentives is.)

Not that I'm agreeing, but some clever ways to formulate almost your objection could be built around the wording "The mind is in the mind, not in reality".

Comment by steve_rayhawk on Friendly AI and the limits of computational epistemology · 2012-08-08T22:42:22.456Z · score: 3 (3 votes) · LW · GW

You need to do the impossible one more time, and make your plans bearing in mind that the true ontology [...] something more than your current intellectual tools allow you to represent.

With the "is" removed and replaced by an implied "might be", this seems like a good sentiment...

...well, given scenarios in which there were some other process that could come to represent it, such that there'd be a point in using (necessarily-)current intellectual tools to figure out how to stay out of those processes' way...

...and depending on the relative payoffs, and the other processes' hypothetical robustness against interference.

(To the extent that decomposing the world into processes that separately come to do things, and can be "interfered with" or not, makes sense at all, of course.)

A more intelligible argument than the specific one you have been making is merely "we don't know whether there are any hidden philosophical, contextual, or further-future gotchas in whether or not a seemingly valuable future would actually be valuable". But in that case it seems like you need a general toolset to try to eventually catch the gotcha hypotheses you weren't by historical accident already disposed to turn up, the same way algorithmic probability is supposed to help you organize your efforts to be sure you've covered all the practical implications of hypotheses about non-weird situations. As a corollary: it would be helpful to propose a program of phenomenological investigation that could be expected to cover the same general sort of amount of ground where possible gotchas could be lurking as would designing an AI to approximate a universal computational hypothesis class.

If it matters, the only scenario I can think of specifically relating to quantum mechanics is that there are forms of human communication which somehow are able to transfer qubits, that these matter for something, and that a classical simulation wouldn't preserve them at the input (and/or the other boundaries).

Comment by steve_rayhawk on Friendly AI and the limits of computational epistemology · 2012-08-08T22:28:03.736Z · score: 5 (5 votes) · LW · GW

His expectation that this will work out is based partly on [...]

(It's also based on an intuition I don't understand that says that classical states can't evolve toward something like representational equilibrium the way quantum states can -- e.g. you can't have something that tries to come up with an equilibrium of anticipation/decisions, like neural approximate computation of Nash equilibria, but using something more like representations of starting states of motor programs that, once underway, you've learned will predictably try to search combinatorial spaces of options and/or redo a computation like the current one but with different details -- or that, even if you can get ths sort of evolution in classical states, it's still knowably irrelevant. Earlier he invoked bafflingly intense intuitions about the obviously compelling ontological significance of the lack of spatial locality cues attached to subjective consciousness, such as "this quale is experienced in my anterior cingulate cortex, and this one in Wernicke's area", to argue that experience is necessarily nonclassically replicable. (As compared with, what, the spatial cues one would expect a classical simulation of the functional core of a conscious quantum state machine to magically become able to report experiencing?) He's now willing to spontaneously talk about non-conscious classical machines that simulate quantum ones (including not magically manifesting p-zombie subjective reports of spatial cues relating to its computational hardware), so I don't know what the causal role of that earlier intuition is in his present beliefs; but his reference to a "sweet spot", rather than a sweet protected quantum subspace of a space of network states or something, is suggestive, unless that's somehow necessary for the imagined tensor products to be able to stack up high enough.)

Comment by steve_rayhawk on Friendly AI and the limits of computational epistemology · 2012-08-08T22:02:29.529Z · score: 11 (11 votes) · LW · GW

You invoke as granted the assumption that there's anything besides your immediately present self (including your remembered past selves) that has qualia, but then you deny that some anticipatable things will have qualia. Presumably there are some philosophically informed epistemic-ish rules that you have been using, and implicitly endorsing, for the determination of whether any given stimuli you encounter were generated by something with qualia, and there are some other meta-philosophical epistemology-like rules that you are implicitly using and endorsing for determining whether the first set of rules was correct. Can you highlight any suitable past discussion you have given of the epistemology of the problem of other minds?

eta: I guess the discussions here, or here, sort of count, in that they explain how you could think what you do... except they're about something more like priors than like likelihoods.

In retrospect, the rest of your position is like that too, based on sort of metaphysical arguments about what is even coherently postulable, though you treat the conclusions with a certainty I don't see how to justify (e.g. one of your underlying concepts might not be fundamental the way you imagine). So, now that I see that, I guess my question was mostly just a passive-aggressive way to object to your argument procedure. The objectionable feature made more explicit is that the constraint you propose on the priors requires such a gerrymandered-seeming intermediate event -- that consciousness-simulating processes which are not causally (and, therefore, in some sense physically) 'atomic' are not experienced, yet would still manage to generate the only kind of outward evidence about their experiencedness that anyone else could possibly experience without direct brain interactions or measurements -- in order to make the likelihood of the (hypothetical) observations (of the outward evidence of experiencedness, and of the absence of that outward evidence anywhere else) within the gerrymandered event come out favorably.

Comment by steve_rayhawk on Friendly AI and the limits of computational epistemology · 2012-08-08T19:54:43.417Z · score: 6 (8 votes) · LW · GW

Some brief attempted translation for the last part:

A "monad", in Mitchell Porter's usage, is supposed to be a somewhat isolatable quantum state machine, with states and dynamics factorizable somewhat as if it was a quantum analogue of a classical dynamic graphical model such as a dynamic Bayesian network (e.g., in the linked physics paper, a quantum cellular automaton). (I guess, unlike graphical models, it could also be supposed to not necessarily have a uniquely best natural decomposition of its Hilbert space for all purposes, like how with an atomic lattice you can analyze it either in terms of its nuclear positions or its phonons.) For a monad to be a conscious mind, the monad must also at least be complicated and [this is a mistaken guess] capable of certain kinds of evolution toward something like equilibria of tensor-product-related quantum operators having to do with reflective state representation[/mistaken guess]. His expectation that this will work out is based partly on intuitive parallels between some imaginable combinatorially composable structures in the kind of tensor algebra that shows up in quantum mechanics and the known composable grammar-like structures that tend to show up whenever we try to articulate concepts about representation (I guess mostly the operators of modal logic).

(Disclaimer: I know almost only just enough quantum physics to get into trouble.)

A monadic mind would be a state machine, but ontologically it would be different from the same state machine running on a network of a billion monads.

Not all your readers will understand that "network of a billion monads" is supposed to refer to things like classical computing machinery (or quantum computing machinery?).

Comment by steve_rayhawk on Work on Security Instead of Friendliness? · 2012-07-22T23:54:54.387Z · score: 12 (12 votes) · LW · GW

It's not something you can ever come close to competing with by a philosophy invented from scratch.

I don't understand what you mean by this.

A sufficient cause for Nick to claim this would be that he believed that no human-conceivable AI design would be able to incorporate by any means, including by reasoning from first principles or even by reference, anything functionally equivalent to the results of all the various dynamics of updating that have (for instance) made present legal systems as (relatively) robust (against currently engineerable methods of exploitation) as they are.

This seems somewhat strange to you, because you believe humans can conceive of AI designs that could reason some things from first principles (given observations of the world that the reasoning needed to be relevant to, plus reasonably anticipatable advantages of computing power over single humans) or incorporate results by reference.

One possible reason he might believe this would be that he believed that, whenever a human reasons about history or evolved institutions, there are something like two distinct levels of a computational complexity hierarchy at work, and that the powers of the greater level (history and the evolution of institutions) are completely inacessible to the powers of the lesser level (the human). (The machines representing the two levels in this case might be "the mental states accessible to a single armchair philosophy community", or, alternatively, "fledgling AI which, per a priori economic intuition, has no advantage over a few philosophers", versus "the physical states accessible in human history".)

This belief of his might be charged with a sort of independent half-intuitive aversion to making the sorts of (frequently catastrophic) mistakes that are routinely made by people who think they can metaphorically breach this complexity barrier. One effect of such an aversion would be that he would intuitively anticipate that he would always be, at least in expected value, wrong to agree with such people, no matter what arguments they could turn out to have. That is, it wouldn't increase his expected rightness to check to see if they were right about some proposed procedure to get around the complexity barrier, because, intuitively, the prior probability that they were wrong, the conditional probability that they would still be wrong despite being persuasive by any conventional threshold, and the wrongness of the cost that had empirically been inflicted on the world by mistakes of that sort, would all be so high. (I took his reference to Hayek's Fatal Conceit, and the general indirect and implicitly argued emotional dynamic of this interaction, to be confirmation of this intuitive aversion.) By describing this effect explicitly, I don't mean to completely psychologize here, or make a status move by objectification. Intuitions like the one I'm attributing can (and very much should!), of course, be raised to the level of verbally presented propositions, and argued for explicitly.

(For what it's worth, the most direct counter to the complexity argument expressed this way is: "with enough effort it is almost certainly possible, even from this side of the barrier, to formalize how to set into motion entities that would be on the other side of the barrier". To cover the pragmatics of the argument, one would also need to add: "and agreeing that this amount of effort is possible can even be safe, so long as everyone who heard of your agreement was sufficiently strongly motivated not to attempt shortcuts".)

Another, possibly overlapping reason would have to do with the meta level that people around here normally imagine approaching AI safety problems from -- that being, "don't even bother trying to invent all the required philosophy yourself; instead do your best to try to formalize how to mechanically refer to the process that generated, and could continue to generate, something equivalent to the necessary philosophy, so as to make that process happen better or at least to maximally stay out of its way" ("even if this formalization turns out to be very hard to do, as the alternatives are even worse"). That meta level might be one that he doesn't really think of as even being possible. One possible reason for this would be that he weren't aware that anyone actually ever meant to refer to a meta level that high, so that he never developed a separate concept for it. Perhaps when he first encountered e.g. Eliezer's account of the AI safety philosophy/engineering problem, the concept he came away with was based on a filled-in assumption about the default mistake that Eliezer must have made and the consequent meta level at which Eliezer meant to propose that the problem should be attacked, and that meta level was far too low for success to be conceivable, and he didn't afterwards ever spontaneously find any reason to suppose you or Eliezer might not have made that mistake. Another possible reason would be that he disbelieved, on the above-mentioned a priori grounds, that the proposed meta level was possible at all. (Or, at least, that it could ever be safe to believe that it were possible, given the horrors perpetrated and threatened by other people who were comparably confident in their reasons for believing similar things.)

Comment by steve_rayhawk on [LINK] Nick Szabo: Beware Pascal's Scams · 2012-07-21T06:05:52.339Z · score: 2 (2 votes) · LW · GW

And who will choose the choosers? No sentient entity at all -- they'll be chosen they way they are today, by a wide variety of markets, except that there too the variety will be far greater.

Such markets and technologies are already far beyond the ability of any single human to comprehend[. . .]

Can you expand on this? The way you say it suggests that it might be your core objection to the thesis of economically explosive strong AI. -- put into words, the way the emotional charge would hook into the argument here would be: "Such a strong AI would have to be at least as smart as the market, and yet it would have been designed by humans, which would mean there had to be a human at least as smart as the market: and belief in this possibility is always hubris, and is characteristically disastrous for its bearer -- something you always want to be on the opposite side of an argument from"? (Where "smart" here is meant to express something metaphorically similar to a proof system's strength: "the system successfully uses unknowably diverse strategies that a lesser system would either never think to invent or never correctly decide how much to trust".)

I guess, for this explanation to work, it also has to be your core objection to Friendly AI as a mitigation strategy: "No human-conceived AI architecture can subsume or substitue for all the lines of innovation that the future of the economy should produce, much less control such an economy to preserve any predicate relating to human values. Any preservation we are going to get is going to have to be built incrementally from empirical experience with incremental software economic threats to those values, each of which we will necessarily be able to overcome if there had ever been any hope for humankind to begin with; and it would be hubris, and throwing away any true hope we have, to cling to a chimerical hope of anything less partial, uncertain, or temporary."

Comment by steve_rayhawk on [LINK] Nick Szabo: Beware Pascal's Scams · 2012-07-21T05:10:22.713Z · score: 1 (1 votes) · LW · GW

(Note that the Uncertain Future software is mostly supposed to be a conceptual demonstration; as mentioned in the accompanying conference paper, a better probabilistic forecasting guide would take historical observations and uncertainty about constant underlying factors into account more directly, with Bayesian model structure. The most important part of this would be stochastic differential equation model components that could account for both parameter and state uncertainty in nonlinear models of future economic development from past observations, especially of technology performance curves and learning curves. Robin Hanson's analysis of the random properties of technological growth modes has something of a similar spirit.)

Comment by steve_rayhawk on [LINK] Nick Szabo: Beware Pascal's Scams · 2012-07-21T04:29:16.159Z · score: 8 (8 votes) · LW · GW

It can be summarized as follows: for basic reasons of economics and computer science, specialized algorithms are generally far superior to general ones.

It would be better to present, as your main reason, "the kinds of general algorithms that humans are likely to develop and implement, even absent impediments caused by AI-existential risk activism, will almost certainly be far inferior to specialized ones". That there exist general-purpose algorithms which subsume the competitive abilities of all existing human-engineered special-purpose algorithms, given sufficient advantages of scale in number of problem domains, is trivial by the existence proof constituted by the human economy.

Put another way: There is some currently-unsubstitutable aspect of the economy which is contained strictly within human cognition and communication. Consider the case where the intellectual difficulties involved in understanding the essence of this unsubstitutable function were overcome, and it were implemented in silico, with an initial level of self-engineering insight already equal to that which was used to create it, and with starting capital and education sufficient to overcome transient learning-curve effects on its initial success. There would then be some fraction of the economy directed by the newly engineered process. Would this fraction of the economy inevitably be at a net competitive advantage, or disadvantage, relative to the fraction of the economy which was directed by humans?

If that fraction of the economy would have an advantage, then this would be an example of a general algorithm ultimately superior to all contemporarily-available specialized algorithms. In that case, what you claim to be the core of your argument would be defeated; the strength of your argument would instead have to come from a focus on the reasons why it were improbable that anyone had a relevant chance of ever achieving this kind of software substitute for human strategy and insight (that is, before everyone else was adequately prepared for it to prevent catastrophe), and that even to the point that supposing otherwise deserves to be tarred with a label of "scam". And if the software-directed economy would have a disadvantage even at steady state, then this would be a peculiar fact about software and computing machinery relative to neural states and brains, and it could not be assumed without argument. Digital software and computing machinery both have properties that have made them, in most respects, much more tractable to large returns to scale from purposeful re-engineering for higher performance than neural states and brains, and this is likely to continue to be true into the future.

Comment by steve_rayhawk on An Intuitive Explanation of Solomonoff Induction · 2012-07-10T16:44:10.025Z · score: 4 (4 votes) · LW · GW

The subtleties I first had in mind were the ones that should have (but didn't) come up in the original earlier dicussion of MWI, having to do with the different numbers of bits in different parts of an observation-predicting program based on a physical theory, and which of those parts should have their bits be charged against the prior or likelihood of the physical theory itself and which of the parts should have their bits be taken for granted as intrinsic parts of the anthropic reasoning that any agent would need to be capable of (even if some physical theories didn't use part of that anthropic reasoning "software").

(I didn't specify what the subtleties were, and you seem to have picked a reading of which subtleties I must have been referring to and what I must have meant by "resolve" that together made what I was saying make no sense at all. This might be another example of the proposed tendency of "not looking very hard to see whether other people could have reasonably supposed" etc. (whether or not other people in the same reference class from your point of view as me weren't signaling that they understood the point either).)

Comment by steve_rayhawk on An Intuitive Explanation of Solomonoff Induction · 2012-07-10T16:34:37.905Z · score: 1 (1 votes) · LW · GW

I'm sorry; I was referring to what I had perceived as a general pattern, from seeing snippets of discussions involving you while I was lurking off-and-on. The "pre-emptive" was meant to refer to within single exchanges, not to refer all the way back to (in this case) the original discussion about MWI (which I'm still hunting down). Now that I look more closely at your history, this has only been at all frequent within the past few months.

I don't have any specific recollection of you from before that comment on the "detecting rationalization" post, but looking back through your comment history of that period, I'm mystified by what happened there too. It's possible that someone thought you were Thomas covertly giving a self-justifying speech about a random red-herring explanation he'd invented to tell himself for why other people disagreed with him, and they wished to discourage him from thinking other people agreed with that explanation.

Comment by steve_rayhawk on An Intuitive Explanation of Solomonoff Induction · 2012-07-10T13:57:22.429Z · score: 0 (2 votes) · LW · GW

You discussed this over here too, with greater hostility:

also someone somehow thinks that Solomonoff induction finds probabilities for theories, while it just assigns 2^-length as probability for software code of such length, which is obviously absurd when applied to anything but brute force generated shortest pieces of code,

I'm trying to figure out whether, when you say "someone", you mean someone upthread or one of the original authors of the post. Because if it's the post authors, then I get to accuse you of not caring enough about refraining from heaping abuse on writers who don't deserve it to actually bother to check whether they made the mistake you mocked them for. The post includes discussion of ideals vs. approximations in the recipe metaphor and in the paragraph starting "The actual process above might seem a little underwhelming...":

We just check every single hypothesis? Really? Isn’t that a little mindless and inefficient? This will certainly not be how the first true AI operates. But don’t forget that before this, nobody had any idea how to do ideal induction, even in principle. Developing fundamental theories, like quantum mechanics, might seem abstract and wasteful. [...] such theories and models change the world, as quantum mechanics did with modern electronics. In the future, [...] ways to approximate Solomonoff induction [...] Perhaps they will develop methods to eliminate large numbers of hypotheses all at once. Maybe hypotheses will be broken into distinct classes. Or maybe they’ll use methods to statistically converge toward the right hypotheses.

Everyone knows there are practical difficulties. Some people expect that someone will almost certainly find a way to mitigate them. You tend to treat people as idiots for going ahead with the version of the discussion that presumes that.

With respect to your objections about undiscoverable short programs, presumably people would look for ways to calibrate the relationship between the description lengths of the theories their procedures do discover (or the lengths of the parts of those theories) and the typical probabilities of observations, perhaps using toy domains where everything is computable and Solomonoff probabilities can be computed exactly. Of course, by diagonalization there are limits to the power of any finite such procedure, but it's possible to do better than assuming that the only programs are the ones you can find explicitly. Do you mean to make an implicit claim by your mockery which special-cases to the proposition that what makes the post authors deserve to be mocked is that they didn't point out ways one might reasonably hope to get around this problem, specifically? (Or, alternatively, is the cause of your engaging in mockery something other than a motivation to express propositions (implicit or explicit) that readers might be benefited by believing? For example, is it a carelessly executed side-effect of a drive to hold your own thinking to high standards, and so to avoid looking for ways to make excuses for intellectual constructs you might be biased in favor of?)

Comment by steve_rayhawk on An Intuitive Explanation of Solomonoff Induction · 2012-07-10T09:24:50.514Z · score: 17 (17 votes) · LW · GW

He identifies subtleties, but doesn't look very hard to see whether other people could have reasonably supposed that the subtleties resolve in a different way than he thinks they "obviously" do. Then he starts pre-emptively campaigning viciously for contempt for everyone who draws a different conclusion than the one from his analysis. Very trigger-happy.

This needlessly pollutes discussion... that is to say, "needless" in the moral perspective of everyone who doesn't already believe that most people who first appear wrong by that criterion that way in fact are wrong, and negligently and effectively incorrigibly so, such that there'd be nothing to lose by loosing broadside salvos before the discussion has even really started. (Incidentally, it also disincentivizes the people who could actually explain the alternative treatment of the subtleties from engaging with him, by demonstrating a disinclination to bother to suppose that their position might be reasonble.) This perception of needlessness, together with the usual assumption that he must already be on some level aware of other peoples' belief in that needlessness but is disregarding that belief, is where most of the negative affect toward him comes from.

Also, his occasional previous lack of concern for solid English grammar didn't help the picture of him as not really caring about the possibility that the people he was talking to might not deserve the contempt for them that third parties would inevitably come away with the impression that he was signaling.

(I wish LW had more people who were capable of explaining their objections understandably like this, instead of being stuck with a tangle of social intuitions which they aren't capable of unpacking in any more sophisticated way than by hitting the "retaliate" button.)

Comment by steve_rayhawk on Thoughts and problems with Eliezer's measure of optimization power · 2012-06-08T22:06:21.904Z · score: 3 (3 votes) · LW · GW

A concept I've played with, coming off of Eliezer's initial take on the problem of formulating optimization power, is: Suppose something generated N options randomly and then chose the best. Given the observed choice, what is the likelihood function for N?

For continuously distributed utilities, this can be computed directly using beta distributions. Beta(N, 1) is the probability density for the highest of N uniformly distributed unit random numbers. This includes numbers which are cumulative probabilities for a continuous distribution at values drawn from that distribution, and therefore numbers which are cumulative probabilities at the goodness of an observed choice. (N doesn't have to be an integer, because beta distributions are defined for non-integer parameters.)

(The second step of this construction, where you attach a beta distribution to another distribution's CDF, I had to work out by myself; it's not directly mentioned in any discussions of extreme value statistics that I could find. The Mathworld page on order statistics, one step of generalization away, uses the formula for a beta CDF transformed by another CDF, but it still doesn't refer to beta distributions by name.)

If the utilities are discretely distributed, you have to integrate the beta density over the interval of cumulative probabilities that invert to the observed utility.

To handle choices of mixtures, I guess you could modify this slightly, and ask about the likelihood function for N given the observed outcome, marginalizing over (as User:Kindly also suggests) the (unobserved) choice of option. This requires a distribution over options and a conditional distribution over observations given options. This would also cover situations with composite options where you only observe one of the aspects of the chosen option.

Opposed optimization might be very crudely modeled by increasing the number on the opposite side of the beta distribution from N. Somewhat related to this is Warren Smith's "Fixed Point for Negamaxing Probability Distributions on Regular Trees", which examines the distributions of position values that result when two opponents take turns choosing the worst option for each other.

Alternatively, instead of a likelihood function for N, you could have a likelihood function for an exponential weighting λ on the expected utility of the option:

Pr(A was chosen)/Pr(B was chosen) ∝ exp(λ(U(A)-U(B))).

Higher values of λ would be hypotheses in which better options were more strongly randomly selected over worse ones. (This is something like a logit-response model, for which λ (or "β", or "1/μ") would be the "rationality" parameter. It might be more familiar as a Gibbs distribution.) But this would fail when the expected utility from the null distribution was heavy-tailed, because then for some λ≠0 the distribution of optimized expected utilities would be impossible to normalize. Better would be for the cumulative probability at the expected utility of the chosen option to be what was exponentially weighted by λ. In that case, in the limit λ = (N-1) >> 1, the two models give the same distribution.

All of these statistics, as well as Eliezer's original formulation, end up encoding equivalent information in the limit where the utility of each option is an independent sum of many identical light-tailed-distributed components and you're predicting a marginal distribution of utilities for one of those components. In this limit you can safely convert everything to a statistical mechanics paradigm and back again.

Of course, the real criterion for a good formulation of optimization power is whether it helps people who use it in an argument about things that might be optimizers, or who hear it used in such an argument, to come to truthful conclusions.

In this respect, likelihood functions can have the problem that most people won't want to use them: they're hard to compute with, or communicate, unless they belong to a low-dimensional family. The likelihood functions I suggested won't do that except under very simple conditions. I'm not sure what the best way would be to simplify them to something lower-dimensional. I guess you could just communicate a maximum-likelihood estimate and precision for the optimization power parameter. Or, if you chose a reference prior over optimization power, you could communicate its posterior mean and variance.

All of this presupposes that the problems with the unoptimized probability measure can be dealt with. Maybe it would work better to describe the optimization power of a system in terms of a series of levels of simpler systems leading up to that system, where each level's new amount of optimization was only characterized approximately, and only relative to something like a distribution of outputs from the previous level. (This would at least patch the problem where, if thermodynamics is somehow involved in the results of an action, that action can count as very powerful relative to the uniform measure over the system's microstates.) If optimizer B is sort of like choosing the best of N options generated by optimizer A, and optimizer C is sort of like choosing the best of M options generated by optimizer B, that might not have to mean that optimizer C is much like choosing the best of N*M options generated by optimizer A.

Comment by steve_rayhawk on AI Risk and Opportunity: A Strategic Analysis · 2012-03-07T19:22:58.718Z · score: 9 (11 votes) · LW · GW

That said, I think his fear of culpability (for being potentially passively involved in an existential catastrophe) is very real. I suspect he is continually driven, at a level beneath what anyone's remonstrations could easily affect, to try anything that might somehow succeed in removing all the culpability from him. This would be a double negative form of "something to protect": "something to not be culpable for failure to protect".

If this is true, then if you try to make him feel culpability for his communication acts as usual, this will only make his fear stronger and make him more desperate to find a way out, and make him even more willing to break normal conversational rules.

I don't think he has full introspective access to his decision calculus for how he should let his drive affect his communication practices or the resulting level of discourse. So his above explanations for why he argues the way he does are probably partly confabulated, to match an underlying constraining intuition of "whatever I did, it was less indefensible than the alternative".

(I feel like there has to be some kind of third alternative I'm missing here, that would derail the ongoing damage from this sort of desperate effort by him to compel someone or something to magically generate a way out for him. I think the underlying phenomenon is worth developing some insight into. Alex wouldn't be the only person with some amount of this kind of psychology going on -- just the most visible.)

Comment by steve_rayhawk on AI Risk and Opportunity: A Strategic Analysis · 2012-03-04T11:44:05.000Z · score: 9 (11 votes) · LW · GW

Currently you suspect that there are people, such as yourself, who have some chance of correctly judging whether arguments such as yours are correct, and of attempting to implement the implications if those arguments are correct, and of not implementing the implications if those arguments are not correct.

Do you think it would be possible to design an intelligence which could do this more reliably?

Comment by steve_rayhawk on People who "don't rationalize"? [Help Rationality Group figure it out] · 2012-03-04T09:49:51.753Z · score: 3 (3 votes) · LW · GW

I wish there was a more standard term for this than "kinesthetic thinking", that other people would be able to look up and understand what was meant.

(A related term is "motor cognition", but that doesn't denote a thinking style. Motor cognition is a theoretical paradigm in cognitive psychology, according to which most cognition is a kind of higher-order motor control/planning activity, connected in a continuous hierarchy with conventional concrete motor control and based on the same method of neural implementation. (See also: precuneus (reflective cognition?); compare perceptual control theory.) Another problem with the term "motor cognition" is that it doesn't convey the important nuance of "higher-order motor planning except without necessarily any concurrent processing of any represented concrete motions". (And the other would-be closest option, "kinesthetic learning", actively denotes the opposite.)

Plausibly, people could be trained to introspectively attend to the aspect of cognition which was like motor planning with a combination of TCMS, to inhibit visual and auditory imagery, and cognitive tasks which involved salient constraints and tradeoffs. Maybe the cognitive tasks would also need to have specific positive or negative consequences for apparent execution of recognizable scripts of sequential actions typical of normally learned plans for the task. Some natural tasks, which are not intrinsically verbal or visual, with some of these features would be social reasoning, mathematical proof planning, or software engineering.)

when I am thinking kinesthetically I basically never rationalize as such

I think kinesthetic thinking still has things like rationalization. For example, you might have to commit to regarding a certain planned action a certain way as part of a complex motivational gambit, with the side effect that you commit to pretend that the action will have some other expected value than the one you would normally assign. If this ability to make commitments that affect perceived expected value can be used well, then by default this ability is probably also being used badly.

Could you give more details about the things like rationalization that you were thinking of, and what it feels like deciding not to do them in kinesthetic thinking?

Comment by steve_rayhawk on Q&A with Abram Demski on risks from AI · 2012-01-19T07:27:49.522Z · score: 6 (6 votes) · LW · GW

If the human-level AGI

0) is autonomous (has, or forms, long-term goals)
1) is not socialized

#1 is important because a self-modifying system will tend to respond to negative reinforcement concerning sociopathic behaviors resulting from #3-- though, it must be admitted, this will depend on how deeply the ability to self-modify runs. Not all architectures will be capable of effectively modifying their goals in response to social pressures. (In fact, rigid goal-structure under self-modification will usually be seen as an important design-point.)

Abram: Could you make this more precise?

From the way you used the concept of "negative reinforcement", it sounds like you have a particular family of agent architectures in mind, which is constrained enough that we can predict that the agent will make human-like generalizations about a reward-relevant boundary between "sociopathic" behaviors and "socialized" ones. It also sounds like you have a particular class of possible emergent socially-enforced concepts of "sociopathic" in mind, which is constrained enough that we can predict that the behaviors permitted by that sociopathy concept wouldn't still be an existential catastrophe from our point of view. But you haven't said enough to narrow things down that much.

For example, you could have two initially sociopathic agents, who successfully manipulate each other into committing all their joint resources to a single composite agent having a weighted combination of their previous utility functions. The parts of this combined agent would be completely trustworthy to each other, so that the agents could be said to have modified their goals in response to the "social pressures" of their two-member society. But the overall agent could still be perfectly sociopathic toward other agents who were not powerful enough to manipulate it into making similar concessions.

Comment by steve_rayhawk on Q&A with experts on risks from AI #2 · 2012-01-10T13:22:11.091Z · score: 1 (1 votes) · LW · GW

In fact, I'd prefer it if Q8 started out with the less-shibbolethy "How much have you read about, or used the concepts of..." or something like that, which replaces a dichotomy with a continuum.

Yeah... I wanted to make the suggested question less loaded, but it would have required more words, and I was unthinkingly preoccupied with worry about a limit on the permitted complexity of a single-sentence question. Maybe I should have split the question across more sentences.

The signaling uses of Q8 seem like a bad idea to me, although it seems a worthwhile thing to ask for Steve Rayhawk's reasons.

My reasons for suggesting Q8 were mostly:

  • First, I wanted to make it easier to narrow down hypotheses about the relationship between respondents' opinions about AI risk and their awareness of progress toward formal, machine-representable concepts of optimal AI design (also including, I guess, progress toward practically efficient mechanized application of those concepts, as in Schmidhuber's Speed Prior and AIXI-tl).

  • Second, I was imagining that many respondents would be AI practitioners who thought mostly in terms of architectures with a machine-learning flavor. Those architectures usually have a very specific and limited structure in their hypothesis space or policy space by construction, such that it would be clearly silly to imagine a system with such an architecture self-representing or self-improving. These researchers might have a conceptual myopia by which they imagine "progress in AI" to mean only "creation of more refined machine-learning-style architectures", of a sort which of course wouldn't lead towards passing a threshold of capability of self-improvement anytime soon. I wanted to put in something of a conceptual speed bump to that kind of thinking, to reduce unthinking dismissiveness in the answers, and counter part of the polarizing/consistency effects that merely receiving and thinking about answering the survey might have on the recipients' opinions. (Of course, if this had been a survey which were meant to be scientific and formally reportable, it would be desirable for the presence of such a potentially leading question to be an experimentally controlled variable.)

With those reasons on the table, someone else might be able to come up with a question that fulfills them better. I also agree with paulfchristiano's comment.