davidmanheim feed - LessWrong 2.0 Reader davidmanheim’s posts and comments on the Effective Altruism Forum en-us Comment by Davidmanheim on Probability space has 2 metrics https://www.lesswrong.com/posts/63TcB9j7gv8Ampz8u/probability-space-has-2-metrics#CwNsHxyfnfGv74Bmr <p>Yes - and this is equivalent to saying that evidence about probability provides Bayesian metric evidence - you need to transform it.</p> davidmanheim CwNsHxyfnfGv74Bmr 2019-02-14T10:38:15.499Z Comment by Davidmanheim on The RAIN Framework for Informational Effectiveness https://www.lesswrong.com/posts/s4TrCbCXvvWfkT2o6/the-rain-framework-for-informational-effectiveness#2qW8z7ZAZ5uKnhEZF <p>Minor comment/correction - VoI isn&#x27;t necessarily linked to a single decision, but the way it is typically defined in introductory works, it implicit that it is limited to one decision. This is mostly because (as I found out when trying to build more generalized VoI models for my dissertation,) it&#x27;s usually quickly intractable for multiple decisions.</p> davidmanheim 2qW8z7ZAZ5uKnhEZF 2019-02-14T10:30:12.536Z Comment by Davidmanheim on Why we need a *theory* of human values https://www.lesswrong.com/posts/zvrZi95EHqJPxdgps/why-we-need-a-theory-of-human-values#BXF6ckNYpgQSm9FL3 <p>I agree, and think work in the area is valuable, but would still argue that unless we expect a correct and coherent answer, any single approach is going to be less effective than an average of (contradictory, somewhat unclear) different models.</p><p>As an analogue, I think that effort into improving individual prediction accuracy and calibration is valuable, but for most estimation questions, I&#x27;d bet on an average of 50 untrained idiots over any single superforecaster.</p> davidmanheim BXF6ckNYpgQSm9FL3 2019-02-14T10:25:00.352Z Comment by Davidmanheim on Spaghetti Towers https://www.lesswrong.com/posts/NQgWL7tvAPgN2LTLn/spaghetti-towers#5g3R2S2wSwqhcs2nc <p>Having looked into this, it&#x27;s partly that, but mostly that tax codes are written in legalese. A simple options contract for a call, which can easily be described in 10 lines of code, or a one-line equation. But the legal terms are actually this 188 page pamplet; https://www.theocc.com/components/docs/riskstoc.pdf which is (technically but not enforced to be a) legally required reading for anyone who wants to purchase an exchange traded option. And don&#x27;t worry - it explicitly notes that it doesn&#x27;t cover the actual laws governing options, for which you need to read the relevant US code, or the way in which the markets for trading them work, or any of the risks.</p> davidmanheim 5g3R2S2wSwqhcs2nc 2019-02-14T10:20:18.739Z Comment by Davidmanheim on How much can value learning be disentangled? https://www.lesswrong.com/posts/Q7WiHdSSShkNsgDpa/how-much-can-value-learning-be-disentangled#uogDB5cZXpYKX94Wp <p>re: #2, VoI doesn&#x27;t need to be constrained to be positive. If in expectation you think the information will have a net negative impact, you shouldn&#x27;t get the information.</p><p>re: #3, of course VoI is subjective. It MUST be, because value is subjective. Spending 5 minutes to learn about the contents of a box you can buy is obviously more valuable to you than to me. Similarly, if I like chocolate more than you, finding out if a cake has chocolate is more valuable for me than for you. The information is the same, the value differs.</p> davidmanheim uogDB5cZXpYKX94Wp 2019-02-11T08:42:46.557Z Comment by Davidmanheim on How much can value learning be disentangled? https://www.lesswrong.com/posts/Q7WiHdSSShkNsgDpa/how-much-can-value-learning-be-disentangled#ChWLRLfvo46CgujxB <blockquote> This matters because if the Less Wrong view of the world is correct, it&#x27;s more likely that there are clean mathematical algorithms for thinking about and sharing truth that are value-neutral (or at least value-orthogonal, e.g. &quot;aim to share facts that the student will think are maximally interesting or surprising&quot;. </blockquote><p>I don&#x27;t think this is correct - it misses the key map-territory distinction in the human mind. Even though there is &quot;truth&quot; in an objective sense, there is no necessity that the human mind can think about or share that truth. Obviously we can say that experientially we have something in our heads that correlates with reality, but that doesn&#x27;t imply that we can think about truth without implicating values. It also says nothing about whether we can discuss truth without manipulating the brain to represent things differently - and all imperfect approximations require trade-offs. If you want to train the brain to do X, you&#x27;re implicitly prioritizing some aspect of the brain&#x27;s approximation of reality over others.</p> davidmanheim ChWLRLfvo46CgujxB 2019-02-11T08:38:43.716Z Comment by Davidmanheim on Why we need a *theory* of human values https://www.lesswrong.com/posts/zvrZi95EHqJPxdgps/why-we-need-a-theory-of-human-values#gFtZhoSLZDAJx6iHi <p>Maybe I&#x27;m reading your post wrong, but it seems that you&#x27;re assuming that a coherent approach is needed in a way that could be counter-productive. I think that a model of an individual&#x27;s preferences is likely to be better represented by taking multiple approaches, where each fails differently. I&#x27;d think that a method that extends or uses revealed preferences would have advantages and disadvantages that none of, say, stated preferences, TD Learning, CEV, or indirect normativity share, and the same would be true for each of that list. I think that we want that type of robust multi-model approach as part of the way we mitigate over-optimization failures, and to limit our downside from model specification errors.</p><p>(I also think that we might be better off building AI to evaluate actions on the basis of some moral congress approach using differently elicited preferences across multiple groups, and where decisions need a super-majority of some sort as a hedge against over-optimization of an incompletely specified version of morality. But it may be over-restrictive, and not allow any actions - so it&#x27;s a weakly held theory, and I haven&#x27;t discussed it with anyone.)</p> davidmanheim gFtZhoSLZDAJx6iHi 2019-02-10T12:21:46.075Z Comment by Davidmanheim on How does Gradient Descent Interact with Goodhart? https://www.lesswrong.com/posts/pcomQ4Fwi7FnfBZBR/how-does-gradient-descent-interact-with-goodhart#fgfufENLh24q8uvpD <p>Having tried to play with this, I&#x27;ll strongly agree that random functions on R^N aren&#x27;t a good place to start. But I&#x27;ve simulated random nodes in the middle of a causal DAG, or selecting ones for high correlation, and realized that they aren&#x27;t particularly useful either; people have some appreciation of causal structure, and they aren&#x27;t picking metrics randomly for high correlation - they are simply making mistakes in their causal reasoning, or missing potential ways that the metric can be intercepted. (But I was looking for specific things about how the failures manifested, and I was not thinking about gradient descent, so maybe I&#x27;m missing your point.)</p> davidmanheim fgfufENLh24q8uvpD 2019-02-03T19:28:21.359Z Comment by Davidmanheim on Fixed Point Discussion https://www.lesswrong.com/posts/mvqmY9MQ3qf88xRuM/fixed-point-discussion#2f5DBmDjQnvFqXtoc <p>&quot;(Each the same size as the original.)&quot;</p><p>I was not expecting to laugh reading this. Well done - I just wish I hadn&#x27;t been in the middle of drinking my coffee.</p> davidmanheim 2f5DBmDjQnvFqXtoc 2018-12-23T08:27:26.733Z Comment by Davidmanheim on Systems Engineering and the META Program https://www.lesswrong.com/posts/q25bajee6DeL9wFqm/systems-engineering-and-the-meta-program#KxqgWMPmpjTHpqQtb <p>Good finds!</p><p>I think they are headed in the right direction, but I&#x27;m skeptical of the usefulness their work on complexity. The metrics ignore computational complexity of the model, and assume all the variance is modeled based on sources like historical data and expert opinion. It&#x27;s also not at all useful unless we can fully characterize the components of the system, which isn&#x27;t usually viable.</p><p>It also seems to ignore the (in my mind critical) difference between &quot;we know this is evenly distributed in the range 0-1&quot; and &quot;we have no idea what the distribution of this is over the space 0-1.&quot; But I may be asking for too much in a complexity metric.</p> davidmanheim KxqgWMPmpjTHpqQtb 2018-12-23T06:59:38.695Z Comment by Davidmanheim on The Vulnerable World Hypothesis (by Bostrom) https://www.lesswrong.com/posts/Tx6dGzYLtfzzkuGtF/the-vulnerable-world-hypothesis-by-bostrom#tcuFzgeazdPEW3fmE <p>I discuss a different reformulation in my new paper, &quot;<strong><a href="https://philpapers.org/go.pl?id=MANSFA-3&proxyId=&u=https%3A%2F%2Fphilpapers.org%2Farchive%2FMANSFA-3.pdf">Systemic Fragility as a Vulnerable World</a>&quot; </strong>casting this as an explore/exploit tradeoff in a complex space. In the paper, I explicitly discuss the way in which certain subspaces can be safe or beneficial.</p><p>&quot;The push to discover new technologies despite risk can be understood as an explore/exploit tradeoff in a potentially dangerous environment. At each stage, the explore action searches the landscape for new technologies, with some probability of a fatal result, and some probability of discovering a highly rewarding new option. The implicit goal in a broad sense is to find a search strategy that maximize humanity&#x27;s cosmic endowment - neither so risk-averse that advanced technologies are never explored or developed, nor so risk-accepting that Bostrom&#x27;s postulated Vulnerable World becomes inevitable. Either of these risks astronomical waste. However, until and unless the distribution of black balls in Bostrom&#x27;s technological urn is understood, we cannot specify an optimal strategy. The first critical question addressed by Bostrom - &#x60;&#x60;Is there a black ball in the urn of possible inventions?&#x27;&#x27; is, to reframe the question, about the existence of negative singularities in the fitness landscape.&quot;</p> davidmanheim tcuFzgeazdPEW3fmE 2018-12-16T08:07:31.230Z Comment by Davidmanheim on The Vulnerable World Hypothesis (by Bostrom) https://www.lesswrong.com/posts/Tx6dGzYLtfzzkuGtF/the-vulnerable-world-hypothesis-by-bostrom#JSsDetyp6MkzM8wNn <p>As an extension of Bostrom&#x27;s ideas, I have written a draft entitled &quot; <strong><a href="https://philpapers.org/rec/MANSFA-3">Systemic Fragility as a Vulnerable World</a></strong> &quot; where I introduce the &quot;Fragile World Hypothesis.&quot;</p><p>Abstract:</p><p> The possibility of social and technological collapse has been the focus of science fiction tropes for decades, but more recent focus has been on specific sources of existential and global catastrophic risk. Because these scenarios are simple to understand and envision, they receive more attention than risks due to complex interplay of failures, or risks that cannot be clearly specified. In this paper, we discuss a new hypothesis that complexity of a certain type can itself function as a source of risk. This ”Fragile World Hypothesis” is compared to Bostroms ”Vulnerable World Hypothesis”, and the assumptions and potential mitigations are contrasted. </p> davidmanheim JSsDetyp6MkzM8wNn 2018-12-16T08:03:22.644Z Comment by Davidmanheim on How Old is Smallpox? https://www.lesswrong.com/posts/8EqTiMPbadFRqYHqp/how-old-is-smallpox#e2o7Dj2yRfqKQdDCM <p>Yes.</p><p>But to clarify, I don&#x27;t think the Antonine plague is quite the same as modern ones, for the simple reason that it could only spread over a fairly limited geographic region, and it could not become endemic because of population density constraints. Smallpox evolution is driven by selection pressure in humans, and the &quot;500 years old&quot; claim is about that evolution, not about whether it affected humans at any time in the past. That said, it absolutely matters, because if the original source of smallpox was only 500 years ago, where did it come from?</p><p>The question is how smallpox evolved, and what variant was present prior to the 1500s. It&#x27;s plausible that Horsepox, which was probably the source for the vaccine strain, or Cowpox, spread via intermediate infections in cats, were the source - but these are phylogenetically distant enough that, from my limited understanding, it&#x27;s clearly implausible that it first infected humans and turned into modern smallpox at recently as the 1500s. (But perhaps this is exactly the claim of the paper. I&#x27;m unclear.) Instead, my understanding is that there must have been some other conduit, and it seems very likely that it&#x27;s related to a historically much earlier human pox virus - thousands of years, not hundreds.</p> davidmanheim e2o7Dj2yRfqKQdDCM 2018-12-13T09:56:20.636Z Comment by Davidmanheim on How Old is Smallpox? https://www.lesswrong.com/posts/8EqTiMPbadFRqYHqp/how-old-is-smallpox#v5hG5JXiGoPrK86XD <p> I&#x27;m definitely not the best person to explain this, since I&#x27;m more on the epidemiology side. I understand the molecular clock analyses a bit, and they involve mutation rates plus tracking mutations in different variants, and figuring out how long it should take for the various samples collected at different times to have diverged, and what their common ancestors are.</p> davidmanheim v5hG5JXiGoPrK86XD 2018-12-13T07:47:02.733Z Comment by Davidmanheim on Should ethicists be inside or outside a profession? https://www.lesswrong.com/posts/LRKXuxLrnxx3nSESv/should-ethicists-be-inside-or-outside-a-profession#L9Nh6YLJs4AgLugmQ <p>Thank you! This is a point I keep trying to make, less eloquently, in both bioethics and in AI safety.</p><p>We need fewer talking heads making suggestions for how to regulate, and more input from actual experts, and more informed advice going to decision makers. If &quot;professional ethicists&quot; have any role, it should be elicitation, attempting to reconcile or delineate different opinions, and translation of ethical opinions of experts into norms and policies.</p> davidmanheim L9Nh6YLJs4AgLugmQ 2018-12-13T07:43:12.449Z Comment by Davidmanheim on Multi-agent predictive minds and AI alignment https://www.lesswrong.com/posts/3fkBWpE4f9nYbdf7E/multi-agent-predictive-minds-and-ai-alignment#KS4qmJJgCWYWDvgtM <p>I have several short comments about part 3, short not because there is little to say, but because I want to make the points and do not have time to discuss them in depth right now.</p><p>1) If multi-agent systems are more likely to succeed in achieving GAI, we should shut up about why they are important. I&#x27;m concerned about unilateralist curse, and would ask that someone from MIRI weigh in on this.</p><p>2) I agree that multi-agent systems are critical, but for different (non-contradictory) reasons - I think multi-agent systems are likely to be less safe and harder to understand. See draft of my forthcoming article here: https://arxiv.org/abs/1810.10862</p><p>3) If this is deemed to be important, the technical research directions point to here are under-specified and too vague to be carried out. I think concretizing them would be useful. (I&#x27;d love to chat about this, as I have ideas in this vein. If you are interested in talking, feel free to be in touch - about.me/davidmanheim .)</p> davidmanheim KS4qmJJgCWYWDvgtM 2018-12-13T06:13:22.172Z Comment by Davidmanheim on How Old is Smallpox? https://www.lesswrong.com/posts/8EqTiMPbadFRqYHqp/how-old-is-smallpox#hzgErcFDHc48fe5fR <p>There is genetic evidence discussed in Hopkins&#x27; &quot;Princes and Peasants: Smallpox in History,&quot; which implies ancient existence of variola viruses, as you note from the Wiki article. The newer paper overstates the case in typical academic fashion in order to sound as noteworthy as possible. The issue with saying that earlier emergence is not the &quot;current&quot; disease of smallpox is that we expect significant evolution to occur once there is sufficient population density, and more once there is selection pressure due to vaccination, and so it is very unsurprising that there are more recent changes. (I discuss this in my most recent paper, https://www.liebertpub.com/doi/pdf/10.1089/hs.2018.0039 )</p><p>It&#x27;s very clear that a precursor disease existed in humans for quite a while. It&#x27;s also very clear that these outbreaks in thin populations would have continued spreading, so I&#x27;m unconvinced that the supposed evidence of lack due to Hippocrate&#x27;s omission, and the lack of discussion in the old and new testament is meaningful. And regarding the old testament, at least, the books aren&#x27;t great with describing &quot;plagues&quot; in detail, and there are plenty of times we hear about some unspecified type of plague or malady as divine punishment.</p><p>So the answer depends on definitions. It&#x27;s unclear that there is anything like a smallpox epidemic as the disease currently occurs in a population that is not concentrated enough for significant person-to-person spread. If that&#x27;s required, we have no really ancient diseases, because we defined them away.</p> davidmanheim hzgErcFDHc48fe5fR 2018-12-11T08:19:35.668Z Comment by Davidmanheim on Is Science Slowing Down? https://www.lesswrong.com/posts/v7c47vjta3mavY3QC/is-science-slowing-down#NCTCGCwEKBeSLgNca <p>The model implies that if funding and prestige increased, this limitation would be reduced. And I would think we don&#x27;t need prestige nearly as much as funding - even if near-top scientists were recruited and paid the way second and third string major league players in most professional sports were paid, we&#x27;d see a significant relaxation of the constraint.</p><p>Instead, the uniform wage for most professors means that even the very top people benefit from supplementing their pay with consulting, running companies on the side, giving popular lectures for money, etc. - all of which compete for time with their research.</p> davidmanheim NCTCGCwEKBeSLgNca 2018-12-03T12:34:40.065Z Comment by Davidmanheim on Is Science Slowing Down? https://www.lesswrong.com/posts/v7c47vjta3mavY3QC/is-science-slowing-down#jrvixPiFT9EcrbxRP <p>Yes, this might help somewhat, but there is an overhead / deduplication tradeoff that is unavoidable. </p><p>I discussed these dynamics in detail (i.e. at great length) on Ribbonfarm <a href="https://www.ribbonfarm.com/2016/03/17/go-corporate-or-go-home/">here</a>.</p><p>The large team benefit would explain why most innovation happens near hubs / at the leading edge companies and universities, but that is explained by the other theories as well.</p> davidmanheim jrvixPiFT9EcrbxRP 2018-11-30T08:34:50.420Z Comment by Davidmanheim on Is Science Slowing Down? https://www.lesswrong.com/posts/v7c47vjta3mavY3QC/is-science-slowing-down#ehRWMXz2Y6yAtbQsa <p>The problem with fracturing is that you lose coordination and increase duplication.</p><p>I have a more general piece that discusses scaling costs and structure for companies that I think applies here as well - https://www.ribbonfarm.com/2016/03/17/go-corporate-or-go-home/</p> davidmanheim ehRWMXz2Y6yAtbQsa 2018-11-29T13:23:34.372Z Comment by Davidmanheim on Is Science Slowing Down? https://www.lesswrong.com/posts/v7c47vjta3mavY3QC/is-science-slowing-down#ZJtWQNidrvyKWtLTb <p>This seems to omit a critical and expected limitation as a process scales up in the number of people involved - communication and coordination overhead.</p><p>If there is low hanging fruit, but everyone is reaching for it simultaneously, then doubling the number of researchers won&#x27;t increase the progress more than very marginally. (People with slightly different capabilities implies that the expected time to success will be the minimum of different people.) But even that will be overwhelmed by the asymptotic costs for everyone to find out that the low-hanging fruit they are looking for has been picked!</p><p>Is there a reason not to think that this dynamic is enough to explain the observed slowdown - even without assuming hypothesis 3, of no more low-hanging fruit?</p> davidmanheim ZJtWQNidrvyKWtLTb 2018-11-29T11:07:58.476Z Comment by Davidmanheim on Oversight of Unsafe Systems via Dynamic Safety Envelopes https://www.lesswrong.com/posts/frMdaZGtpRmEe26Wu/oversight-of-unsafe-systems-via-dynamic-safety-envelopes#8dHvEkM2jTJtjZCoX <p>The paper is now live on Arxiv: https://arxiv.org/abs/1811.09246</p> davidmanheim 8dHvEkM2jTJtjZCoX 2018-11-26T17:14:55.077Z Comment by Davidmanheim on Values Weren't Complex, Once. https://www.lesswrong.com/posts/AMF5cRBjmMeeHB6Rf/values-weren-t-complex-once#AwLoPZJCTP56Lsotj <p>In part, I think the implication of zero-sum versus non-zero sum status is critical. Non-zero sum status is &quot;I&#x27;m the best left-handed minor league pitcher by allowed runs&quot; while zero-sum status is &quot;by total wealth/power, I&#x27;m 1,352,235,363rd in the world.&quot; Saying we only have on positional value for status seemingly assumes the zero-sum model.</p><p>The ability to admit these non-zero sum status signals has huge implications for whether we can fulfill values. If people can mostly find relatively high-position niches, the room for selection on noise and path-dependent value grows. </p><p>This also relates to TAG&#x27;s point about whether we care about &quot;value&quot; or &quot;moral value&quot; - and I&#x27;d suggest there might be moral value in fulfilling preferences only if they are not zero-sum positional ones.</p> davidmanheim AwLoPZJCTP56Lsotj 2018-11-26T13:28:34.779Z Comment by Davidmanheim on Values Weren't Complex, Once. https://www.lesswrong.com/posts/AMF5cRBjmMeeHB6Rf/values-weren-t-complex-once#zxpkbMo7Z5os9ohvh <p>This is a good point, but I&#x27;ll lay out the argument against it.</p><p>To start, I&#x27;m personally skeptical of the claim that preferences and moral values can be clearly distinguished, especially given the variety of value systems that people have preferred over time, or even today.</p><p>Even if this is false, we seem to see the same phenomenon occur with moral values. I think the example of obvious differences in the relative preference for saving dogs, the elderly, or criminals points to actual differences in values - but as I argued above, I think this is a heavily optimized subspace of a moral intuition towards liking life which is now largely selecting on noise. But the difference in moral conclusions that follow from assigning animal lives exactly zero versus smaller-than human but nonzero value are huge.</p> davidmanheim zxpkbMo7Z5os9ohvh 2018-11-26T13:21:30.643Z Comment by Davidmanheim on Values Weren't Complex, Once. https://www.lesswrong.com/posts/AMF5cRBjmMeeHB6Rf/values-weren-t-complex-once#Le3nxZ3Cby5WGn8ED <p>Yes, and that&#x27;s closely related to the point I made about &quot; we&#x27;re adaptation executioners, not fitness maximizers.&quot; </p><p>My point is a step further, I think - I&#x27;m asking what decides which things we plan to do? It&#x27;s obviously our &quot;preferences,&quot; but if we&#x27;ve already destroyed everything blue, the next priority is very underspecified.</p><p></p> davidmanheim Le3nxZ3Cby5WGn8ED 2018-11-25T12:37:00.221Z Comment by Davidmanheim on Values Weren't Complex, Once. https://www.lesswrong.com/posts/AMF5cRBjmMeeHB6Rf/values-weren-t-complex-once#cDzAgi3gtgDkm77eN <p>I agree that positional goods are important even in the extreme, but: </p><p>1) I don&#x27;t think that sexual desires or food preferences fit in this mold.</p><p>2) I don&#x27;t think that which things are selected as positional goods (perhaps other than wealth and political power) is dictated by anything other than noise and path dependence - the best tennis player, the best DOTA player, or the most cited researcher are all positional goods, and all can absorb arbitrary levels of effort, but the form they take and the relative prestige they get is based on noise.</p> davidmanheim cDzAgi3gtgDkm77eN 2018-11-25T12:33:19.943Z Values Weren't Complex, Once. https://www.lesswrong.com/posts/AMF5cRBjmMeeHB6Rf/values-weren-t-complex-once <p>The central argument of this post is that human values are only complex because all the obvious constraints and goals are easily fulfilled. The resulting post-optimization world is deeply confusing, and leads to noise as the primary driver of human values. This has worrying implications for any kind of world-optimizing. <em>(This isn&#x27;t a particularly new idea, but I am taking it a bit farther and/or in a different direction than <a href="https://www.lesswrong.com/posts/asmZvCPHcB4SkSCMW/the-tails-coming-apart-as-metaphor-for-life">this post</a> by <a href="https://www.lesswrong.com/users/yvain">Scott Alexander</a>, and I think it is worth making clear, given the previously noted connection to value alignment and effective altruism.)</em></p><p></p><p>First, it seems clear that formerly simple human values are now complex. &quot;Help and protect relatives, babies, and friends&quot; as a way to ensure group fitness and survival is mostly accomplished, so we find complex ethical dilemmas about the relative values of different behavior. &quot;Don&#x27;t hurt other people&quot; as a tool for ensuring reciprocity has turned into compassion for humanity, animals, and perhaps other forms of suffering. These are more complex than they could possibly have been expressed in the ancestral environment, given restricted resources. It&#x27;s worth looking at what changed, and how.</p><p>In the ancestral environment, humans had three basic desires; they wanted food, fighting, and fornication. Food is now relatively abundant, leading to people&#x27;s complex preferences about exactly which flavors they like most. These differ because the base drive for food is overoptimizing. Fighting was competition between people for resources - and since we all have plenty, this turns into status-seeking in ways that aren&#x27;t particularly meaningful outside of human social competition. The varieties of signalling and counter-signalling are the result. And fornication was originally for procreation, but we&#x27;re adaptation executioners, not fitness maximizers, so we&#x27;ve short-cutted that with birth control and pornography, leading to an explosion in seeking sexual variety and individual kinks. </p><p>Past the point where maximizing the function has a meaningful impact on the intended result, we see the <a href="https://www.lesswrong.com/posts/dC7mP5nSwvpL65Qu5/why-the-tails-come-apart">tails come apart</a>. The goal seeking of human nature, however, needs to find some direction to push the optimization process. The implication from this is that humanity finds diverging goals because they are past the point where the basic desires run out. As Randall Munroe points out in an XKCD Comic, this leads to <a href="https://www.xkcd.com/915/">increasingly complex and divergent preferences for ever less meaningful results</a>. And that comic would be funny if it weren&#x27;t a huge problem for aligning group decision making and avoiding longer term problems.</p><p>If this is correct, the key takeaway is that as humans find ever fewer things to need, they inevitably to find ever more things to disagree about. Even though we expect convergent goals related to dominating resources, narrowly implying that we want to increase the pool of resources to reduce conflict, human values might be divergent as the pool of such resources grows.</p> davidmanheim AMF5cRBjmMeeHB6Rf 2018-11-25T09:17:02.207Z Comment by Davidmanheim on On MIRI's new research directions https://www.lesswrong.com/posts/SZ2h2Yte9pDi95Hnf/on-miri-s-new-research-directions#3he7yHk2dMcRm8JuC <p>Yes, this very much resonates with me, especially because a parallel issue exists in biosecurity, where we don&#x27;t want to talk publicly about how to work to prevent things that we&#x27;re worried about because it could prompt bad actors to look into those things.</p><p>The issues here are different, but the need to have walls between what you think about and what you discuss imposes a real cost.</p> davidmanheim 3he7yHk2dMcRm8JuC 2018-11-23T08:42:08.999Z Oversight of Unsafe Systems via Dynamic Safety Envelopes https://www.lesswrong.com/posts/frMdaZGtpRmEe26Wu/oversight-of-unsafe-systems-via-dynamic-safety-envelopes <h2>Idea</h2><p>I had an idea for short-term, non-superhuman AI safety that I recently wrote up and ̶w̶i̶l̶l̶ ̶b̶e̶ ̶p̶o̶s̶t̶i̶n̶g̶ <a href="https://arxiv.org/abs/1811.09246">have now posted</a> on Arxiv. This post serves to introduce the idea, and request feedback from a more safety-oriented group than those that I would otherwise present the ideas to. </p><p>In short, the paper tries to adapt <a href="https://arxiv.org/abs/1708.06374">a paradigm that Mobileye has presented</a> for autonomous vehicle safety to a much more general setting. The paradigm is to have a &quot;safety envelope&quot; that is dictated by a separate algorithm than the policy algorithm for driving, setting speed- and distance- limits for the vehicle based on the position of vehicles around it. </p><p>For self-driving cares, this works well because there is a physics based model of the system that can be used to find an algorithmic envelope. In arbitrary other systems, it works less well, because we don&#x27;t have good fundamental models for what safe behavior means. For example, in financial markets there are &quot;circuit breakers&quot; that function as an opportunity for the system to take a break when something unexpected happens. The values for the circuit breakers are set via a simple heuristic that doesn&#x27;t relate to the dynamics of the system in question. I propose taking a middle path - dynamically learning a safety envelope.</p><p>In building separate models for safety and for policy, I think the system can address a different problem being discussed in military and other AI contexts, which is that &quot;Human-in-the-Loop&quot; is impossible for normal ML systems, since it slows the reaction time down to the level of human reactions. The proposed paradigm of a safety-envelope learning system can be meaningfully controlled by humans, because the adaptive time needed for the system can be slower than the policy system that makes the lower level decisions.</p><h2>Quick Q&amp;A</h2><p>1) How do we build heuristic safety envelopes in practice?</p><p>This depends on the system in question. I would be very interested in identifying domains where this class of solution could be implemented, either in toy models, or in full systems.</p><p>2) Why is this better than a system that optimizes for safety?</p><p>The issues with balancing optimization for goals versus optimization for safety can lead to perverse effects. If the system optimizing for safety is segregated, and the policy-engine is not given access to it, this should not occur. </p><p>This also allows the safety system to be built and monitored by a regulator, instead of by the owners of the system. In the case of Mobileye&#x27;s proposed system, a self-driving car could have the parameters of the safety envelope dictated by traffic authorities, instead of needing to rely on the car manufacturers to implement systems that drive safely as determined by those manufacturers.</p><p>3) Are there any obvious shortcoming to this approach?</p><p>Yes. This does not scale to human- or superhuman- general intelligence, because a system aware of the constraints can attempt to design policies for avoiding them. It is primarily intended to serve as a stop-gap measure to marginally improve the safety of near-term Machine Learning systems.</p> davidmanheim frMdaZGtpRmEe26Wu 2018-11-23T08:37:30.401Z Comment by Davidmanheim on Collaboration-by-Design versus Emergent Collaboration https://www.lesswrong.com/posts/ts3CKRDHgkBasAy4J/collaboration-by-design-versus-emergent-collaboration#3ihi6JGbdj24ByX3g <p>I don&#x27;t think humans have collaboration as a default - it&#x27;s only because evolution was due to social pressure that this occurs at all, and it occurs primarily at the social-structure level, not as an outcome of individual effort.</p><p>Even if this is wrong, however, non-GAI systems can pose existential risks.</p> davidmanheim 3ihi6JGbdj24ByX3g 2018-11-19T09:59:07.223Z Comment by Davidmanheim on Topological Fixed Point Exercises https://www.lesswrong.com/posts/svE3S6NKdPYoGepzq/topological-fixed-point-exercises#bE7H3DYBnj4apCDb7 <p>I&#x27;m stuck part-way through on #4 - I assume there is a way to do this without the exhaustive search I&#x27;m running into needing.</p><p class="spoiler">I&#x27;m going to try (nested) induction. Define triangles by side size, measured in nodes. <br/><br/>Induction base step: For n=2, there must be exactly one trichromatic edge.<br/><br/>Induction step: If there are an odd number of tri-chromatic edges for all triangles n=x, we must show that this implies the same for n=x+1.<br/><br/>We create all possible new triangles by adding x+1 nodes on one of the sides, then allow any of the previous x nodes on that side to change. Without loss of generality, assume we add x+1 edges to the bottom (non-red) side. These must be green or blue. The previous layer can now change any number of node-colors. We now must prove this by induction on color changes of nodes in the second-to-bottom layer to be red. (If they flip color otherwise, it is covered by a different base case.)<br/><br/>First, base step, assume no nodes change color. Because the previous triangle had an odd number of trichromatic edges, and the new edge is only green+blue, no new trichromatic edges were created.<br/><br/>Induction step: There is an x+1 triangle with an odd number of trichromatic vertices, and one node in the second-to-bottom layer changes to red. This can only create a new tri-cromatic triangle in one of the six adjacent triangles. We split this into (lots of) cases, and handle them one at a time. </p><p class="spoiler">(Now I get into WAY too many cases. I started and did most of the edge-node case, but it&#x27;s a huge pain. Is there some other way to do this, presumably using some nifty graph theory I don&#x27;t know, or will I need to list these out? Or should I not be using the nested induction step?)</p><p>Pointers welcome!</p> davidmanheim bE7H3DYBnj4apCDb7 2018-11-18T09:13:36.761Z Comment by Davidmanheim on Topological Fixed Point Exercises https://www.lesswrong.com/posts/svE3S6NKdPYoGepzq/topological-fixed-point-exercises#dhdhWwT6B5ixMBonJ <p>I am having trouble figuring out why #2 needs / benefits from Sperner&#x27;s Lemma. </p><p class="spoiler">But I keep going back to the proof that I&#x27;m comfortable with, which depends on connectedness, so I&#x27;m clearly missing an obvious alternative proof that doesn&#x27;t need topology.</p><p></p> davidmanheim dhdhWwT6B5ixMBonJ 2018-11-18T07:35:50.271Z Collaboration-by-Design versus Emergent Collaboration https://www.lesswrong.com/posts/ts3CKRDHgkBasAy4J/collaboration-by-design-versus-emergent-collaboration <h2>Introduction</h2><p>This seems to be a non-trivial problem for even current narrow AI, which is much more problematic for strong NAI, which I haven&#x27;t seen called out or named explicitly. I provide a quick literature review to explain why I think it&#x27;s ignored in classic multi-agent system design. (But I might be corrected)</p><p>It is unclear to me whether we can expect even introspective GAI to &quot;just solve it&quot; by noticing that it is a problem and working to fix it, given that people often don&#x27;t seem to manage it.</p><h2>The problem</h2><p>One challenge for safe AI is the intrinsic difficulty of coordination problems. This includes coordination with humans, coordination with other AI systems, and potentially self-coordination when AI uses multiple agents. Unfortunately, the typical system design intends to maximize some fitness function, not to coordinate in order to allow mutually beneficial interaction.</p><p>There is extensive literature on multi-agent coordination for task-based delegation and cooperation, dating back to at least the <a href="http://www.ensc.sfu.ca/research/idea/courses/files/Contract%20Net%20Protocol1.pdf">1980 Contract Net Interaction Protocol</a>, which allows autonomous agents to specify markets for interaction. This is useful, but doesn&#x27;t avoid any of the problems with market failures and inadequate equillibria. (In fact, it probably induces such failures, since individual contracts are the atomic unit or interaction.) Extensive follow-up work on <a href="https://ieeexplore.ieee.org/document/1470239">distributed consensus problems</a> assumes that all agents are built to achieve consensus. This may be important for AI coordination, but requires clearly defined communication channels and well-understood domains. Work on <a href="http://collaborative-intelligence.org/ciq.html">Collaborative Intelligence</a> is also intended to allow collaboration, but it is unclear that there is substantive ongoing work in that area. <a href="https://science.energy.gov/~/media/ascr/pdf/research/am/docs/Multiscale_math_workshop_3.pdf">Multiscale decision theory</a> attempts to build multi-scale models for decision making, but is not tied explicitly to multiple agents.</p><p>What most of the literature shares is an assumption that agents will be designed for cooperation and collaboration. Inducing collaboration in agents not explicitly designed for that task is a very different problem, as is finding coordinated goals that can be achieved.</p><h2>&quot;Solutions&quot;?</h2><p>The obvious solution is to expect multi-agent systems to have agents with models of other agents that are sophisticated enough to build strategies that allow collaboration. In situations where multiple equilibria exist, moving from pareto-dominated equilibria to better ones often requires coordination, which requires understanding that initially costly moves towards the better equilibrium will be matched by other players. As <a href="https://www.lesswrong.com/posts/dfZLLEfFvkrMwmiMw/multi-agent-overoptimization-and-embedded-agent-world-models">I argued earlier</a>, there are fundamental limitations on the models of embedded agents that we don&#x27;t have good solutions to. (If we find good ways to build embedded agents, we may also find good ways to design embedded agents for cooperation. This isn&#x27;t obvious.)</p><p>Collaboration-by-design, on the other hand, is much easier. Unfortunately, AI-race dynamics make it seem unlikely. The other alternative is to explicitly design safety parameters, as Mobileye <a href="https://www.mobileye.com/responsibility-sensitive-safety/">has done for self driving cars with &quot;RSS&quot;</a> - limiting the space in which they can make decisions to enforce limits about how cars interact. This seems intractable in domains where safety is ill-defined, and seems to require much better understanding of <a href="https://www.lesswrong.com/posts/SSRuEATNTwR2iraDR/corrigibility">corrigibility</a>, at the very least.</p><h2>Next Steps</h2><p>Perhaps there are approaches I haven&#x27;t considered, or reasons to think this isn&#x27;t a problem. Alternatively, perhaps there is a clearer way to frame the problem that exists ow which I am unaware, or the problem could be framed more clearly in a way I am not seeing. As a first step, progress on identification on either front seems useful.</p> davidmanheim ts3CKRDHgkBasAy4J 2018-11-18T07:22:16.340Z Comment by Davidmanheim on Embedded World-Models https://www.lesswrong.com/posts/efWfvrWLgJmbBAs3m/embedded-world-models#x4StGN9JkSz9nAum4 <p>Some of these issues (obviously) are not limited to AI. Specifically, the problem of how to deal with multi-level models and &quot;composibility&quot; was the subject of an applied research project for military applications by my dissertation chair, Paul Davis, here: https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG101.pdf -</p><p>&quot;The appealing imagery of arbitrary plug-and-play is fatally flawed for complex models... The more-complex [lower level] model components have typically been developed for particular purposes and depend on context-sensitive assumptions, some of which are tacit.&quot;</p><p>This issue has formed the basis of a fair amount of his later work as well, but this work focuses on practical advice, rather than conceptual understanding of the limitations. Still, that type of work may be useful as inspiration.</p> davidmanheim x4StGN9JkSz9nAum4 2018-11-15T06:55:20.024Z Comment by Davidmanheim on Multi-Agent Overoptimization, and Embedded Agent World Models https://www.lesswrong.com/posts/dfZLLEfFvkrMwmiMw/multi-agent-overoptimization-and-embedded-agent-world-models#jq88GXxSqYgDw2eSh <p>Yes, there is a ton of work on some of these in certain settings, and I&#x27;m familiar with some of it. </p><p>In fact, the connections are so manifold that I suspect it would be useful to lay out which if these connections seems useful, in another paper, if only to save other people time and energy trying to do the same and finding dead-ends. On reflection, however, I&#x27;m concerned about how big of a project this ends up becoming, and I am unsure how useful it would be to applied work in AI coordination.</p><p>Just as one rabbit hole to go down, there is a tremendous amount of work on cooperation, which spans several very different literatures. The most relevant work, to display my own obvious academic bias, seems to be from public policy and economics, and includes work on participatory decision making and cooperative models for managing resources. Next, you mentioned law - I know there is work on interest-based negotiation, where defining the goals clearly allows better solutions, as well as work on mediation. In business, there is work on team-building that touches on these points, as well as inter-group and inter-firm competition and cooperation, which touch on related work in economics. I know the work on principle agent problems, as well as game-theory applied to more realistic scenarios. (Game theorists I&#x27;ve spoken with have noted the fragility of solutions to very minor changes in the problem, which is why it&#x27;s rarely applied.) There&#x27;s work in evolutionary theory, as well as systems biology, that touches on some of these points. Social psychology, Anthropology, and Sociology all presumably have literatures on the topic as well, but I&#x27;m not at all familiar with them.</p> davidmanheim jq88GXxSqYgDw2eSh 2018-11-14T14:35:27.424Z Comment by Davidmanheim on What is ambitious value learning? https://www.lesswrong.com/posts/5eX8ko7GCxwR5N9mN/what-is-ambitious-value-learning#t6KA9idvksgyMAZ3H <p>That&#x27;s an important question, bu it&#x27;s also fundamentally hard, since it&#x27;s almost certainly true that human values are inconsistent - if not individually, than at an aggregate level. (You can&#x27;t reconcile opposite preferences, or maximize each person&#x27;s share of a finite resource.) </p><p>The best answer I have seen is Eric Drexler&#x27;s discussion of Pareto-topia, where he suggests that we can make huge progress and gain of utility according to all value-systems held by humans, despite the fact that they are inconsistent.</p> davidmanheim t6KA9idvksgyMAZ3H 2018-11-09T11:23:37.428Z Multi-Agent Overoptimization, and Embedded Agent World Models https://www.lesswrong.com/posts/dfZLLEfFvkrMwmiMw/multi-agent-overoptimization-and-embedded-agent-world-models <p>I think this expands on the points being made in the recently completed Garrabrant / Demski <a href="https://www.alignmentforum.org/s/Rm6oQRJJmhGCcLvxh">Embedded Agency</a> sequence. It also serves to connect a paper I wrote recently that discusses mostly non-AI risks from multiple agents that expands on the work done last year of Goodhart&#x27;s Law back to the deeper questions that MIRI is considering. Lastly, it tries to point out a bit of how all of this connects to some of the other streams of AI safety research.</p><h2>Juggling Models</h2><p>We don&#x27;t know how to make agents contain a complete world model that includes themselves. That&#x27;s a hard enough problem, but the problem could get much harder - and in some applications it already has. When multiple agents need to have world models, the discrepancy between the model and reality can have some nasty feedback effects that relate to Goodhart&#x27;s law, which I am now referring to more generally as overoptimization failures.</p><p>In <a href="https://arxiv.org/abs/1810.10862">my recent paper</a>, I discuss the problem when multiple agents interact, using poker as a motivating example. Each poker-playing agent needs to have a (simplified) model of the game in order to play (somewhat) optimally. Reasonable heuristics and Machine Learning already achieve super-human performance in &quot;heads-up&quot; (2-player) poker. But the general case of multi-player poker is a huge game, so the game gets simplified.</p><p>This is exactly the case where we can transition just a little bit from the world of easy decision theory, which <a href="https://www.lesswrong.com/s/Rm6oQRJJmhGCcLvxh/p/zcPLNNw4wgBX5k8kQ">Abram and Scott point out</a> allows modeling &quot;the agent and the environment as separate units which interact over time through clearly defined i/o channels,&quot; to the world of not embedded agents, but interacting agents. This moves just a little bit in the direction of &quot;we don&#x27;t know how to do this.&quot;</p><p>This partial transition happens because the agent must have some model of the decision process of the other players in order to play strategically. In that model, agents need to represent what those players will do not only in reaction to the cards, but in reaction to the bets the agent places. To do this optimally, they need a model of the other player&#x27;s (perhaps implicit) model of the agent. And building models of other player&#x27;s models seems very closely related to work like Andrew Critch&#x27;s paper on <a href="https://arxiv.org/abs/1602.04184">Lob&#x27;s Theorem and Cooperation</a>.</p><p>That explains why I claim that building models of complex agents that have models of you that then need models of them, etc. is going to be related to some of the same issues that embedded agents face, even without the need to deal with some of the harder parts of self-knowledge of agents that self-modify.</p><h2>Game theory &quot;answers&quot; this, but it cheated.</h2><p>The obvious way to model interaction is with game theory, which makes a couple seemingly-innocuous simplifying assumptions. The problem is that these assumptions are impossible in practice.</p><p>The first is that the agents are rational and Bayesian. But as Chris Sims <a href="http://sims.princeton.edu/yftp/Bayes250/NoRealBayesians.pdf">pointed out</a>, there are no real Bayesians. (&quot; Not that there’s something better out there. &quot;)</p><blockquote>• There are fewer than 2 truly Bayesian chess players (probably none). • We know the optimal form of the decision rule when two such players play each other: Either white resigns, black resigns, or they agree on a draw, all before the first move. • But picking which of these three is the right rule requires computations that are not yet complete.</blockquote><p>This is (kind of) a point that Abram and Scott made in the sequence in disguise - that world models are always smaller than the agents.</p><p>The second assumption is that agents have common knowledge of both agents&#x27; objective functions. (Ben Pace points out <a href="https://www.lesswrong.com/posts/9QxnfMYccz9QRgZ5z/the-costly-coordination-mechanism-of-common-knowledge">how hard that assumption is to realize in practice</a>. And yes, you can avoid this assumption by specifying that they have uncertainty of a defined form, but that just kicks the can down the road - how do you know what distributions to use? What happens if the agent&#x27;s true utility is outside the hypothesis space?) If the models of the agents must be small, however, it is possible that they cannot have a complete model of the other agent&#x27;s preferences.</p><p>It&#x27;s a bit of a side-point for the embedded agents discussion, but breaking this second assumption is what allows for a series of overoptimization exploitations <a href="https://arxiv.org/abs/1810.10862">explored in the new paper</a>. Some of these, like accidental steering and coordination failures, are worrying for AI-alignment because they pose challenges even for cooperating agents. Others, like adversarial misalignment, input spoofing and filtering, and goal co-option, are only in the adversarial case, but can still matter if we are concerned about subsystem alignment. And the last category, direct hacking, gets into many of the even harder problems of embedded agents.</p><h2>Embedded agents, exploitation and ending.</h2><p>As I just noted, one class of issues that embedded agents have that traditional dichotomous agents do not is direct interference. If an agents hacks the software another agent is running on, there are many obvious exploits to worry about. This can&#x27;t easily happen with a defined channel. (But to digress, they still do happen in such defined channels. This is because people without security mindset keep building Turing-complete languages into the communication interfaces, instead of doing #LangSec properly.)</p><p>But for embedded agents the types of exploitation we need to worry about are even more general. Decision theory with embedded world models is obviously critical for <a href="https://www.alignmentforum.org/s/Rm6oQRJJmhGCcLvxh">Embedded Agency</a> work, but I think it&#x27;s also critical for value alignment, since &quot;goal inference&quot; in practice requires inferring some baseline shared human value system from incoherent groups. (Whether or not the individual agents are incoherent.) This is in many ways a multi-agent cooperation problem - and even if we want to cooperate and share goals, and we already agreed that we should do so, cooperation can fall prey to accidental steering and coordination failures. </p><p>Lastly, Paul Christiano&#x27;s <a href="https://www.alignmentforum.org/s/EmDuGeRw749sD3GKd">Iterated Amplification</a> approach, which in part relies on small agents cooperating, seems to need to deal with this even more explicitly. But I&#x27;m still thinking about the connections between these problems and the ones his approach takes, and I&#x27;ll wait for his sequence to be finished, and time for me to think about it, to comment about this and get more clarity.</p> davidmanheim dfZLLEfFvkrMwmiMw 2018-11-08T20:33:00.499Z Comment by Davidmanheim on What is ambitious value learning? https://www.lesswrong.com/posts/5eX8ko7GCxwR5N9mN/what-is-ambitious-value-learning#HvY6tjZmMhZnPzxNP <p>Sorry, I needed to clarify my thinking and my claim a lot further. This is in addition to the (what I assumed was obvious) claim that correct Bayesian thinkers should be able to converge on beliefs despite potentially having different values. I&#x27;m speculating that if terminal values are initially drawn from a known distribution, AND &quot;if you think that a different set of life experiences means that you are a different person with different values,&quot; but that values change based on experiences in ways that are understandable, then rational humans will act in a coherent way so that we should expect to be able to learn human values and their distribution, despite the existence of shifts. </p><p>Conditional on those speculative thoughts, I disagree with your conclusion that &quot;that&#x27;s a really good reason to assume that the whole framework of getting the true human utility function is doomed.&quot; Instead, I think we should be able to infer the distribution of values that humans actually have - even if they individually change over time from experiences.</p> davidmanheim HvY6tjZmMhZnPzxNP 2018-11-08T20:18:33.002Z Comment by Davidmanheim on No Really, Why Aren't Rationalists Winning? https://www.lesswrong.com/posts/bRGbdG58cJ8RGjS5G/no-really-why-aren-t-rationalists-winning#CcbfgTRN4CajHevd7 <blockquote> I&#x27;m not sure I expect hiring people solely based on their educational expertise to work out well. </blockquote><p>Yes, there needs to be some screening other than pedagogy, but money to find the best people can fix lots of problems. And yes, typical teaching at good universities sucks, but that&#x27;s largely because it optimizes for research. (You&#x27;d likely have had better professors as an undergrad if you went to a worse university - or at least that was my experience.)</p><blockquote>...they can only (do something like) streamline the existing product. </blockquote><p>My thought was that streamlining the existing product and turning it into useable and testably effective modules would be a really huge thing.</p><blockquote>Also: I think you&#x27;re implying that AI is a really huge deal problem and rationality is less. </blockquote><p>If that was the implication, I apologize - I view safe AI as only near-impossible, while making actual humans rational is a problem that is fundamentally impossible. But raising the sanity water-line has some low-hanging fruits - not to get most people to CFAR-expert-levels, but to get high schools to teach some of the basics in ways that potentially has significant leverage in improving social decision-making in general. (And if the top 1% of people in High Schools also take those classes, there might be indirect benefits leading to increasing the number of CFAR-expert-level people in a decade.)</p> davidmanheim CcbfgTRN4CajHevd7 2018-11-08T09:52:45.103Z Comment by Davidmanheim on Subsystem Alignment https://www.lesswrong.com/posts/ChierESmenTtCQqZy/subsystem-alignment#xqi4oCm5eZ2vTXcih <p>There is a literature on this, and it&#x27;s not great for the purposes here - principle-agent setups assume we can formalize the goal as a good metric, and the complexity of management is a fundamentally hard problem that we don&#x27;t have good answers for (see my essay on scaling companies here: https://www.ribbonfarm.com/2016/03/17/go-corporate-or-go-home/ ), and goodhart failures due to under-specified goals are fundamentally impossible (see my essay on this here: https://www.ribbonfarm.com/2016/09/29/soft-bias-of-underspecified-goals/ ).</p><p>There are a set of strategies for mitigating the problems, and I have a paper on this that is written but still needs to be submitted somewhere, tentatively titled &quot;Building Less Flawed Metrics: Dodging Goodhart and Campbell’s Laws,&quot; if anyone wants to see it they can message/email/tweet at me. </p><p>Abstract: Metrics are useful for measuring systems and motivating behaviors. Unfortunately, naive application of metrics to a system can distort the system in ways that undermine the original goal. The problem was noted independently by Campbell and Goodhart, and in some forms it is not only common, but unavoidable due to the nature of metrics. There are two distinct but interrelated problems that must be overcome in building better metrics; first, specifying metrics more closely related to the true goals, and second, preventing the recipients from gaming the difference between the reward system and the true goal. This paper describes several approaches to designing metrics, beginning with design considerations and processes, then discussing specific strategies including secrecy, randomization, diversification, and post-hoc specification. Finally, it will discuss important desiderata and the trade-offs involved in each approach. </p><p>(I should edit this comment to be a link once I have submitted and have a pre-print or publication.)</p> davidmanheim xqi4oCm5eZ2vTXcih 2018-11-08T09:33:16.681Z Comment by Davidmanheim on Subsystem Alignment https://www.lesswrong.com/posts/ChierESmenTtCQqZy/subsystem-alignment#Qn6bXqwmvJKbRN8TG <blockquote>...we&#x27;re talking about a class of problems that already comes up in all sorts of practical engineering, and which can be satisfactorily handled in many real cases without needing any philosophical advances.</blockquote><p>The explicit assumption of the discussion here is that we can&#x27;t pass the full objective function to the subsystem - so it cannot possibly have the goal fully well defined. This isn&#x27;t going to depend on whether the subsystem is really smart or really dumb, it&#x27;s a fundamental problem if you can&#x27;t tell the subsystem enough to solve it.</p><p>But I don&#x27;t think that&#x27;s a fair characterization of most Goodhart-like problems, even in the limited practical case. Bad models and causal mistakes don&#x27;t get mitigated unless we get the correct model. And adversarial Goodhart is much worse than that. I agree that it describes &quot;tails diverge&quot; / regressional goodhart, and we have solutions for that case, (compute the Bayes estimate, as the previous ) but only once the goal is well-defined. (We have mitigations for other cases, but they have their own drawbacks.) </p> davidmanheim Qn6bXqwmvJKbRN8TG 2018-11-08T09:22:53.354Z Comment by Davidmanheim on No Really, Why Aren't Rationalists Winning? https://www.lesswrong.com/posts/bRGbdG58cJ8RGjS5G/no-really-why-aren-t-rationalists-winning#kvqPqNjiFjrPp9eyo <p>Very much disagree - but this is as someone not in the middle of the Bay area, where the main part of this is happening. Still, I don&#x27;t think rationality works without some community.</p><p>First, I don&#x27;t think that the alternative communities that people engage with are epistemically healthy enough to allow people to do what they need to reinforce good norms for themselves.</p><p>Second, I don&#x27;t think that epistemic rationality is something that a non-community can do a good job with, because there is much too little personal reinforcement and positive vibes that people get to stick with it if everyone is going it alone.</p> davidmanheim kvqPqNjiFjrPp9eyo 2018-11-07T12:43:00.882Z Comment by Davidmanheim on What is ambitious value learning? https://www.lesswrong.com/posts/5eX8ko7GCxwR5N9mN/what-is-ambitious-value-learning#3jHbbuAAXZSA28F9M <p>I don&#x27;t think you are correct about the implication of &quot;not up for grabs&quot; - it doesn&#x27;t mean it is not learnable, it means that we don&#x27;t update or change it, and that it is not constrained by rationality. But even that isn&#x27;t quite right - rational behavior certainly requires that we change preferences about intermediate outcomes when we find that our instrumental goals should change in response to new information.</p><p>And if the utility function changes as a result of life experiences, it should be in a way that reflects learnable expectations over how experiences change the utility function - so the argument about needing origin disputes still applies. </p> davidmanheim 3jHbbuAAXZSA28F9M 2018-11-07T07:34:49.901Z Comment by Davidmanheim on Subsystem Alignment https://www.lesswrong.com/posts/ChierESmenTtCQqZy/subsystem-alignment#ZyDt2a3KYS6uJL9vo <p>It&#x27;s easy to find ways of searching for truth in ways that harm the instrumental goal.</p><p>Example 1: I&#x27;m a self-driving car AI, and don&#x27;t know whether hitting pedestrians at 35 MPH is somewhat bad, because it injures them, or very bad, because it kills them. I should not gather data to update my estimate.</p><p>Example 2: I&#x27;m a medical AI. Repeatedly trying a potential treatment that I am highly uncertain about the effects of to get high-confidence estimates isn&#x27;t optimal. I should be trying to maximize something other than knowledge. Even though I need to know whether treatments work, I should balance the risks and benefits of trying them.</p><p></p> davidmanheim ZyDt2a3KYS6uJL9vo 2018-11-07T07:21:23.176Z Comment by Davidmanheim on The easy goal inference problem is still hard https://www.lesswrong.com/posts/h9DesGT3WT9u2k7Hr/the-easy-goal-inference-problem-is-still-hard#tbhyReiC7Ly5bpKa3 <p>Fair point, but I don&#x27;t think that addresses the final claim, which is that even if you are correct, analyzing the black box isn&#x27;t enough without actually playing out counterfactuals.</p> davidmanheim tbhyReiC7Ly5bpKa3 2018-11-06T09:03:59.600Z Comment by Davidmanheim on Embedded World-Models https://www.lesswrong.com/posts/efWfvrWLgJmbBAs3m/embedded-world-models#HFkAsLzRAD3NPWorQ <blockquote>...the weirdness of the injunction to optimize over a space containing every procedure you could ever do, including all of the <em>optimization </em>procedures you could ever do. </blockquote><p>My most recent preprint discusses multi-agent Goodhart ( https://arxiv.org/abs/1810.10862 ) and uses the example of poker, along with a different argument somewhat related to the embedded agent problem, to say why the optimization over strategies needs to include optimizing over the larger solution space.</p><p>To summarize and try to clarify how I think it relates, strategies for game-playing must at least implicitly include a model of the other player&#x27;s actions, so that an agent can tell which strategies will work against them. We need uncertainty in that model, because if we do something silly like assume they are rational Bayesian agents, we are likely to act non-optimally against their actual strategy. But the model of the other agent itself needs to account for their model of our strategy, including uncertainty about our search procedure for strategies - otherwise the space is clearly much too large to optimize over.</p><p>Does this make sense? (I may need to expand on this and clarify my thinking...)</p> davidmanheim HFkAsLzRAD3NPWorQ 2018-11-06T09:01:49.741Z Comment by Davidmanheim on The Importance of Goodhart's Law https://www.lesswrong.com/posts/YtvZxRpZjcFNwJecS/the-importance-of-goodhart-s-law#fgzyeW8kTEbWhsXJs <blockquote> Anyway, it doesn&#x27;t even seem mathematically obvious to me that optimizing for G* will reduce correlation between G and G*. </blockquote><p>See Greg Lewis&#x27;s post here: https://www.lesswrong.com/posts/dC7mP5nSwvpL65Qu5/why-the-tails-come-apart and Scott Alexander&#x27;s discussion here: http://slatestarcodex.com/2018/09/25/the-tails-coming-apart-as-metaphor-for-life/</p><p>Also see our paper formalizing the other Goodhart&#x27;s Law failure modes: https://arxiv.org/abs/1803.04585</p> davidmanheim fgzyeW8kTEbWhsXJs 2018-11-06T08:52:45.945Z Comment by Davidmanheim on No Really, Why Aren't Rationalists Winning? https://www.lesswrong.com/posts/bRGbdG58cJ8RGjS5G/no-really-why-aren-t-rationalists-winning#rfRwnutqhWJ6s7gFx <blockquote>Agreed that rationality work has not seen much progress, and I&#x27;d personally like to move the needle forward on that. </blockquote><p>Unfortunately, or perhaps fortunately, the really huge deal problems get all the attention from the really motivated smart people who get convinced by the rational arguments. </p><p>Perhaps the way forward on the &quot;improve general rationality&quot; is to try hiring educational experts from outside the rationality community to build curricula and training based on the sequences while getting feedback from CFAR, instead of having CFAR work on building such curricula (which they are no longer doing, AFAICT.)</p><p></p> davidmanheim rfRwnutqhWJ6s7gFx 2018-11-05T16:59:20.140Z Comment by Davidmanheim on The easy goal inference problem is still hard https://www.lesswrong.com/posts/h9DesGT3WT9u2k7Hr/the-easy-goal-inference-problem-is-still-hard#BJsPAwmCFbbF7Rtqb <p>You mean that you can ask the agent if it wants just X, and it will say &quot;I want Y also,&quot; but it will never act to do those things? That sounds like what Robin Hanson discusses in Elephant in the Brain - and he largely dismisses the claimed preferences, in favor of the caring about the actual desire.</p><p>I&#x27;m confused about why we think this is a case that would occur in a way that Y is a real goal we should pursue, instead of a false pretense. And if it was the case, how would brain inspection (without manipulation) allow us to know it?</p> davidmanheim BJsPAwmCFbbF7Rtqb 2018-11-05T16:53:16.340Z Comment by Davidmanheim on No Really, Why Aren't Rationalists Winning? https://www.lesswrong.com/posts/bRGbdG58cJ8RGjS5G/no-really-why-aren-t-rationalists-winning#E2gGEeQTS6K48rkSh <p>I think you are not looking in the right places, as the groups of rationalists I know are doing incredibly well for themselves - tenure-track positions at major universities, promotions to senior positions in US government agencies, incredibly well paid jobs doing EA-aligned research in machine learning and AI, huge amounts of money being sent to the rationalist-sphere AI risk research agendas that people were routinely dismissing a few years ago, etc.</p><p>To evaluate this more dispassionately, however, I&#x27;d suggest looking at the people who posted high-karma posts in 2009, and seeing what the posters are doing now. I&#x27;ll try that here, but I don&#x27;t know what some of these people are doing now. They seem to be a overall high-achieving group. (But we don&#x27;t have a baseline.)</p><p>https://www.greaterwrong.com/archive/2009 - Page 1: I&#x27;m seeing Eliezer, (he seems to have done well,) Hal Finney (unfortunately deceased, but had he lived a bit longer he would have been a multi-multi millionaire for being an early bitcoin holder / developer,) Scott Alexander (I think his blog is doing well enough,) Phil Goetz - ?, Anna Salomon (helping run CFAR,) &quot;Liron&quot; - (?, but he&#x27;s now running https://relationshiphero.com/ and seems to have done decently as a serial entrepreneur,) Wei Dei, (A fairly big name in cryptocurrency,) cousin_it - ?, CarlShulman, doing a bunch of existential risk work with FHI and other organizations, Alicorn (now a writer and &quot;Immortal bisexual polyamorous superbeing&quot;), HughRistik - ?, Orthonormal (Still around, but ?), jimrandomh (James Babcock - ?), AllanCrossman, (http://allancrossman.com/ - ?) and Psychohistorian (Eitan Pechenick, Academia)</p> davidmanheim E2gGEeQTS6K48rkSh 2018-11-05T12:41:31.172Z Comment by Davidmanheim on What is ambitious value learning? https://www.lesswrong.com/posts/5eX8ko7GCxwR5N9mN/what-is-ambitious-value-learning#wL9C5zC4LiWem3SYP <p>The Hansonian discussion of shared priors seems relevant. (For those not familiar with it: <a href="https://mason.gmu.edu/~rhanson/prior.pdf">https://mason.gmu.edu/~rhanson/prior.pdf</a> ) Basically, we should have convergent posteriors in an Aumann sense unless we have not only different priors and different experiences, but also different origins. </p><p>But what this means is that *to the extent that human values are coherent and based on correct bayesian reasoning* - which, granted, is a big assumption - distributional shifts shouldn&#x27;t exist. (And now, back to reality.)</p> davidmanheim wL9C5zC4LiWem3SYP 2018-11-05T06:42:28.197Z Comment by Davidmanheim on Robust Delegation https://www.lesswrong.com/posts/iTpLAaPamcKyjmbFC/robust-delegation#o6H7yBqPGzvgcstKq <p>I usually think of the non-wireheading preference in terms of multiple values - humans value both freedom and pleasure. We are not willing to fully maximize one fully at the expense of the other. Wireheading is always defined by giving up freedom of action by maximizing &quot;pleasure&quot; defined in some way that does not include choice.</p> davidmanheim o6H7yBqPGzvgcstKq 2018-11-05T06:35:39.198Z Comment by Davidmanheim on Robust Delegation https://www.lesswrong.com/posts/iTpLAaPamcKyjmbFC/robust-delegation#HAvf9DMMDzLnoLWw8 <p>I want to expand a bit on adversarial Goodhart, which this post describes as when another agent actively attempts to make the metric fail, and the paper I wrote with Scott split into several sub-categories, but which I now think of in somewhat simpler terms. There is nothing special happening in the multi-agent setting in terms of metrics or models, it&#x27;s the same three failure modes we see in the single agent case.</p><p>What changes more fundamentally is that there are now coordination problems, resource contention, and game-theoretic dynamics that make the problem potentially much worse in practice. I&#x27;m beginning to think of these multi-agent issues as a problem more closely related to the other parts of embedded agency - needing small models of complex systems, reflexive consistency, and needing self-models, as well as the issues less intrinsically about embedded agency, of coordination problems and game theoretic competition.</p> davidmanheim HAvf9DMMDzLnoLWw8 2018-11-04T18:45:36.455Z Comment by Davidmanheim on The easy goal inference problem is still hard https://www.lesswrong.com/posts/h9DesGT3WT9u2k7Hr/the-easy-goal-inference-problem-is-still-hard#6qAB6SCXR6wiamMT6 <p>I think this is important, but I&#x27;d take it further.</p><p>In addition to computational limits for the class of decision where you need to compute to decide, there are clearly some heuristics that are being used by humans that give implicitly incoherent values. In those cases, you might want to apply the idea of computational limits as well. This would allow you to say that the reason they picked X not Y at time 1 for time 2, but Y not X at time 2, reflects the cost of thinking about what their future self will want.</p><p></p> davidmanheim 6qAB6SCXR6wiamMT6 2018-11-04T13:19:33.665Z Policy Beats Morality https://www.lesswrong.com/posts/RK8koC4Bfg54G7jTk/policy-beats-morality <p>(<a href="https://medium.com/@davidmanheim/policy-beats-morality-2d12b689bde8">Crossposted from Medium</a>)</p><p> </p><p>This is a simple point, but one that gets overlooked, so I think it deserves a clear statement. Morality is less effective than incentives at changing behavior, and most of the time, policy is the way incentives get changed.</p><p></p><p>Telling people the right thing to do doesn’t work. Even if they believe you, or understand what you are saying, most people will not change their behavior simply because it’s the right thing to do. What works better is changing the incentives. If this is done right, people who won’t do the right thing on their own often support the change, and their behavior will follow.</p><p></p><p>I remember reading a story that I think was about Martin Gardner’s column (Edit:it was Douglas Hofstadter&#x27;s - thanks <strong><a href="https://www.lesswrong.com/users/gjm">gjm</a></strong>!) in <em>Scientific American</em> in which he asked eminent scientists to write in whether they would cooperate with someone described as being “as intelligent as themselves” in a one-shot prisoner’s dilemma. He was disappointed to find that even many of the smartest people in the world were rational, instead of superrational. Despite his assertion that intelligent enough people should agree that superrationality leads to better outcomes for everyone, those people followed their incentives, and everyone defected. Perhaps we can chalk this up to their lack of awareness of <a href="https://wiki.lesswrong.com/wiki/Decision_theory">newer variants of decision theory</a>, but the simpler explanation is that morality is a weak tool, and people know it. The beneficial nature of the “morality” of non-defection wasn’t enough to convince participants that anyone would go along.</p><p></p><p>Environmentalists spent decades attempting “moral suasion” as a way to get people to recycle. It didn’t work. What worked was curb-side pickup of recycling that made money for municipalities, paired with fines for putting recyclables in the regular garbage. Unsurprisingly, incentives matter. This is well understood, but often ignored. When people are<a href="https://edition.cnn.com/2018/10/08/world/ipcc-climate-change-consumer-actions-intl/index.html"> told the way to curb pollution is to eat less meat or drive less</a>, they don’t listen. The reason their behavior doesn’t change isn’t because it’s <a href="https://twitter.com/adamjohnsonNYC/status/1049519866154242048">“really” the fault of companies</a>, it’s because morality doesn’t change behavior much — but policy will.</p><p></p><p>The reason <a href="http://www.overcomingbias.com/2008/09/politics-isnt-a.html">politics is even related to policy</a> is because politicians like being able to actually change public behavior. The effectiveness of policy in changing behavior is the secondary reason why — after <a href="https://www.propublica.org/article/filing-taxes-could-be-free-simple-hr-block-intuit-lobbying-against-it">donations by Intuit and H&amp;R Block </a>— congress will never simplify the tax code. To <a href="https://slatestarcodex.com/2014/09/10/society-is-fixed-biology-is-mutable/">paraphrase / disagree</a> with Scott Alexander, “Society Is Fixed, Policy Is Mutable.” Public policy can change the incentives in a way that makes otherwise impossible improvements turn into defaults. Punishment mechanisms are (at least sometimes) <a href="https://ideas.repec.org/p/ces/ceswps/_183.html">sufficient to induce cooperation among free-riders</a>.</p><p></p><p>Policy doesn’t change culture directly, but it certainly changes behaviors and outcomes. So I’ll say it again: policy beats morality.</p><p></p><p>*) Yes, technological change and innovation can ALSO drive changes in incentives, but predicting the direction of such changes is really hard. This is why I’m skeptical that innovation <strong>alone</strong> is a good target for changing systems. Even when technology lowers the cost of recycling, it’s rarely clear beforehand whether new technology will in fact manage to prompt such changes — electric trolleys were a better technology than early cars, but they lost. Electric cars are still rare. Nuclear power is the lowest carbon alternative, but it’s been regulated into inefficiency.</p> davidmanheim RK8koC4Bfg54G7jTk 2018-10-17T06:39:40.398Z (Some?) Possible Multi-Agent Goodhart Interactions https://www.lesswrong.com/posts/9evYBqHAvKGR3rxMC/some-possible-multi-agent-goodhart-interactions <p>Epistemic Status: I need feedback on these ideas, and I&#x27;ve been delaying because I&#x27;m not sure I&#x27;m on the right track. This is the product of a lot of thinking, but I&#x27;m not sure the list is complete or there isn&#x27;t something important I&#x27;m missing. (Note: This is intended to form a large part of a paper for an article to be submitted to the journal special issue <a href="http://www.mdpi.com/journal/BDCC/special_issues/Artificial_Superintelligence">here</a>.)</p><p>Following up on Scott Garrabrant&#x27;s earlier post on <a href="https://www.lesswrong.com/posts/EbFABnst8LsidYs5Y/goodhart-taxonomy">Goodhart&#x27;s Law</a> and the resulting <a href="https://arxiv.org/abs/1803.04585">paper</a>, I wrote a further discussion of <a href="https://www.lesswrong.com/posts/iK2F9QDZvwWinsBYB/non-adversarial-goodhart-and-ai-risks">non-adversarial goodhart</a>, and explicitly deferred discussion of the adversarial case. I&#x27;ve been working on that.</p><p>Also note that these are often reformulations or categorizations of other terms (treacherous turn, faulty reward functions, distributional shift, reward hacking, etc.) It might be good to clarify exactly what went where, but I&#x27;m unsure.</p><p>To (finally) start, here is Scott&#x27;s &quot;Quick Reference&quot; for the initial 4 methods, which is useful for this post as well. I&#x27;ve partly replaced the last one with the equivalent cases from the Arxiv paper.</p><h1>Quick Reference</h1><ul><li>Regressional Goodhart - When selecting for a proxy measure, you select not only for the true goal, but also for the difference between the proxy and the goal.</li><ul><li><u>Model</u>: When U is equal to V+X, where X is some noise, a point with a large U value will likely have a large V value, but also a large X value. Thus, when U is large, you can expect V to be predictably smaller than U.</li><li><em>Example: height is correlated with basketball ability, and does actually directly help, but the best player is only 6&#x27;3&quot;, and a random 7&#x27; person in their 20s would probably not be as good</em></li></ul><li>Causal Goodhart - When there is a non-causal correlation between the proxy and the goal, intervening on the proxy may fail to intervene on the goal.</li><ul><li><u>Model</u>: If V causes U (or if V and U are both caused by some third thing), then a correlation between V and U may be observed. However, when you intervene to increase U through some mechanism that does not involve V, you will fail to also increase V.</li><li><em>Example: someone who wishes to be taller might observe that height is correlated with basketball skill and decide to start practicing basketball.</em></li></ul><li>Extremal Goodhart - Worlds in which the proxy takes an extreme value may be very different from the ordinary worlds in which the correlation between the proxy and the goal was observed.</li><ul><li><u>Model</u>: Patterns tend to break at simple joints. One simple subset of worlds is those worlds in which U is very large. Thus, a strong correlation between U and V observed for naturally occuring U values may not transfer to worlds in which U is very large. Further, since there may be relatively few naturally occuring worlds in which U is very large, extremely large U may coincide with small V values without breaking the statistical correlation.</li><li><em>Example: the tallest person on record,</em> <em><a href="https://en.wikipedia.org/wiki/Robert_Wadlow">Robert Wadlow</a>, was 8&#x27;11&quot; (2.72m). He grew to that height because of a pituitary disorder, he would have struggled to play basketball because he &quot;required leg braces to walk and had little feeling in his legs and feet.&quot;</em></li></ul><li><em><strong>(See below from the Arxiv paper.)</strong> Adversarial Goodhart - When you optimize for a proxy, you provide an incentive for adversaries to correlate their goal with your proxy, thus destroying the correlation with your goal.</em></li><ul><li><u>Model</u>: <strong>Removed. See below.</strong></li><li><em>Example -<strong> Removed.</strong></em></li></ul></ul><p><em><strong>From the Arxiv Paper: (Note - I think this is still incomplete, and focuses far too much on the Agent-Regulator framing. See below.)</strong></em></p><ul><li><strong> Adversarial Misalignment Goodhart</strong> - The agent applies selection pressure knowing the regulator will apply different selection pressure on the basis of the metric . The adversarial misalignment failure can occur due to the agent creating extremal Goodhart effects, or by exacerbating always-present regressional Goodhart, or due to causal intervention by the agent which changes the effect of the regulator optimization. </li><li><strong>Campbell’s Law </strong>- Agents select a metric knowing the choice of regulator metric. Agents can correlate their metric with the regulator’s metric, and select on their metric. This further reduces the usefulness of selection using the metric for acheiving the original goal. </li><li><strong>Normal Cobra Effect</strong> - The regulator modifies the agent goal, usually via an incentive, to correlate it with the regulator metric. The agent then acts by changing the observed causal structure due to incompletely aligned goals in a way that creates a Goodhart effect. </li><li><strong>Non-Causal Cobra Effect</strong> - The regulator modifies the agent goal to make agent actions aligned with the regulator’s metric. Under selection pressure from the agent, extremal Goodhart effects occur or regressional Goodhart effects are worsened. </li></ul><h1>New: 5 Ways Multiple Agents Ruin Everything</h1><p>To fix that insufficient bullet point above, here is a list of 5 forms of optimization failures that can occur in multi-agent systems. I intend for the new sub-list to be both exhaustive, and non-overlapping, but I&#x27;m not sure either is true. For obvious reasons, the list is mostly human examples, and I haven&#x27;t formalized these into actual system models. (Anyone who would like to help me do so would be welcome!)</p><p>Note that the list is only discussing things that happen due to optimization failure and interactions. Also note that most examples are 2-party. There may be complex and specific 3-party or N-party failure modes that are not captured, but I can&#x27;t find any.</p><p><strong>1) (Accidental) Steering </strong>is when one agent alter the system in ways not anticipated by another agent, creating one of the above-mentioned over-optimization failures for the victim.</p><p>This is particularly worrisome when multiple agents have closely related goals, even if those goals are aligned.</p><p><strong>Example 1.1 </strong>A system may change due to a combination of actors&#x27; otherwise benign influences, either putting the system in an extremal state or triggering a regime change.</p><p><strong>Example 1.2</strong> In the presence of multiple agents without coordination, manipulation of factors not already being manipulated by other agents is likely to be easier and more rewarding, potentially leading to inadvertent steering.</p><p><strong>2) Coordination Failure </strong>occurs when multiple agents clash despite having potentially compatible goals.</p><p>Coordination is an inherently difficult task, and can in general be considered impossible\cite{Gibbard1973}. In practice, coordination is especially difficult when goals of other agents are incompletely known or understood. Coordination failures such as Yudkowsky&#x27;s Inadequate equilibria\cite{Yudkowsky2017} are stable, and coordination to escape from such an equilibrium can be problematic even when agents share goals. </p><p><strong>Example 2.1</strong> Conflicting instrumental goals that neither side anticipates may cause wasted resources on contention. For example, both agents are trying to do the same thing in conflicting ways.</p><p><strong>Example 2.2</strong> Coordination limiting overuse of public goods is only possible when conflicts are anticipated or noticed and where a reliable mechanism can be devised\cite{Ostrom1990}.</p><p><strong>3) Adversarial misalignment</strong> occurs when a victim agent has an incomplete model of how an opponent can influence the system, and the opponent selects for cases where the victim&#x27;s model performs poorly and/or promotes the opponent&#x27;s goal.</p><p><strong>Example 3.1</strong> Chess engines will choose openings for which the victim is weakest.</p><p><strong>Example 3.2</strong> Sophisticated financial actors can dupe victims into buying or selling an asset in order to exploit the resulting price changes.</p><p><strong>4) Input spoofing and filtering </strong>- Filtered evidence can be provided or false evidence can be manufactured and put into the training data stream of a victim agent.</p><p><strong>Example 4.1</strong> Financial actors can filter by performing transactions they don&#x27;t want seen as private transactions or dark pool transactions, or can spoof by creating offsetting transactions with only one half being reported to give a false impression of activity to other agents.</p><p><strong>Example 4.2 </strong>Rating systems can be attacked by inputting false reviews into a system, or by discouraging reviews by those likely to be the least or most satisfied reviewers.</p><p><strong>Example 4.3</strong> Honeypots can be placed or Sybil attacks mounted by opponents in order to fool victims into learning from examples that systematically differ from the true distribution.</p><p><strong>5) Goal co-option</strong> is when an agent directly modifies the victim agent reward function directly, or manipulates variables absent from the victim&#x27;s system model.</p><p>The probability of exploitable reward functions increases with the complexity of both the agent and the system it manipulates\cite{Amodei2016}, and exploitation by other agents seems to follow the same pattern. </p><p><strong>Example 5.1</strong> Attackers can directly target the system on which an agent runs and modify its goals.</p><p><strong>Example 5.2</strong> An attacker can discover exploitable quirks in the goal function to make the second agent optimize for a new goal, as in Manheim and Garrabrant&#x27;s Campbell&#x27;s law example.</p><h1>Conclusion</h1><p>I&#x27;d love feedback. (I have plenty to say about applications and importance, but I&#x27;ll talk about that separately.)</p> davidmanheim 9evYBqHAvKGR3rxMC 2018-09-22T17:48:22.356Z Lotuses and Loot Boxes https://www.lesswrong.com/posts/q9F7w6ux26S6JQo3v/lotuses-and-loot-boxes <p>There has been <a href="https://www.lesswrong.com/posts/KwdcMts8P8hacqwrX/noticing-the-taste-of-lotus#comments">some recent discussion</a> about how to conceptualize how we lose sight of goals, and I think there&#x27;s a critical conceptual tool missing. Simply put, there&#x27;s a difference between addiction and exploitation. Deciding to try heroin and getting hooked is addiction, but the free sample from a dealer given to an unwary kid to &#x27;try it just once&#x27; is exploitation. This difference is not about the way the addict acts once they are hooked, it&#x27;s about how they got where they are, and how wary non-addicts need to be when they notice manipulation.</p><p>I&#x27;m going to claim that the modern world is more addictive overall not just because humans have gotten better at wireheading ourselves. The reason is because humanity has gotten better at exploitation at scale. People (correctly) resent exploitation more than addiction, and we should pay more attention to how we are getting manipulated. Finally, I&#x27;ll claim that this all ties back to <a href="https://www.ribbonfarm.com/2016/06/09/goodharts-law-and-why-measurement-is-hard/">my pet theory</a> that <a href="https://arxiv.org/abs/1803.04585">some form</a> of Goodhart&#x27;s law can <a href="https://www.ribbonfarm.com/2016/09/29/soft-bias-of-underspecified-goals/">explain</a> almost <a href="https://twitter.com/search?q=%40davidmanheim%20%22Goodhart%27s%20Law%22&src=typd">anything that matters</a>. (I&#x27;m kind of joking about that last. But only kind of.)</p><h2>Lotuses</h2><p>Humans are adaptation executioners, not fitness maximizers - so sometimes we figure out how to wirehead using those adaptations in a way that doesn&#x27;t maximize fitness. Sex is great for enhancing natural fitness because it leads to babies, condoms let us wirehead. High-fat foods are great for enhancing natural fitness because food is scarce, modern abundance lets us wirehead. Runner&#x27;s high enhances natural fitness, #punintentional, because it allows people to continue running from a predator, but running marathons for the exhilarating feeling lets us wirehead. These examples are mostly innocuous - they aren&#x27;t (usually) adddictions.</p><p>That doesn&#x27;t mean they can&#x27;t be bad for us. Valentine <a href="https://www.lesswrong.com/posts/KwdcMts8P8hacqwrX/noticing-the-taste-of-lotus">suggested </a>that we try to notice the taste of the lotus when our minds get hijacked by something other than our goal. I think this is exactly right - we get hijacked by our own brain into doing something other than our original goal. Our preferences get modified by doing something, and several hours later it&#x27;s 2 AM and we realize we should stop p̶l̶a̶y̶i̶n̶g̶ ̶p̶u̶z̶z̶l̶e̶s̶ ̶a̶n̶d̶ ̶d̶r̶a̶g̶o̶n̶s̶ r̶e̶a̶d̶i̶n̶g̶ ̶o̶l̶d̶ ̶p̶o̶s̶t̶s̶ ̶o̶n̶ ̶S̶l̶a̶t̶e̶s̶t̶a̶r̶c̶o̶d̶e̶x doing whatever it is we&#x27;re currently getting distracted by. </p><p>In some of those cases natural behavior reinforcement systems are easier to trick than they are to satisfy and wireheading takes over. That&#x27;s probably a bad thing, and most people would prefer not to have it happen. Valentine says: &quot;I claim you can come to notice what lotuses taste like. Then you can choose to break useless addictions.&quot; But this isn&#x27;t addiction. Addiction certainly involves hijacked motivations, but it goes further. It doesn&#x27;t count as addiction unless it breaks the system a lot more than just preference hijacking. </p><blockquote>&quot;Addiction is a primary, chronic disease of brain reward, motivation, memory and related circuitry. Dysfunction in these circuits leads to characteristic biological, psychological, social and spiritual manifestations. This is reflected in an individual pathologically pursuing reward and/or relief by substance use and other behaviors.&quot; - <a href="https://www.asam.org/resources/definition-of-addiction">American Society of Addiction Medicine</a></blockquote><p>Addiction is when the system breaks, and breaking is more than being hijacked a bit - but the breaking that occurs in addiction is often passive. No-one broke it, the system just failed.</p><h2>Loot Boxes</h2><p>It&#x27;s not hard to see that companies have figured out how to exploit people&#x27;s natural behavior reinforcement systems. Foods are now engineered to trigger all the right physiological and gustatory triggers that make you want more of them. The engineering of preferences is not an art. (I&#x27;d call it a science, but instead of being ignored, it&#x27;s really well funded.) </p><p>There is big money riding on a company&#x27;s ability to hijack preferences. When Lays chips said “Betcha Can&#x27;t Eat Just One,” they were being literal - they invested money into a product and a marketing campaign in order to bet that consumers would be unable to stop themselves from eating more than would be good for them. <a href="https://www.theatlantic.com/health/archive/2010/03/bet-you-cant-eat-just-one/38181/">Food companies have been making this and similar bets for a few decades now, and each time they tilt the odds even further in their favor</a>.</p><p>Modern video games include loot boxes, a game feature <a href="https://www.pcgamesn.com/loot-boxes-gambling-addiction">explicitly designed</a> to turn otherwise at least kind-of normal people into money pumps. The victims are not money pumps in the classical dutch book sense, though. Instead, they are money pumps because playing the game is addictive, and companies have discovered it&#x27;s possible to hijack people&#x27;s limbic systems by coupling immersive games with addictive gambling. The players preferences are modified by playing the game, and had players been told beforehand that they would end up spending $1,000 and six months of their life playing the game, many would have chosen not to play.</p><p>Writing this post instead of the paper I should be writing on multipandemics is a distraction. Browsing reddit mindlessly until 2AM is an addiction. But it&#x27;s Cheez-its that rewired my mind to make a company money. This last isn&#x27;t addiction in the classical sense, it&#x27;s exploitation of an addictive tendency. And that&#x27;s where I think Goodhart&#x27;s Law comes in.</p><h2>Being Tricked</h2><p>Scott Garrabrant suggested the notion of &quot;Adversarial Goodhart,&quot; and in our paper we started to explore how it works a bit. Basically, Goodhart&#x27;s law is when the metric, which in this case is whatever gets promoted by our natural behavior reinforcement systems, diverges from the goal (of the blind idiot god evolution,) which is evolutionary fitness. This divergence isn&#x27;t inevitably a problem when selection pressure is only moderate, but because people are motivated to munchkin their own limbic system, it gets a bit worse.</p><p>Still, individuals are limited - we have only so much ability to optimize, and when we&#x27;re playing in single player mode, the degree to which metrics and goals get misaligned is limited by the error in our models. It turns out that when you&#x27;re screwing up on your own, you&#x27;re probably safe. No one is there to goad you into further errors, and noticing the taste of the lotus might be enough. When the lotus is being sold by someone, however, we end up deep in <a href="https://www.lesserwrong.com/posts/EbFABnst8LsidYs5Y/goodhart-taxonomy">adversarial-goodhart </a>territory.</p><p>And speaking of being tricked, if I started this post saying I was going to mention AI safety and multi-agent dynamics most of you wouldn&#x27;t have read this far. But now that I&#x27;ve got you this far, I&#x27;ll misquote Paul Erlich, “To err is human but to really foul things up<strong> </strong>requires [multi-agent competitive optimizations].” I&#x27;ll get back to psychology and wire-heading in the next paragraph, but the better a system becomes at understanding another system it&#x27;s manipulating, the worse this problem gets. If there is ever any form of superintelligence that wants to manipulate us, we&#x27;re in trouble - people don&#x27;t seem to be hard to exploit.</p><p>As I&#x27;ve discussed above, exploitation doesn&#x27;t require AI, or even being particularly clever - our goals can get hijacked by <a href="https://xkcd.com/356/">simple nerd-snipes</a>, gamification (even if it&#x27;s just magic internet points,) or any other idea that eats smart people. So I&#x27;ll conclude by saying I think that while I agree we should notice the taste of the lotus, it&#x27;s worth differentiating between being distracted, being addicted, and being exploited.</p> davidmanheim q9F7w6ux26S6JQo3v 2018-05-17T00:21:12.583Z Non-Adversarial Goodhart and AI Risks https://www.lesswrong.com/posts/iK2F9QDZvwWinsBYB/non-adversarial-goodhart-and-ai-risks <p>In a <a href="https://arxiv.org/abs/1803.04585">recent paper</a> by Scott Garrabrant and myself, we formalized and extended the categories <a href="https://www.lesserwrong.com/posts/EbFABnst8LsidYs5Y/goodhart-taxonomy">Scott proposed</a> for Goodhart-like phenomena. (If you haven&#x27;t read either his post or the new paper, it&#x27;s important background for most of this post.) </p><p>Here, I lay out my further intuitions about how and where the non-adversarial categories matter for AI safety. Specifically, I view these categories as particularly critical in preventing accidental superhuman AI, or near-term paperclipping. This makes them particularly crucial in the short term.</p><p>I do not think that most of the issues highlighted are new, but I think the framing is useful, and hopefully clearly presents why causal mistakes by Agentic AI are harder problems that I think is normally appreciated.</p><p><em>Epistemic Status: Provisional and open to revision based on new arguments, but arrived at after significant consideration. I believe conclusions 1-4 are restatements of well understood claims in AI safety. I believe conclusions 5 and 6 are less well appreciated.</em></p><p>Side Note: I am deferring discussion of adversarial Goodhart to the other paper and a later post; it is arguably more important, but in very different ways. The deferred topics includes most issues with multiple agentic AIs that interact, and issues with pre-specifying a control scheme for a superhuman AI.</p><h2>Goodhart Effects Review - <a href="https://arxiv.org/pdf/1803.04585.pdf">Read the paper for details!</a> </h2><p><strong>Regressional Goodhart</strong> - When selecting for a proxy measure, you select not only for the true goal, but also for the difference between the proxy and the goal.</p><p><strong>Extremal Goodhart</strong> - Worlds in which the proxy takes an extreme value may be very different from the ordinary worlds in which the relationship between the proxy and the goal was observed. This occurs in the form of Model Insufficiency, or Change in Regime.</p><p><strong>Causal Goodhart</strong> - When the causal path between the proxy and the goal is indirect, intervening can change the relationship between the measure and proxy, and optimizing can then cause perverse effects.</p><p><strong>Adversarial Goodhart</strong> will not be discussed in this post. It occurs in two ways.<strong> Misalignment</strong> - The agent applies selection pressure knowing the regulator will apply different selection pressure on the basis of the metric. This allows the agent to hijack the regulator&#x27;s optimization. <strong>Cobra Effect</strong> - The regulator modifies the agent goal, usually via an incentive, to correlate it with the regulator metric. The agent then either 1) uses selection pressure to create extremal Goodhart effects occur or make regressional Goodhart effects more severe, or 2) acts by changing the causal structure due to incompletely aligned goals in a way that creates a Goodhart effect. </p><h2><strong>Regressional and Extremal Goodhart</strong></h2><p>The first two categories of Goodhart-like phenomena, regressional and extremal, are over-optimization mistakes, and in my view the mistakes should be avoidable. This is not to say we don&#x27;t need to treat AI as<a href="https://intelligence.org/2016/12/28/ai-alignment-why-its-hard-and-where-to-start/"> a cryptographic rocket probe</a>, or that we don&#x27;t need to be worried about it - just that we know what to be concerned about already. This seems vaguely related to <a href="http://slatestarcodex.com/2018/01/24/conflict-vs-mistake/">what Scott Alexander calls &quot;Mistake Theory</a>&quot; - the risks would be technically solvable if we could convince people not to do the stupid things that make them happen.</p><p>Regressional Goodhart, as Scott Garrabrant correctly noted, is an unavoidable phenomenon when doing unconstrained optimization using a fixed metric which is imperfectly correlated with a goal. To avoid the problems of overoptimization despite the unavoidable phenomenon, a safe AI systems must 1) have limited optimization power to allow robustness to misalignment, perhaps via satisficing, low-impact agents, or suspendable / stoppable agents, and/or 2) involve a metric which is adaptive using techniques like oversight or reinforcement learning. This allows humans to realign the AI, and safe approaches should ensure there are other ways to enforce <a href="https://www.lesserwrong.com/posts/bBdfbWfWxHN9Chjcq/robustness-to-scale">robustness to scaling up</a>. </p><p>Conclusion 1 - <em>Don&#x27;t allow unconstrained optimization using a fixed metric.</em></p><p>In less unavoidable ways, Extremal Goodhart effects are mistakes of overoptimization, and my intuition is that they should be addressable in similar ways. We need to be able to detect the regime changes or increasing misalignment of the metric, but the strategies that address regressional effects should be closely related or useful in the same cases. Again, it&#x27;s not easy, but it&#x27;s a well defined problem.</p><p>Conclusion 2 - <em>When exploring the fitness landscape, don&#x27;t jump down the optimization slope too quickly before double checking externally. This is especially true when moving to out-of-sample areas.</em></p><p>Despite the challenges, I think that the divergences between goals and metrics in the first two Goodhart-like effects can be understood and addressed beforehand, and these techniques are being actively explored. In fact, I think this describes at least a large plurality of the current work being done on AI alignment.</p><h2>The Additional Challenge of Causality</h2><p>Causal Goodhart, like the earlier two categories, is always a mistake of understanding. Unlike the first two, it seems less easily avoidable by being cautious. The difficulty of inferring causality correctly means that it&#x27;s potentially easy to accidentally screw up an agentic AI&#x27;s world model in a way that allows causal mistakes to be made. I&#x27;m unsure the approaches being considered for AI Safety are properly careful about this fact. (I am not very familiar with the various threads of AI safety research, so I may be mistaken on that count.)</p><p>Accounting for uncertainty about causal models is critical, but given the multiplicity of possible models, we run into the problems of computation seen in AIXI. (And even AIXI doesn&#x27;t guarantee safety!)</p><p>So inferring causal structure is NP hard. Scott&#x27;s earlier post claims that &quot;you can try to infer the causal structure of the variables using statistical methods, and check that the proxy actually causes the goal before you intervene on the proxy. &quot; The problem is that we can&#x27;t actually infer causal structure well, even given RCTs, without simultaneously testing the full factorial set of cases. (And even then statistics is hard, and can be screwed up accidentally in complex and unanticipated ways.) Humans infer causality partly intuitively, but in more complex systems, badly. They can be <a href="http://homepages.inf.ed.ac.uk/clucas2/docs/kushnirGopnikLucasSchulz2010.pdf">taught to do it better (PDF)</a>, but only in narrow domains. </p><p>Conclusion 3 -<em> Getting causality right is an intrinsically computationally hard and sample-inefficient problem, and building AI won&#x27;t fix that.</em></p><p>As Pearl notes, policy is hard in part because <a href="http://ftp.cs.ucla.edu/pub/stat_ser/r422-reprint.pdf">knowing exactly these complex causal factors is hard</a>. This isn&#x27;t restricted to AI, and it also happens in [insert basically any public policy that you think we should stop already here]. (Politics is the mind-killer, and policy debates often center around claims about causality. No, I won&#x27;t give contemporary examples.)<em> </em></p><p>We don&#x27;t even get causality right in the relatively simpler policy systems we already construct - hence Chesterton&#x27;s fence, <a href="https://twitter.com/davidmanheim/status/758680756222824448">Boustead&#x27;s Iron Law of Intervention</a>, and the fact that intellectuals throughout history routinely start advocating strongly for things that turn out to be bad when actually applied. They never actually apologize for <a href="https://en.wikipedia.org/wiki/Great_Leap_Forward">accidentally starving and killing 5% of their population</a>. This, of course, is because their actual idea was good, it was just done badly. Obviously <a href="https://en.wikipedia.org/wiki/Four_Pests_Campaign">real killing of birds to reduce pests in China has never been tried</a>.</p><p>Conclusion 4 - <em>Sometimes the perverse effects of getting it a little bit wrong are really, really bad, especially because perverse effects may only be obvious after long delays. </em></p><p>There are two parts to this issue, the first of which is that mistaken causal structure can lead to regressional or extremal Goodhart. This is not causal Goodhart, and isn&#x27;t more worrisome than those issues, since the earlier mentioned solutions still apply. The second part is that the action taken by the regulator may actually change the causal structure. They think they are doing something simple like removing a crop-eating predator, but the relationship between crop-eating and birds ignores the fact that the birds eat other pests. This is much more worrisome, and harder to avoid.</p><p>This second case is causal Goodhart. The mistake can occur as soon as you allow a regulator - Machine Learning, AI, or otherwise - to interact in arbitrary ways with wider complex systems directly to achieve specified goals, without specifying and restricting the methods to be used. </p><p>These problems don&#x27;t show up in current deployed systems because humans typically choose the action set to be chosen from based on the causal understanding needed. The challenge is also not seen in toy-worlds, since testing domains are usually very well specified, and inferring causality becomes difficult only when the system being manipulated contains complex and not-fully-understood causal dynamics. (A possible counterexample is a story I heard secondhand about OpenAI developing what became the <a href="http://alpha.openai.com/miniwob/">World of Bits</a> system. Giving a RL system access to a random web browser and the mouse led to weird problems including, if I recall correctly, the entire system crashing.) </p><p>Conclusion 5 - <em>This class of causal mistake problem should be expected to show up as proto-AI systems are fully deployed, not beforehand when tested in limited cases.</em></p><p>This class of problem does not seem to be addressed by much of the AI-Risk approaches that are currently being suggested or developed. (The only approach that avoids this is using Oracle-AIs.) It seems there is no alternative to using a tentative causal understanding for decision making if we allow any form of Agentic AI. The problems that it causes are not usually obvious to either the observer or the agent until the decision has been implemented. </p><p>Note that attempting to minimize the impact of a choice is done based on the same mistaken or imperfect causal model that leads to the decision, so it is not avoidable in this way. Humans providing reinforcement learning based on the projected outcomes of the decision are similarly unaware of the perverse effect, and impact minimization assumes that the projected impact is correct.</p><p>Conclusion 6 -<em> Impact minimization strategies do not seem likely to fix the problems of causal mistakes.</em></p><h2>Summary</h2><p>It seems that the class of issues identified in the realm of Goodhart-like phenomena illustrates some potential advantages and issues worth considering in AI safety. The problems identified in part simply restate problems that are already understood, but the framework seems worth further consideration. Most critically, a better understanding of causal mistakes and causal Goodhart effects would potentially be valuable. If the conclusions here are incorrect, understanding why also seems useful for understanding the way in which AI risk can and cannot manifest. </p> davidmanheim iK2F9QDZvwWinsBYB 2018-03-27T01:39:30.539Z Evidence as Rhetoric — Normative or Positive? https://www.lesswrong.com/posts/3megyBeFyYqBZbzMy/evidence-as-rhetoric-normative-or-positive <p>A (very brief) response to a 2006 paper noting that rationality in policy needs to admit that evidence is about rhetoric.</p> davidmanheim 3megyBeFyYqBZbzMy 2017-12-06T17:38:05.033Z A Short Explanation of Blame and Causation https://www.lesswrong.com/posts/RS4hPDqdXL8TCuG6a/a-short-explanation-of-blame-and-causation davidmanheim RS4hPDqdXL8TCuG6a 2017-09-18T17:43:34.571Z Prescientific Organizational Theory (Ribbonfarm) https://www.lesswrong.com/posts/N4k737KfkB38MjHpe/prescientific-organizational-theory-ribbonfarm davidmanheim N4k737KfkB38MjHpe 2017-02-22T23:00:41.273Z A Quick Confidence Heuristic; Implicitly Leveraging "The Wisdom of Crowds" https://www.lesswrong.com/posts/ePcsMfkHFp9sCGFpw/a-quick-confidence-heuristic-implicitly-leveraging-the <p class="graf graf--p">Let&rsquo;s say you have well-informed opinions on a variety of topics. Without information about your long term accuracy in each given area, how confident should you be in those opinions?</p> <p class="graf graf--p">Here&rsquo;s a quick heuristic, for any area where other people have well-informed opinions about the same topics; your confidence should be a function of the distance of your estimate from the average opinion, and the standard deviation of those opinions. I&rsquo;ll call this the wisdom-of-crowds-confidence level, because it can be justified based on the empirical observation that the average of even uninformed guesses is typically a better predictor than most individual predictions.</p> <p class="graf graf--p">Why does this make sense?</p> <p class="graf graf--p">The Aumann agreement theorem implies that rational discussants can, given enough patience and introspection, pass messages about their justifications until they eventually converge. Given that informed opinions share most evidence, the differential between the opinions is likely due to specific unshared assumptions or evidence. If that evidence were shared, unless the vast majority of the non-shared assumptions were piled up on the same side, the answer would land somewhere near the middle. (This is why I was going to call the heuristic Aumann-confidence, but I don&rsquo;t think it quite fits.)</p> <p class="graf graf--p">Unless you have a strong reason to assume you are a privileged observer, trading on inside information or much better calibrated than other observers, there is no reason to expect this nonshared evidence will be biased. And while this appears to contradict the conservation of expected evidence theorem, it&rsquo;s actually kind-of a consequence of it, because we need to update on the knowledge that there is unshared evidence leading the other person to make their own claim.</p> <p class="graf graf--p">This is where things get tricky &mdash; we need to make assumptions about joint distributions on unshared evidence. Suffice it to say that unless we have reason to believe our unshared evidence or assumptions is much stronger than theirs, we should end up near the middle. And that goes back to a different, earlier assumption - that others are also well informed.</p> <p class="graf graf--p">Now that we&rsquo;ve laid out the framework, though, we can sketch the argument.</p> <ol class="postList"> <li class="graf graf--li">We can expect that our opinion should shift towards the average, once we know what the average is, even without exploring the other people&rsquo;s unshared assumptions and data. The distance it should shift depends on how good our assumptions and data are compared to theirs.</li> <li class="graf graf--li">Even if we have strong reasons for thinking that we understand why others hold the assumptions they do, they presumably feel the same way about us.</li> <li class="graf graf--li">And why do you think your unshared evidence and assumptions are so great anyways, huh? Are you special or something?</li> </ol> <p class="graf graf--p">Anyways, those are my thoughts.</p> <p class="graf graf--p">Comments?</p> davidmanheim ePcsMfkHFp9sCGFpw 2017-02-10T00:54:41.394Z Most empirical questions are unresolveable; The good, the bad, and the appropriately under-powered https://www.lesswrong.com/posts/88DaMxZQoEWJYCsHo/most-empirical-questions-are-unresolveable-the-good-the-bad davidmanheim 88DaMxZQoEWJYCsHo 2017-01-23T20:35:29.054Z A Cruciverbalist’s Introduction to Bayesian reasoning https://www.lesswrong.com/posts/LuzekfX96Zgf2qMPJ/a-cruciverbalist-s-introduction-to-bayesian-reasoning davidmanheim LuzekfX96Zgf2qMPJ 2017-01-12T20:43:48.928Z Map:Territory::Uncertainty::Randomness – but that doesn’t matter, value of information does. https://www.lesswrong.com/posts/CnwSsKuKCg8fLC4vr/map-territory-uncertainty-randomness-but-that-doesn-t-matter <p class="MsoNormal">In risk modeling, there is a well-known distinction between <a href="https://en.wikipedia.org/wiki/Uncertainty_quantification#Aleatoric_and_epistemic_uncertainty">aleatory and epistemic uncertainty</a>, which is sometimes referred to, or thought of, as irreducible versus reducible uncertainty. Epistemic uncertainty exists in our map; as Eliezer <a href="/lw/oj/probability_is_in_the_mind/">put it</a>, &ldquo;The Bayesian says, &lsquo;Uncertainty exists in the map, not in the territory.&rsquo;&rdquo; Aleatory uncertainty, however, exists in the territory. (Well, at least according to our map that uses quantum mechanics, according to Bells Theorem &ndash; like, say, the time at which a radioactive atom decays.) This is what people call quantum uncertainty, indeterminism, true randomness, or recently (and somewhat confusingly to myself) ontological randomness &ndash; referring to the fact that our ontology allows randomness, not that the ontology itself is in any way random. It may be better, in Lesswrong terms, to think of uncertainty versus randomness &ndash; while being aware that the wider world refers to both as uncertainty. But does the distinction matter?</p> <p class="MsoNormal">To clarify a key point, many facts are treated as random, such as dice rolls, are actually mostly uncertain &ndash; in that with enough physics modeling and inputs, we could predict them. On the other hand, in chaotic systems, there is the possibility that the &ldquo;true&rdquo; quantum randomness can propagate upwards into macro-level uncertainty. For example, a sphere of highly refined and shaped uranium that is *exactly* at the critical mass will set off a nuclear chain reaction, or not, based on the quantum physics of whether the neutrons from one of the first set of decays sets off a chain reaction &ndash; after enough of them decay, it will be reduced beyond the critical mass, and become increasingly unlikely to set off a nuclear chain reaction. Of course, the question of whether the nuclear sphere is above or below the critical mass (given its geometry, etc.) can be a difficult to measure uncertainty, but it&rsquo;s not aleatory &ndash; though some part of the question of whether it kills the guy trying to measure whether it&rsquo;s just above or just below the critical mass will be random &ndash; so maybe it&rsquo;s not worth finding out. And that brings me to the key point.</p> <p class="MsoNormal">In a large class of risk problems, there are factors treated as aleatory &ndash; but they may be epistemic, just at a level where finding the &ldquo;true&rdquo; factors and outcomes is prohibitively expensive. Potentially, the timing of an earthquake that would happen at some point in the future could be determined exactly via a simulation of the relevant data. Why is it considered aleatory by most risk analysts? Well, doing it might require a destructive, currently technologically impossible deconstruction of the entire earth &ndash; making the earthquake irrelevant. We would start with measurement of the position, density, and stress of each relatively macroscopic structure, and the perform a very large physics simulation of the earth as it had existed beforehand. (We have lots of silicon from deconstructing the earth, so I&rsquo;ll just assume we can now build a big enough computer to simulate this.) Of course, this is not worthwhile &ndash; but doing so would potentially show that the actual aleatory uncertainty involved is negligible. Or it could show that we need to model the macroscopically chaotic system to such a high fidelity that microscopic, fundamentally indeterminate factors actually matter &ndash; and it was truly aleatory uncertainty. (So we have epistemic uncertainty about whether it&rsquo;s aleatory; if our map was of high enough fidelity, and was computable, we would know.)</p> <p class="MsoNormal">It turns out that most of the time, for the types of problems being discussed, this distinction is irrelevant. If we know that the value of information to determine whether something is aleatory or epistemic is negative, we can treat the uncertainty as randomness. (And usually, we can figure this out via a quick order of magnitude calculation; Value of Perfect information is estimated to be worth $100 to figure out which side the dice lands on in this game, and building and testing / validating any model for predicting it would take me at least 10 hours, my time is worth at least $25/hour, it&rsquo;s negative.) But sometimes, slightly improved models, and slightly better data, are feasible &ndash; and then worth checking whether there is some epistemic uncertainty that we can pay to reduce. In fact, for earthquakes, we&rsquo;re doing that &ndash; we have monitoring systems that can give several minutes of warning, and geological models that can predict to some degree of accuracy the relative likelihood of different sized quakes.</p> <p class="MsoNormal">So, in conclusion; most uncertainty is lack of resolution in our map, which we can call epistemic uncertainty. This is true even if lots of people call it &ldquo;truly random&rdquo; or irreducibly uncertain &ndash; or if they are fancy, aleatory uncertainty. Some of what we assume is uncertainty is really randomness. But lots of the epistemic uncertainty can be safely treated as aleatory randomness, and value of information is what actually makes a difference. And knowing the terminology used elsewhere can be helpful.</p> davidmanheim CnwSsKuKCg8fLC4vr 2016-01-22T19:12:17.946Z Meetup : Finding Effective Altruism with Biased Inputs on Options - LA Rationality Weekly Meetup https://www.lesswrong.com/posts/N54BG43KnCEbNDDk9/meetup-finding-effective-altruism-with-biased-inputs-on <h2>Discussion article for the meetup : <a href='/meetups/1kl'>Finding Effective Altruism with Biased Inputs on Options - LA Rationality Weekly Meetup</a></h2> <div class="meetup-meta"> <p> <strong>WHEN:</strong>&#32; <span class="date">20 January 2016 07:00:00PM (-0800)</span><br> </p> <p> <strong>WHERE:</strong>&#32; <span class="address">10850 West Pico Boulevard, Los Angeles, CA 90064Westside Pavilion - Upstairs Wine Bar (Next to the movie theater)</span> </p> </div><!-- .meta --> <div id="" class="content"> <div class="md"><p>We're going to be discussing the general question of how to use biased information to make rational decisions, but talk about the specific context of how to be an Effective Altruist doing so.</p> <p>The various EA nonprofits each have a claim to effective altruism, and there is lots of uncertainty about which will end up being the most effective; we can give to AMF and save lives in the near future for around $1,000 a life, or try policy interventions, with unknown effects, or perhaps we should try prevent one of severeal potential tail risks that could destroy humanity in the near or far future. The experts in each area argue for their cause, and we'd love a clearer way to think about the options. Come join us as we try to find one!</p></div> </div><!-- .content --> <h2>Discussion article for the meetup : <a href='/meetups/1kl'>Finding Effective Altruism with Biased Inputs on Options - LA Rationality Weekly Meetup</a></h2> davidmanheim N54BG43KnCEbNDDk9 2016-01-14T05:31:20.472Z Perceptual Entropy and Frozen Estimates https://www.lesswrong.com/posts/uEukcuvnW8s6Kvh4Y/perceptual-entropy-and-frozen-estimates <h2>A Preface</h2> <p class="MsoNormal">During the 1990&rsquo;s, a significant stream of research existed around how people process information, which combined very different streams in psychology and related areas with explicit predictive models about how actual cognitive processes differ from the theoretical ideal. This is not only the literature by Kahneman and Tversky about cognitive biases, but includes research about memory, perception, scope insensitivity, and other areas. The rationalist community is very familiar with some of this literature, but fewer are familiar with a masterful synthesis produced by Richards Heuer for the intelligence community in 1999<a name="_ftnref1" href="file:///C:/Users/dmanheim/Documents/LA%20Rationality%201%20-%20Perceptual%20Entropy%20and%20State.docx#_ftn1"><span class="MsoFootnoteReference"><!--[if !supportFootnotes]--><span class="MsoFootnoteReference"><span style="font-size:11.0pt;line-height:115%; font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;mso-ascii-theme-font:minor-latin;mso-fareast-font-family: Calibri;mso-fareast-theme-font:minor-latin;mso-hansi-theme-font:minor-latin; mso-bidi-font-family:&quot;Times New Roman&quot;;mso-bidi-theme-font:minor-bidi; mso-ansi-language:EN-US;mso-fareast-language:EN-US;mso-bidi-language:AR-SA">[1]</span></span><!--[endif]--></span></a>, which was intended to start combating these problems, a goal we share. I&rsquo;m hoping to put together a stream of posts based on that work, potentially expanding on it, or giving my own spin &ndash; but encourage reading <a href="https://www.cia.gov/library/center-for-the-study-of-intelligence/csi-publications/books-and-monographs/psychology-of-intelligence-analysis/PsychofIntelNew.pdf">the book itself</a>&nbsp;(PDF) as well<span class="MsoFootnoteReference"><a href="file:///C:/Users/dmanheim/Documents/LA%20Rationality%201%20-%20Perceptual%20Entropy%20and%20State.docx#_ftn2"><!--[if !supportFootnotes]--><span class="MsoFootnoteReference"><span style="font-size:11.0pt;line-height:115%; font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;mso-ascii-theme-font:minor-latin;mso-fareast-font-family: Calibri;mso-fareast-theme-font:minor-latin;mso-hansi-theme-font:minor-latin; mso-bidi-font-family:&quot;Times New Roman&quot;;mso-bidi-theme-font:minor-bidi; mso-ansi-language:EN-US;mso-fareast-language:EN-US;mso-bidi-language:AR-SA">[2]</span></span></a>.</span>&nbsp;(This essay is based on Chapter 3.)</p> <p class="MsoNormal">This will hopefully be my first set of posts, so feedback is especially welcome, both to help me refine the ideas, and to refine my presentation.</p> <div> <h2>Entropy, Pressure, and Metaphorical States of Matter</h2> <p class="MsoNormal">Eliezer recommends <a href="/lw/ij/update_yourself_incrementally/">updating incrementally</a> but has noted that it&rsquo;s hard. The central point, that it is hard to do &nbsp;so, is one that some in our community have experienced and explicated, but there is deep theory I&rsquo;ll attempt to outline, via an analogy, that I think explains how and why it occurs. The problem is that we are quick to form opinions and build models, because humans are good at pattern finding. We are less quick to discard them, due to limited mental energy. This is especially true when the pressure of evidence doesn&rsquo;t shift overwhelmingly and suddenly.</p> <p class="MsoNormal">I&rsquo;ll attempt to answer the question of how this is true by stretching a metaphor and create an intuition pump for thinking about how our minds might be perform some think using uncertainty.</p> <h2>Frozen Perception</h2> <p class="MsoNormal">Heuer notes a stream of research about perception, and notes that &ldquo;once an observer has formed an image &ndash; that is, once he or she has developed a mind set or expectation concerning the phenomenon being observed &ndash; this conditions future perceptions of that phenomenon.&rdquo; This seems to follow a standard Bayesian practice, but in fact, as Eliezer noted, people fail to update. The following set of images, which Heuer reproduced from a 1976 book by Robert Jervis, show exactly this point;</p> <img src="http://www.internetactu.net/wp-content/uploads/2009/01/ciaimage.gif" alt="Impressions Resist Change - Series of line drawings transitioning between a face and a crouching woman." width="675" height="538" /><br /> <p class="MsoNormal">Looking at each picture, starting on the left, and moving to the right, you see a face slowly change. At what point does the face no longer seem to appear? (Try it!) For me, it&rsquo;s at about the seventh image that it&rsquo;s clear it morphed into a sitting, bowed figure. But what if you start at the other end? The woman is still clearly there long past the point where we see a face, starting in the other direction. What&rsquo;s going on?</p> <p class="MsoNormal">We seem to attach too strongly to our first approach, decision, or idea. Specifically, our decision seems to &ldquo;freeze&rdquo; once it get to one place, and needs much more evidence to start moving again. This has an analogue in physics, to the notion of freezing, which I think is more important than it first appears.</p> <h2>Entropy</h2> <p class="MsoNormal">To analyze this, I&rsquo;ll drop into some basic probability theory, and physics, before (hopefully) we come out on the other side with a conceptually clearer picture. First, I will note that cognitive architecture has some way of representing theories, and implicitly assigns probabilities to various working theories. This is some sort of probability distribution over sample theories. Any probability distribution has a quantity called entropy<a name="_ftnref1" href="file:///C:/Users/dmanheim/Documents/LA%20Rationality%201%20-%20Perceptual%20Entropy%20and%20State.docx#_ftn1"><span class="MsoFootnoteReference"><!--[if !supportFootnotes]--><span class="MsoFootnoteReference"><span style="font-size:11.0pt;line-height:115%; font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;mso-ascii-theme-font:minor-latin;mso-fareast-font-family: Calibri;mso-fareast-theme-font:minor-latin;mso-hansi-theme-font:minor-latin; mso-bidi-font-family:&quot;Times New Roman&quot;;mso-bidi-theme-font:minor-bidi; mso-ansi-language:EN-US;mso-fareast-language:EN-US;mso-bidi-language:AR-SA">[3]</span></span><!--[endif]--></span></a>, which is simply the probability of each state, multiplied by the logarithm of that probability, summed over all the states. (The probability is less than 1, so the logarithm is negative, but we traditionally flip the sign so entropy is a positive quantity.)</p> <p class="MsoNormal">Need an example? Sure! I have two dice, and they can each land on any number, 1-6. I&rsquo;m assuming they are fair, so each has probability of 1/6, and the logarithm (base 2) of 1/6 is about -2.585. There are 6 states, so the total is 6* (1/6) * 2.585 = 2.585. (With two dice, I have 36 possible combinations, each with probability 1/36, log(1/36) is -5.17, so the entropy is 5.17. You may have notices that I doubled the number of dice involved, and the entropy doubled &ndash; because there is exactly twice as much that can happen, but the average entropy is unchanged.) If I only have 2 possible states, such as a fair coin, each has probability of 1/2, and log(1/2)=-1, so for two states, (-0.5*-1)+(-0.5*-1)=1. An unfair coin, with a &frac14; probability of tails, and a &frac34; probability of heads, has an entropy&nbsp; of 0.81. Of course, this isn&rsquo;t the lowest possible entropy &ndash; a trick coin with both sides having heads only has 1 state, with entropy 0. So unfair coins have lower entropy &ndash; because we know more about what will happen.</p> <p class="MsoNormal">&nbsp;</p> <h2>Freezing, Melting, and Ideal Gases under Pressure</h2> <p class="MsoNormal">In physics, this has a deeply related concept, also called entropy, which in the form we see it on a macroscopic scale, just temperature. If you remember your high school science classes, temperature is a description of how much molecules move around. I&rsquo;m not a physicist, and this is a bit simplified<a name="_ftnref2" href="file:///C:/Users/dmanheim/Documents/LA%20Rationality%201%20-%20Perceptual%20Entropy%20and%20State.docx#_ftn2"><span class="MsoFootnoteReference"><!--[if !supportFootnotes]--><span class="MsoFootnoteReference"><span style="font-size:11.0pt;line-height:115%; font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;mso-ascii-theme-font:minor-latin;mso-fareast-font-family: &quot;Times New Roman&quot;;mso-fareast-theme-font:minor-fareast;mso-hansi-theme-font: minor-latin;mso-bidi-font-family:&quot;Times New Roman&quot;;mso-bidi-theme-font:minor-bidi; mso-ansi-language:EN-US;mso-fareast-language:EN-US;mso-bidi-language:AR-SA">[4]</span></span><!--[endif]--></span></a>, but the entropy of an object is how uncertain we are about its state &ndash; gasses expand to fill their container, and the molecules could be anywhere, so they have higher entropy than a liquid, which stays in its container, which still has higher entropy than a solid, where the molecules don&rsquo;t more much, which still has higher entropy than a crystal, where the molecules are sort of locked into place.</p> <p class="MsoNormal">This partially lends intuition to the third law of thermodynamics; &ldquo;the entropy of a perfect crystal at absolute zero is exactly equal to zero.&rdquo; In our terms above, it&rsquo;s like that trick coin &ndash; we know exactly where everything is in the crystal, and it doesn&rsquo;t move. Interestingly, a perfect crystal at 0 Kelvin cannot exist in nature; no finite process can reduce entropy to that point; like infinite certainty, infinitely exact crystals are impossible to arrive at, unless you started there. So far, we could build a clever analogy between temperature and certainty, telling us that &ldquo;you&rsquo;re getting warmer&rdquo; means exactly the opposite of what it does in common usage &ndash; but I think this is misleading<a name="_ftnref3" href="file:///C:/Users/dmanheim/Documents/LA%20Rationality%201%20-%20Perceptual%20Entropy%20and%20State.docx#_ftn3"><span class="MsoFootnoteReference"><!--[if !supportFootnotes]--><span class="MsoFootnoteReference"><span style="font-size:11.0pt;line-height:115%; font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;mso-ascii-theme-font:minor-latin;mso-fareast-font-family: Calibri;mso-fareast-theme-font:minor-latin;mso-hansi-theme-font:minor-latin; mso-bidi-font-family:&quot;Times New Roman&quot;;mso-bidi-theme-font:minor-bidi; mso-ansi-language:EN-US;mso-fareast-language:EN-US;mso-bidi-language:AR-SA">[5]</span></span><!--[endif]--></span></a>.</p> <p class="MsoNormal">In fact, I think that information in our analogy doesn&rsquo;t change the temperature; instead, it reduces the volume! In the analogy, gases can become liquids or solids either by lowering temperature, or by increasing pressure &ndash; which is what evidence does. Specifically, evidence constrains the set of possibilities, squeezing our hypothesis space. The phrase &ldquo;weight of evidence&rdquo; is now metaphorically correct; it will actually constrain the space by applying pressure.</p> <p class="MsoNormal">I think that by analogy, this explains the phenomenon we see with perception. While we are uncertain, information increases pressure, and our conceptual estimate can condense from uncertain to a relatively contained liquid state &ndash; not because we have less probability to distribute, but because the evidence has constrained &nbsp;the space over which we can distribute it. Alternatively, we can settle on a lower energy state on our own, unassisted by evidence. If our minds too-quickly settle on a theory or idea, the gas settles into a corner of the available space, and if we fail to apply enough energy to the problem, our unchallenged opinion can even freeze into place.</p> <p class="MsoNormal">Our mental models can be liquid, gaseous, or frozen in place &ndash; either by our prior certainty, our lack of energy required to update, or an immense amount of evidential pressure. When we look at those faces, our minds settle into a model quickly, and once there, fail to apply enough energy to re-evaporate our decision until the pressure of the new pictures is relatively immense. If we had started at picture 3 or 6,&nbsp;we could much more easily update away from our estimates; our minds are less willing to let the cloud settle into a puddle of probable answers, much less freeze into place. We can easily see the face, or the woman, moving between just these two images.</p> <p class="MsoNormal">When we begin to search for a mental model to describe some phenomena, whether it be patterns of black and white on a page, or the way in which our actions will affect a friend, I am suggesting we settle into a puddle of likely options, and when not actively investing energy into the question, we are likely to freeze into a specific model.</p> <h2>What does this approach retrodict, or better, forbid?</h2> <p class="MsoNormal">Because our minds have limited energy, the process of maintaining an uncertain stance should be difficult. This seems to be borne out by personal and anecdotal experience, but I have not yet searched the academic literature to find more specific validation.</p> <p class="MsoNormal">We should have more trouble updating away from a current model than we do arriving at that new model from the beginning. As Heuer puts it, &ldquo;Initial exposure to&hellip; ambiguous stimuli interferes with accurate perception even after more and better information becomes available.&rdquo; He notes that this was shown in Brunder and Potter, 1964 &ldquo;Interference in Visual Recognition,&rdquo; and that &ldquo;the early but incorrect impression tends to persist because the amount of information necessary to invalidate a hypothesis is considerably greater than the amount of information required to make an initial interpretation.&rdquo;</p> <h2>Potential avenues of further thought</h2> <p class="MsoNormal">The pressure of evidence should reduce the mental effort needed to switch models, but &ldquo;leaky&rdquo; hypothesis sets, where a class of model is not initially considered, should allow the pressure to metaphorically escape into the larger hypothesis space.</p> <p class="MsoNormal">There is a potential for making this analogy more exact, but discussing entropy in graphical models (Bayesian Networks), especially in sets of graphical models with explicit uncertainty attached. I don&rsquo;t have the math needed for this, but would be interested in hearing from those who did.</p> <div><!--[if !supportFootnotes]--><br /> <hr size="1" /> <div id="ftn1"> <p class="MsoFootnoteText"><a name="_ftn1" href="file:///C:/Users/dmanheim/Documents/LA%20Rationality%201%20-%20Perceptual%20Entropy%20and%20State.docx#_ftnref1"><span class="MsoFootnoteReference"><span class="MsoFootnoteReference"><span style="font-size:10.0pt;line-height:115%;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;; mso-ascii-theme-font:minor-latin;mso-fareast-font-family:Calibri;mso-fareast-theme-font: minor-latin;mso-hansi-theme-font:minor-latin;mso-bidi-font-family:&quot;Times New Roman&quot;; mso-bidi-theme-font:minor-bidi;mso-ansi-language:EN-US;mso-fareast-language: EN-US;mso-bidi-language:AR-SA">[1]</span></span></span></a>&nbsp;I would like to thank both Abram Demski (<a href="/lw/9fz/qa_with_abram_demski_on_risks_from_ai/">Interviewed here</a>) from providing a link to this material, and my dissertation chair, Paul Davis, who was able to point me towards how this has been used and extended in the intelligence community.</p> </div> <div id="ftn2"> <p class="MsoFootnoteText"><a name="_ftn2" href="file:///C:/Users/dmanheim/Documents/LA%20Rationality%201%20-%20Perceptual%20Entropy%20and%20State.docx#_ftnref2"><span class="MsoFootnoteReference"><span class="MsoFootnoteReference"><span style="font-size:10.0pt;line-height:115%;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;; mso-ascii-theme-font:minor-latin;mso-fareast-font-family:Calibri;mso-fareast-theme-font: minor-latin;mso-hansi-theme-font:minor-latin;mso-bidi-font-family:&quot;Times New Roman&quot;; mso-bidi-theme-font:minor-bidi;mso-ansi-language:EN-US;mso-fareast-language: EN-US;mso-bidi-language:AR-SA">[2]</span></span></span></a>&nbsp;There is a follow up book and training course which is also available, but I&rsquo;ve not read it nor seen it online. A shorter version of the main points of that book is <a href="https://www.cia.gov/library/center-for-the-study-of-intelligence/csi-publications/books-and-monographs/Tradecraft%20Primer-apr09.pdf">here</a>&nbsp;(PDF), which I have only glanced through.</p> </div> <!--[endif]--> <div id="ftn1"> <p class="MsoFootnoteText"><a name="_ftn1" href="file:///C:/Users/dmanheim/Documents/LA%20Rationality%201%20-%20Perceptual%20Entropy%20and%20State.docx#_ftnref1"><span class="MsoFootnoteReference"><!--[if !supportFootnotes]--><span class="MsoFootnoteReference"><span style="font-size:10.0pt;line-height:115%;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;; mso-ascii-theme-font:minor-latin;mso-fareast-font-family:Calibri;mso-fareast-theme-font: minor-latin;mso-hansi-theme-font:minor-latin;mso-bidi-font-family:&quot;Times New Roman&quot;; mso-bidi-theme-font:minor-bidi;mso-ansi-language:EN-US;mso-fareast-language: EN-US;mso-bidi-language:AR-SA">[3]</span></span><!--[endif]--></span></a> Eliezer discusses this idea in <a href="/lw/o1/entropy_and_short_codes/">Entropy and short codes</a>, but I&rsquo;m heading a slightly different direction.</p> </div> <div id="ftn2"> <p class="MsoFootnoteText"><a name="_ftn2" href="file:///C:/Users/dmanheim/Documents/LA%20Rationality%201%20-%20Perceptual%20Entropy%20and%20State.docx#_ftnref2"><span class="MsoFootnoteReference"><!--[if !supportFootnotes]--><span class="MsoFootnoteReference"><span style="font-size:10.0pt;line-height:115%;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;; mso-ascii-theme-font:minor-latin;mso-fareast-font-family:Calibri;mso-fareast-theme-font: minor-latin;mso-hansi-theme-font:minor-latin;mso-bidi-font-family:&quot;Times New Roman&quot;; mso-bidi-theme-font:minor-bidi;mso-ansi-language:EN-US;mso-fareast-language: EN-US;mso-bidi-language:AR-SA">[4]</span></span><!--[endif]--></span></a>&nbsp;We have a LW Post, <a href="/lw/ldr/entropy_and_temperature/">Entropy and Temperature</a> that explains this a bit. For a different, simplified explanation, try this: <a href="http://www.nmsea.org/Curriculum/Primer/what_is_entropy.htm">http://www.nmsea.org/Curriculum/Primer/what_is_entropy.htm</a>. For a slightly more complete version, try Wikipedia: <a href="https://en.wikipedia.org/wiki/Introduction_to_entropy">https://en.wikipedia.org/wiki/Introduction_to_entropy</a>. For a much more complete version, learn the math, talk to a PhD in thermodynamics, then read some textbooks yourself.</p> </div> <div id="ftn3"> <p class="MsoFootnoteText"><a name="_ftn3" href="file:///C:/Users/dmanheim/Documents/LA%20Rationality%201%20-%20Perceptual%20Entropy%20and%20State.docx#_ftnref3"><span class="MsoFootnoteReference"><!--[if !supportFootnotes]--><span class="MsoFootnoteReference"><span style="font-size:10.0pt;line-height:115%;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;; mso-ascii-theme-font:minor-latin;mso-fareast-font-family:Calibri;mso-fareast-theme-font: minor-latin;mso-hansi-theme-font:minor-latin;mso-bidi-font-family:&quot;Times New Roman&quot;; mso-bidi-theme-font:minor-bidi;mso-ansi-language:EN-US;mso-fareast-language: EN-US;mso-bidi-language:AR-SA">[5]</span></span><!--[endif]--></span></a> I think this, of course, because I was initially heading in that direction. Instead, I realized there was a better analogy &ndash; but if we wanted to develop it in this direction instead, I&rsquo;d point to the phase change energy required to changed phases of matter as a reason that our minds have trouble moving from their initial estimate. On reflection, I think this should be a small part of the story, if not entirely negligible.</p> </div> </div> </div> davidmanheim uEukcuvnW8s6Kvh4Y 2015-06-03T19:27:31.074Z Meetup : Complex problems, limited information, and rationality; How should we make decisions in real life? https://www.lesswrong.com/posts/RqMJ2jTBbRBCCDw5D/meetup-complex-problems-limited-information-and-rationality <h2>Discussion article for the meetup : <a href='/meetups/s4'>Complex problems, limited information, and rationality; How should we make decisions in real life?</a></h2> <div class="meetup-meta"> <p> <strong>WHEN:</strong>&#32; <span class="date">16 October 2013 07:00:45PM (-0700)</span><br> </p> <p> <strong>WHERE:</strong>&#32; <span class="address">West Los Angeles (At the Westside Tavern Upstair Wine Bar)</span> </p> </div><!-- .meta --> <div id="" class="content"> <div class="md"><p>Most decisions we make involve complex, poorly understood systems. We'd like to be rational anyways, but how?</p> <p>Example time: I am going to pre-commit here to biking to the meetup. Why? I believe that more exercise would increase my physical fitness in ways that are beneficial.</p> <p>But... I haven't done the research into the benefits of physical fitness, and haven't done a tradeoff analysis of time costs versus benefits, I don't know how likely dangerous biking accidents are in LA, I don't know enough about my body to be sure that biking is safe, or a useful way for me to get in shape. Should I spend the week until the meetup researching these factors and building a model, or should I spend time getting work and homework done, playing with my kid, and sleeping? Through a combination of trusting experts, laziness, and other things to do, I'm not going to do the research.</p> <p>And that's where we are with most decisions. What should we do, if we want to be rational? I have some ideas, some questions, and some willingness to shut up and listen to others, and I might even update my beliefs if others have ideas I like.</p></div> </div><!-- .content --> <h2>Discussion article for the meetup : <a href='/meetups/s4'>Complex problems, limited information, and rationality; How should we make decisions in real life?</a></h2> davidmanheim RqMJ2jTBbRBCCDw5D 2013-10-09T21:44:19.773Z Meetup : Group Decision Making (the good, the bad, and the confusion of welfare economics) https://www.lesswrong.com/posts/pCHScZFWgNcMLJdaS/meetup-group-decision-making-the-good-the-bad-and-the <h2>Discussion article for the meetup : <a href='/meetups/m9'> Group Decision Making (the good, the bad, and the confusion of welfare economics)</a></h2> <div class="meetup-meta"> <p> <strong>WHEN:</strong>&#32; <span class="date">08 May 2013 07:00:00PM (-0700)</span><br> </p> <p> <strong>WHERE:</strong>&#32; <span class="address">West Los Angeles (At the Westside Tavern Upstair Wine Bar)</span> </p> </div><!-- .meta --> <div id="" class="content"> <div class="md"><p>Where: The Westside Tavern in the upstairs Wine Bar (all ages welcome), located inside the Westside Pavillion on the second floor, right by the movie theaters. The entrance sign says "Lounge".</p> <p>Parking is free for 3 hours</p> <p>Or you can take a Public Transit! A Trip Planner can be found here: <a href="http://socaltransport.org/tm_pub_start.php" rel="nofollow">http://socaltransport.org/tm_pub_start.php</a> &lt;- So you can try to avoid multiple hour trips! (We appreciate your attendance despite length of commute!)</p> <p>We will hang out for 30 minutes or so, then I'll spend 10-15 minutes presenting: Group decision making. AKA Why voting can be a stupid way to make utility decisions, AKA Adding utility between people is stupid, this is an ordinal scale AKA Didn't Arrow win a Nobel prize for telling you people to stop?</p> <p>Then we'll talk about what math and economics can say about making collective decisions in a way that isn't ill defined, and continue a hopefully interesting discussion. (Bonus points if it leads to a publishable idea for me!)</p> <p>This will be a great break for me from... writing papers and taking tests about the same subject.</p> <p>No foreknowledge or exposure to Less Wrong is necessary; this will be generally accessible and useful to anyone who values thinking for themselves. That said, it might help to read <a href="http://lesswrong.com/lw/ggm/pinpointing_utility/" rel="nofollow">http://lesswrong.com/lw/ggm/pinpointing_utility/</a> so we can avoid type errors and radiation poisoning while we talk. (Not real radiation poisoning!)</p></div> </div><!-- .content --> <h2>Discussion article for the meetup : <a href='/meetups/m9'> Group Decision Making (the good, the bad, and the confusion of welfare economics)</a></h2> davidmanheim pCHScZFWgNcMLJdaS 2013-04-30T16:18:04.955Z