Comment by davidmanheim on Did the recent blackmail discussion change your beliefs? · 2019-03-25T13:30:13.348Z · score: 4 (3 votes) · LW · GW

It's not politics in disguise, but it's hard to discuss rationally for similar reasons. Politics is hard-mode for rationality because it is a subcategory of identity and morals. The moral rightness of a concrete action seems likely to trigger all of the same self-justification that any politics discussion will, albeit along different lines. Making this problem plausibly worse is that the discussion of morality here cannot be as easily tied to disagreements about predicted outcomes as those that occur in politics.

Comment by davidmanheim on Understanding information cascades · 2019-03-25T09:20:14.511Z · score: 3 (2 votes) · LW · GW

As I replied to Pablo below, "'s an argument from first principles. Basically, if you extremize guesses from 90% to 95%, and 90% is a correct estimate, 9/10 times you do better due to extremizing. "

Comment by davidmanheim on Understanding information cascades · 2019-03-25T09:19:47.975Z · score: 9 (3 votes) · LW · GW

You don't need the data - it's an argument from first principles. Basically, if you extremize guesses from 90% to 95%, and 90% is a correct estimate, 9/10 times you do better due to extremizing.

Comment by davidmanheim on How can we respond to info-cascades? [Info-cascade series] · 2019-03-17T10:43:09.219Z · score: 11 (3 votes) · LW · GW

The Systems Dynamics "Beer Game" seems like a useful example of how something like (but not the same as) an info-cascade happens. - "The beer distribution game (also known as the beer game) is an experiential learning business simulation game created by a group of professors at MIT Sloan School of Management in early 1960s to demonstrate a number of key principles of supply chain management. The game is played by teams of at least four players, often in heated competition, and takes at least one hour to complete... The purpose of the game is to understand the distribution side dynamics of a multi-echelon supply chain used to distribute a single item, in this case, cases of beer."

Basically, passing information through a system with delays means everyone screws up wildly as the system responds in a nonlinear fashion to a linear change. In that case, Forrester and others suggest that changing viewpoints and using systems thinking is critical in preventing the cascades, and this seems to have worked in some cases.

(Please respond if you'd like more discussion.)

Comment by davidmanheim on Understanding information cascades · 2019-03-17T10:36:30.211Z · score: 7 (2 votes) · LW · GW

That's a great point. I'm uncertain if the analyses account for the cited issue, where we would expect a priori that extremizing slightly would on average hurt the accuracy, but in any moderately sized sample (like the forecasting tournament,) it is likely to help. It also relates to a point I made about why proper scoring rules are not incentive compatible in tournaments in a tweetstorm here; .

Interestingly, a similar dynamic may happen in tournaments, and could be part of where info-cascades occur. I can in expectation outscore everyone else slightly and minimize my risk of doing very poorly by putting my predictions a bit to the extreme of the current predictions. It's almost the equivalent of betting a dollar more than the current high bid in price is right - you don't need to be close, you just need to beat the other people's scores to win. But if I report my best strategy answer instead of my true guess, it seems that it could cascade if others are unaware I am doing this.

Comment by davidmanheim on Understanding information cascades · 2019-03-14T10:57:02.174Z · score: 9 (5 votes) · LW · GW

There's better, simpler results that I recall but cannot locate right now on doing local updating that is algebraic, rather than deep learning. I did find this, which is related in that it models this type of information flow and shows it works even without fully Bayesian reasoning; Jadbabaie, A., Molavi, P., Sandroni, A., & Tahbaz-Salehi, A. (2012). Non-Bayesian social learning. Games and Economic Behavior, 76(1), 210–225.

Given those types of results, the fact that RL agents can learn to do this should be obvious. (Though the social game dynamic result in the paper is cool, and relevant to other things I'm working on, so thanks!)

Comment by davidmanheim on Understanding information cascades · 2019-03-14T10:44:31.314Z · score: 23 (5 votes) · LW · GW

I'm unfortunately swamped right now, because I'd love to spend time working on this. However, I want to include a few notes, plus reserve a spot to potentially reply more in depth when I decide to engage in some procrastivity.

First, the need for extremizing forecasts (See: Jonathan Baron, Barbara A. Mellers, Philip E. Tetlock, Eric Stone, Lyle H. Ungar (2014) Two Reasons to Make Aggregated Probability Forecasts More Extreme. Decision Analysis 11(2):133-145. seems like evidence that this isn't typically the dominant factor in forecasting. However, c.f. the usefulness of teaming and sharing as a way to ensure actual reasons get accounted for ( Mellers, B., Ungar, L., Baron, J., Ramos, J., Gurcay, B., Fincher, K., ... & Murray, T. (2014). Psychological strategies for winning a geopolitical forecasting tournament. Psychological science, 25(5), 1106-1115. )

Second, the solution that Pearl proposed for message-passing to eliminate over-reinforcement / double counting of data seems to be critical and missing from this discussion. See his book: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. I need to think about this more, but if Aumann agreement is done properly, people eventually converge on correct models of other reasoners, which should also stop info-cascades. The assumption of both models, however, is that there is iterated / repeated communication. I suspect that we can model info-cascades as a failure at exactly that point - in the examples given, people publish papers, and there is no dialogue. For forecasting, explicit discussion of forecasting reasons should fix this. (That is, I might say "My model says 25%, but I'm giving that only 50% credence and allocating the rest to the consensus value of 90%, leading to my final estimate of 57.5%")

Third, I'd be really interested in formulating testable experimental setups in Mturk or similar to show/not show this occurring, but on reflection this seems non-trivial, and I haven't thought much about how to do it other than to note that it's not as easy as it sounded at first.

Comment by davidmanheim on The RAIN Framework for Informational Effectiveness · 2019-02-26T10:26:46.340Z · score: 1 (1 votes) · LW · GW

The works on decision theory tend to be general, but I need my textbooks to find better resources - I'll see if I have the right ones at home. Until then, Andrew Gelmans' BDA3 explicitly formulates VoI as a multi-stage decision tree in section 9.3, thereby making it clear that the same procedure is generalizable. And Jaynes doesn't call it VoI in PT:LoS, but his discussion in the chapter on simple applications of decision theory leaves the number of decision implicitly open.

Comment by davidmanheim on Probability space has 2 metrics · 2019-02-14T10:38:15.499Z · score: 5 (1 votes) · LW · GW

Yes - and this is equivalent to saying that evidence about probability provides Bayesian metric evidence - you need to transform it.

Comment by davidmanheim on The RAIN Framework for Informational Effectiveness · 2019-02-14T10:30:12.536Z · score: 3 (2 votes) · LW · GW

Minor comment/correction - VoI isn't necessarily linked to a single decision, but the way it is typically defined in introductory works, it implicit that it is limited to one decision. This is mostly because (as I found out when trying to build more generalized VoI models for my dissertation,) it's usually quickly intractable for multiple decisions.

Comment by davidmanheim on Why we need a *theory* of human values · 2019-02-14T10:25:00.352Z · score: 1 (1 votes) · LW · GW

I agree, and think work in the area is valuable, but would still argue that unless we expect a correct and coherent answer, any single approach is going to be less effective than an average of (contradictory, somewhat unclear) different models.

As an analogue, I think that effort into improving individual prediction accuracy and calibration is valuable, but for most estimation questions, I'd bet on an average of 50 untrained idiots over any single superforecaster.

Comment by davidmanheim on Spaghetti Towers · 2019-02-14T10:20:18.739Z · score: 2 (2 votes) · LW · GW

Having looked into this, it's partly that, but mostly that tax codes are written in legalese. A simple options contract for a call, which can easily be described in 10 lines of code, or a one-line equation. But the legal terms are actually this 188 page pamplet; which is (technically but not enforced to be a) legally required reading for anyone who wants to purchase an exchange traded option. And don't worry - it explicitly notes that it doesn't cover the actual laws governing options, for which you need to read the relevant US code, or the way in which the markets for trading them work, or any of the risks.

Comment by davidmanheim on How much can value learning be disentangled? · 2019-02-11T08:42:46.557Z · score: 1 (1 votes) · LW · GW

re: #2, VoI doesn't need to be constrained to be positive. If in expectation you think the information will have a net negative impact, you shouldn't get the information.

re: #3, of course VoI is subjective. It MUST be, because value is subjective. Spending 5 minutes to learn about the contents of a box you can buy is obviously more valuable to you than to me. Similarly, if I like chocolate more than you, finding out if a cake has chocolate is more valuable for me than for you. The information is the same, the value differs.

Comment by davidmanheim on How much can value learning be disentangled? · 2019-02-11T08:38:43.716Z · score: 3 (2 votes) · LW · GW
This matters because if the Less Wrong view of the world is correct, it's more likely that there are clean mathematical algorithms for thinking about and sharing truth that are value-neutral (or at least value-orthogonal, e.g. "aim to share facts that the student will think are maximally interesting or surprising".

I don't think this is correct - it misses the key map-territory distinction in the human mind. Even though there is "truth" in an objective sense, there is no necessity that the human mind can think about or share that truth. Obviously we can say that experientially we have something in our heads that correlates with reality, but that doesn't imply that we can think about truth without implicating values. It also says nothing about whether we can discuss truth without manipulating the brain to represent things differently - and all imperfect approximations require trade-offs. If you want to train the brain to do X, you're implicitly prioritizing some aspect of the brain's approximation of reality over others.

Comment by davidmanheim on Why we need a *theory* of human values · 2019-02-10T12:21:46.075Z · score: 5 (3 votes) · LW · GW

Maybe I'm reading your post wrong, but it seems that you're assuming that a coherent approach is needed in a way that could be counter-productive. I think that a model of an individual's preferences is likely to be better represented by taking multiple approaches, where each fails differently. I'd think that a method that extends or uses revealed preferences would have advantages and disadvantages that none of, say, stated preferences, TD Learning, CEV, or indirect normativity share, and the same would be true for each of that list. I think that we want that type of robust multi-model approach as part of the way we mitigate over-optimization failures, and to limit our downside from model specification errors.

(I also think that we might be better off building AI to evaluate actions on the basis of some moral congress approach using differently elicited preferences across multiple groups, and where decisions need a super-majority of some sort as a hedge against over-optimization of an incompletely specified version of morality. But it may be over-restrictive, and not allow any actions - so it's a weakly held theory, and I haven't discussed it with anyone.)

Comment by davidmanheim on How does Gradient Descent Interact with Goodhart? · 2019-02-03T19:28:21.359Z · score: 3 (2 votes) · LW · GW

Having tried to play with this, I'll strongly agree that random functions on R^N aren't a good place to start. But I've simulated random nodes in the middle of a causal DAG, or selecting ones for high correlation, and realized that they aren't particularly useful either; people have some appreciation of causal structure, and they aren't picking metrics randomly for high correlation - they are simply making mistakes in their causal reasoning, or missing potential ways that the metric can be intercepted. (But I was looking for specific things about how the failures manifested, and I was not thinking about gradient descent, so maybe I'm missing your point.)

Comment by davidmanheim on Fixed Point Discussion · 2018-12-23T08:27:26.733Z · score: 11 (3 votes) · LW · GW

"(Each the same size as the original.)"

I was not expecting to laugh reading this. Well done - I just wish I hadn't been in the middle of drinking my coffee.

Comment by davidmanheim on Systems Engineering and the META Program · 2018-12-23T06:59:38.695Z · score: 3 (2 votes) · LW · GW

Good finds!

I think they are headed in the right direction, but I'm skeptical of the usefulness their work on complexity. The metrics ignore computational complexity of the model, and assume all the variance is modeled based on sources like historical data and expert opinion. It's also not at all useful unless we can fully characterize the components of the system, which isn't usually viable.

It also seems to ignore the (in my mind critical) difference between "we know this is evenly distributed in the range 0-1" and "we have no idea what the distribution of this is over the space 0-1." But I may be asking for too much in a complexity metric.

Comment by davidmanheim on The Vulnerable World Hypothesis (by Bostrom) · 2018-12-16T08:07:31.230Z · score: 2 (2 votes) · LW · GW

I discuss a different reformulation in my new paper, "Systemic Fragility as a Vulnerable World" casting this as an explore/exploit tradeoff in a complex space. In the paper, I explicitly discuss the way in which certain subspaces can be safe or beneficial.

"The push to discover new technologies despite risk can be understood as an explore/exploit tradeoff in a potentially dangerous environment. At each stage, the explore action searches the landscape for new technologies, with some probability of a fatal result, and some probability of discovering a highly rewarding new option. The implicit goal in a broad sense is to find a search strategy that maximize humanity's cosmic endowment - neither so risk-averse that advanced technologies are never explored or developed, nor so risk-accepting that Bostrom's postulated Vulnerable World becomes inevitable. Either of these risks astronomical waste. However, until and unless the distribution of black balls in Bostrom's technological urn is understood, we cannot specify an optimal strategy. The first critical question addressed by Bostrom - ``Is there a black ball in the urn of possible inventions?'' is, to reframe the question, about the existence of negative singularities in the fitness landscape."

Comment by davidmanheim on The Vulnerable World Hypothesis (by Bostrom) · 2018-12-16T08:03:22.644Z · score: 1 (1 votes) · LW · GW

As an extension of Bostrom's ideas, I have written a draft entitled " Systemic Fragility as a Vulnerable World " where I introduce the "Fragile World Hypothesis."


The possibility of social and technological collapse has been the focus of science fiction tropes for decades, but more recent focus has been on specific sources of existential and global catastrophic risk. Because these scenarios are simple to understand and envision, they receive more attention than risks due to complex interplay of failures, or risks that cannot be clearly specified. In this paper, we discuss a new hypothesis that complexity of a certain type can itself function as a source of risk. This ”Fragile World Hypothesis” is compared to Bostroms ”Vulnerable World Hypothesis”, and the assumptions and potential mitigations are contrasted.

Comment by davidmanheim on How Old is Smallpox? · 2018-12-13T09:56:20.636Z · score: 4 (3 votes) · LW · GW


But to clarify, I don't think the Antonine plague is quite the same as modern ones, for the simple reason that it could only spread over a fairly limited geographic region, and it could not become endemic because of population density constraints. Smallpox evolution is driven by selection pressure in humans, and the "500 years old" claim is about that evolution, not about whether it affected humans at any time in the past. That said, it absolutely matters, because if the original source of smallpox was only 500 years ago, where did it come from?

The question is how smallpox evolved, and what variant was present prior to the 1500s. It's plausible that Horsepox, which was probably the source for the vaccine strain, or Cowpox, spread via intermediate infections in cats, were the source - but these are phylogenetically distant enough that, from my limited understanding, it's clearly implausible that it first infected humans and turned into modern smallpox at recently as the 1500s. (But perhaps this is exactly the claim of the paper. I'm unclear.) Instead, my understanding is that there must have been some other conduit, and it seems very likely that it's related to a historically much earlier human pox virus - thousands of years, not hundreds.

Comment by davidmanheim on How Old is Smallpox? · 2018-12-13T07:47:02.733Z · score: 2 (2 votes) · LW · GW

I'm definitely not the best person to explain this, since I'm more on the epidemiology side. I understand the molecular clock analyses a bit, and they involve mutation rates plus tracking mutations in different variants, and figuring out how long it should take for the various samples collected at different times to have diverged, and what their common ancestors are.

Comment by davidmanheim on Should ethicists be inside or outside a profession? · 2018-12-13T07:43:12.449Z · score: 10 (3 votes) · LW · GW

Thank you! This is a point I keep trying to make, less eloquently, in both bioethics and in AI safety.

We need fewer talking heads making suggestions for how to regulate, and more input from actual experts, and more informed advice going to decision makers. If "professional ethicists" have any role, it should be elicitation, attempting to reconcile or delineate different opinions, and translation of ethical opinions of experts into norms and policies.

Comment by davidmanheim on Multi-agent predictive minds and AI alignment · 2018-12-13T06:13:22.172Z · score: 5 (5 votes) · LW · GW

I have several short comments about part 3, short not because there is little to say, but because I want to make the points and do not have time to discuss them in depth right now.

1) If multi-agent systems are more likely to succeed in achieving GAI, we should shut up about why they are important. I'm concerned about unilateralist curse, and would ask that someone from MIRI weigh in on this.

2) I agree that multi-agent systems are critical, but for different (non-contradictory) reasons - I think multi-agent systems are likely to be less safe and harder to understand. See draft of my forthcoming article here:

3) If this is deemed to be important, the technical research directions point to here are under-specified and too vague to be carried out. I think concretizing them would be useful. (I'd love to chat about this, as I have ideas in this vein. If you are interested in talking, feel free to be in touch - .)

Comment by davidmanheim on How Old is Smallpox? · 2018-12-11T08:19:35.668Z · score: 28 (11 votes) · LW · GW

There is genetic evidence discussed in Hopkins' "Princes and Peasants: Smallpox in History," which implies ancient existence of variola viruses, as you note from the Wiki article. The newer paper overstates the case in typical academic fashion in order to sound as noteworthy as possible. The issue with saying that earlier emergence is not the "current" disease of smallpox is that we expect significant evolution to occur once there is sufficient population density, and more once there is selection pressure due to vaccination, and so it is very unsurprising that there are more recent changes. (I discuss this in my most recent paper, )

It's very clear that a precursor disease existed in humans for quite a while. It's also very clear that these outbreaks in thin populations would have continued spreading, so I'm unconvinced that the supposed evidence of lack due to Hippocrate's omission, and the lack of discussion in the old and new testament is meaningful. And regarding the old testament, at least, the books aren't great with describing "plagues" in detail, and there are plenty of times we hear about some unspecified type of plague or malady as divine punishment.

So the answer depends on definitions. It's unclear that there is anything like a smallpox epidemic as the disease currently occurs in a population that is not concentrated enough for significant person-to-person spread. If that's required, we have no really ancient diseases, because we defined them away.

Comment by davidmanheim on Is Science Slowing Down? · 2018-12-03T12:34:40.065Z · score: 3 (3 votes) · LW · GW

The model implies that if funding and prestige increased, this limitation would be reduced. And I would think we don't need prestige nearly as much as funding - even if near-top scientists were recruited and paid the way second and third string major league players in most professional sports were paid, we'd see a significant relaxation of the constraint.

Instead, the uniform wage for most professors means that even the very top people benefit from supplementing their pay with consulting, running companies on the side, giving popular lectures for money, etc. - all of which compete for time with their research.

Comment by davidmanheim on Is Science Slowing Down? · 2018-11-30T08:34:50.420Z · score: 6 (2 votes) · LW · GW

Yes, this might help somewhat, but there is an overhead / deduplication tradeoff that is unavoidable.

I discussed these dynamics in detail (i.e. at great length) on Ribbonfarm here.

The large team benefit would explain why most innovation happens near hubs / at the leading edge companies and universities, but that is explained by the other theories as well.

Comment by davidmanheim on Is Science Slowing Down? · 2018-11-29T13:23:34.372Z · score: 6 (4 votes) · LW · GW

The problem with fracturing is that you lose coordination and increase duplication.

I have a more general piece that discusses scaling costs and structure for companies that I think applies here as well -

Comment by davidmanheim on Is Science Slowing Down? · 2018-11-29T11:07:58.476Z · score: 9 (4 votes) · LW · GW

This seems to omit a critical and expected limitation as a process scales up in the number of people involved - communication and coordination overhead.

If there is low hanging fruit, but everyone is reaching for it simultaneously, then doubling the number of researchers won't increase the progress more than very marginally. (People with slightly different capabilities implies that the expected time to success will be the minimum of different people.) But even that will be overwhelmed by the asymptotic costs for everyone to find out that the low-hanging fruit they are looking for has been picked!

Is there a reason not to think that this dynamic is enough to explain the observed slowdown - even without assuming hypothesis 3, of no more low-hanging fruit?

Comment by davidmanheim on Oversight of Unsafe Systems via Dynamic Safety Envelopes · 2018-11-26T17:14:55.077Z · score: 1 (1 votes) · LW · GW

The paper is now live on Arxiv:

Comment by davidmanheim on Values Weren't Complex, Once. · 2018-11-26T13:28:34.779Z · score: 1 (1 votes) · LW · GW

In part, I think the implication of zero-sum versus non-zero sum status is critical. Non-zero sum status is "I'm the best left-handed minor league pitcher by allowed runs" while zero-sum status is "by total wealth/power, I'm 1,352,235,363rd in the world." Saying we only have on positional value for status seemingly assumes the zero-sum model.

The ability to admit these non-zero sum status signals has huge implications for whether we can fulfill values. If people can mostly find relatively high-position niches, the room for selection on noise and path-dependent value grows.

This also relates to TAG's point about whether we care about "value" or "moral value" - and I'd suggest there might be moral value in fulfilling preferences only if they are not zero-sum positional ones.

Comment by davidmanheim on Values Weren't Complex, Once. · 2018-11-26T13:21:30.643Z · score: 3 (2 votes) · LW · GW

This is a good point, but I'll lay out the argument against it.

To start, I'm personally skeptical of the claim that preferences and moral values can be clearly distinguished, especially given the variety of value systems that people have preferred over time, or even today.

Even if this is false, we seem to see the same phenomenon occur with moral values. I think the example of obvious differences in the relative preference for saving dogs, the elderly, or criminals points to actual differences in values - but as I argued above, I think this is a heavily optimized subspace of a moral intuition towards liking life which is now largely selecting on noise. But the difference in moral conclusions that follow from assigning animal lives exactly zero versus smaller-than human but nonzero value are huge.

Comment by davidmanheim on Values Weren't Complex, Once. · 2018-11-25T12:37:00.221Z · score: 3 (3 votes) · LW · GW

Yes, and that's closely related to the point I made about " we're adaptation executioners, not fitness maximizers."

My point is a step further, I think - I'm asking what decides which things we plan to do? It's obviously our "preferences," but if we've already destroyed everything blue, the next priority is very underspecified.

Comment by davidmanheim on Values Weren't Complex, Once. · 2018-11-25T12:33:19.943Z · score: 1 (1 votes) · LW · GW

I agree that positional goods are important even in the extreme, but:

1) I don't think that sexual desires or food preferences fit in this mold.

2) I don't think that which things are selected as positional goods (perhaps other than wealth and political power) is dictated by anything other than noise and path dependence - the best tennis player, the best DOTA player, or the most cited researcher are all positional goods, and all can absorb arbitrary levels of effort, but the form they take and the relative prestige they get is based on noise.

Values Weren't Complex, Once.

2018-11-25T09:17:02.207Z · score: 34 (15 votes)
Comment by davidmanheim on On MIRI's new research directions · 2018-11-23T08:42:08.999Z · score: 19 (7 votes) · LW · GW

Yes, this very much resonates with me, especially because a parallel issue exists in biosecurity, where we don't want to talk publicly about how to work to prevent things that we're worried about because it could prompt bad actors to look into those things.

The issues here are different, but the need to have walls between what you think about and what you discuss imposes a real cost.

Oversight of Unsafe Systems via Dynamic Safety Envelopes

2018-11-23T08:37:30.401Z · score: 11 (5 votes)
Comment by davidmanheim on Collaboration-by-Design versus Emergent Collaboration · 2018-11-19T09:59:07.223Z · score: 0 (2 votes) · LW · GW

I don't think humans have collaboration as a default - it's only because evolution was due to social pressure that this occurs at all, and it occurs primarily at the social-structure level, not as an outcome of individual effort.

Even if this is wrong, however, non-GAI systems can pose existential risks.

Comment by davidmanheim on Topological Fixed Point Exercises · 2018-11-18T09:13:36.761Z · score: 11 (3 votes) · LW · GW

I'm stuck part-way through on #4 - I assume there is a way to do this without the exhaustive search I'm running into needing.

I'm going to try (nested) induction. Define triangles by side size, measured in nodes.

Induction base step: For n=2, there must be exactly one trichromatic edge.

Induction step: If there are an odd number of tri-chromatic edges for all triangles n=x, we must show that this implies the same for n=x+1.

We create all possible new triangles by adding x+1 nodes on one of the sides, then allow any of the previous x nodes on that side to change. Without loss of generality, assume we add x+1 edges to the bottom (non-red) side. These must be green or blue. The previous layer can now change any number of node-colors. We now must prove this by induction on color changes of nodes in the second-to-bottom layer to be red. (If they flip color otherwise, it is covered by a different base case.)

First, base step, assume no nodes change color. Because the previous triangle had an odd number of trichromatic edges, and the new edge is only green+blue, no new trichromatic edges were created.

Induction step: There is an x+1 triangle with an odd number of trichromatic vertices, and one node in the second-to-bottom layer changes to red. This can only create a new tri-cromatic triangle in one of the six adjacent triangles. We split this into (lots of) cases, and handle them one at a time.

(Now I get into WAY too many cases. I started and did most of the edge-node case, but it's a huge pain. Is there some other way to do this, presumably using some nifty graph theory I don't know, or will I need to list these out? Or should I not be using the nested induction step?)

Pointers welcome!

Comment by davidmanheim on Topological Fixed Point Exercises · 2018-11-18T07:35:50.271Z · score: 11 (3 votes) · LW · GW

I am having trouble figuring out why #2 needs / benefits from Sperner's Lemma.

But I keep going back to the proof that I'm comfortable with, which depends on connectedness, so I'm clearly missing an obvious alternative proof that doesn't need topology.

Collaboration-by-Design versus Emergent Collaboration

2018-11-18T07:22:16.340Z · score: 12 (3 votes)
Comment by davidmanheim on Embedded World-Models · 2018-11-15T06:55:20.024Z · score: 4 (2 votes) · LW · GW

Some of these issues (obviously) are not limited to AI. Specifically, the problem of how to deal with multi-level models and "composibility" was the subject of an applied research project for military applications by my dissertation chair, Paul Davis, here: -

"The appealing imagery of arbitrary plug-and-play is fatally flawed for complex models... The more-complex [lower level] model components have typically been developed for particular purposes and depend on context-sensitive assumptions, some of which are tacit."

This issue has formed the basis of a fair amount of his later work as well, but this work focuses on practical advice, rather than conceptual understanding of the limitations. Still, that type of work may be useful as inspiration.

Comment by davidmanheim on Multi-Agent Overoptimization, and Embedded Agent World Models · 2018-11-14T14:35:27.424Z · score: 5 (3 votes) · LW · GW

Yes, there is a ton of work on some of these in certain settings, and I'm familiar with some of it.

In fact, the connections are so manifold that I suspect it would be useful to lay out which if these connections seems useful, in another paper, if only to save other people time and energy trying to do the same and finding dead-ends. On reflection, however, I'm concerned about how big of a project this ends up becoming, and I am unsure how useful it would be to applied work in AI coordination.

Just as one rabbit hole to go down, there is a tremendous amount of work on cooperation, which spans several very different literatures. The most relevant work, to display my own obvious academic bias, seems to be from public policy and economics, and includes work on participatory decision making and cooperative models for managing resources. Next, you mentioned law - I know there is work on interest-based negotiation, where defining the goals clearly allows better solutions, as well as work on mediation. In business, there is work on team-building that touches on these points, as well as inter-group and inter-firm competition and cooperation, which touch on related work in economics. I know the work on principle agent problems, as well as game-theory applied to more realistic scenarios. (Game theorists I've spoken with have noted the fragility of solutions to very minor changes in the problem, which is why it's rarely applied.) There's work in evolutionary theory, as well as systems biology, that touches on some of these points. Social psychology, Anthropology, and Sociology all presumably have literatures on the topic as well, but I'm not at all familiar with them.

Comment by davidmanheim on What is ambitious value learning? · 2018-11-09T11:23:37.428Z · score: 3 (2 votes) · LW · GW

That's an important question, bu it's also fundamentally hard, since it's almost certainly true that human values are inconsistent - if not individually, than at an aggregate level. (You can't reconcile opposite preferences, or maximize each person's share of a finite resource.)

The best answer I have seen is Eric Drexler's discussion of Pareto-topia, where he suggests that we can make huge progress and gain of utility according to all value-systems held by humans, despite the fact that they are inconsistent.

Multi-Agent Overoptimization, and Embedded Agent World Models

2018-11-08T20:33:00.499Z · score: 9 (4 votes)
Comment by davidmanheim on What is ambitious value learning? · 2018-11-08T20:18:33.002Z · score: 1 (1 votes) · LW · GW

Sorry, I needed to clarify my thinking and my claim a lot further. This is in addition to the (what I assumed was obvious) claim that correct Bayesian thinkers should be able to converge on beliefs despite potentially having different values. I'm speculating that if terminal values are initially drawn from a known distribution, AND "if you think that a different set of life experiences means that you are a different person with different values," but that values change based on experiences in ways that are understandable, then rational humans will act in a coherent way so that we should expect to be able to learn human values and their distribution, despite the existence of shifts.

Conditional on those speculative thoughts, I disagree with your conclusion that "that's a really good reason to assume that the whole framework of getting the true human utility function is doomed." Instead, I think we should be able to infer the distribution of values that humans actually have - even if they individually change over time from experiences.

Comment by davidmanheim on No Really, Why Aren't Rationalists Winning? · 2018-11-08T09:52:45.103Z · score: 11 (3 votes) · LW · GW
I'm not sure I expect hiring people solely based on their educational expertise to work out well.

Yes, there needs to be some screening other than pedagogy, but money to find the best people can fix lots of problems. And yes, typical teaching at good universities sucks, but that's largely because it optimizes for research. (You'd likely have had better professors as an undergrad if you went to a worse university - or at least that was my experience.)

...they can only (do something like) streamline the existing product.

My thought was that streamlining the existing product and turning it into useable and testably effective modules would be a really huge thing.

Also: I think you're implying that AI is a really huge deal problem and rationality is less.

If that was the implication, I apologize - I view safe AI as only near-impossible, while making actual humans rational is a problem that is fundamentally impossible. But raising the sanity water-line has some low-hanging fruits - not to get most people to CFAR-expert-levels, but to get high schools to teach some of the basics in ways that potentially has significant leverage in improving social decision-making in general. (And if the top 1% of people in High Schools also take those classes, there might be indirect benefits leading to increasing the number of CFAR-expert-level people in a decade.)

Comment by davidmanheim on Subsystem Alignment · 2018-11-08T09:33:16.681Z · score: 2 (2 votes) · LW · GW

There is a literature on this, and it's not great for the purposes here - principle-agent setups assume we can formalize the goal as a good metric, and the complexity of management is a fundamentally hard problem that we don't have good answers for (see my essay on scaling companies here: ), and goodhart failures due to under-specified goals are fundamentally impossible (see my essay on this here: ).

There are a set of strategies for mitigating the problems, and I have a paper on this that is written but still needs to be submitted somewhere, tentatively titled "Building Less Flawed Metrics: Dodging Goodhart and Campbell’s Laws," if anyone wants to see it they can message/email/tweet at me.

Abstract: Metrics are useful for measuring systems and motivating behaviors. Unfortunately, naive application of metrics to a system can distort the system in ways that undermine the original goal. The problem was noted independently by Campbell and Goodhart, and in some forms it is not only common, but unavoidable due to the nature of metrics. There are two distinct but interrelated problems that must be overcome in building better metrics; first, specifying metrics more closely related to the true goals, and second, preventing the recipients from gaming the difference between the reward system and the true goal. This paper describes several approaches to designing metrics, beginning with design considerations and processes, then discussing specific strategies including secrecy, randomization, diversification, and post-hoc specification. Finally, it will discuss important desiderata and the trade-offs involved in each approach.

(I should edit this comment to be a link once I have submitted and have a pre-print or publication.)

Comment by davidmanheim on Subsystem Alignment · 2018-11-08T09:22:53.354Z · score: 2 (2 votes) · LW · GW
...we're talking about a class of problems that already comes up in all sorts of practical engineering, and which can be satisfactorily handled in many real cases without needing any philosophical advances.

The explicit assumption of the discussion here is that we can't pass the full objective function to the subsystem - so it cannot possibly have the goal fully well defined. This isn't going to depend on whether the subsystem is really smart or really dumb, it's a fundamental problem if you can't tell the subsystem enough to solve it.

But I don't think that's a fair characterization of most Goodhart-like problems, even in the limited practical case. Bad models and causal mistakes don't get mitigated unless we get the correct model. And adversarial Goodhart is much worse than that. I agree that it describes "tails diverge" / regressional goodhart, and we have solutions for that case, (compute the Bayes estimate, as the previous ) but only once the goal is well-defined. (We have mitigations for other cases, but they have their own drawbacks.)

Comment by davidmanheim on No Really, Why Aren't Rationalists Winning? · 2018-11-07T12:43:00.882Z · score: 21 (9 votes) · LW · GW

Very much disagree - but this is as someone not in the middle of the Bay area, where the main part of this is happening. Still, I don't think rationality works without some community.

First, I don't think that the alternative communities that people engage with are epistemically healthy enough to allow people to do what they need to reinforce good norms for themselves.

Second, I don't think that epistemic rationality is something that a non-community can do a good job with, because there is much too little personal reinforcement and positive vibes that people get to stick with it if everyone is going it alone.

Comment by davidmanheim on What is ambitious value learning? · 2018-11-07T07:34:49.901Z · score: 1 (1 votes) · LW · GW

I don't think you are correct about the implication of "not up for grabs" - it doesn't mean it is not learnable, it means that we don't update or change it, and that it is not constrained by rationality. But even that isn't quite right - rational behavior certainly requires that we change preferences about intermediate outcomes when we find that our instrumental goals should change in response to new information.

And if the utility function changes as a result of life experiences, it should be in a way that reflects learnable expectations over how experiences change the utility function - so the argument about needing origin disputes still applies.

Comment by davidmanheim on Subsystem Alignment · 2018-11-07T07:21:23.176Z · score: 5 (4 votes) · LW · GW

It's easy to find ways of searching for truth in ways that harm the instrumental goal.

Example 1: I'm a self-driving car AI, and don't know whether hitting pedestrians at 35 MPH is somewhat bad, because it injures them, or very bad, because it kills them. I should not gather data to update my estimate.

Example 2: I'm a medical AI. Repeatedly trying a potential treatment that I am highly uncertain about the effects of to get high-confidence estimates isn't optimal. I should be trying to maximize something other than knowledge. Even though I need to know whether treatments work, I should balance the risks and benefits of trying them.

Comment by davidmanheim on The easy goal inference problem is still hard · 2018-11-06T09:03:59.600Z · score: 1 (1 votes) · LW · GW

Fair point, but I don't think that addresses the final claim, which is that even if you are correct, analyzing the black box isn't enough without actually playing out counterfactuals.

Comment by davidmanheim on Embedded World-Models · 2018-11-06T09:01:49.741Z · score: 6 (2 votes) · LW · GW
...the weirdness of the injunction to optimize over a space containing every procedure you could ever do, including all of the optimization procedures you could ever do.

My most recent preprint discusses multi-agent Goodhart ( ) and uses the example of poker, along with a different argument somewhat related to the embedded agent problem, to say why the optimization over strategies needs to include optimizing over the larger solution space.

To summarize and try to clarify how I think it relates, strategies for game-playing must at least implicitly include a model of the other player's actions, so that an agent can tell which strategies will work against them. We need uncertainty in that model, because if we do something silly like assume they are rational Bayesian agents, we are likely to act non-optimally against their actual strategy. But the model of the other agent itself needs to account for their model of our strategy, including uncertainty about our search procedure for strategies - otherwise the space is clearly much too large to optimize over.

Does this make sense? (I may need to expand on this and clarify my thinking...)

Policy Beats Morality

2018-10-17T06:39:40.398Z · score: 15 (15 votes)

(Some?) Possible Multi-Agent Goodhart Interactions

2018-09-22T17:48:22.356Z · score: 21 (5 votes)

Lotuses and Loot Boxes

2018-05-17T00:21:12.583Z · score: 27 (6 votes)

Non-Adversarial Goodhart and AI Risks

2018-03-27T01:39:30.539Z · score: 64 (14 votes)

Evidence as Rhetoric — Normative or Positive?

2017-12-06T17:38:05.033Z · score: 1 (1 votes)

A Short Explanation of Blame and Causation

2017-09-18T17:43:34.571Z · score: 1 (1 votes)

Prescientific Organizational Theory (Ribbonfarm)

2017-02-22T23:00:41.273Z · score: 3 (4 votes)

A Quick Confidence Heuristic; Implicitly Leveraging "The Wisdom of Crowds"

2017-02-10T00:54:41.394Z · score: 1 (2 votes)

Most empirical questions are unresolveable; The good, the bad, and the appropriately under-powered

2017-01-23T20:35:29.054Z · score: 3 (4 votes)

A Cruciverbalist’s Introduction to Bayesian reasoning

2017-01-12T20:43:48.928Z · score: 1 (2 votes)

Map:Territory::Uncertainty::Randomness – but that doesn’t matter, value of information does.

2016-01-22T19:12:17.946Z · score: 6 (11 votes)

Meetup : Finding Effective Altruism with Biased Inputs on Options - LA Rationality Weekly Meetup

2016-01-14T05:31:20.472Z · score: 1 (2 votes)

Perceptual Entropy and Frozen Estimates

2015-06-03T19:27:31.074Z · score: 10 (11 votes)

Meetup : Complex problems, limited information, and rationality; How should we make decisions in real life?

2013-10-09T21:44:19.773Z · score: 3 (4 votes)

Meetup : Group Decision Making (the good, the bad, and the confusion of welfare economics)

2013-04-30T16:18:04.955Z · score: 4 (5 votes)