Comment by davidmanheim on How does Gradient Descent Interact with Goodhart? · 2019-06-18T09:40:04.503Z · score: 2 (2 votes) · LW · GW

Note: I briefly tried a similar approach, albeit with polynomial functions with random coefficients rather than ANNs, and in R not python, but couldn't figure out how to say anything useful with it.

If this is of any interest, it is available here: https://gist.github.com/davidmanheim/5231e4a82d5ffc607e953cdfdd3e3939 (I also built simulations for bog-standard Goodhart)

I am unclear how much of my feeling that this approach is fairly useless reflects my lack of continued pursuit of building such models and figuring out what can be said, or my diversion to other work that was more fruitful, rather than a fundamental difficult of saying anything clear based on these types of simulations. I'd like to claim it's the latter, but I'll clearly note that it is heavily motivated reasoning.

Comment by davidmanheim on How does Gradient Descent Interact with Goodhart? · 2019-06-18T09:26:27.381Z · score: 1 (1 votes) · LW · GW

I really like the connection between optimal learning and Goodhart failures, and I'd love to think about / discuss this more. I've mostly thought about it in the online case, since we can sample from human preferences iteratively, and build human-in-the-loop systems as I suggested here: https://arxiv.org/abs/1811.09246 "Oversight of Unsafe Systems via Dynamic Safety Envelopes", which I think parallels, but is less developed than one part of Paul Christiano's approach, but I see why that's infeasible in many settings, which is a critical issue that the offline case addresses.

I also want to note that this addresses issues of extremal model insufficiency, and to an extent regressional Goodhart, but not regime change or causal Goodhart.

As an example of the former for human values, I'd suggest that "maximize food intake" is a critical goal in starving humans, but there is a point at which the goal becomes actively harmful, and if all you see are starving humans, you need a fairly complex model of human happiness to notice that. The same regime change applies to sex, and to most other specific desires.

As an example of the latter, causal Goodhart would be where an AI system optimizes for systems that are good at reporting successful space flights, rather than optimizing for actual success - any divergence leads to a system that will kill people and lie about it.

Comment by davidmanheim on Coercive Formats · 2019-06-12T06:46:28.287Z · score: 1 (1 votes) · LW · GW

Based on the discussions below, it seems clear to me that there are (at least) two continuous dimensions of legibility and coercion, which are often related but conceptually distinct. I think they are positively correlated in most good writing, so they are easily conflated, but clarifying them seems useful.

The first is Legible <--> Illegible, in Venkatesh Rao's terms, as others suggested. This is typically the same as serial-access vs random-access, but has more to do with structure; trees are highly legible, but may not require a particular order. Rough notes from a lecture are illegible (even if they are typed, rather than hand-written,) but usually need to be read in order.

Coercive <--> Non-coercive, mostly in the negative sense people disliked. Most of the time, the level of coercion is fairly low even in what we think of as coercive writing. For example, any writing that pushes a conclusion is attempting to change your mind, hence it is coercive. Structures that review or present evidence are non-coercive.

I think it takes effort to make something legible but non-coercive, and it is either very high effort OR badly structured when they are illegible and non-coercive. And since I've brought up Venkatesh Rao and mentioned two dimensions, I believe I'm morally required to construct a 2x2. I can't upload a drawing in a comment, but I will "take two spectra (or watersheds) relevant to a complex issue, simplify each down to a black/white dichotomy, and label the four quadrants you produce." Given his advice, I'll use a "glossary of example “types” to illustrate diversity and differentiation within the soup of ambiguity."

Paternalistic non-fiction writing is legible but coercive; it assumes it knows best, but allows navigation. The sequences are a good example, well structured textbooks are often a better example. Note that being correct doesn't change the level of coercion! There are plenty of coercive anti-evolution/religious biology "textbooks," but the ones that are teaching actual science are no less coercive.

Unstructured Wikis are illegible and non-coercive; the structure isn't intended to make a point or convince you, but they are also unstructured and makes no effort to present things logically or clearly on a higher level. (Individual articles can be more or less structured or coercive, but the wiki format is not.)

Blueprints, and Diagrams, are legible but non-coercive, since by their structure they only present information, rather than leading to a conclusion. Novels and other fiction are (usually) legible, but are often non-coercive. Sometimes there is an element of coercion, as in fables, Lord of the Flies, HP:MoR, and everything CS Lewis ever wrote - but the main goal is (or should be) to be immersive or entertaining rather than coercive or instructive.

Conversations, and almost any multi-person Forum (including most lesswrong writing) are coercive and illegible. Tl;drs are usually somewhat illegible as well. The structure of conversation is hard to understand, there are posts and comments that are relevant that aren't clearly structured. At the same time, everyone is trying to push their reasoning.

Comment by davidmanheim on Major Update on Cost Disease · 2019-06-06T09:25:59.016Z · score: 12 (5 votes) · LW · GW

It also fails to account for the fact that health care is, in a sense, an ultimate superior good - there is no level of income at which people don't want more health, and their demand scales with more income. This combines with the fact that we don't have good ways to exchange money for being healthier. (The same applies for intelligence / education.) I discussed this in an essay on Scott's original post:

https://medium.com/@davidmanheim/chasing-superior-good-syndrome-vs-baumols-or-scott-s-cost-disease-40327ae87b45

Comment by davidmanheim on Does Bayes Beat Goodhart? · 2019-06-06T09:16:47.236Z · score: 1 (1 votes) · LW · GW

That's all basically right, but if we're sticking to causal Goodhart, the "without further assumptions" may be where we differ. I think that if the uncertainty is over causal structures, the "correct" structure will be more likely to increase all metrics than most others.

(I'm uncertain how to do this, but) it would be interesting to explore this over causal graphs, where a system has control over a random subset of nodes, and a metric correlated to the unobservable goal is chosen. In most cases, I'd think that leads to causal goodhart quickly, but if the set of nodes potentially used for the metric includes some that are directly causing the goal, and others than can be intercepted creating causal goodhart, uncertainty over the metric would lead to less Causal-goodharting, since targeting the actual cause should improve the correlated metrics, while the reverse is not true.

Comment by davidmanheim on Uncertainty versus fuzziness versus extrapolation desiderata · 2019-06-04T12:51:49.583Z · score: 10 (2 votes) · LW · GW

It's not exactly the same, but I would argue that the issues with "Dog" versus "Cat" for the picture are best captured with that formalism - the boundaries between categories are not strict.

To be more technical, there are a couple locations where fuzziness can exist. First, the mapping in reality is potentially fuzzy since someone could, in theory, bio-engineer a kuppy or cat-dog. These would be partly members of the cat set, and partly members of the dog set, perhaps in proportion to the genetic resemblance to each of the parent categories.

Second, the process that leads to the picture, involving a camera and a physical item in space, is a mapping from reality to an image. That is, reality may have a sharp boundary between dogs and cats, but the space of possible pictures of a given resolution is far smaller than the space of physical configurations that can be photographed, so the mapping from reality->pictures is many-to-one, creating a different irresolvable fuzziness - perhaps 70% of the plausible configurations that lead to this set of pixels are cats, and 30% are dogs, so the picture has a fuzzy set membership.

Lastly, there is mental fuzziness, which usually captures the other two implicitly, but has the additional fuzziness created because the categories were made for man, not man for the categories. That is, the categories themselves may not map to reality coherently. This is different from the first issue, where "sharp" genetic boundaries like that between dogs and cats do map to reality correctly, but items can be made to sit on the line. This third issues is that the category may not map coherently to any actual distinction, or may be fundamentally ambiguous, as Scott's post details for "Man vs. Woman" or "Planet vs. Planetoid" - items can partly match one or more than one category, and be fuzzy members of the set.

Each of these, it seems, can be captured fairly well as fuzzy sets, which is why I'm proposing that your usage has a high degree of membership in the fuzzy set of things that can be represented by fuzzy sets.

Comment by davidmanheim on Does Bayes Beat Goodhart? · 2019-06-03T06:44:01.816Z · score: 16 (4 votes) · LW · GW

Also, I keep feeling bad that we're perpetuating giving Goodhart credit, rather than Campbell, since Campbell was clearly first - https://rss.onlinelibrary.wiley.com/doi/full/10.1111/j.1740-9713.2018.01205.x - and Goodhart explicitly said he was joking in a recent interview.

Comment by davidmanheim on Does Bayes Beat Goodhart? · 2019-06-03T06:38:38.341Z · score: 9 (3 votes) · LW · GW

See my much shorter and less developed note to a similar effect: https://www.lesswrong.com/posts/QJwnPRBBvgaeFeiLR/uncertainty-versus-fuzziness-versus-extrapolation-desiderata#kZmpMGYGfwGKQwfZs - and I agree that regressional and extremal goodhart cannot be fixed purely with his solution.

I will, however, defend some of Stuart's suggestions as they relate to causal Goodhart in a non-adversarial setting. - I'm also avoiding the can of worms of game theory. In that case, both randomization AND mixtures of multiple metrics can address Goodhart-like failures, albeit in different ways. I had been thinking about this in the context of policy - https://mpra.ub.uni-muenchen.de/90649/ - rather than AI alignment, but some of the arguments still apply. (One critical argument that doesn't fully apply is that "good enough" mitigation raises the cognitive costs of cheating to a point where aligning with the true goal is cheaper. I also noted in the paper that satisficing is useful for limiting the misalignment from metrics, and quantilization seems like one promising approach for satisficing for AGI.)

The argument for causal goodhart is that randomization and mixed utilities are both effective in mitigating causal structure errors that lead to causal Goodhart in the one-party case. That's because the failure occurs when uncertainty or mistakes about causal structure leads to choice of metrics that are corrrelated with the goal, rather than causal of the goal. However, if even some significant fraction or probability of the metric is causally connected to the metrics in ways that cannot be gamed, it can greatly mitigate this class of failure.

To more clearly apply this logic to human utility, if we accidentally think that endorphins in the brain are 100% of human goals, AGI might want to tile the universe with rats on happy drugs, or the moral equivalent. If we assign this only 50% weight, of have a 50% probability that it will be the scored outcome, and we define something that requires a different way of creating what we actually think of as happiness / life satisfaction, it does not just shift the optimum from 50% of the universe tiled with rat brains. This is because the alternative class of hedonium will involve a non-trivial amount of endorphins as well, as long as other solutions have anywhere close to as much endorphins, they will be preferred. (In this case, admittedly, we got the endorphin goal so wrong that 50% of the universe tiled in rats on drugs is likely - bad enough utility functions can't be fixed with either randomization or weighting. But if a causal mistake can be fixed with either a probabilistic or a weighting solution, it seems likely it can be fixed with the other.)

Comment by davidmanheim on Conditions for Mesa-Optimization · 2019-06-03T06:07:02.080Z · score: 14 (4 votes) · LW · GW

I really like this formulation, and it greatly clarifies something I was trying to note in my recent paper on multiparty dynamics and failure modes - https://www.mdpi.com/2504-2289/3/2/21/htm. The discussion about the likelihood of mesa-optimization due to human modeling is close to the more general points I tried to make in the discussion section of that paper. As argued here about humans, other systems are optimizers (even if they are themselves only base-optimizers,) and therefore any successful machine-learning system in a multiparty environment is implicitly forced to model the other parties. I called this the "opponent model," and argued that they are dangerous because they are always approximate, arguing directly from that point to claim there is great potential for misalignment - but the implication from this work is that they are also dangerous because it encourages machine learning in multi-agent systems to be mesa-optimizers, and the mesa-optimization is a critical enabler of misalignment even when the base optimizer is well aligned.

I would add to the discussion here that multiparty systems can display the same dynamics, and therefore have risks similar to that of systems which require human models. I also think, less closely connected to the current discussion, but directly related to my paper, that mesa-optimizers misalignments pose new and harder to understand risks when they interact with one another.

I also strongly agree with the point that current examples are not really representative of the full risk. Unfortunately, peer-reviewers strongly suggested that I have moreconcrete examples of failures. But as I said in the paper, "the failures seen so far are minimally disruptive. At the same time, many of the outlined failures are more problematic for agents with a higher degree of sophistication, so they should be expected not to lead to catastrophic failures given the types of fairly rudimentary agents currently being deployed. For this reason, specification gaming currently appears to be a mitigable problem, or as Stuart Russell claimed, be thought of as “errors in specifying the objective, period.”"

As a final aside, I think that the concept of mesa-optimizers is very helpful in laying out the argument against that last claim - misalignment is more than just misspecification. I think that this paper will be very helpful in showing why,

Comment by davidmanheim on Uncertainty versus fuzziness versus extrapolation desiderata · 2019-06-01T18:41:51.747Z · score: 3 (2 votes) · LW · GW

Actually, I assumed fuzzy was intended here to be a precise term, contrasted with probability and uncertainty, as it is used in describing fuzzy sets versus uncertainty about set membership. https://en.wikipedia.org/wiki/Fuzzy_set

Comment by davidmanheim on Uncertainty versus fuzziness versus extrapolation desiderata · 2019-05-31T08:13:19.569Z · score: 7 (2 votes) · LW · GW

I missed the proposal when it was first released, but I wanted to note that the original proposal addresses only one (critical) class of Goodhart-error, and proposes a strategy based on addressing one problematic result of that, nearest-unblocked neighbor. The strategy does more widely useful for misspecification than just nearest-unblocked neighbor, but it still is only addressing some Goodhart-effects.

The misspecification discussed is more closely related to, but still distinct from, extremal and regressional Goodhart. (Causal and adversarial Goodhart are somewhat far removed, and don't seem as relevant to me here. Causal Goodhart is due to mistakes, albeit fundamentally hard to avoid mistakes, while adversarial Goodhart happens via exploiting other modes of failure.)

I notice I am confused about how different strategies being proposed to mitigate these related failures can coexist if each is implemented separately, and/or how they would be balanced if implemented together, as I briefly outline below. Reconciling or balancing these different strategies seems like an important question, but I want to wait to see the full research agenda before commenting or questioning further.

Explaining the conflict I see between the strategies:

Extremal Goodhart is somewhat addressed by another post you made, which proposes to avoid ambiguous distant situations - https://www.lesswrong.com/posts/PX8BB7Rqw7HedrSJd/by-default-avoid-ambiguous-distant-situations. It seems that the strategy proposed here is to attempt to resolve fuzziness, rather than avoid areas where it becomes critical. These seem to be at least somewhat at odds, though this is partly reconcilable by fully pursuing neither resolving ambiguity, nor fully avoiding distant ambiguity.

and regressional Goodhart, as Scott G. originally pointed out, is unavoidable except by staying in-sample, interpolating rather than extrapolating. Fully pursuing that strategy is precluded by injecting uncertainty into the model of the Human-provided modification to the utility function. Again, this is partly reconcilable, for example, by trying to bound how far we let the system stray from the initially provided blocked strategy, and how much fuzziness it is allowed to infer without an external check.

Comment by davidmanheim on Schelling Fences versus Marginal Thinking · 2019-05-23T11:06:38.609Z · score: 1 (1 votes) · LW · GW

Yes, that does seem to be a risk. I would think that applying schelling fences to reinforce current values reduces the amount of expected drift in the future, and I'm unclear whether you are claiming that using Schelling fences will do the opposite, or claiming that they are imperfect.

I'd also like to better understand what specifically you think is making the error of making it difficult to re-align with current values, rather than reducing the degree of drift, and how it could be handled differently.

Comment by davidmanheim on No Really, Why Aren't Rationalists Winning? · 2019-05-23T06:33:50.135Z · score: 4 (2 votes) · LW · GW

That's a very good point, I was definitely unclear.

I think that the critical difference is that in epistemically health communities, when such a failure is pointed out, some effort is spent in identifying and fixing the problem, instead of pointedly ignoring it despite efforts to solve the problem, or spending time actively defending the inadequate status quo from even pareto-improving changes.

Comment by davidmanheim on No Really, Why Aren't Rationalists Winning? · 2019-05-23T06:30:39.333Z · score: 12 (3 votes) · LW · GW

I don't think they get epistemic rationality anywhere near correct either. As a clear and simpole example, there are academics currently vigorously defending their right not to pre-register empirical studies.

Comment by davidmanheim on By default, avoid ambiguous distant situations · 2019-05-23T06:27:33.503Z · score: 3 (2 votes) · LW · GW

Agreed. I'm just trying to think through why we should / should not privilege the status quo. I notice I'm confused about this, since the reversal heuristic implies we shouldn't. If we take this approach to an extreme, aren't we locking in the status-quo as a base for allowing only pareto improvements, rather than overall utilitarian gains?

(I'll note that Eric Drexler's Pareto-topia argument explicitly allows for this condition - I'm just wondering whether it is ideal, or a necessary compromise.)

Comment by davidmanheim on No Really, Why Aren't Rationalists Winning? · 2019-05-22T16:40:48.150Z · score: 1 (1 votes) · LW · GW

Mine, and my experience working in academia. But (with the very unusual exceptions of FHI, GMU's economics department, and possibly the new center at Georgetown) I don't think you'd find much disagreement among LWers who interact with academics that academia sometimes fails to do even the obvious, level-one intelligent character things to enable them to achieve their goals.

Comment by davidmanheim on By default, avoid ambiguous distant situations · 2019-05-22T11:51:26.182Z · score: 3 (2 votes) · LW · GW

I'm going to try thinking about this by applying the reversal heuristic.

If a smarter and/or less evil person had magicked house elves into existence, so that they were mentally incapable of understanding what freedom would entail, instead of just enjoying servitude, should we change them? Equivalently, if we have a world where everyone is happier than this one because their desires are eusocial and fully compatible with each other, but liberty and prestige are literally impossible to conceive of, should we change back? If that world existed, or we found those aliens, should they be "freed" to make them appreciate liberty, when the concept never occurred to them?

OK, now we can ask the question - should we change from our world to one where people are not culturally molded to appreciate any of our current values? Let's say cultural pressures didn't exist, and values emerged from allowing people, starting from when they are babies, to have whatever they want. This is accomplished by non-sentient robots that can read brainwaves and fulfill desires immediately. IS that better, or should we move towards a future where we continue to culturally engineer our children to have a specific set of desires, those we care about - for pathos, prestige, freedom, etc?

Or should we change the world from our current one to one where people's values are eusocial by design? Where being sacrificed for the greater good was pleasant, and the idea of selfishness was impossible?

At the end of this, I'm left with a feeling that yes, I agree that these are actually ambiguous, and "an explicit level of status quo bias to our preferences" is in fact justified.


Comment by davidmanheim on No Really, Why Aren't Rationalists Winning? · 2019-05-22T11:20:43.998Z · score: 1 (1 votes) · LW · GW

Academia in general is certainly not an adequate community from an epistemic standards point of view, and while small pockets are relatively healthy, none are great. And yes, the various threads of epistemic rationality certainly predated LessWrong, and there were people and even small groups that noted the theoretical importance of pursuing it, but I don't there were places that actively advocated that members follow those epistemic standards.

To get back to the main point, I don't think that it is necessary for the community to "fulfill each other's needs for companionship, friendship, etc," I don't think that there is a good way to reinforce norms without something at least as strongly affiliated as a club. There is a fine line between club and community, and I understand why people feel there are dangers of going too far, but before LW, few groups seem to have gone nearly far enough in building even a project group with those norms.

Schelling Fences versus Marginal Thinking

2019-05-22T10:22:32.213Z · score: 23 (14 votes)
Comment by davidmanheim on Literature Review: Distributed Teams · 2019-05-22T09:20:13.212Z · score: -1 (2 votes) · LW · GW

Sure, but we can close the global prestige gap to some extent, and in the mean time, we can leverage in-group social prestige, as the current format implicitly does.

Comment by davidmanheim on How To Use Bureaucracies · 2019-05-22T09:18:36.852Z · score: 27 (10 votes) · LW · GW

I strongly disagree. There are many domains where we have knowledge with little or no ability to conduct RCTs - geology, evolutionary theory, astronomy, etc. The models work because we have strong Bayesian evidence for them - as I understood it, this was the point of a large section of the sequences, so I'm not going to try to re-litigate that debate here.

Comment by davidmanheim on Where are people thinking and talking about global coordination for AI safety? · 2019-05-22T09:14:52.364Z · score: 12 (7 votes) · LW · GW

I want to focus on your second question: "Human coordination ability seems within an order of magnitude of what's needed for AI safety. Why the coincidence? (Why isn’t it much higher or lower?)"

Bottom line up front: Humanity has faced a few potentially existential crises in the past; world wars, nuclear standoffs, and the threat of biological warfare. The fact that we survived those, plus selection bias, seems like a sufficient explanation of why we are near the threshold for our current crises.

I think this is a straightforward argument. At the same time, I'm not going to get deep into the anthropic reasoning, which is critical here, but I'm not clear enough on to discuss clearly. (Side note: Stuart Armstrong recently mentioned to me that there are reasons I'm not yet familiar with for why anthropic shadows aren't large, which is assumed in the below model.)

If we assume that large scale risks are distributed in some manner, such as from Bostrom's urn of technologies (See: Vulnerable World Hypothesis - PDF,) we should expect that the attributes of the problems, including the coordination needed to withstand / avoid them, are distributed with some mean and variance. Whatever that mean and variance is, we expect that there should be more "easy" risks (near or below the mean) than "hard" ones. Unless the tail is very, very fat, this means that we are likely to see several moderate risks before we see more extreme ones. For a toy model, let's assume risks show up at random yearly, and follow a standard normal distribution in terms of capability needed. If we had capability in the low single digits, we would be wiped out already with high probability. Given that we've come worryingly close, however, it seems clear that we aren't in the high double digits either.

Given all of that, and the selection bias of asking the question when faced with larger risks, I think it's a posteriori likely that most salient risks we face are close to our level of ability to overcome.

Comment by davidmanheim on Comment section from 05/19/2019 · 2019-05-19T05:47:51.447Z · score: 7 (11 votes) · LW · GW

I have no idea what this is about, but it clearly doesn't belong here. Can you have this discussion elsewhere?

Comment by davidmanheim on Tales From the American Medical System · 2019-05-10T07:14:27.219Z · score: 8 (3 votes) · LW · GW

Because their desire to spend money is a constant multiple of the amount they have, and that constant multiple is usually slightly above one.

See: Hedonic Treadmill.

Comment by davidmanheim on How To Use Bureaucracies · 2019-05-10T07:00:37.495Z · score: 19 (10 votes) · LW · GW

Fair point about the bar for posting, but this doesn't read like "posting their [tentative] thoughts," it reads like conclusions based on extensive review. As a matter of good epistemics, the difference should be made clearer. Similarly, if you dismiss large parts of the literature, it would be good to at least let people know what you think should be ignored, so they don't waste their time, and even better, why, so they can decide if they agree.

As a side point, I think that considering RCTs as a source of evidence in this domain is a strange bar. There's lots of case study and other quantitative observational evidence that supports these other approaches, and specifying what evidence counts is, as the phrase goes, logically rude - how would you even design an RCT to test these theories?

Comment by davidmanheim on Literature Review: Distributed Teams · 2019-05-10T06:55:50.147Z · score: 3 (2 votes) · LW · GW

Somewhere in the middle. Most conclusions should be hedged more than they are, but some specific conclusions here are based on strong assumptions that I don't think are fully justified, and the strength of evidence and the generality of the conclusions isn't clear.

I think that recommending site visits and not splitting a team are good recommendations in general, but sometimes (rarely) could be unhelpful. Other ideas are contingently useful, but often other factors push the other way. "Make people very accessible" is a reasonable idea that in many contexts would work poorly, especially given Paul Graham's points on makers versus managers. Similarly, the emphasis on having many channels for communication seems to be better than the typical lack of communication, but can be a bad idea for people who need time for deep work, and could lead to furthering issues with information overload.

All of that said, again, this is really helpful research, and points to enough literature that others can dive in and assess these things for themselves.

Comment by davidmanheim on Literature Review: Distributed Teams · 2019-05-10T06:22:34.477Z · score: 1 (1 votes) · LW · GW

I think you're viewing intrinsic versus extrinsic reward as dichotomous rather than continuous. Knuth awards are on one end of the spectrum, salaries at large organizations are at the other. Prestige isn't binary, and there is a clear interaction between prestige and standards - raising standards can itself increase prestige, which will itself make the monetary rewards more prestigious.

Comment by davidmanheim on How To Use Bureaucracies · 2019-05-10T06:18:51.644Z · score: 10 (5 votes) · LW · GW

I think that most bureaucracies are the inevitable result of growth, and even when they were initially owned by the creator, they don't act that way once they require more than a few people. (See my Ribbonfarm Post, Go Corporate or Go Home)

Comparing the goals of a bureaucracy with the incentives and the organizational style, you should expect to find a large degree of overlap for small bureaucracies, trailing off, at best, around a dozen people, but almost none for larger ones. This isn't a function of time since formation, but rather a function of size - larger bureaucracies are fundamentally less responsive to owner's intent or control, and more about structure and organizational priorities. As an obvious case study, look at what happened at US DHS after 2002, which was created de novo with a clear goal, but it is clear in retrospect that the goal was immediately irrelevant to how the bureaucracy worked.

Comment by davidmanheim on Literature Review: Distributed Teams · 2019-05-09T09:15:48.791Z · score: 9 (5 votes) · LW · GW

This is a fantastic review of the literature, and a very valuable post - thank you!

My critical / constructive note is that I think that many of the conclusions here are state with too much certainty or are overstated. My promary reasons to think it should be more hedged are that the literature is so ambiguous, the fundamental underlying effects are unclear, the model(s) proposed in the post do not really account for reasonable uncertainties about what factors matter, and there is almost certainly heterogeneity based on factors that aren't discussed.

Comment by davidmanheim on Should Effective Altruism be at war with North Korea? · 2019-05-09T09:10:58.335Z · score: 5 (1 votes) · LW · GW

It seems ill-advised to discuss specific case studies in this context, especially given that EA as a movement has very good reasons not to take sides in international relations.

Comment by davidmanheim on How To Use Bureaucracies · 2019-05-09T06:32:59.881Z · score: 27 (17 votes) · LW · GW

There's a large literature on bureaucracies, and it has a lot to say that is useful on the topic. Unfortunately, this post manages to ignore most of it. Even more unfortunately, I don't have time to write a response in the near future.

For those looking for a more complete picture - one that at least acknowledges the fact that most bureaucracies are neither designed by individuals, nor controlled by them - I will strongly recommend James Q. Wilson's work on the topic, much of which is captured in his book, "Bureaucracy." I'll also note that Niskanen's work is an important alternative view, as is Simon's earlier (admittedly harder to read, but very useful) work on Administrative Behavior.

Perrow's work, "Organizational Analysis: A Sociological View" is more dated, and I wouldn't otherwise recommend it, but it probably does the best job directly refuting the claims made here. In his first chapter, titled "Perspectives on Organizations," he explains why it is unhelpful to view organizations just as a function of the people who make them up, or as a function of who leads them. When I have more time, I will hope to summarize those points as a response to this post.

Comment by davidmanheim on Should Effective Altruism be at war with North Korea? · 2019-05-06T07:36:25.317Z · score: 10 (2 votes) · LW · GW
North Korea doesn't have a lot of cash

Just posting to strongly disagree with this factual claim. They have tons of cash from illicit sources for the things the regime values, and it is certainly enough for much of the ruling class to do whatever they'd like.

Comment by davidmanheim on Dishonest Update Reporting · 2019-05-06T07:23:18.990Z · score: 5 (3 votes) · LW · GW

Note that most markets don't have any transparency about who buys or sells, and external factors are often more plausible reasons than a naive outsider expects. A drop in the share price of a retailer could be reflecting lower confidence in their future earnings, or result from a margin call on a firm that made a big bet on the firm that it needed to unwind, or even be because a firm that was optimistic about the retailer decided to double down, and move a large call options position out 6 months, so that their counterparty sold to hedge their delta - there is no way to tell the difference. (Which is why almost all market punditry is not only dishonest, but laughable once you've been on the inside.)

Comment by davidmanheim on Dishonest Update Reporting · 2019-05-06T07:17:53.912Z · score: 3 (2 votes) · LW · GW

Political contexts are poisonous, of course, in this and so many other ways, so politics should be kept as small as possible. In most contexts, however, including political ones, the solution is to give no credit for those that don't explain, or even to assign negative credit for punditry that isn't demonstrably more accurate than the corwd - which leads to a wonderful incentive to shut up unless you can say something more than "I think X will happen."

And in collaborative contexts, people are happy to give credit for mostly correct thinking that assist their own, rather than attack for mistakes. We should stay in those contexts and build them out where possible - positive sum thinking is good, and destroying, or at least ignoring, negative sum contexts is often good as well.

Comment by davidmanheim on Coherence arguments do not imply goal-directed behavior · 2019-05-06T07:09:47.454Z · score: 3 (2 votes) · LW · GW

I love that framing - do you have a source you can link so I can cite it?

Comment by davidmanheim on Coherence arguments do not imply goal-directed behavior · 2019-05-06T07:09:04.997Z · score: 3 (2 votes) · LW · GW
Actually, no matter what the policy is, we can view the agent as an EU maximizer.

There is an even broader argument to be made. For an agent that is represented by a program, no matter what the preferences are, even if inconsistent, we can view it as an EU maximizer that always chooses the output it is programmed to take. (If it is randomized, its preferences are weighted between those options.)

I suspect there are other constructions that are at least slightly less trivial, because this trivial construction has utilities over only the "outcomes" of which action it takes, which is a deontological goal, rather than the external world, which would allow more typically consequentialist goals. Still, it is consistent with definitions of EU maximization.

Comment by davidmanheim on Understanding information cascades · 2019-05-05T05:39:17.587Z · score: 3 (2 votes) · LW · GW

Each season, there were too few questions for this to be obvious, rather than a minor effect, and the "misses" were excused as getting an actually unlikely event wrong. It's hard to say, post-hoc, that the ~1% consensus opinion about a "freak event" were accurate, but there was a huge surprise (and yes, this happened at least twice) or if the consensus was simply overconfident.

(I also think that the inability to specify estimates <0.5% or >99.5% reduced the extent to which the scores were hurt by these events.)

Comment by davidmanheim on Dishonest Update Reporting · 2019-05-05T05:33:24.391Z · score: 34 (8 votes) · LW · GW

There is a strategy that is almost mentioned here, but not pursued, that I think is near-optimal - explaining your reasoning as a norm. This is the norm I have experienced in the epistemic community around forecasting. (I am involved in both Good Judgment, where I was an original participant, and have resumed work, and on Metaculus's AI instance. Both are very similar in that regard.)

If such explanation is a norm, or even a possibility, the social credit for updated predictions will normally be apportioned based on the reasoning as much as the accuracy. And while individual brier scores are useful, forecasters who provide mediocre calibration but excellent public reasoning and evidence which others use are more valuable for an aggregate forecast than excellent forecasters who explain little or nothing.

If Bob wants social credit for his estimate in this type of community, he needs to publicly explain his model - at least in general. (This includes using intuition as an input - there are superforecasters who I update towards based purely on claims that the probability seems too low / high.) Similarly, if Bob wants credit for updating, he needs to explain his updated reasoning - including why he isn't updating based on evidence that prompted Alice's estimate, which would usually have been specified, or updated based on Alice's stated model and her estimate itself. If Bob said 75% initially, but now internally updates to think 50%, it will often be easier to justify a sudden change based on an influential datapoint, rather than a smaller one using an excuse.

Comment by davidmanheim on Did the recent blackmail discussion change your beliefs? · 2019-03-25T13:30:13.348Z · score: 4 (3 votes) · LW · GW

It's not politics in disguise, but it's hard to discuss rationally for similar reasons. Politics is hard-mode for rationality because it is a subcategory of identity and morals. The moral rightness of a concrete action seems likely to trigger all of the same self-justification that any politics discussion will, albeit along different lines. Making this problem plausibly worse is that the discussion of morality here cannot be as easily tied to disagreements about predicted outcomes as those that occur in politics.

Comment by davidmanheim on Understanding information cascades · 2019-03-25T09:20:14.511Z · score: 3 (2 votes) · LW · GW

As I replied to Pablo below, "...it's an argument from first principles. Basically, if you extremize guesses from 90% to 95%, and 90% is a correct estimate, 9/10 times you do better due to extremizing. "

Comment by davidmanheim on Understanding information cascades · 2019-03-25T09:19:47.975Z · score: 9 (3 votes) · LW · GW

You don't need the data - it's an argument from first principles. Basically, if you extremize guesses from 90% to 95%, and 90% is a correct estimate, 9/10 times you do better due to extremizing.

Comment by davidmanheim on How can we respond to info-cascades? [Info-cascade series] · 2019-03-17T10:43:09.219Z · score: 11 (3 votes) · LW · GW

The Systems Dynamics "Beer Game" seems like a useful example of how something like (but not the same as) an info-cascade happens.

https://en.wikipedia.org/wiki/Beer_distribution_game - "The beer distribution game (also known as the beer game) is an experiential learning business simulation game created by a group of professors at MIT Sloan School of Management in early 1960s to demonstrate a number of key principles of supply chain management. The game is played by teams of at least four players, often in heated competition, and takes at least one hour to complete... The purpose of the game is to understand the distribution side dynamics of a multi-echelon supply chain used to distribute a single item, in this case, cases of beer."

Basically, passing information through a system with delays means everyone screws up wildly as the system responds in a nonlinear fashion to a linear change. In that case, Forrester and others suggest that changing viewpoints and using systems thinking is critical in preventing the cascades, and this seems to have worked in some cases.

(Please respond if you'd like more discussion.)

Comment by davidmanheim on Understanding information cascades · 2019-03-17T10:36:30.211Z · score: 7 (2 votes) · LW · GW

That's a great point. I'm uncertain if the analyses account for the cited issue, where we would expect a priori that extremizing slightly would on average hurt the accuracy, but in any moderately sized sample (like the forecasting tournament,) it is likely to help. It also relates to a point I made about why proper scoring rules are not incentive compatible in tournaments in a tweetstorm here; https://twitter.com/davidmanheim/status/1080460223284948994 .

Interestingly, a similar dynamic may happen in tournaments, and could be part of where info-cascades occur. I can in expectation outscore everyone else slightly and minimize my risk of doing very poorly by putting my predictions a bit to the extreme of the current predictions. It's almost the equivalent of betting a dollar more than the current high bid in price is right - you don't need to be close, you just need to beat the other people's scores to win. But if I report my best strategy answer instead of my true guess, it seems that it could cascade if others are unaware I am doing this.

Comment by davidmanheim on Understanding information cascades · 2019-03-14T10:57:02.174Z · score: 9 (5 votes) · LW · GW

There's better, simpler results that I recall but cannot locate right now on doing local updating that is algebraic, rather than deep learning. I did find this, which is related in that it models this type of information flow and shows it works even without fully Bayesian reasoning; Jadbabaie, A., Molavi, P., Sandroni, A., & Tahbaz-Salehi, A. (2012). Non-Bayesian social learning. Games and Economic Behavior, 76(1), 210–225. https://doi.org/https://doi.org/10.1016/j.geb.2012.06.001

Given those types of results, the fact that RL agents can learn to do this should be obvious. (Though the social game dynamic result in the paper is cool, and relevant to other things I'm working on, so thanks!)

Comment by davidmanheim on Understanding information cascades · 2019-03-14T10:44:31.314Z · score: 23 (5 votes) · LW · GW

I'm unfortunately swamped right now, because I'd love to spend time working on this. However, I want to include a few notes, plus reserve a spot to potentially reply more in depth when I decide to engage in some procrastivity.

First, the need for extremizing forecasts (See: Jonathan Baron, Barbara A. Mellers, Philip E. Tetlock, Eric Stone, Lyle H. Ungar (2014) Two Reasons to Make Aggregated Probability Forecasts More Extreme. Decision Analysis 11(2):133-145. http://dx.doi.org/10.1287/deca.2014.0293) seems like evidence that this isn't typically the dominant factor in forecasting. However, c.f. the usefulness of teaming and sharing as a way to ensure actual reasons get accounted for ( Mellers, B., Ungar, L., Baron, J., Ramos, J., Gurcay, B., Fincher, K., ... & Murray, T. (2014). Psychological strategies for winning a geopolitical forecasting tournament. Psychological science, 25(5), 1106-1115. )

Second, the solution that Pearl proposed for message-passing to eliminate over-reinforcement / double counting of data seems to be critical and missing from this discussion. See his book: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. I need to think about this more, but if Aumann agreement is done properly, people eventually converge on correct models of other reasoners, which should also stop info-cascades. The assumption of both models, however, is that there is iterated / repeated communication. I suspect that we can model info-cascades as a failure at exactly that point - in the examples given, people publish papers, and there is no dialogue. For forecasting, explicit discussion of forecasting reasons should fix this. (That is, I might say "My model says 25%, but I'm giving that only 50% credence and allocating the rest to the consensus value of 90%, leading to my final estimate of 57.5%")

Third, I'd be really interested in formulating testable experimental setups in Mturk or similar to show/not show this occurring, but on reflection this seems non-trivial, and I haven't thought much about how to do it other than to note that it's not as easy as it sounded at first.

Comment by davidmanheim on The RAIN Framework for Informational Effectiveness · 2019-02-26T10:26:46.340Z · score: 1 (1 votes) · LW · GW

The works on decision theory tend to be general, but I need my textbooks to find better resources - I'll see if I have the right ones at home. Until then, Andrew Gelmans' BDA3 explicitly formulates VoI as a multi-stage decision tree in section 9.3, thereby making it clear that the same procedure is generalizable. And Jaynes doesn't call it VoI in PT:LoS, but his discussion in the chapter on simple applications of decision theory leaves the number of decision implicitly open.

Comment by davidmanheim on Probability space has 2 metrics · 2019-02-14T10:38:15.499Z · score: 5 (1 votes) · LW · GW

Yes - and this is equivalent to saying that evidence about probability provides Bayesian metric evidence - you need to transform it.

Comment by davidmanheim on The RAIN Framework for Informational Effectiveness · 2019-02-14T10:30:12.536Z · score: 3 (2 votes) · LW · GW

Minor comment/correction - VoI isn't necessarily linked to a single decision, but the way it is typically defined in introductory works, it implicit that it is limited to one decision. This is mostly because (as I found out when trying to build more generalized VoI models for my dissertation,) it's usually quickly intractable for multiple decisions.

Comment by davidmanheim on Why we need a *theory* of human values · 2019-02-14T10:25:00.352Z · score: 1 (1 votes) · LW · GW

I agree, and think work in the area is valuable, but would still argue that unless we expect a correct and coherent answer, any single approach is going to be less effective than an average of (contradictory, somewhat unclear) different models.

As an analogue, I think that effort into improving individual prediction accuracy and calibration is valuable, but for most estimation questions, I'd bet on an average of 50 untrained idiots over any single superforecaster.

Comment by davidmanheim on Spaghetti Towers · 2019-02-14T10:20:18.739Z · score: 2 (2 votes) · LW · GW

Having looked into this, it's partly that, but mostly that tax codes are written in legalese. A simple options contract for a call, which can easily be described in 10 lines of code, or a one-line equation. But the legal terms are actually this 188 page pamplet; https://www.theocc.com/components/docs/riskstoc.pdf which is (technically but not enforced to be a) legally required reading for anyone who wants to purchase an exchange traded option. And don't worry - it explicitly notes that it doesn't cover the actual laws governing options, for which you need to read the relevant US code, or the way in which the markets for trading them work, or any of the risks.

Comment by davidmanheim on How much can value learning be disentangled? · 2019-02-11T08:42:46.557Z · score: 1 (1 votes) · LW · GW

re: #2, VoI doesn't need to be constrained to be positive. If in expectation you think the information will have a net negative impact, you shouldn't get the information.

re: #3, of course VoI is subjective. It MUST be, because value is subjective. Spending 5 minutes to learn about the contents of a box you can buy is obviously more valuable to you than to me. Similarly, if I like chocolate more than you, finding out if a cake has chocolate is more valuable for me than for you. The information is the same, the value differs.

Values Weren't Complex, Once.

2018-11-25T09:17:02.207Z · score: 34 (15 votes)

Oversight of Unsafe Systems via Dynamic Safety Envelopes

2018-11-23T08:37:30.401Z · score: 11 (5 votes)

Collaboration-by-Design versus Emergent Collaboration

2018-11-18T07:22:16.340Z · score: 12 (3 votes)

Multi-Agent Overoptimization, and Embedded Agent World Models

2018-11-08T20:33:00.499Z · score: 9 (4 votes)

Policy Beats Morality

2018-10-17T06:39:40.398Z · score: 15 (15 votes)

(Some?) Possible Multi-Agent Goodhart Interactions

2018-09-22T17:48:22.356Z · score: 21 (5 votes)

Lotuses and Loot Boxes

2018-05-17T00:21:12.583Z · score: 27 (6 votes)

Non-Adversarial Goodhart and AI Risks

2018-03-27T01:39:30.539Z · score: 64 (14 votes)

Evidence as Rhetoric — Normative or Positive?

2017-12-06T17:38:05.033Z · score: 1 (1 votes)

A Short Explanation of Blame and Causation

2017-09-18T17:43:34.571Z · score: 1 (1 votes)

Prescientific Organizational Theory (Ribbonfarm)

2017-02-22T23:00:41.273Z · score: 3 (4 votes)

A Quick Confidence Heuristic; Implicitly Leveraging "The Wisdom of Crowds"

2017-02-10T00:54:41.394Z · score: 1 (2 votes)

Most empirical questions are unresolveable; The good, the bad, and the appropriately under-powered

2017-01-23T20:35:29.054Z · score: 3 (4 votes)

A Cruciverbalist’s Introduction to Bayesian reasoning

2017-01-12T20:43:48.928Z · score: 1 (2 votes)

Map:Territory::Uncertainty::Randomness – but that doesn’t matter, value of information does.

2016-01-22T19:12:17.946Z · score: 6 (11 votes)

Meetup : Finding Effective Altruism with Biased Inputs on Options - LA Rationality Weekly Meetup

2016-01-14T05:31:20.472Z · score: 1 (2 votes)

Perceptual Entropy and Frozen Estimates

2015-06-03T19:27:31.074Z · score: 10 (11 votes)

Meetup : Complex problems, limited information, and rationality; How should we make decisions in real life?

2013-10-09T21:44:19.773Z · score: 3 (4 votes)

Meetup : Group Decision Making (the good, the bad, and the confusion of welfare economics)

2013-04-30T16:18:04.955Z · score: 4 (5 votes)