The Importance of Goodhart's Law

blogospheroid

The Importance of Goodhart's Law

post by blogospheroid · 2010-03-13T08:19:29.974Z · LW · GW · Legacy · 123 comments

  A speculative origin of Goodhart's law
  The mitigations to Goodhart's law
    Hansonian Cynicism
    Better measures
      Balanced scorecards
      Optimization around the constraint
      Extrapolated Volition
    Solutions centred around Human discretion
      Left Anarchist ideas
      Hierarchical rule
None
123 comments

This article introduces Goodhart's law, provides a few examples, tries to explain an origin for the law and lists out a few general mitigations.

Goodhart's law states that once a social or economic measure is turned into a target for policy, it will lose any information content that had qualified it to play such a role in the first place. wikipedia The law was named for its developer, Charles Goodhart, a chief economic advisor to the Bank of England.

The much more famous Lucas critique is a relatively specific formulation of the same.

The most famous examples of Goodhart's law should be the soviet factories which when given targets on the basis of numbers of nails produced many tiny useless nails and when given targets on basis of weight produced a few giant nails. Numbers and weight both correlated well in a pre-central plan scenario. After they are made targets (in different times and periods), they lose that value.

We laugh at such ridiculous stories, because our societies are generally much better run than Soviet Russia. But the key with Goodhart's law is that it is applicable at every level. The japanese countryside is apparently full of constructions that are going on because constructions once started in recession era are getting to be almost impossible to stop. Our society centres around money, which is supposed to be a relatively good measure of reified human effort. But many unscruplous institutions have got rich by pursuing money in many ways that people would find extremely difficult to place as value-adding.

Recently GDP Fetishism by David henderson is another good article on how Goodhart's law is affecting societies.

The way I look at Goodhart's law is Guess the teacher's password writ large. People and instituitions try to achieve their explicitly stated targets in the easiest way possible, often obeying the letter of the law.

A speculative origin of Goodhart's law

The way I see Goodhart's law work, or a target's utility break down, is the following.

Superiors want an undefined goal G.
They formulate G* which is not G, but until now in usual practice, G and G* have correlated.
Subordinates are given the target G*.
The well-intentioned subordinate may recognise G and suggest G** as a substitute, but such people are relatively few and far inbetween. Most people try to achieve G*.
As time goes on, every means of achieving G* is sought.
Remember that G* was formulated precisely because it is simple and more explicit than G. Hence, the persons, processes and organizations which aim at maximising G* achieve competitive advantage over those trying to juggle both G* and G.
P(G|G*) reduces with time and after a point, the correlation completely breaks down.

The mitigations to Goodhart's law

If you consider the law to be true, solutions to Goodhart's law are an impossibility in a non-singleton scenario. So let's consider mitigations.

Hansonian Cynicism
Better Measures
Solutions centred around Human Discretion

Hansonian Cynicism

Pointing out what most people would have in mind as G and showing that institutions all around are not following G, but their own convoluted G*s. Hansonian cynicism is definitely the second step to mitigation in many many cases (Knowing about Goodhart's law is the first). Most people expect universities to be about education and hospitals to be about health. Pointing out that they aren't doing what they are supposed to be doing creates a huge cognitive dissonance in the thinking person.

Better measures

Balanced scorecards

Taking multiple factors into consideration, trying to make G* as strong and spoof-proof as possible. The Scorecard approach is mathematically, the simplest solution that strikes a mind when confronted with Goodhart's law.

Optimization around the constraint

There are no generic solutions to bridging the gap between G and G*, but the body of knowledge of theory of constraints is a very good starting point for formulating better measures for corporates.

Extrapolated Volition

CEV tries to mitigate Goodhart's law in a better way than mechanical measures by trying to create a complete map of human morality. If G is defined fully, there is no need for a G*. CEV tries to do it for all humanity, but as an example, individual extrapolated volition should be enough. The attempt is incomplete as of now, but it is promising.

Solutions centred around Human discretion

Human discretion is the one thing that can presently beat Goodhart's law because the constant checking and rechecking that G and G* match. Nobody will attempt to pull off anything as weird as the large nails in such a scenario. However, this is not scalable in a strict sense because of the added testing and quality control requirements.

Left Anarchist ideas

Left anarchist ideas about small firms and workgroups are based on the fact that hierarchy will inevitably introduce goodhart's law related problems and thus the best groups are small ones doing simple things.

Hierarchical rule

On the other end of the political spectrum, Molbuggian hierarchical rule completely eliminates the mechanical aspects of the law. There is no letter of the law, its all spirit. I am supposed to take total care of my slaves and have total obedience to my master. The scalability is ensured through hierarchy.

Of all proposed solutions to the Goodhart's law problem confronted, I like CEV the most, but that is probably a reflection on me more than anything, wanting a relatively scalable and automated solution. I'm not sure whether the human discretion supporting people are really correct in this matter.

Your comments are invited and other mitigations and solutions to Goodhart's law are also invited.

123 comments

Comments sorted by top scores.

comment by pjeby · 2010-03-14T19:32:02.789Z · LW(p) · GW(p)

There are no generic solutions to bridging the gap between G and G*, but the body of knowledge of theory of constraints is a very good starting point for formulating better measures for corporates.

A good example from my own history of doing this is when I worked for an ISP and persuaded them to eliminate "cases closed" as a performance measurement for customer service and tech support people, because it was causing email-based cases to be closed without any actual investigation. People would email back and create a new case, and then a rep would get credit for closing that one without investigation either.

The replacement metric was one I derived via the Theory of Constraints, inspired by Goldratt's "throughput-dollar-days" measurement. The replacement metric was "customer-satisfaction-waiting-hours" - a measurement of collective work-in-progress inventory at the team level, and a measurement of priority at the ticket level.

I also made it impossible to truly "close" a case - you could say, "I think this is done", but the customer could still email into it and it would jump right back to its old place in the queue, due to the accumulated "satisfaction waiting hours" on the ticket.

Of course, the toughest part in some ways was educating new service managers that, no, you can't have a measurement of cases closed on a per-rep basis. Instead, you're going to have to actually pay attention to a rep's work in order to know if they're doing the job. (Of course, the system I developed also had ways to make it easy to see what people are working on, not only at the managerial but the team level - peer pressure is a useful co-ordination tool, if done right.)

I have no idea how well the system fared since I left the company, since it's entirely possible they found programmers since then to give them new metrics that would f**k it up, although I did design the database in such a way as to make it as close to impossible as I could manage. ;-)

Anyway, the theory of constraints positively rocks for business performance optimization, and its Thinking Processes are generally useful tools for any rationalist. They were also a big inspiration for me developing other thinking processes and ultimately mindhacking techniques, in that they showed that it's possible to think systematically even about some of the vaguest and most ill-defined problems imaginable, rigorously hone in on key leverage points, resolve conflicts between goals, and generally overcome our brains' processing limitations for analysis and planning.

[Edit to add: the Wikipedia page on thinking processes doesn't really show why a rationalist would be interested in the processes; it's useful to know that a key element of the processes are something called the "categories of legitimate reservation", which have to do with logical proof and well-formedness of argument. They are a key part of constructing and critiquing the semantic maps that are created by the thinking processes.

For example, ToC's conflict resolution method effectively maps out certain implicit assumptions in a conflict, and then invites you to logically disprove these assumptions in order to break the conflict. (That is, if you can find a circumstance where one of those assumptions is false, then the conflict will no longer exist under that circumstance - and you have a potential way out of your dilemma.)

So, in short, ToC thinking processes are mostly about constructing past, present, or future semantic maps of a situation, and applying systematic logic to validating (or invalidating) the maps' well-formedness, as a way of solving problems, creating plans, etc. Very core rationalist stuff, from an instrumental-rationality POV.]

Replies from: None

↑ comment by [deleted] · 2015-07-15T16:10:04.854Z · LW(p) · GW(p)

That's really impressive. I always wonder how customer service works in big business. Companies like Moo (their chat support on the web) and Optus (phone calls) are blissful, whereas most places are terrible. I'm curious about the determinants. I'm also curious about how companies that are receptive to insight from within their ranks, like yours, fair objectively and in terms of what subjective experience I'd get.

comment by [deleted] · 2010-03-14T06:40:12.537Z · LW(p) · GW(p)

I am reminded of one of Dijkstra's sayings:

To this very day we have organizations that measure "programmer productivity" by the "number of lines of code produced per month"; this number can, indeed, be counted, but they are booking it on the wrong side of the ledger, for we should talk about "the number of lines of code spent".

comment by djcb · 2010-03-14T10:15:50.518Z · LW(p) · GW(p)

So, in short: incentives can have unintented consequences, as the incentives influence whatever you want to influence with them.

There are a lot of examples of this in e.g. Dan Ariely's book and Freakonomics.

But the best example must be the bizarre 1994 footbal (soccer) match between Barbados and Grenada. Barbados needed to win with a two goal difference.

The special incentive here was that any goal scored in the extra time would count double. Now, shortly before the end of the regular time, it was 2-1 for Barbados. Imagine what happened...

(edit: added the note about the two-goal difference, thanks Hook)

Replies from: Hook

↑ comment by Hook · 2010-03-14T14:09:53.624Z · LW(p) · GW(p)

It's an important note for the soccer game that Barbados needed to win by two points in order to advance to the finals. Otherwise, Grenada would go to the finals. Now people have a chance of imagining what happened.

comment by Mass_Driver · 2010-04-11T07:00:39.223Z · LW(p) · GW(p)

Goodhart's Law starts some other way. It's not quite right to say:

Superiors want an undefined goal G.

Mathematically speaking, the problem can't be that G is undefined. If G were really undefined in any absolute sense, then superiors would be indifferent to all possible outcomes, or would choose their utility function literally at random. That rarely happens.

Instead, the problem could be that G is difficult to articulate. It is "undefined" only in the sense that people have had trouble coming up with an explicit verbal definition for it. i know what I want and how to get it, but I don't know how to communicate that want to you ex ante. For example, maybe I want you (the night shift manager) to page me (the owner) whenever there's a decision to make that could affect whether our business keeps a client, but I've never taken any business classes and don't quite have the vocab to say that, so instead I say to only page me if it's "important." "Important" is vague, but "important' is just a map, and the map is not the territory.

Alternatively, the problem could be that G is difficult to commit to. I can define my goal in words just fine today, but I know (or you suspect) that later I will be tempted to evaluate you by some other criterion. For example, I would like to give a raise to whichever police officer does the most to keep his beat safe, and, as a thoughtful and experienced police chief, I know exactly what the difference is between a safe neighborhood and an unsafe neighborhood, and I'm happy to explain it to anyone who's interested. As one of my employees, though, you can't verify that I'm actually rewarding people for making neighborhoods safe, and not, say, giving raises to people who bring in the most money for drug busts, or who artificially lower their crime statistics, or who give me a kickback. It might make more sense for me to just announce that I'll pay people based on hours worked and complaints lodged, because that announcement is more verifiable, and thus more credible, so at least I'll be viewed as evenhanded.

Finally, as you've already pointed out, the problem could be that G is difficult or expensive to measure. Alternative measures of GDP that take into account factors like health, leisure, and environmental quality have gotten pretty good about specifying what health is, and it's easy enough to pass laws that commit agencies to valuing health in a particular way, but it's expensive to measure health, especially in any broad sense. A physical is $60; an exercise fitness exam is another $45; an STD test runs about $20; a battery of prophylactic tests for cancer and heart disease and so on is another $100 or so; a mental health exam is another $80, and then you multiply all that by the size of a valid random sample and we're talking real money. In my opinion, it would be money very, very well spent, but one can understand why GDP - which can be measured just by asking the IRS for a copy of its tax receipts - is such a popular metric. It's cheap to use.

Replies from: Davidmanheim, bigjeff5

↑ comment by Davidmanheim · 2017-03-03T15:54:13.042Z · LW(p) · GW(p)

I partly disagree. Simple metrics are used in place of complex goals, for good reason; https://www.ribbonfarm.com/2016/06/09/goodharts-law-and-why-measurement-is-hard/

Then the fact that the goal is too simply defined allows flexibility to be abused; https://www.ribbonfarm.com/2016/09/29/soft-bias-of-underspecified-goals/

↑ comment by bigjeff5 · 2011-01-27T05:53:06.393Z · LW(p) · GW(p)

G is a variable. It must be undefined by definition, or it is not a variable. A variable's definition changes by context, therefore outside of context it is always undefined.

That's why we use X instead of the number 2 in algebraic formulas. You wouldn't say 2 - 3 = 8, solve for 2, that's clearly stupid. You must use the undefined variable X (or any other mathematically irrelevant symbol), and then define it in context of the rest of the formula. Move X to a different formula, and it has a different definition. Isolate X without the context of a formula, and it is always undefined (X = ?).

In this instance, G is a variable without context. We aren't making nails of a certain size, we are just talking about G and the ways G can be used to create a metric once G is known.

comment by Stuart_Armstrong · 2010-03-14T12:41:55.141Z · LW(p) · GW(p)

CEV, until designed and defined properly, is just a black box that everyone universally agrees is 'good', but has little else in term of defining features.

Replies from: Roko, jimmy

↑ comment by Roko · 2010-03-29T20:43:37.402Z · LW(p) · GW(p)

See This paper for a relatively decent account of what CEV is getting at.

Replies from: Mass_Driver

↑ comment by Mass_Driver · 2010-04-05T14:01:30.741Z · LW(p) · GW(p)

If the question is "what should we want?" then CEV is much better than a black box, because it fleshes out some of the intuitions behind the magical category "want."

If the question is "how should we measure what we want?" then CEV is just a black box, because it doesn't solve or suggest a method for solving any of our measurement problems. We know we want coherent, extrapolated, volitional thingies, but we have no idea, for example, how to rigorously define "volitional." We likewise have no idea how far into the future we should be extrapolating things, nor how many facets of a personality or society can reasonably be expected to converge/cohere.

↑ comment by jimmy · 2010-03-14T19:08:33.641Z · LW(p) · GW(p)

It's not so much a box, but a method of filling the box. We just haven't filled the box yet.

comment by vinayak · 2010-03-14T03:54:22.562Z · LW(p) · GW(p)

The fact that students who are motivated to get good scores in exams very often get better scores than students who are genuinely interested in the subject is probably also an application of Goodhart's Law?

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2010-03-14T12:39:04.251Z · LW(p) · GW(p)

Partially; but a lot of what is being tested is actually skills correlated with being good in exams - working hard, memorisation, bending youself to the rules, ability to learn skill sets even if you don't love them, gaming the system - rather than interest in the subject.

Replies from: MichaelVassar

↑ comment by MichaelVassar · 2010-03-14T19:24:07.646Z · LW(p) · GW(p)

But those skills don't correlate with doing good science, or with good use of the subject of the exams in general, nearly so well, and they are easy to test in other ways.

comment by RobinHanson · 2010-03-14T16:23:03.294Z · LW(p) · GW(p)

Pretty much every trick in organization design or management can be thought of as a partial solution to this problem. Listing "anarchy" or "absolute authority" explicitly on a short list of solutions is therefore a bit misleading.

comment by DeevGrape · 2012-01-02T19:58:14.171Z · LW(p) · GW(p)

Goodhart's law seems very applicable to natural selection: the Blind Idiot God wants creatures to have higher fitness (G), and so creates targets that are correlated with fitness in the ancestral habitat (e.g., pleasure-seeking and pain-avoidance (G*)). Once you get creatures that are self-aware (us), they figure out G-star, and start optimizing for that instead of G.

Relevant.

comment by Morendil · 2010-03-13T14:41:49.368Z · LW(p) · GW(p)

In software development, this is (or ought to be) known as the Mini-Van Law.

Replies from: Rain, roystgnr

↑ comment by Rain · 2010-03-13T15:47:01.368Z · LW(p) · GW(p)

It made me think of the Tree Swing, insofar as it represents how difficult it can be to create and follow a good G* through the process.

↑ comment by roystgnr · 2011-07-30T03:58:13.977Z · LW(p) · GW(p)

The Minivan Law would have made a good name for Goodhart's Law for multiple reasons.

comment by HungryHobo · 2016-04-12T12:02:35.966Z · LW(p) · GW(p)

Example:

The Big Mac Index has been used to compare prices across countries, as we have noted before. Argentina currently has very high prices due to a combination of inflation and a strong economy, and this shows up glaringly in the Big Mac Index.

Tyler Cowen reports (translating a Spanish original) that the Argentinian government has persuaded McDonalds to lower the price of the Big Mac (relative to other McDonalds items, and relative to competing hamburgers), so that Brazil’s Big Mac Index becomes more competitive.

http://www.statschat.org.nz/2011/11/16/goodharts-law-and-brazilian-hamburgers/

In other words, the real price of the Big Mac rose nearly twice as much as the official statistics were willing to admit, in Argentina of course. That’s not right, so the government sprang into action. The minister of the commerce department “persuaded” McDonald’s to price the Big Mac at $16, while other sandwiches at the chain are in the $21 to $23 range.

The outlets now keep the Big Mac well-hidden

http://marginalrevolution.com/marginalrevolution/2011/11/sentences-to-ponder-32.html

comment by kodos96 · 2010-03-17T20:56:17.527Z · LW(p) · GW(p)

Getting back to trying to propose practical mitigation strategies for goodhart's law, I propose a fairly simple solution: Choose a G*, evaluate performance based on it, but KEEP IT SECRET. This of course wouldn't really work for national scale, GDP-esque kind of situations, but for corporate management situations it seems like it could work well enough. If only upper management knows what G* is, it becomes impossible to optimize for it, and everyone has to just keep working under the assumption they're being evaluated on G.

Taking it a step further, to hedge against employees eventually figuring out G* and surreptitiously optimizing for it, you could have a bounty on guessing G* - the first employee who figures out what the mystery metric G* really is gets a prize, and as soon as it's claimed, you switch to using G**

Replies from: wnoise, froob, Sniffnoy, botogol

↑ comment by wnoise · 2010-03-17T21:51:05.585Z · LW(p) · GW(p)

The hedge is absolutely necessary, elsewise, a manager will just tell subordinates what G* is in order to look impressive for managing a high-performing group.

↑ comment by froob · 2010-08-17T13:46:21.036Z · LW(p) · GW(p)

Andrew Grove (of Intel fame) wrote a book, High Output Management, suggesting that management needs two opposing metrics to avoid this problem. For example, measure productivity and number of defects, and score people on the combined results.

BTW the large nail/little nail joke has a third part. Soviet management eventually got a clue and started measuring by the value of the nails produced... and the result was the world's first solid-gold-nail factory.

↑ comment by Sniffnoy · 2010-08-17T23:09:47.376Z · LW(p) · GW(p)

Presumably finding arbitrarily many basic G*'s will be hard. Two ideas for dealing with this: 1. Even if you only have finitely many and they're all known, you could select one at random each time there's a switch. 2. Each time there's a switch, select a somehow-random linear (or some other sort, if you like) combination of your basic G*'s. (That would make guessing it in the first place quite hard, actually...)

↑ comment by botogol · 2010-04-01T15:15:07.385Z · LW(p) · GW(p)

if management are doing that then are neglecting a powerful tool in their tool-kit, because announcing a G will surely cause G to fall, and experience says that to begin with a well-chosen G and G remain correlated (because many of the things to do to reduce G also reduce G). It is only over time that G* and G detach.

comment by botogol · 2010-03-16T16:28:39.932Z · LW(p) · GW(p)

At work a large part of my job involves choosing G , and I can report that Goodhart's Law is very powerful and readily observable.
Further : rational players in the workspace know full-well that management desire G, and the G is not well-correlated with G, but nonethelss if they are rewarded on G*, then that's what they will focus on.

The best solution - in my experience - is mentioned in the post: the balanced scorecard. Define several measures G1 G2 G3 and G4 that are normally correlated with G. The correlation is then more persistent : if all four measures improve it is likely that G will improve.

G1 G2 G3 G4 may be presented as simulaneous measures, or if setting four measures in one go is too confusing for people trying to prioritise (the frwer the measures the more powerful) they can be sequential. IE If you hope to improve G over 2 years, then measure G1 for two quarters, then switch the measurement to G2 for the next two and so on. (obviously you don't tell people in advance). NB this approach can eb effective, but will make you very unpopular.

Replies from: Christian_Szegedy

↑ comment by Christian_Szegedy · 2010-05-27T19:05:34.170Z · LW(p) · GW(p)

measure G1 for two quarters, then switch the measurement to G2 for the next two and so on. (obviously you don't tell people in advance).

Why obviously? Are you so afraid that people would do the right thing without immediate incentives?

I think I'd measure G1 first, but would tell in advance that next quarter we will measure that one of G1,G2,G3,... which will be most critical at the beginning of that quarter.

comment by NancyLebovitz · 2010-03-13T17:07:17.732Z · LW(p) · GW(p)

Goodhart's Law is a very nice corollary to the Snafu Principle: Communication is impossible in a hierarchy.

Temple Grandin has written about the importance of finding relevant, measurable standards-- the example she gives is the number of cattle falling down on the way to slaughter. Not falling down means the genes, food, lighting, walking surface etc. are all good enough.

Thing to check: Do measures used as targets for policy always become completely useless, or do they sometimes become increasingly less useful, but not totally useless? Does culture matter? I suspect that the amount of judgement which people are allowed to mix into the system varies a lot.

She seems to believe that thinking visually will be more likely to produce such standards than (as most people do) verbally.

I'm not convinced of that-- dog and cat show standards are an example of well-defined visual standards not producing reliably good results.

I have no idea if it's even possible to have that good a standard for financial markets.

I've heard "you can't manage what you can't measure", but I think "you can't manage what you can't perceive" is better. Is it possible to generalize the idea of the king traveling incognito to see how the kingdom is doing?

Replies from: JamesPfeiffer, blogospheroid

↑ comment by JamesPfeiffer · 2010-03-14T05:59:11.714Z · LW(p) · GW(p)

Once management recognizes that there is something to measure, I think they do an OK job measuring it - secret shoppers come to mind. But there's something more subtle about when you take for granted that G = G* and don't even think to verbalize your true values, so can't measure them.

Replies from: NancyLebovitz

↑ comment by NancyLebovitz · 2010-03-14T18:45:06.415Z · LW(p) · GW(p)

The secret shoppers are a variant of "the king going incognito"-- but not as good in some ways because they may be tasked with evaluating according to a checklist, and thus could still be trapped by G vs G*.

I believe that the problem isn't that true values aren't verbalized, it's that they can't be fully verbalized. Language is too low-bandwidth to capture all the aspects of a situation.

The point of a king going incognito isn't just to enforce existing, verbalized rules, it's to see how things are in the kingdom. It's a bit easier for a king than an AI because a king is more like a subject than an AI is like people.

↑ comment by blogospheroid · 2010-03-15T06:29:31.591Z · LW(p) · GW(p)

Does culture matter?

Yes. Culture is partly, a process of making people behave in a predictable fashion. If you make your subordinate as similar to you as possible, then there is a good chance that he/she will perceive G instead of G . But you have to be committed to juggling G and G and take the risk that someone actively pursuing G* is not getting ahead of you.

comment by taw · 2010-03-13T12:04:06.742Z · LW(p) · GW(p)

We laugh at such ridiculous stories, because our societies are generally much better run than Soviet Russia.

That's not even true, when you measure results Soviet Union was ran about as well as any other country. They were just ran differently.

Anyway, it doesn't even seem mathematically obvious to me that optimizing for G* will reduce correlation between G and G*.

Let G=market value of nails, G*=number of nails. Once G* target is introduced everyone switches from making medium nails to making tiny nails, but correlation between market value of nails and number of nails among different factories is still very very high.

There is some loss of value from optimizing for G* as opposed to optimizing for G, but no incentive system is perfect, and incentive systems have costs, so simple incentive scheme might still be better than having to pay the cost of more accurate but far more complicated scheme.

Replies from: SilasBarta, Davidmanheim

↑ comment by SilasBarta · 2010-03-13T14:45:16.887Z · LW(p) · GW(p)

blogospheroid: When you pick a metric of success, countries will game it by doing well on that metric, yet not achieving what is really meant by success. A good example would be the Soviet Union, whose leaders constantly made sure they did well by the metrics, yet were actually far from successful.

taw: Not true -- the Soviet Union did about average by those metrics, so they had about average success.

me: *falls out of chair*

Replies from: RobinZ, taw

↑ comment by RobinZ · 2010-03-13T15:04:41.031Z · LW(p) · GW(p)

Key question: are the metrics optimized by the Soviet Union identical to the metrics suggested to evaluate the success of the Soviet Union?

↑ comment by taw · 2010-03-13T16:58:47.342Z · LW(p) · GW(p)

You misread me. GDP is one of those really-hard-to-game high-correlation-with-everything-meaningful metrics, and Soviet Union did ok with other metrics like access to clean water, electricity etc.; life expectancy, child mortality, and pretty much everything else you can think of.

People's claim that Soviet Union was a disaster as if it was a well established fact, while it was not. South America was a disaster. India was a disaster. Indonesia was a disaster. Africa was a disaster. Soviet Union and other Communist countries were fairly average.

What you're saying is basically "Soviet Union was unsuccessful and I base it on my feelings about it and no metrics of any kind".

Replies from: knb, James_K, gwern, PhilGoetz, knb

↑ comment by knb · 2010-03-13T20:22:06.379Z · LW(p) · GW(p)

The Soviet Union's GDP was approximately half military spending. In other words, at least half of Soviet GDP was an almost complete dead-weight loss to the citizens. GDP is just an aggregation of total spending, if money is spent on pure dross, it still shows up as an improvement in GDP.

↑ comment by James_K · 2010-03-13T22:28:33.924Z · LW(p) · GW(p)

The Soviet Union killed tens of millions of its people through work camps, starvation and purges. How is that not a disaster?

And GDP is not hard to game if you're centrally planning the economy. And even that excludes simply lying about your figures.

Replies from: knb, Douglas_Knight

↑ comment by knb · 2010-03-14T07:54:54.955Z · LW(p) · GW(p)

The Soviet Union killed tens of millions of its people through work camps, starvation and purges. How is that not a disaster?

This is a really good point. It kind of goes to the heart of the emptiness of GDP as a statistic. What was the death toll of a 5% increase in GDP? The Soviet economy also was infamous for overproducing big capital goods but failing to produce consumer goods people actually wanted to buy.

Replies from: ggg

↑ comment by ggg · 2010-03-14T09:30:27.400Z · LW(p) · GW(p)

Similarly, Ceauşescu supposedly managed to clear Romania's national debt, but, in doing so, impoverished the people and crushed their spirit. The socio-cultural damage he did to the country cannot be overstated. (btw I'm half-Romanian and lived there during his reign).

↑ comment by Douglas_Knight · 2010-03-14T02:27:54.444Z · LW(p) · GW(p)

And GDP is not hard to game if you're centrally planning the economy. And even that excludes simply lying about your figures.

But did they game GDP?

Your insinuation seems to predict that there was a dramatic drop after the fall of communism. It did fall by half, but that just brought it back to levels reported in 1985. Some communist countries, such as Poland and Hungary, barely had any dip.

Russia continued to tumble, which is consistent with what was going on. If you trust western GDP figures, the most skeptical position I can imagine is taking the 1997 figures as a proxy for the 1985 figures, and concluding a 1.5x fudging. Which is not much for taw's purposes.

Replies from: James_K

↑ comment by James_K · 2010-03-15T04:29:34.130Z · LW(p) · GW(p)

That accounts for deception, but not the difference between GDP and true economic value added, which is the whole reason GDP was raised as an example. Communist countries game GDP by getting people to produce large quantities of worthless goods (like gigantic nails). Those giant nails added to GDP, but they make anybody better off?

GDP is an approximate measure of material well being. If your economic success metric classifies a society with mass starvation and routine shortages of basic goods as being as successful as a society that doesn't, then your metric is busted.

Replies from: wedrifid, Douglas_Knight

↑ comment by wedrifid · 2010-03-15T04:53:36.328Z · LW(p) · GW(p)

Communist countries game GDP by getting people to produce large quantities of worthless goods (like gigantic nails).

At least, that is the cover story that Naily uses to hide his tracks. Clippy, start taking notes!

↑ comment by Douglas_Knight · 2010-03-15T05:07:06.326Z · LW(p) · GW(p)

There are several points here. What I endorse is what I took to be TAW's original point: people laugh at these stories and reinforce basically false beliefs about Soviet efficiency. The stories about tiny nails are true, but they are not representative. For these purposes, it is irrelevant if the goal of the efficiency was military production. The work camps are relevant if that is how they achieved efficiency, but I don't think that's a popular belief.

Also, people compiling GDP, like the CIA, try not to count worthless goods. They also compiled civilian consumption, if you'd like to try to exclude military spending, but I don't know where the data is.

I'm not sure I endorse the use of GDP for general success of society. It is very convenient to talk about relative changes in GDP, though. No one is claiming that the USSR was a rich society, only that its GDP was multiplied by a reasonable number over the course of the century. But I am claiming that it didn't suffer mass starvation after Stalin.

Replies from: beoShaffer, James_K

↑ comment by beoShaffer · 2013-03-31T01:22:32.202Z · LW(p) · GW(p)

The stories about tiny nails are true

Do I have a source for this? Every thing I can find seems to point towards it being a joke.

Replies from: Douglas_Knight, Douglas_Knight

↑ comment by Douglas_Knight · 2013-03-31T21:08:34.303Z · LW(p) · GW(p)

The story of the giant nail is a joke, appearing in Krokodil, c1960. I switched back to the tiny nails because it was pretty close to anecdotes I've heard that I'm pretty sure were not jokes. But those were oral, so I can't cite them. Do you accept anecdotes from Alec Nove? I see quoted from p94 of his 1977 Soviet Economic System "It is notorious that Soviet sheet steel has been heavy and thick, for this sort of reason. Sheet glass was too heavy when it was planned in tons, and paper too thick." On p355 of the his 1969 Economic History of the USSR (or p365 of the 1993 edition):

A large number of semi-anecdotal examples can readily be assembled to illustrate the resultant irrationalities. Steel sheet was made too heavy because the plan was in tons, and acceptance of orders from customers for thin sheet threatened plan fulfilment. Road transport vehicles made useless journeys to fulfil plans in ton-kilometres. Khrushchev himself quoted the examples of heavy chandeliers (plans in tons), and over-large sofas made by the furniture industry (the easiest way of fulfilling plans in roubles). [Pravda, 2 July 1959.]

Replies from: beoShaffer

↑ comment by beoShaffer · 2013-04-01T01:46:12.655Z · LW(p) · GW(p)

Ok, that makes sense.

↑ comment by Douglas_Knight · 2013-05-02T04:10:26.662Z · LW(p) · GW(p)

Here's the original nail joke. bigger

– Кому нужен такой гвоздь?
– Это пустяки! Главное – мы сразу выполнили план по гвоздям...

↑ comment by James_K · 2010-03-16T04:39:06.652Z · LW(p) · GW(p)

Efficiency isn't just the stuff you produce, in economics its allocative efficiency (roughly the value of the stuff you produce), not mere technical efficiency that matters. GDP data is collected at a pretty high level, and I'd be surprised if the CIA could adjust effectively for low-value production. Even just looking at civilian production won't do because it doesn't account for mismatches of supply and demand e.g. twice as many shirts and half as many shoes as people demand.

Its true that the USSR grew a lot in the 1950s and 1960s, and it would be implausible to suggest it was all wastage. But that can be explained by convergence, specifically the increase in capital stock over that period. Lots of countries managed to industrialise without communism, so I can't really attribute this growth to communism per se. I'd be willing to accept this as evidence that communism wasn't a total failure (since it did produce positive side effects), but not that it was a success.

Whether mass starvation happened after Stalin is besides the point. Stalin was part of the system. There's no reason why the USSR should have had famine when western countries had no difficulty, so I think any starvation is attributable to communism.

↑ comment by gwern · 2010-03-14T02:11:39.185Z · LW(p) · GW(p)

Soviet Union did ok with other metrics like access to clean water, electricity etc.; life expectancy, child mortality, and pretty much everything else you can think of.

Is that really true?

http://en.wikipedia.org/wiki/Suppressed_research_in_the_Soviet_Union#Statistics

By the numbers, Russia fell off a cliff when the USSR dissolved; I've always wondered how much of that was genuinely due to transition troubles, kleptocracy etc. and how much was just poor USSR performance finally showing up in the statistics.

Replies from: taw

↑ comment by taw · 2010-03-14T08:44:34.170Z · LW(p) · GW(p)

It was genuinely due to transition troubles. Many former Communist countries did reasonably well in transition - usually those that were close enough to the west that they could switch their trade patterns effectively and not be caught up in the mess; and you really have no way to fake life expectancy and such - which suffered a lot in the most transition-affected countries like Russia.

Replies from: MichaelVassar, IlyaShpitser

↑ comment by MichaelVassar · 2010-03-14T19:34:06.748Z · LW(p) · GW(p)

citation needed

↑ comment by IlyaShpitser · 2013-08-04T08:21:42.360Z · LW(p) · GW(p)

The USSR was a dump. Saying things like

"That's not even true, when you measure results Soviet Union was ran about as well as any other country. They were just ran differently."

is revisionism of the worst kind. I think a simple but informative hypothesis is that Putin's Russia is mostly the same place with the same institutions in charge, but sans the explicit communist ideology.

↑ comment by PhilGoetz · 2010-03-13T23:34:19.309Z · LW(p) · GW(p)

South America was a disaster. India was a disaster. Indonesia was a disaster. Africa was a disaster. Soviet Union and other Communist countries were fairly average.

If you just consider the endpoint, maybe. But why would you do that? What would you be trying to show?

IMHO, if we consider the time period 1918-1990, South America, India, Indonesia, and Africa - not to mention China, Japan, Mexico, and, gee, that's pretty much the whole world, isn't it? - all made more progress than the Soviet Union did. East Germany and large parts of eastern Europe probably made negative economic progress. It doesn't impress me that they were still better-off than parts of Africa after 50 years of decline.

Replies from: simplyeric, FAWS

↑ comment by simplyeric · 2010-03-14T15:50:33.285Z · LW(p) · GW(p)

When discussing the Soviet Union, and more specifically Russia, you have to also consider the beginning point as well. It should be noted how far behind Russia was compared to the rest of Europe in 1918. Coming out of abject serfdom bordering on generalized slavery, they actually made tremendous progress in both abstract metrics and tangible result in quality of life up until the late 50's or early 60's. Over time that then declined.
In any case, to an extent their G was "production", measurable production, in the sample case: nails. Their G was not the market value of nails, their G was "progress through central planning", but they didn't know how to measure "progress" except through the early-capitalist metrics of "production". Thus: produce more = progress, in their practice. Our G is GDP. People seem so happy with our GDP, without reflecting on things like income disparity, striation of wealth, etc. If we allow it, we can G* ourselves into a mutant 3rd world nation, with great GDP performance but declining quality of life generally. G is quality of life. Economists, and lay people, generally equate the two, and the correlate generally, but they are not irrevocably entangled.

↑ comment by FAWS · 2010-03-14T00:15:31.208Z · LW(p) · GW(p)

I'm far from an expert on economic history, but I don't think you can reasonably say that South America in general did better economically. You say that the endpoint for the Soviet Union was better, lets say they were about equal. But South America at the beginning of the 20th century was reasonably well developed economically, Argentinia in particular was pretty much on the same level as western Europe, far ahead of Russia which was still rather backwards, even though it was rapidly developing (largely based on mostly French loans/investments). I'm not so sure about south America as a whole, probably slightly ahead or about even. And then Russia got thoroughly wrecked by two world wars while South America was untouched. I think the Soviet Union wins, even though by how much isn't clear.

EDIT: Looked up some figures, seems like the Russian Empire had about 2.5 times the population and 2.5 times the GDP of Latin America, so about even was right.

Replies from: taw

↑ comment by taw · 2010-03-14T08:41:35.643Z · LW(p) · GW(p)

Most people on this supposedly rationalist site don't even bother looking at the data when it comes to Soviet Union - they get instant emotional reaction. In case you're one of those who actually care about the data, here I made it easier for you.

Replies from: wedrifid, RobinZ, Jack, FAWS

↑ comment by wedrifid · 2010-03-14T13:29:16.016Z · LW(p) · GW(p)

Most people on this supposedly rationalist site don't even bother looking at the data when it comes to Soviet Union - they get instant emotional reaction.

How exactly have you determined the instant emotional reaction of most of the people on this site in response to the Soviet Union? I haven't seen most people even comment on the subject, much less display obvious evidence emotional involvement.

Did you actually think through your estimates of soviet-emotionalism in the population or is this a case of "the pot calling a non representative sample of kettles black"?

↑ comment by RobinZ · 2010-03-14T15:14:16.141Z · LW(p) · GW(p)

I'll agree that "most people [...] don't even bother looking at the data [...]" - I, in particular, am not sufficiently invested in this argument to go to the inconvenience of reading a PDF. The effect of modifiers "this site" and "Soviet Russia" I have no interesting opinion on.

(By the way: horrible format for Internet content. If you can read this, please don't upload your information to the Internet in PDF format. Make an HTML file.)

Replies from: ata, taw

↑ comment by ata · 2010-03-15T04:14:12.654Z · LW(p) · GW(p)

I, in particular, am not sufficiently invested in this argument to go to the inconvenience of reading a PDF.

Here's an HTMLized version, albeit one that still looks like a PDF (though one you don't have to download, doesn't use any browser plugins, and can't give you a virus).

↑ comment by taw · 2010-03-14T19:09:09.255Z · LW(p) · GW(p)

We have to learn to live with PDFs as virtually all research is formatted as PDFs. Sane (single column portrait-only) PDFs like the linked paper are not particularly worse than constant-width websites. You are exaggerating the inconvenience.

The problem are PDFs which do things that make sense only on paper - like double column / alternating portrait-landscape - these are really really bad for reading on screen. But - what stops PDF readers from having some hacks to make them bearable? I cannot think of any reason. And it would definitely be easier to hack PDF readers than to make all researchers and all research journals in the world switch to HTML.

Related problem of tables being in appendix as opposed to floating seems harder to solve, but it's nowhere near as bad as double columns PDFs.

Replies from: RobinZ

↑ comment by RobinZ · 2010-03-14T19:19:26.958Z · LW(p) · GW(p)

The biggest three problems with PDFs as a format for Internet content are:

The text display does not adapt to your window.
Viewing the content requires running additional processes, adding CPU and memory usage.
PDF viruses.

You pointed out (1), but (2) is no less annoying to me personally. That said: yeah, I got no control over this.

Replies from: taw

↑ comment by taw · 2010-03-14T19:52:07.166Z · LW(p) · GW(p)

(1) is genuine (but then many websites assume constant width, so it's not PDF-exclusive issue), but (2) and (3) sound totally made-up to me. Browsers have had far more security vulnerabilities, and use far more CPU/memory than PDF readers.

Replies from: Cyan, FAWS, RobinZ

↑ comment by Cyan · 2010-03-14T20:24:25.334Z · LW(p) · GW(p)

PDF viruses exist.

Replies from: pjeby

↑ comment by pjeby · 2010-03-14T20:53:54.331Z · LW(p) · GW(p)

Indeed, I've been infected by PDF-based viruses more than once. Updating Acrobat and turning off JavaScript in PDFs isn't enough to keep you safe, either; I finally added NoScript to Firefox in order to prevent any PDFs from being displayed without an extra enabling click, so that only PDFs I trust are ever downloaded.

Of course, this has little relevance to scientific papers: the PDFs that you need to worry about are the ones that you never intended to download in the first place, that are downloaded in the background via JavaScript or an iframe embedded in an ad on a random webpage. (I once caught one from Kaj Sotala's LiveJournal page, for example... just visiting the page was enough to infect my machine.)

↑ comment by FAWS · 2010-03-14T20:48:36.885Z · LW(p) · GW(p)

But a browser alone will have fewer vulnerabilities (and probably use less resources) than a browser + a PDF reader.

↑ comment by RobinZ · 2010-03-15T05:02:14.869Z · LW(p) · GW(p)

Nearly echoing FAWS, a browser alone will have less CPU/memory usage than a browser+a PDF reader. More importantly, there is no delay to load the PDF viewer when visiting an HTML page, where there is for PDFs.

↑ comment by Jack · 2010-03-14T20:10:33.603Z · LW(p) · GW(p)

I'm not particularly invested in the issue but it seems like you're underestimating the importance of the Eastern and Western Germany diversion. That is about as close as we're ever going to get to having actually experimental conditions to test this hypothesis. We have one nation, divided it in half, structured their economies in accordance with the leading theories of the time, let them develop, and found a clear winner. Of course there are possible sources of error and maybe there are reasons to think the lessons learned in Germany don't apply to the rest of the Soviet bloc, but this is about as compelling as evidence gets in economics.

Replies from: taw

↑ comment by taw · 2010-03-14T20:22:22.823Z · LW(p) · GW(p)

It's very compelling evidence that common knowledge is wrong, as GDP per capita ratios of West and East Germany were virtually identical in 1950 and 1990.

All the difference happened during the war (East Germany suffered from incomparably more fighting and destruction than West Germany) and earliest years of occupation (Soviets plundered everything they could and destroyed the rest; while Western Allies gave massive levels of economic aid in form of the Marshal Plan).

German experience is a great proof that the difference in economic performance between Communism and Capitalism is minor.

Replies from: FAWS

↑ comment by FAWS · 2010-03-14T21:00:04.813Z · LW(p) · GW(p)

I was going to raise two of the same points (Marshal plan and Soviet looting), but I would consider only managing to not fall even further behind a relative failure. And that's with both the FRG (e. g. giving the GDR access to the EC market) and the Soviets ( can't find a source right now but IIRC they tried to prove that the socialist system could allow a high standard of living) trying to prop up their economy towards the end of that period.

Replies from: taw

↑ comment by taw · 2010-03-14T21:30:44.406Z · LW(p) · GW(p)

I would consider only managing to not fall even further behind a relative failure.

The paper I've linked to so many times deals with this converge question. There seems to be no evidence for any kind of global economic convergence - or any worldwide correlation between economic levels and economic growth - you seem to only converge to levels of your geographically close trading partners. East Germany mostly traded with countries even poorer than itself. West Germany traded mostly with very rich countries.

Of course you could ask question like "so why didn't the trade more with the Western Europe and USA etc.", but you could be asking the same question about Mexico, Argentina, Indonesia, New Zealand, and countless other countries which did worse than Communist average.

Overall, evidence for Communism being an economic failure is shockingly underwhelming relative to how widely and strongly it is believed.

Replies from: FAWS

↑ comment by FAWS · 2010-03-14T22:09:22.066Z · LW(p) · GW(p)

I think recovery after war and plundering is a bit different than normal convergence. Wrecked developed nations don't behave like developing nations of the same GDP. Since a main difference was East Germany being even more wrecked more of their GDP growth should have been of the easier rebuilding/recovery sort.

↑ comment by FAWS · 2010-03-14T10:56:59.198Z · LW(p) · GW(p)

Heh, the third paragraph sounds rather familiar :).

↑ comment by knb · 2010-03-13T20:21:02.507Z · LW(p) · GW(p)

The Soviet Union's GDP was approximately half military spending. In other words, at least half of Soviet GDP was an almost complete dead-weight loss to the citizens.

↑ comment by Davidmanheim · 2018-11-06T08:52:45.945Z · LW(p) · GW(p)

Anyway, it doesn't even seem mathematically obvious to me that optimizing for G* will reduce correlation between G and G*.

See Greg Lewis's post here: https://www.lesswrong.com/posts/dC7mP5nSwvpL65Qu5/why-the-tails-come-apart and Scott Alexander's discussion here: http://slatestarcodex.com/2018/09/25/the-tails-coming-apart-as-metaphor-for-life/

Also see our paper formalizing the other Goodhart's Law failure modes: https://arxiv.org/abs/1803.04585

comment by cousin_it · 2010-03-13T11:41:08.819Z · LW(p) · GW(p)

1) Examples of G* should be given a cost-benefit analysis. Yeah, scammers and parasites exist, but societies that use money still seem to better off than societies that try to get rid of it.

2) It's unclear to me why you list CEV as one of the solutions. We use money to allocate limited resources. If magic nano-AI appears and resources become unlimited, why keep score at all? If it doesn't and resources stay limited, how does CEV help you distribute bread, and would you really like it to replace money? (I wouldn't. No caring daddies for me, please.)

Replies from: FAWS

↑ comment by FAWS · 2010-03-13T11:57:59.108Z · LW(p) · GW(p)

In the case of a FAI G would be friendliness and G the friendliness definition. Avoiding a Goodhart's Law effect on G is pretty much the core of the friendliness problem in a nutshell. An example of such a Goodhart's Law effect would be the molecular smiley faces scenario.

Replies from: cousin_it

↑ comment by cousin_it · 2010-03-13T12:03:26.420Z · LW(p) · GW(p)

Ah, sorry. I've read the post as saying something different from what it actually says.

Replies from: blogospheroid

↑ comment by blogospheroid · 2010-03-15T06:11:27.925Z · LW(p) · GW(p)

Good discussion.

The point I wanted to make was about Extrapolated volition as a strategy to avoid Goodhart's law issues. If you extrapolate the volition of a person towards the "person he/she wants to be" and put a resulting goal as G*, it will be pretty much close to G as can be. I presented CEV as an example, since the audience is more familiar with it.

And FAWS, your definition of G and G* in the friendliness scenario is perfect. I've nothing more to add there.

comment by JanetK · 2010-03-13T09:11:24.349Z · LW(p) · GW(p)

I noticed this tendency in British running of hospitals, schools and police forces. The gov got hooked on the idea of targets and not on medicine, education and public order.

Replies from: taw

↑ comment by taw · 2010-03-13T12:07:47.231Z · LW(p) · GW(p)

And yet, correlation between government targets and results is probably a lot higher than correlation between teachers/doctors/policemen fuzzy ideas how their job should be done and results.

Replies from: MichaelVassar

↑ comment by MichaelVassar · 2010-03-14T19:26:35.010Z · LW(p) · GW(p)

Maybe, but how probably and how much? Why do you think that? For which governments? By what measure?

comment by 110phil · 2010-03-13T15:07:39.157Z · LW(p) · GW(p)

I wish there were some examples (other than the Soviet nails) ... if I had some better idea of what G and G* might actually represent, I'd be able to more easily get my head around the rest of the post.

Replies from: Morendil, Unnamed, JenniferRM, CronoDAS, Sniffnoy, JamesAndrix, dlthomas, 110phil

↑ comment by Morendil · 2010-03-14T13:25:02.201Z · LW(p) · GW(p)

I'm surprised no one has yet brought up (G*) the LW karma system as a proxy for (G) contributing to "refining the art of human rationality".

Replies from: jimmy, CronoDAS

↑ comment by jimmy · 2010-03-14T19:05:19.750Z · LW(p) · GW(p)

LW karma is an interesting example because no one has direct access to the karma giving algorithm.

It's a bit like telling the nail factory that you're going to evaluate them on something, but not telling them whether its nail mass or number or something else until the end of the evaluation period.

If the one being evaluated knows nothing about how he's going to be evaluated except that it's going to be a proxy for goodness, then he can't really cheat. However, they might know that it's going to be very simple criteria so they make a very massive nail and many miniature ones.

Replies from: Kaj_Sotala

↑ comment by Kaj_Sotala · 2010-03-14T21:05:13.822Z · LW(p) · GW(p)

This reminds me of the way I hear they do state censorship in China. The censoring agencies don't actually give out any specific guidelines on what is allowed and what isn't, instead just clamping down on cases they do consider to be over the line. As a consequence, everyone self-censors more than they might with specific guidelines: with the guidelines, you could always try to twist their letter to violate their spirit. Instead, people are constantly unsure of just exactly what will get you in trouble, so they err on the side of caution.

While I strongly oppose state censorship, I can't help but admire the genius in the system.

Replies from: Emile, TheAncientGeek, None

↑ comment by Emile · 2011-12-06T21:16:23.238Z · LW(p) · GW(p)

Also, unlike Saudi Arabia, they don't make many efforts to block pornography. As a result, the average Chinese teen is less likely to know how to access blocked sites than the average Saudi teen is (or so I read; I'm not aware of any study on that).

↑ comment by TheAncientGeek · 2015-04-16T09:20:26.224Z · LW(p) · GW(p)

Or section 28 , which didn't forbid the discussion of homosexuality in the classroom, only its promotion....but since promotion wasn't defined, schools erred on the side of not mentioning it.

↑ comment by [deleted] · 2011-12-06T19:07:24.412Z · LW(p) · GW(p)

Depressing. This would mean that most informal norms of censorship are much more resilient and effective than most formal laws censoring material.

Arguably this makes them much harder to dislodge than even the intentionally vague Chinese law. Since I guess you can't really be prosecuted under it by pointing out there is a censorship law right?

↑ comment by CronoDAS · 2010-03-14T17:40:38.665Z · LW(p) · GW(p)

I never thought of the LW karma system a proxy for that.

Replies from: Morendil

↑ comment by Morendil · 2010-03-14T17:57:26.450Z · LW(p) · GW(p)

What is your interpretation of it? It seems a pretty plausible hypothesis to me that it's a proxy for something, and has come to be relied upon as such. If we think Goodhart's Law applies in the case of karma, the final prediction in the "speculative origin" section might be something to be concerned about.

Replies from: CronoDAS

↑ comment by CronoDAS · 2010-03-14T18:23:32.013Z · LW(p) · GW(p)

I think of it as a proxy for "valued member of the community" - if someone has karma, then people like their posts and comments. I'm mostly here to have fun and pass the time, and I happen to find discussing rationality to be fun. I don't really expect refining the art of human rationality to be well-correlated with a popularity contest.

Replies from: Morendil

↑ comment by Morendil · 2010-03-14T18:33:30.843Z · LW(p) · GW(p)

And do you think Goodhart's Law, as presented in the post, applies here? That is, we should expect that eventually people (through gaming the system) end up with high karma without that in fact reliably correlating with being valued members of the community?

Replies from: CuSithBell, Jack, CronoDAS, wedrifid

↑ comment by CuSithBell · 2011-05-12T20:38:14.623Z · LW(p) · GW(p)

As a data point, one thing I've noticed that seems to give a disproportionate amount of karma is arguing with someone who's wrong and unwilling to listen. It's easy to think they might come around eventually, and each point you make against them is worth a few points of karma from the amused onlookers or fellow arguers - which might tell you that you're making a valuable contribution, and so encourage you to keep arguing with trolls. This is my impression, at least.

Edit: (The problem being - determining the point of diminishing returns.)

↑ comment by Jack · 2010-03-14T19:31:32.559Z · LW(p) · GW(p)

Except we're like the self-employed in this regard. You can't do anything with karma. It won't impress your boss. It is just a way of quantifying how valued you are by the community. An employee doesn't really care about G at all. She cares about G because that's what impresses the boss which furthers her own goals. But if you are your own boss you do care about G, G is just an easy way to measure it. For me at least, this is the case with karma. I can't do anything with the number but it suggests that people like me.

So perhaps revenue sharing is a way to help address the problem. Instead of trying to come up with ways to measure what you care about, make the people beneath you care about it too. Of course this is a lot easier with money than it is with values.

Replies from: Alicorn

↑ comment by Alicorn · 2010-03-14T21:30:17.272Z · LW(p) · GW(p)

My boss cares about karma.

↑ comment by CronoDAS · 2010-03-14T18:35:56.120Z · LW(p) · GW(p)

Only if people care about having high karma. It's probably fairly easy to game karma by making multiple accounts and voting yourself up, but why bother?

↑ comment by wedrifid · 2011-05-12T20:50:47.514Z · LW(p) · GW(p)

And do you think Goodhart's Law, as presented in the post, applies here? That is, we should expect that eventually people (through gaming the system) end up with high karma without that in fact reliably correlating with being valued members of the community?

What? You mean Karma doesn't reliably correlate with objective worth of the individual? Damn.

↑ comment by Unnamed · 2010-03-14T02:36:07.124Z · LW(p) · GW(p)

In education, this is one of the criticisms of high-stakes testing: you'll just get schools teaching to the test, in ways that aren't correlated to real learning (the test is G*, real knowledge/learning is G). People say the same thing about the SAT and test prep - kids get into better colleges because they paid to learn tricks for answering multiple choice questions. The Wire does a great job of showing the police force's efforts to "juke the stats" (e.g. counting robberies as larcenies) so that crime statistics (G*) look better even while crime (G) is getting worse. Athletes get criticized for playing for their stats (G*), or trying to pad their stats, instead of playing to win, when the stats are supposed to be a measure of how much a player has contributed to his team's chances of winning (G). I'm not sure if it's historically accurate, but I've heard that body count (G*) was used by the US as one of the main metrics of success (G) in the Vietnam war, and as a result we ended up with a bunch of dead bodies but a misguided war.

In general, any time you measure something you care about in order to incentivize people, or to hold people accountable, or to keep track of what's going on, and the thing you measure isn't exactly the same as the thing that you care about, there's a risk of figuring out ways to improve the measurement that don't translate into improvements on the thing that you care about.

↑ comment by JenniferRM · 2010-03-14T06:53:05.621Z · LW(p) · GW(p)

The health and/or beauty of a woman (G) and her scale reported weight (G*) which might be somewhat correlated under some circumstances, but are definitely not identical and can diverge rather sharply due to crazy diets.

↑ comment by CronoDAS · 2010-03-13T19:09:47.170Z · LW(p) · GW(p)

Here's a few.

↑ comment by Sniffnoy · 2010-03-13T19:06:12.426Z · LW(p) · GW(p)

Well there's a few described here, for instance: http://lesswrong.com/lw/le/lost_purposes/

↑ comment by JamesAndrix · 2010-03-14T07:17:20.606Z · LW(p) · GW(p)

Products that are good for humanity, and products that are profitable

↑ comment by dlthomas · 2011-12-06T19:30:20.160Z · LW(p) · GW(p)

Call time (G) or calls taken (G) in a call center, where what they care about is customer satisfaction (G) (at least inasmuch as it serves profitability).

↑ comment by 110phil · 2010-03-13T19:38:15.095Z · LW(p) · GW(p)

Thanks,

comment by Academian · 2010-03-13T11:10:10.909Z · LW(p) · GW(p)

I suggest editing a "summary break" into this post to create a "continue reading" link on the frontpage. It's the 6th button from the left atop the editing interface.

Replies from: blogospheroid

↑ comment by blogospheroid · 2010-03-14T07:30:21.399Z · LW(p) · GW(p)

Someone already did it for me. But I will note it from next time. Thanks.

comment by ChristianKl · 2019-07-15T16:12:59.687Z · LW(p) · GW(p)

Diversity of an ecosystem is a way to reduce the impact of Goodhart's law. If different universities would use very different G* for their hiring decisions it would be harder for young researchers to optimize for any particular G*.

comment by ChristianKl · 2019-07-15T16:01:37.333Z · LW(p) · GW(p)

I don't see how hierarchical rule is a solution. Hierarchy requires the people at the top of the hierarchy to give order to people at the bottom to achieve certain outcomes and measure whether those outcomes are achieved.

comment by FiftyTwo · 2012-01-07T03:24:27.527Z · LW(p) · GW(p)

Often a goal set is not based on a single set of arguments justifying it, but because it is a good compromise point between multiple arguments, motivations or interest groups. For example human rights formulations don't perfectly fulfill any groups desires (utilitarians, egalitarians, deontological groups, religious motivations etc.) but are a point of overlap between their goal sets (both utilitarians and deontologists both think torture and murder are generally bad). Similarly with GDP, economic growth is a shared interest of several groups in society.

So some instances of goodhearts law may be an observation that particular sets of goals are not being perfectly fulfilled.

comment by xamdam · 2010-05-13T03:38:07.003Z · LW(p) · GW(p)

body of knowledge of theory of constraints is a very good starting point for formulating better measures for corporates

I've had some interest in TOC, could you please expand on how it works to get G* closer to G?

Generally I've found TOC to be some really interesting semi-scientific stuff mixed with a ton of self promotion by goldratt.

comment by Mike Bishop (MichaelBishop) · 2010-05-12T20:05:03.310Z · LW(p) · GW(p)

The Importance of Goodhart's Law

Contents

A speculative origin of Goodhart's law

The mitigations to Goodhart's law

Hansonian Cynicism

Better measures

Balanced scorecards

Optimization around the constraint

Extrapolated Volition

Solutions centred around Human discretion

Left Anarchist ideas

Hierarchical rule

123 comments