Less Competition, More Meritocracy?
post by Zvi · 2019-01-20T02:00:00.974Z · LW · GW · 19 commentsContents
I. The Basic Model and its Central Point II. What To Do, and What This Implies, If This Holds III. Relax Reflective Equilibrium IV. Allow Continuous Skill Levels V. Multi-Stage Process VI. Pricing People Out VII. Taking Extra Risk is Hard VIII. Central Responses IX. Practical Conclusions Rule 1: Pool quality on the margin usually matters more than quantity. Rule 2: Once your application pool probably includes enough identifiable top-quality candidates to fill all your slots, up to your ability to differentiate, stop looking. Rule 3: Weak candidates must either be driven away, or rewarded for revealing themselves. If weak candidates can successfully fake being strong, it is worth a lot to ensure that this strategy is punished. Rule 4: Sufficiently hard, high stakes competitions that are vulnerable to gaming and/or resource investment are highly toxic resource monsters. Rule 5: Rewards must be able to step outside of a strict scoring mechanism. Rule 6: Too much knowledge by potential competitors can be very bad. None 19 comments
Analysis of the paper: Less Competition, More Meritocracy (hat tip: Marginal Revolution: Can Less Competition Mean More Meritocracy?)
Epistemic Status: Consider the horse as if it was not a three meter sphere
Economic papers that use math to prove things can point to interesting potential results and reasons to question one’s intuitions. What is frustrating is the failure to think outside of those models and proofs, analyzing the practical implications.
In this particular paper, the central idea is that when risk is unlimited and free, ratcheting up competition dramatically increases risk taken. This introduces sufficient noise that adding more competitors can make the average winner less skilled. At the margin, adding additional similar competitors to a very large pool has zero impact. Adding competitors with less expected promise makes things worse.
This can apply in the real world. The paper provides a good example of a very good insight that is then proven ‘too much,’ and which does not then question or vary its assumptions in the ways I would find most interesting.
I. The Basic Model and its Central Point
Presume some number of job openings. There are weak candidates and strong candidates. Each candidate knows if they are strong or weak, but not how many other candidates are strong, nor do those running the contest know how many are strong.
The goal of the competition is to select as many strong candidates as possible. Or formally, to maximize [number of strong selected – number of weak selected], which is the same thing if the number of candidates is fixed, but is importantly different later when the number of selected candidates can vary. Each candidate performs and is given a score, and for an N-slot competition, the highest N scores are picked.
By default, strong candidates score X and weak candidates score Y, X>Y, but each candidate can also take on as much risk as they wish, with any desired distribution of scores, so long as their score never goes below zero.
The paper then does assumes reflexive equilibrium, does math and proves a bunch of things that happen next. The math checks out; I duplicated the results intuitively.
There are two types of equilibrium.
In the first type, concession equilibria, strong candidates take no risk and are almost always chosen. Weak candidates take risk to try and beat other weak candidates, but attempting to beat strong candidates isn’t worthwhile. This allows strong candidates to take zero risk.
In the second type, challenge equilibria, weak candidates attempt to be chosen over strong candidates, forcing strong candidates to take risk.
If I am a weak candidate, I can be at least (Y/X) as likely as a strong candidate to be selected by copying their strategy with probability (Y/X) and scoring 0 otherwise. This seems close to optimal in a challenge equilibria.
Adding more candidates, strong or weak, risks shifting from a concession to a challenge equilibria. Each additional candidate, of any strength, makes challenge a better option relative to concession.
If competition is ‘insufficiently intense’ then we get a concession equilibria. We successfully identify every strong candidate, at the cost of accepting some weak ones. If competition is ‘too intense’ we lose that. The extra candidate that tips us over the edge makes things much worse. After that, quantity does not matter, only the ratio of weak candidates to strong.
Even if search is free, and you continue to sample from the same pool, hitting the threshold hurts you, and further expansion does nothing. Interviewing one million people for ten jobs, a tenth of which are strong, is not better than ten thousand, or even one hundred. Ninety might be better.
Since costs are never zero (and rarely negative), and the pool usually degrades as it expands, this argues strongly for limited competitions with weaker selection criteria, including via various hacks to the system.
II. What To Do, and What This Implies, If This Holds
The paper does a good job analyzing what happens if its conditions hold.
If one has a fixed set of positions to fill (winners to pick) and wants to pick the maximum number of strong candidates, with no cost to expanding the pool of candidates, the ideal case is to pick the maximum number of strong candidates that maintains a concession equilibrium. With no control (by assumption) over who you select or how to select them, this is the same as picking the maximum number of candidates that maintains a concession equilibrium, no matter what decrease in quality you might get while expanding the pool.
The tipping point makes this a Price Is Right style situation. Get as close to the number as possible without going over. Going over is quite bad, worse than a substantial undershoot.
One can think of probably not interviewing enough strong candidates, and probably hiring some weak candidates, as the price you must pay to be allowed to sort strong candidates from weak candidates – you need to ‘pay off’ the weak ones to not try and fool the system. An extra benefit is that even as you fill all the slots, you know who is who, which can be valuable information in the future. Even if you’re stuck with them, better to know that.
A similar dynamic comes if choosing how many candidates to select from a fixed pool, or when choosing both candidate and pool sizes.
If one attempts to only have slots for strong candidates, under unlimited free risk taking, you guarantee a challenge equilibria. Your best bet will therefore probably be to pick enough candidates from the pool to create a concession equilibrium, just like choosing a smaller candidate pool.
The paper considers hiring a weak candidate as a -1, and hiring a strong candidate as a +1. The conclusions don’t vary much if this changes, since there are lots of other numerical knobs left unspecified that can cancel this out. But it is worth noting that in most cases the ratio is far less favorable than that. The default is that one good hire is far less good than one bad hire is bad. True bad hires are rather terrible (as opposed to all right but less than the best).
Thus, when the paper points out that it is sometimes impossible to reliably break 50% strong candidates under realistic conditions, no matter how many people are interviewed and how many slots are given out, they underestimate the chance that the system breaks down entirely into no contest at all, and no production.
What is the best we can do, if all assumptions hold?
The minimum portion of weak candidates accepted scales linearly with their presence in the pool, and with how strongly they perform relative to strong candidates. Thus we set the pool size such that this fills out the pool with some margin of error.
That is best if we set the pool size but nothing else. The paper considers college admissions. A college is advised to solve for which candidates are above a fixed threshold, then choose at random from those above the threshold (which is a suggestion one would only make in a paper with zero search costs, since once you have enough worthy candidates you can stop searching, but shrug.) Thus, we can always choose to arbitrarily limit the pool.
In practice, attempting this would change the pool of applicants. In a way you won’t like. You are more attractive to weak candidates and less attractive to strong ones. Weak candidates flood in to ‘take their shot,’ causing a vicious cycle of reputation and pool decay. You’ve not a good reach school or a safe school for a strong candidate, so why bother? If other colleges copy you, students respond by investing less in becoming strong and more in sending out all the applications, and the remaining strong candidates remain at risk.
True reflexive equilibria almost never exist, given the possible angles of response, and differences between people’s knowledge, preferences and cognition.
III. Relax Reflective Equilibrium
Even if it is common knowledge that only two candidate strengths exist, and all candidates of each type are identical (which they aren’t), they will get different information and react differently, destroying reflexive equilibrium.
Players will not expect all others to jump with certainty between equilibria at some size threshold. Because they won’t. Which creates different equilibria.
Some players don’t know game theory, or don’t pay attention to strategy. Those players, as a group, lose. Smart game theory always has the edge.
An intuition pump: Learning game theory is costly, so the equilibrium requires it to pay off. Compare to the efficient market hypothesis.
Some weak candidates will always attempt to pass as strong candidates. There is a gradual shift from most not doing so to almost everyone doing so. More weak candidates steadily take on more risk. Eventually most of them mostly take on large risk to do their impression of a strong candidate. Strong candidates slowly start taking more risk more often as they sense their position becoming unsafe.
Zero risk isn’t stable anyway without continuous skill levels. Strong candidates notice that exactly zero risk puts them behind candidates who take on extra tail risk to get epsilon above them. Zero risk is a default strategy, so beating that baseline is wise.
Now those doing this try to outbid each other, until strong candidates lose to weak candidates at least sometimes. This risk will cap out very low if strong candidates consider the risk of losing at around their average performance to also be minuscule, but it will have to exist. Otherwise, there’s an almost free action in making one’s poor performances worse, since they are already losing to almost all other strong candidates, and doing that allows one to make their stronger performances better and/or more likely.
The generalization of this rule is that whenever you introduce a possible outcome into the system, and provide any net benefit to anyone if they do things that make the outcome more likely, there is now a chance that the outcome happens. Even if the outcome is ‘divorce,’ ‘government default,’ ‘forced liquidation,’ ‘we both drive off the cliff’ or ‘nuclear war.’ It probably also isn’t epsilon. While risk is near epsilon, taking actions that increase risk will look essentially free, so until the risk is big enough to matter it will keep increasing. Therefore, every risk isn’t only possible. Every risk will matter. Given enough time, someone will miscalculate, and Murphy’s Law ensues.
Future post: Possible bad outcomes are really bad.
Stepping back, the right strategy for each competitor will be to guess the performance levels that efficiently translate into wins, making sure to maximally bypass levels others are likely to naively select (such as zero risk strategies), and generally play like they’re in a variation of the game of Blotto.
A lot of these results are driven by discrete skill levels, so let’s get rid of those next.
IV. Allow Continuous Skill Levels
Suppose instead of two skill levels, each player has their own skill level, and a rough and noisy idea where they lie in the distribution.
Each player has resources to distribute across probability. Success is increasing as a function of performance. Thinking players aim for performance levels they believe are efficient, and do not waste resources on performance levels that matter less.
All competitors also know that the chance of winning with low performance is almost zero. The value of additional performance probably gradually increases (positive second derivative) until it peaks at an inflection point, and then starts to decline as success starts to approach probability one. There may be additional quirky places in the distribution where extra performance is especially valuable. This exact curve won’t be known to anyone, different players will have different guesses partly based on their own abilities, and ability levels are continuous.
A sufficiently strong candidate, who expects their average performance to be above the inflection point, should take no risk. A weaker candidate should approximate the inflection point, and risk otherwise scoring a zero performance to reach that point. Simple.
If the distribution of skill levels is bumpy, what happens then? We have strong candidates and weak candidates (e.g. let’s say college graduates and high school graduates, or some have worked in the field and some haven’t, or whatever) so there’s a two-peak distribution of skill levels. Unless people are badly misinformed, we’ll still get a normal-looking distribution. If the two groups calculate very different expected thresholds, we’ll see two peaks.
In general, but not always, enough players will miscalculate or compete for the ‘everyone failed’ condition that trying to do so is a losing play. Occasionally there will be good odds to hoping enough others aim too high and miss.
Rather than have a challenge and a concession equilibrium, we have a threshold equilibrium. Everyone has a noisy estimate of the threshold they need. Those capable of reliably hitting the threshold take no risk, and usually make it. Those not capable of reliably hitting the threshold risk everything to make the threshold as often as possible.
Note that this equilibrium holds, although it may contain no one above the final threshold. If everyone aims for what they think is good-enough performance, aiming for less is almost worthless, and aiming for much more is mostly pointless, and threshold adjusts so that the expected number of threshold performances is very close to the number of slots.
More competition raises the threshold, forcing competitors to take on more risk, until everyone is using the same threshold strategy and success is purely proportional to skill. Thus, in a large pool, we once again have expanding the pool as a bad idea if it weakens average skill, even if search and participation costs for all are free.
In a small pool, the strongest candidates are ‘wasting’ some of their skill on less efficient outcomes beyond their best estimate of the threshold.
This ends up being similar to the challenge case, except that there is no inflection point where things suddenly get worse. You never expect to lose from expanding the pool while maintaining quality. Instead, things slowly get better as you waste less work at the top of the curve, so the value of adding more similar candidates quickly approaches zero.
The new intuition is, given low enough search costs, we should add equally strong potential candidates until we are confident everyone is taking risk, rather than stopping just short of causing stronger candidates to take risk. If participation is costly to you and/or the candidates, you should likely stop short of that point.
The key intuitive question to ask is, if a candidate was the type of person you want, would they be so far ahead of the game as to be obviously better than the current expected marginal winner? Would they be able to crush a much bigger pool, and thus be effectively wasting lots of effort? If and only if that’s true, there’s probably benefit to expanding your search, so you get more such people, and it’s a question of whether it is worth the cost.
The other strong intuition is that once your marginal applicant pool is lower in average quality than your average pool, that will always be a high cost, so focus on quality over quantity.
This suggests another course of action…
V. Multi-Stage Process
Our model tells us that average quality of winners is, given a large pool, a function of the average quality of our base pool.
But we have a huge advantage: This whole process is free.
Given that, it seems like we should be able to be a bit more clever and complex, and do better.
We can improve if we can get a pool of candidates that has a higher average quality than our original candidate pool, but which is large enough to get us into a similar equilibrium. Each candidate’s success is proportional to their skill level, so our average outcome improves.
We already have a selection process that does this. We know our winners will be on average better than our candidates. So why not use that to our advantage?
Suppose we did a multi-stage competition. Before, we would have had 10 applicants for 1 slot. Expanding that to 100 applicants won’t do us any good directly, because of risk taking. But running 10 competitions with 10 people each, then pitting those 10 winners against each other, will improve things for us.
By using this tactic multiple times, we can do quite a bit better. Weaker candidates will almost never survive multiple rounds.
What happened here?
We cheated. We forced candidates to take observable, uncorrelated risks in each different round. We destroyed the rule that risk taking is free and easy, and assumed that a lucky result in round 1 won’t help you in round 2.
If a low-skill person can permanently mimic in all ways a high-skill person, and we observe that success, they are high skill now! A worthy winner. If they can’t, then they fall back down to Earth on further observation. This should make clear why the idea of unlimited cheap and exactly controlled risk is profoundly bizarre. A test that works that way is a rather strange test.
So is a test that costs nothing to administer. You get what you pay for.
The risk is that risk-taking takes the form of ‘guess the right approach to the testing process’ and thus test scores are correlated without having to link back to skill.
This is definitely a thing.
During one all-day job interview, I made several fundamental interview-skill mistakes that hurt me in multiple sessions. If I had fixed those mistakes, I would have done much better all day, but would not have been much more skilled at what they were testing for. A more rigorous or multi-step process could have only done so much. To get better information, they would have had to add a different kind of test. That would risk introducing bad noise.
This seems typical of similar contests and testing methods designed to find strong candidates.
A more realistic model would introduce costs to participation in the search process, for all parties. You’d have another trade-off between having noise be correlated versus minimizing its size, making more rounds of analysis progressively less useful.
Adding more candidates to the pool now clearly is good at first and then turns increasingly negative.
VI. Pricing People Out
There are two realistic complications that can help us a lot.
The first is pricing people out. Entering a contest is rarely free. I have been fortunate that my last two job interviews were at Valve Software and Jane Street Capital. Both were exceptional companies looking for exceptional people, and I came away from both interviews feeling like I’d had a very fun and very educational experience, in addition to leveling up my interview skills. So those particular interviews felt free or better. But most are not.
Most are more like when I applied to colleges. Each additional college meant a bunch of extra work plus an application fee. Harvard does not want to admit a weak candidate. If we ignore the motivation to show that you have lots of applications, Harvard would prefer that weak candidates not apply. It wastes time, and there’s a non-zero chance one will gain admission by accident. If Harvard taxes applications, by requiring additional effort or raising the fee, they will drive weak applicants away and strengthen their pool, improving the final selections.
Harvard also does this by making Harvard hard. A sufficiently weak candidate should not want to go to Harvard, because they will predictably flunk out. Making Harvard harder, the way MIT is hard, would make their pool higher quality once word got out.
We can think of some forms of hazing, or other bad experiences for winners of competitions, partly as a way to discourage weak candidates from applying, and also partly as an additional test to drive them out.
Ideally we also reduce risk taken.
A candidate has uncertainly in how strong they are, and how much they would benefit from the prize. If being a stronger candidate is correlated with benefiting from winning, a correct strategy becomes to take less or no risk. If taking a big risk causes me to win when I would otherwise lose, I won a prize I don’t want. If taking a big risk causes me to lose, I lost a prize I did want. That pushes me heavily towards lowering my willingness to take risk, which in turn lowers the competition level and encourages me to take less risk still. Excellent.
VII. Taking Extra Risk is Hard
Avoiding risk is also hard.
In the real world, there is a ‘natural’ amount of risk in any activity. One is continuously offered options with varying risk levels.
Some of these choices are big, some small. Sometimes the risky play is ‘better’ in an expected value sense, sometimes worse.
True max-min strategies that avoid even minimal risks decline even small risks that would cancel out over time. This is expensive.
If one wants to maximize risk at all costs, one ends up doing the more risky thing every time and takes bad gambles. This is also expensive.
It is a hard problem to get the best outcome given one’s desired level of risk, or to maximize the chance of exceeding some performance threshold, even with no opponent. In games with an opponent who wants to beat you and thus has the opposite incentives of yours (think football) it gets harder still. Real world performances are notoriously terrible.
There are two basic types of situations with respect to risk.
Type one is where adding risk is expensive. There is a natural best route to work or line of play. There are other strategies that overall are worse, but have bigger upside, such as taking on particular downside tail risks in exchange for tiny payoffs, or hoping for a lucky result. In the driving example, one might take an on average slower route that has variable amounts of traffic, or one might drive faster and risk an accident or speeding ticket.
Available risk is limited. If I am two hours away by car, I might be able to do something reckless and maybe get there in an hour and forty five minutes, but if I have to get there in an hour, it’s not going to happen.
I can hope to ever overcome only a limited skill barrier. If we are racing in the Indianapolis 500, I might try to win the race by skipping a pit stop, or passing more aggressively to make up ground, or choosing a car that is slightly faster but has more engine trouble. But if my car combined with my driving skill is substantially slower than yours (where substantially means a minute over several hours) and your car doesn’t crash or die, I will never beat you.
If I had taken the math Olympiad exam (the USAMO) another hundred times, I might have gotten a non-zero score sometimes, but I was never getting onto the team. Period.
In these situations, reducing risk beyond the ‘natural’ level may not even be possible. If it is, it will be increasingly expensive.
Type two is where giant risks are the default, then sacrifices are made to contain those risks. Gamblers who do not pay attention to risk will always go broke. To be a winning gambler, one can either be lucky and retain large risk, or one can be skilled and pay a lot of attention to containing risk. In the long term, containing risk, including containing risk by ceasing to play at all, is the only option.
Competitors in type two situations must be evaluated explicitly on their risk management, or on very long term results, or any evaluation is worthless. If you are testing for good gamblers and only have one day, you pay some attention to results but more attention to the logic behind choices and sizing. Tests that do otherwise get essentially random results, and follow the pattern where reducing the applicant pool improves the quality of the winners.
Another note is that the risks competitors take can be correlated across competitors in many situations. If you need a sufficiently high rank rather than a high raw score, those who take risks should seek to take uncorrelated risks. Thus, in stock market or gambling competitions, the primary skill often is in doing something no one else would think to do, rather than in picking a high expected value choice. Sometimes that’s what real risk means.
VIII. Central Responses
There are also four additional responses by those running the competition, that are worth considering.
The first response is to observe a competitor’s level of risk taking and test optimization, and penalize too much (or too little). This is often quite easy. Everyone knows what a safe answer to ‘what is your greatest weakness’ looks like, bet size in simulations is transparent, and so on. If you respond to things going badly early on with taking a lot of risk, rather than being responsible, will you do that with the company’s money?
A good admissions officer at a college mostly knows instantly which essays had professional help and which resumes are based on statistical analysis, versus who lived their best life and then applied to college.
A good competition design gives you the opportunity to measure these considerations.
Such contests should be anti-inductive, if done right, with the really sneaky players playing on higher meta levels. Like everything else.
The second response is to vary the number of winners based on how well competitors do. This is the default.
If I interview three job applicants and all of them show up hung over, I need to be pretty desperate to take the one who was less hung over, rather than call in more candidates tomorrow. If I find three great candidates for one job, I’ll do my best to find ways to hire all three.
Another variation is that I have an insider I know well as the default winner, and the application process is to see if I can do better than that, and to keep the insider and the company honest, so again it’s mostly about crossing a bar.
The third response is that often there isn’t even a ‘batch’ of applications. There is only a series of permanent yes/no decisions until the position is filled. This is the classic problem of finding a spouse or a secretary, where you can’t easily go back once you reject someone. Once you have a sense of the distribution of options, you’re effectively looking for ‘good enough’ at every step, and that requirement doesn’t move much until time starts running out.
Thus, most contests that care mostly about finding a worthy winner are closer to threshold requirements than they look. This makes it very difficult to create a concession equilibrium. If you show up and aren’t good enough to beat continuing to search, your chances are very, very bad. If you show up and are are good enough to beat continuing to search, your chances are very good. The right strategy becomes either to aim at this threshold, or if the field is large you might need to aim higher. You can never keep the field small enough to keep the low-skill players honest.
The fourth response is to punish sufficiently poor performance. This can be as mild as in-the-moment social embarrassment – Simon mocking aspirants in American Idol. It can be as serious as ‘you’re fired,’ either from the same company (you revealed you’re not good enough for your current job, or your upside is limited), or from another company (how dare you try to jump ship!). In fiction a failed application can be lethal. Even mild retaliation is very effective in improving average quality (and limiting the size) of the talent pool.
IX. Practical Conclusions
We don’t purely want the best person for the job. We want a selection process that balances search costs, for all concerned, with finding the best person and perhaps getting your applicants to improve their skill.
A weaker version of the paper’s core take-away heuristic seems to hold up under more analysis: There is a limit to how far expanding a search helps you at all, even before costs.
Rule 1: Pool quality on the margin usually matters more than quantity.
Bad applicants that can make it through are more bad than they appear. Expanding the pool’s quantity at the expense of average quality, once your supply of candidates isn’t woefully inadequate, is usually a bad move.
Rule 2: Once your application pool probably includes enough identifiable top-quality candidates to fill all your slots, up to your ability to differentiate, stop looking.
A larger pool will make your search more expensive and difficult for both you and them, add more regret because choices are bad, and won’t make you more likely to choose wisely.
Note that this is a later stopping point than the paper recommends. The paper says you should stop before you fill all your slots, such that weak applicants are encouraged not to represent themselves as strong candidates.
Also note that this rule has two additional requirements. It requires the good candidates be identifiable, since if some of them will blow it or you’ll blow noticing them, that doesn’t help you. It also requires that there not be outliers waiting to be discovered, that you would recognize if you saw them.
Another, similar heuristic that is also good is, make the competition just intense enough that worthy candidates are worried they won’t get the job. Then stop.
Rule 3: Weak candidates must either be driven away, or rewarded for revealing themselves. If weak candidates can successfully fake being strong, it is worth a lot to ensure that this strategy is punished.
Good punishments include application fees, giving up other opportunities or jobs, long or stressful competitions, and punishments for failure ranging from mild in-the-room social disapproval or being made to feel dumb, up to major retaliation.
Another great punishment is to give less rewards to success if it is by a low skilled person. If their prize is something they can’t use – they’ll flunk out, or get fired quickly, or similar – then they will be less inclined to apply.
Reward for participation is probability of success times reward for success, while cost is mostly fixed. Tilt this enough and your bad-applicant problem clears up.
Fail to tilt this enough, and you have a big lemon problem on multiple levels. Weak competitors will choose your competition over others, giving strong applicants less reason to bother both in terms of chance of winning, and desire to win. Who wants to win only to be among a bunch of fakers who got lucky? That’s no fun and it’s no good for your reputation either.
It will be difficult to punish weak candidates for faking being strong versus punishing them in general. But if you can do it, that’s great.
The flip side is that we can reward them for being honest. That will often be easier.
Preventing a rebellion of the less skilled is a constraint on mechanism design. We must either appease them, or wipe them out.
Rule 4: Sufficiently hard, high stakes competitions that are vulnerable to gaming and/or resource investment are highly toxic resource monsters.
This is getting away from the paper’s points, since the paper doesn’t deal with resource costs to participation or search, but it seems quite important.
In some cases, we want these highly toxic resource monsters. We like that every member of area sports team puts the rest of their life mostly on hold and focuses on winning sporting events. The test is exactly what we want them to excel at. We also get to use the trick of testing them in discrete steps, via different games and portions of games, to prevent ‘risk’ from playing too much of a factor.
In most cases, where the match between test preparation, successful test strategies and desired skills is not so good, this highly toxic resource monster is very, very bad.
Consider school, or more generally childhood. The more we reward good performance on a test, and punish failure, the more resources are eaten alive by the test. In the extreme, all of most child’s experiences and resources, and even those of their parents, become eaten. From discussions I’ve had, much of high school in China has something remarkably close to this, as everything is dropped for years to cram for a life-changing college entrance exam.
Rule 5: Rewards must be able to step outside of a strict scoring mechanism.
Any scoring mechanism is vulnerable to gaming and to risk taking, and to Goodhart’s Law. To avoid everyone’s motivation, potentially their entire life and being, being subverted, we need to be rewarding and punishing from the outside looking in on what is happening. This has to carry enough weight to be competitive with the prizes themselves.
Consider this metaphor.
If the real value of many journeys is the friends you made along the way, that can be true in both directions. Often one’s friends, experiences and lessons end up dwarfing in importance the prize or motivation one started out with; frequently we need a McGuffin and restrictions that breed creativity and focus to allow coordination, more than any prize.
It also works the other way. The value of your friends can be that they motivate and help you to be worthy of friendship, to do and accomplish things. The reason we took the journey the right way was so that we would make friends along it. This prevents us from falling to Goodhart’s Law. We don’t narrow in on checking off a box. Even in a pure competition, like a Magic tournament, we know the style points matter, and we know that it matters whether we think the style points matter, and so on.
The existence of the social, of various levels and layers, the ability to step outside the game, and the worry about unknown unknowns, is what guards systems from breakdown under the pressure of metrics. Given any utility function we know about, however well designed, and sufficient optimization pressure, things end badly. You need to preserve the value of unknown unknowns.
This leads us to:
Rule 6: Too much knowledge by potential competitors can be very bad.
The more competitors do the ‘natural’ thing, that maximizes their expected output, the better off we usually are. The less they know about how they are being evaluated, on what levels, with what threshold of success, the less they can game the system, and the less success depends on gaming skill or luck.
All the truly perverse outcomes came from scenarios where competitors knew they were desperadoes, and taking huge risks was not actually risky for them.
Having a high threshold is only bad if competitors know about it. If they don’t know, it can’t hurt you. If they suspect a high threshold, but they don’t know, that mitigates a lot of the damage. In many cases, the competitor is better served by playing to succeed in the worlds where the threshold is low, and accept losing when the threshold is unexpectedly high, which means doing exactly what you want. More uncertainty also makes the choices of others less certain, which makes situations harder to game effectively.
Power hides information. Power does not reveal its intentions. This is known, and the dynamics explored here are part of why. You want people optimizing for things you won’t even be aware of, or don’t care about, but which they think you might be aware of and care about. You want to avoid them trying too hard to game the things you do look at, which would also be bad. You make those in your power worry at every step that if they try anything, or fail in any way, it could be what costs them. You cause people to want to curry favor. You also allow yourself to alter the results, if they’re about to come out ‘wrong’. The more you reveal about how you work, the less power you have. In this case, the power to find worthy winners.
This is in addition to the fact that some considerations that matter are not legally allowed to be considered, and that lawsuits might fly, and other reasons why decision makers ensure that no one knows what they were thinking.
Thus we must work even harder to reward those who explain themselves and thereby help others, and who realize that the key hard thing is, as Hagbard Celine reminds us, to avoid power.
But still get things done.
19 comments
Comments sorted by top scores.
comment by Raemon · 2019-01-20T23:51:13.532Z · LW(p) · GW(p)
Note: I got about a third through this, and... had a strong sense that this was about something important that was worth my time understanding, but something about the description/examples made that hard to do.
(This does leave the article in a state where, I predict, to understand it, I'd have to invest effort, and the invested effort would in fact improve my understanding. But this feels more accidental than optimal)
I think part of it is just that the names are fairly unintuitive. Concession equilibria and Challenge Equilibria don't neatly map into whatever they're supposed to be about in my head. I have an easier time understanding jargon if I know why a name was chosen.
Replies from: Zvi, Raemon, tenthkrige↑ comment by Zvi · 2019-01-23T14:17:12.589Z · LW(p) · GW(p)
In that particular case, I would have chosen different names that likely would have resonated better, but felt it was important not to change the paper's chosen labels, even though they seemed not great. That might have been an error.
Their explanation is that the question is, will the weaker candidates concede that they are weaker than strong ones and let the strong ones all win, or will they challenge the stronger candidates.
Suggestions for other ways to make this more clear are appreciated. I'd like to be able to write things like this in a way that people actually read and benefit from.
Replies from: Raemon↑ comment by Raemon · 2019-01-23T18:40:21.685Z · LW(p) · GW(p)
I think simply explaining that in the OP would have helped
Replies from: Benito↑ comment by Ben Pace (Benito) · 2019-02-27T11:28:33.236Z · LW(p) · GW(p)
Datapoint: I got the point about challenge equilibria being the place where everyone has to start fighting and taking risks. However I thought that 'concession' referred to the employers making concessions to weaker candidates, by hiring some. I suppose the paper's explanation makes more sense.
↑ comment by tenthkrige · 2019-01-22T13:37:07.370Z · LW(p) · GW(p)
I almost gave up halfway through, for much the same reasons, but this somehow felt important, the way some sequences/codex posts felt important at the time, so I powered through. I definitely will need a second pass on some of the large inferential steps, but overall this felt long-term valuable.
comment by habryka (habryka4) · 2019-02-22T20:37:06.386Z · LW(p) · GW(p)
Promoted to curated: I've applied the ideas in this post to a variety of domains since I first read it, and I think it was quite useful in a lot of them (examples of questions I was thinking about: "How much more progress should we expect in Science given that at least 10x more resources are available for recruiting scientists?" and "How much does the size of the EA and Rationality community determine the quality of people working at organizations in the community?").
I do think I had to read this post at least twice to really grasp any of the core points, and am still struggling with some of them. I think this was partially the result of trying to translate a mathematical econ paper into a post without any equations, which is always a really big challenge, but I also think I could have benefitted from a longer initial section that just summarized the econ paper, and then a separate section that commented on it. As it stood, I think I ended up somewhat confused about which points were covered in the econ paper, and which ones were your points.
But overall, I think this post changed my mind on some important ideas, which is one of the most valuable things a post can do. Thanks a lot for writing it.
comment by Zvi · 2020-12-17T21:10:37.636Z · LW(p) · GW(p)
So first off... I'd forgotten this existed. That's obviously a negative indication in terms of how much it guided my thinking over the past two years! It also meant I got to see it with fresh eyes two years later.
I think the central point the post thinks it is making is that, extending on the original econ paper, search effectiveness can rapidly become impossible to improve by expanding size of one's search, if those you are searching understand they are in competition. To improve results further, one must instead improve average quality in the search pool and/or improve one's search method, which can be done in various ways, especially by making it unattractive to compete without being high quality, to get into a place where it is in everyone's interest to play things straight.
Also, one more example of having a fixed and known decision algorithm leading to others engaging in destructive behaviors. That's one of several points here that foreshadow the post on Blackmail, which can be considered almost the proposed future post ("Possible Bad Outcomes are Very Bad").
I get the feeling, now, that in addition to all that, this is 'burying the lede' or at least leaving out a large aspect of the interesting things this post is pointing towards. Instead of modeling how one might search, perhaps a more impactful question is how one behaves when one expects life to put you into a series of such competitions, and what it means then to "take risk / embrace variance" in that context, and what conditions make one willing to play the meritocratic game. And how these dynamics can make people increasingly move towards cheating slash investing in deception or rent seeking, rather than seeking to provide value and show they can provide value, because they lose the ability to win such search competitions, especially if the mechanisms to discover and enter become corrupted, or political insiders reliably get the inside track. A much broader critique is implied that was never explored.
I also noticed several people saying the post was hard to grok. As the author, I found it easy to reconnect with and re-grok, but that isn't much evidence. Long post is already long, and a lot of the statements seem like requests to go slower and make this longer, rather than to tighten and improve quality, although others seemed to be terminology issues one could fix.
comment by lionhearted (Sebastian Marshall) (lionhearted) · 2019-01-22T04:08:15.636Z · LW(p) · GW(p)
Much ado about nothing, I think this is the most quotable thing you've ever written.
Appease or wipe out, perverse desperadoes, etc etc.
Anyways — exceptional piece. Feels like classical Zvi deep analysis as applied to high-leverage non-constructed scenarios. Or rather, how to turn a draft into constructed, without participants knowing. One marvels over what type of win rate would be possible if this can be successfully executed....
comment by waveman · 2019-02-14T07:17:04.401Z · LW(p) · GW(p)
Finance is full of such hidden risks. Start a fund, take insane risks, maybe generate outsize returns => profit by growing funds under management and take a % of that.
If not, try, try again. Google "incubator funds". Taleb's Fooled by Randomness has many examples.
But if you are taking risks, won't people see it and shun you? Probably not. It is very hard to see risk after the event. It is not too hard to "stuff the risk into the tails". There are even conslutants who will help you do this.
Even without cheating, when a test is very stringent, then an alarming fraction of the apparent top performers may have just had a lucky day.
comment by BurntVictory · 2019-01-22T18:59:11.194Z · LW(p) · GW(p)
I'll echo the other commenters in saying this was interesting and valuable, but also (perhaps necessarily) left me to cross some significant inferential gaps. The biggest for me were in going from game-descriptions to equilibria. Maybe this is just a thing that can't be made intuitive to people who haven't solved it out? But I think that, e.g., graphs of the kinds of distributions you get in different cases would have helped me, at least.
I also had to think for a bit about what assumptions you were making here:
A more rigorous or multi-step process could have only done so much. To get better information, they would have had to add a different kind of test. That would risk introducing bad noise.
A very naive model says additional tests -> uncorrelated noise -> less noise in the average.
More realistically, we can assume that some dimensions of quality are easier to Goodhart than others, and you don't know which are which beforehand. But then, how do you know your initial choice of test isn't Goodhart-y? And even if the Goodhart noise is much larger than the true variation in skill, it seems like you can aggregate scores in a way that would allow you to make use of the information from the different tests without being bamboozled. (Depending on your use-case, you could take the average of a concave function of the scores, or use quantiles, or take the min score, etc.)
In reality, though, you usually have some idea what dimensions are important for the job. Maybe it's something like PCA, with the noise/signal ratio of dimensions decreasing as you go down the list of components. Then that decrease, plus marginal costs of more tests, means that there is some natural stopping point. I guess that makes sense, but it took a bit for me to get there. Is that what you were thinking?
comment by cousin_it · 2019-01-21T17:09:56.045Z · LW(p) · GW(p)
Yeah. To be our best selves, we need the right amount of challenge, not too little and not too much. I would also add that we need the right fluctuation of challenge: it shouldn't be too steady over time, or too rare and spiky. Otherwise you get distortions in behavior: cramming for an exam, or cheating, or breaking down from stress. Moreover, the optimal amount and fluctuation of challenge is different for every person and every task. I noticed that when trying to teach people stuff, and then started applying it to myself as well.
comment by jacobjacob · 2019-02-27T11:50:32.523Z · LW(p) · GW(p)
For section III. it would be really helpful to concretely work through what happens in the examples of divorce, nuclear war, government default, etc. What's a plausible thought process of the agents involved?
My current model is something like "my marriage is worse than I find tolerable, so I have nothing to loose. Now that divorce is legal, I might as well gamble my savings in the casino. If I win we could move to a better home and maybe save the relationship, if I lose we'll get divorced."
People who have nothing to lose start taking risks which fill up the merely possibly bad outcomes until they start mattering.
comment by jmh · 2019-02-25T17:54:35.048Z · LW(p) · GW(p)
First, I did not get though the entire post. That said, some thoughts that occurred to me.
- Under what conditions is does this generalize? I was trying to apply it to my world. If I am hiring part of the problem is not about hard skills but soft skill -- and those will differ a bit based on what team the new person will be part of.
- Good game theory always wins seems to have directed at the candidates but what happens when the game master (lack of a better term) has poor game theory or doesn't think such behavior good? Again, from a corporate hiring standpoint that might be a real situation. In terms of consumers shopping around that might also apply (the average advertiser is a better game theorists than the average consumer). Do the conclusions still hold here?
- Closely related to the first bullet, just what market settings were considered for the analysis.
- I recall an old paper that asked if duopoly was more competitive than the standard atomistic competition in the Econ 101 pure competition model. That was largely driven by search costs and asymmetric information problems. Would theory here be complementary?
My gut reaction is there is some value and truth here but that it should not be taken too seriously. Consider it an area of consideration and element of a solution rather than a solution to any problem of getting the best out of the messy social institutions that mediate our activities and greatly influence the collective/aggregate results.
Replies from: jmhcomment by Chris_Leong · 2021-01-18T05:33:13.685Z · LW(p) · GW(p)
I would love to see a more concise version of this.
comment by Ben Pace (Benito) · 2021-01-13T07:29:08.135Z · LW(p) · GW(p)
Zvi wrote a two whole posts on perfect/imperfect competition and how more competition can be bad. However, this is the only post that has really stuck with me in teaching me how increased competition can be worse overall for the system, and helped me appreciate Moloch in more detail. I expect to vote for this post around +4 or +5.
As with one or two others by Zvi, I think it's a touch longer than it needs to be, and can be made more concise.
comment by habryka (habryka4) · 2020-12-15T05:51:09.523Z · LW(p) · GW(p)
Most of the things I said in my curation notice still hold. I think I ended up thinking about this post slightly less than I had expected at the time, but definitely still enough to be worth a thorough revisit and potential inclusion in the 2019 book.
comment by Ben Pace (Benito) · 2020-12-11T05:26:23.755Z · LW(p) · GW(p)
This was a really interesting analysis. I'd like to see it reviewed.