Being the (Pareto) Best in the World

post by johnswentworth · 2019-06-24T18:36:45.929Z · score: 192 (92 votes) · LW · GW · 36 comments


  Elbow Room
  Problem Density

The generalized efficient markets (GEM) principle says, roughly, that things which would give you a big windfall of money and/or status, will not be easy. If such an opportunity were available, someone else would have already taken it. You will never find a $100 bill on the floor of Grand Central Station at rush hour, because someone would have picked it up already.

One way to circumvent GEM is to be the best in the world at some relevant skill. A superhuman with hawk-like eyesight and the speed of the Flash might very well be able to snag $100 bills off the floor of Grand Central. More realistically, even though financial markets are the ur-example of efficiency, a handful of firms do make impressive amounts of money by being faster than anyone else in their market. I’m unlikely to ever find a proof of the Riemann Hypothesis, but Terry Tao might. Etc.

But being the best in the world, in a sense sufficient to circumvent GEM, is not as hard as it might seem at first glance (though that doesn’t exactly make it easy). The trick is to exploit dimensionality.

Consider: becoming one of the world’s top experts in proteomics is hard. Becoming one of the world’s top experts in macroeconomic modelling is hard. But how hard is it to become sufficiently expert in proteomics and macroeconomic modelling that nobody is better than you at both simultaneously? In other words, how hard is it to reach the Pareto frontier?

Having reached that Pareto frontier, you will have circumvented the GEM: you will be the single best-qualified person in the world for (some) problems which apply macroeconomic modelling to proteomic data. You will have a realistic shot at a big money/status windfall, with relatively little effort.

(Obviously we’re oversimplifying a lot by putting things like “macroeconomic modelling skill” on a single axis, and breaking it out onto multiple axes would strengthen the main point of this post. On the other hand, it would complicate the explanation; I’m keeping it simple for now.)

Let’s dig into a few details of this approach…

Elbow Room

There are many table tennis players, but only one best player in the world. This is a side effect of ranking people on one dimension: there’s only going to be one point furthest to the right (absent a tie).

Pareto optimality pushes us into more dimensions. There’s only one best table tennis player, and only one best 100-meter sprinter, but there can be an unlimited number of Pareto-optimal table tennis/sprinters.

Problem is, for GEM purposes, elbow room matters. Maybe I’m the on the pareto frontier of Bayesian statistics and gerontology, but if there’s one person just little bit better at statistics and worse at gerontology than me, and another person just a little bit better at gerontology and worse at statistics, then GEM only gives me the advantage over a tiny little chunk of the skill-space.

This brings up another aspect…

Problem Density

Claiming a spot on a Pareto frontier gives you some chunk of the skill-space to call your own. But that’s only useful to the extent that your territory contains useful problems.

Two pieces factor in here. First, how large a territory can you claim? This is about elbow room, as in the diagram above. Second, what’s the density of useful problems within this region of skill-space? The table tennis/sprinting space doesn’t have a whole lot going on. Statistics and gerontology sounds more promising. Cryptography and monetary economics is probably a particularly rich Pareto frontier these days. (And of course, we don’t need to stop at two dimensions - but we’re going to stop there in this post in order to keep things simple.)


One problem with this whole GEM-vs-Pareto concept: if chasing a Pareto frontier makes it easier to circumvent GEM and gain a big windfall, then why doesn’t everyone chase a Pareto frontier? Apply GEM to the entire system: why haven’t people already picked up the opportunities lying on all these Pareto frontiers?

Answer: dimensionality. If there’s 100 different specialties, then there’s only 100 people who are the best within their specialty. But there’s 10k pairs of specialties (e.g. statistics/gerontology), 1M triples (e.g. statistics/gerontology/macroeconomics), and something like 10^30 combinations of specialties. And each of those pareto frontiers has room for more than one person, even allowing for elbow room. Even if only a small fraction of those combinations are useful, there’s still a lot of space to stake out a territory.

And to a large extent, people do pursue those frontiers. It’s no secret that an academic can easily find fertile fields by working with someone in a different department. “Interdisciplinary” work has a reputation for being unusually high-yield. Similarly, carrying scientific work from lab to market has a reputation for high yields. Thanks to the “curse” of dimensionality, these goldmines are not in any danger of exhausting.


Comments sorted by top scores.

comment by robertskmiles · 2019-07-02T13:20:45.292Z · score: 47 (29 votes) · LW(p) · GW(p)

I thought about this a lot when considering my work. I'm very far from the best Youtuber, and very far from the most knowledgeable person on AI Safety, but nobody else is trying to combine those things, so I'm probably the best AI Safety Youtuber.

The interaction with comparative advantage is interesting though. I can think of several people off the top of my head who are strictly better than me at both AI Safety and public speaking/communication, who I'm confident could, if they wanted to, do my job better than I can. But they don't want to, because they're busy doing other (probably more important) things. It's not the case that a person on the pareto frontier eats up everything in their chunk of skill space - in practice people can only do a few things at a time. So even if you aren't on the frontier, you're ok as long as the ratio of problem density to 'elbow room' is good enough. You can be the best person in the world to tackle a particular problem, not because nobody else could do it better, but because everyone better is busy right now.

comment by Wei_Dai · 2019-06-26T00:01:46.525Z · score: 25 (13 votes) · LW(p) · GW(p)

It seems like a natural next step here is to talk about comparative advantage (whereas "being the best in the world" seems more analogous to absolute advantage), but I'm not sure how to think about comparative advantage in the "dimensionality" setting. (To be fair, comparative advantage seems hard to think about in general [LW · GW].) So I'll just throw this out and see if anyone else has any ideas.

comment by Dr_Manhattan · 2019-06-25T12:32:24.217Z · score: 18 (8 votes) · LW(p) · GW(p)

Is this much different from Scott Adams' advice


if you want something extraordinary, you have two paths:
1. Become the best at one specific thing.
2. Become very good (top 25%) at two or more things.


comment by johnswentworth · 2019-06-25T17:59:06.589Z · score: 7 (5 votes) · LW(p) · GW(p)

I doubt that top 25% is usually sufficient to be best-in-the-world, which is what you need to circumvent GEM.

comment by dxu · 2019-06-25T18:31:16.098Z · score: 14 (9 votes) · LW(p) · GW(p)

Being at or above the 75th-percentile mark corresponds to 2 bits of information. About 32.7 bits of information are required to specify a single person out of a population of 7 billion; even if we truncate that to 32 bits, you'd need to be in the top 25% at 16 different things to be considered "best in the world" in that one particular chunk of skill-space (assuming that the skills you choose aren't correlated). And then you have to consider the problem density in that chunk--how likely is it, realistically speaking, that there are major problems that (a) require the intersection of 16 different domains, but (b) require only a mediocre grasp of all 16 of those domains?

comment by HomarusSimpson · 2019-07-27T16:32:10.282Z · score: 11 (4 votes) · LW(p) · GW(p)
how likely is it, realistically speaking, that there are major problems that (a) require the intersection of 16 different domains, but (b) require only a mediocre grasp of all 16 of those domains?

Politics. Corporate CEO. Talk show host. All potentially interpretable as an ability to BS successfully to people who don't know sh*t from shinola

comment by johnswentworth · 2019-06-25T19:16:51.152Z · score: 4 (3 votes) · LW(p) · GW(p)

Two gotchas to bear in mind there:

  • High g-factor can get you to the top 25% or even top 10% of the population in an awful lot areas all by itself. If only 5% of the population has ever formally studied and practiced chess strategy, then 95th percentile g-factor may be enough to hit 90th percentile of chess skill without any formal study at all (though the exact numbers depend on correlation of g-factor with formal study). Problem is, g-factor only counts once; we don't want to double-count it by saying e.g. "assume top 10% in physics and philosophy are independent".
  • Specialist expertise is mostly strongly anticorrelated. Most people pick one specialized career path, and even the people who "generalize" don't usually tackle more than 2 or 3 areas at a deep level - our lives are not that long.

Put those two together, and it means that above-average-but-below-expert skill levels mostly won't compound, but expert skill levels in multiple fields can yield a lot more bits than the independence calculation suggests - e.g. if almost nobody studies both topology and anthropology.

I do think the "how many bits does this get me?" approach is a useful way to think about it, but I'm not yet sure what set of assumptions is reasonable for quantification.

comment by Gunnar_Zarncke · 2019-08-01T10:07:36.423Z · score: 2 (1 votes) · LW(p) · GW(p)

Don't trust any numbers Scott Adams gives. They are just directional. And they include self-perception. So someone who is actually 95th percentile may *feel* like he is just 75th.

Also he talks a lot about creating a stack of multiple skills. And stack doesn't mean just having the skills but combining them in a productive way. Like robertskmiles: Being a YouTuber and being interested in AI Safety doesn't automatically make you an AI Safety
YouTuber. You have to do some actual work for that. And it doesn't hurt to e.g. know enough economics to do A/B tests.

comment by bloop · 2019-08-20T21:32:43.369Z · score: 17 (8 votes) · LW(p) · GW(p)

I agree with many of the premises here, and I like this as a way of conceptualising skillsets, but I'm not sure I find it all that useful.

The main omission in this essay, to my mind, is any mention of skill interdependence. If you're one of the first people to discover the fertile gerontology/statistics niche, then you might get your 15 minutes of fame, and your early adopter status might give you a comparative advantage. But as soon as it becomes commonly recognised how fertile the ground is in this niche, there'll be tons of people right behind you chasing the low-hanging fruit.

Because of this, training programmes in gerontology start making statistics courses more robust and mandatory; research journals start publishing more statistics-heavy papers; labs start doing more statistics-heavy work; it becomes harder to get promoted, or get a foot in the door, without some familiarity with statistics. And so on. Everyone wants to be modern and interdisciplinary. But the inevitable result of this is that statistics eventually becomes a basic, fundamental prerequisite for calling yourself a gerontologist. The skills "gerontology" and "statistics" become strongly correlated. And now, suddenly, your two-dimensional picture has collapsed to a one-dimensional picture.

This may be a gross oversimplification, but I think the point approximately stands. As soon as a niche with high problem density is discovered, the density of skilled problem-solvers in that area quickly rises to meet it. You can only stand out if (a) you're an early adopter, (b) you have skills that others in your area would find it prohibitively difficult to replicate, or (c) your niche isn't too fruitful - i.e. nobody is interested in trying to rake in a share of the profit.

comment by Raemon · 2019-08-20T21:27:16.941Z · score: 3 (1 votes) · LW(p) · GW(p)
This may be a gross oversimplification, but I think the point approximately stands. As soon as a niche with high problem density is discovered, the density of skilled problem-solvers in that area quickly rises to meet it. You can only stand out if (a) you're an early adopter, (b) you have skills that others in your area would find it prohibitively difficult to replicate, or (c) your niche isn't too fruitful - i.e. nobody is interested in trying to rake in a share of the profit.

This was a good crystallization of this point, thanks.

comment by Samuel Hapák (hleumas) · 2019-06-24T20:59:50.397Z · score: 12 (9 votes) · LW(p) · GW(p)

Something is missing in this explanation. Why isn’t everyone super rich?

comment by johnswentworth · 2019-06-24T21:57:50.811Z · score: 17 (11 votes) · LW(p) · GW(p)

Multiple reasons:

  • First, reaching a useful Pareto frontier still isn't easy. For the sort of examples in the post, we're talking about effort equivalent to two or three separate PhD's, plus enough work in the relevant fields to master them. You'd have to clear a certain bar for intelligence and diligence and financial slack just for that to be an option.
  • Second, "super rich" isn't quite the right metric. Academics usually aren't measuring their success in dollars, for instance, and status is unfortunately more zero-sum than wealth. Same for lots of people in government or nonprofit roles. That said, there are clearly an awful lot of people not picking this fruit.
  • Third, this whole strategy inherently involves trailblazing. You have to do something which nobody has done before - that's kinda the point. That means there's not going to be an established career path, an established ladder to climb. It also means there's an inherent element of risk: it's entirely possible that your particular frontier won't have anything useful on it.

I think the first and third are the main factors. There's a high barrier just to try this sort of strategy, and even once you pay that toll, there's a high risk of finding nothing. On top of that, you inherently need to beat your own path; there's no set formula for which combinations of things will work (otherwise those combinations would already be exhausted). Put those two together, and it's not something that most people will pursue.

To make it a bit more concrete: suppose you've just wrapped up a PhD in monetary economics from a decent school. You're neither rich nor famous, but you've got a pretty comfortable life ahead of you - you can easily get a good-paying job in industry, or you can write a few grant applications and stay in academia. Are you really going to walk away from a comfortable life and restart from square zero in proteomics, just to have a chance at a major breakthrough? For Pareto frontiers involving even more specialties, will you walk away from a comfortable life and restart from square zero multiple times?

comment by jmh · 2019-06-25T13:15:37.774Z · score: 2 (2 votes) · LW(p) · GW(p)

" Third, this whole strategy inherently involves trailblazing. You have to do something which nobody has done before - that's kinda the point. "

But more than that, you need to blaze a new trail that still fits in with the needs of the world around you. A new trail to the cliff no one wants to be on doesn't get you much but a rather long fall I think ;-)

But in general I like the observation you've made.

comment by HomarusSimpson · 2019-07-27T12:48:20.301Z · score: 1 (1 votes) · LW(p) · GW(p)
status is unfortunately more zero-sum than wealth

Status is 100% zero sum I'd say.

comment by Pattern · 2019-07-27T16:05:40.530Z · score: 9 (3 votes) · LW(p) · GW(p)

How do we know it isn't negative sum?

comment by mr-hire · 2019-07-27T19:48:38.281Z · score: 2 (1 votes) · LW(p) · GW(p)

What makes you say that? I'd say it's about 50/50

comment by Raemon · 2019-07-27T20:11:51.745Z · score: 14 (5 votes) · LW(p) · GW(p)

I think different people are just using status to mean different things. Some people think it obviously means "the literal pecking order", and others mean something more like "how much respect people have." Something something prestige vs dominance hierarchies?

Dominance is more zero sum, and plausibly has to be 100% although I haven't thought it through in full.

comment by mr-hire · 2019-07-27T20:20:56.416Z · score: 4 (2 votes) · LW(p) · GW(p)

I think any take of what people mean by "status" has to include both prestige and dominance, respect and pecking order. Even a dominance hierarchy, communities are founded on the middle being illegible, which allows for non-zero-sum dynamics.

This is why I think it's about 50/50, but of course if you think of only literal pecking order and only the bits that are legible, I could see that bit being 0 sum. IE, there's usually only one leader.

comment by romeostevensit · 2019-06-25T05:16:50.918Z · score: 4 (2 votes) · LW(p) · GW(p)

Yeah, if measuring along multiple dimensions works well, why isn't everyone in the right tail of this single dimension?

comment by Dagon · 2019-06-24T21:13:35.446Z · score: 3 (4 votes) · LW(p) · GW(p)

One answer is that everyone _IS_ super-rich, compared to any median or average in history.

But also missing is scalability of opportunity and size of market for a given point on a price/performance curve. It's worth noting that the best table-tennis player in the world makes NOTHING if the second- through billionth-best don't play against them. Also a whole lot of frontiers in that multi-dimensional space have so much demand that the best, second-best, and billionth-best are all at full capacity, and there remains money for the billion and one-th best to make some money at it.

comment by Zack_M_Davis · 2019-06-25T05:51:21.936Z · score: 10 (7 votes) · LW(p) · GW(p)

Even if only a small fraction of those combinations are useful, there's still a lot of space to stake out a territory. [...] Thanks to the "curse" of dimensionality, these goldmines are not in any danger of exhausting.

A blessing on the supply side is still a curse on the demand side. A lot of empty hyperspace for you to be the closest expert in, just means that when there's a problem at a precise intersection of statistical-gerontological-macroeconomic-chemistry, the nearest expert might be far away.

Maybe think about this in the context of seeking a romantic partner: as you add more independent traits to your wishlist, your pool of potential matches goes down exponentially. (And God help you if some of your desired traits are anticorrelated.) Suddenly being alone in a high-dimensional space feels less comforting!

comment by steve2152 · 2020-04-15T20:36:10.751Z · score: 8 (4 votes) · LW(p) · GW(p)

9 months later, and I still think about this post regularly. (Mostly as a weapon in my never-ending battle against underconfidence.) Thanks!

comment by bloop · 2019-08-20T15:17:05.590Z · score: 6 (5 votes) · LW(p) · GW(p)

I hope you don't mind me leaving a second comment, because it's kind of orthogonal to the first.

It’s no secret that an academic can easily find fertile fields by working with someone in a different department.

(Emphasis mine.) Speaking as an academic, I think this is far from true. Interdisciplinarity is a very popular buzzword, and we're all told to strive for it, but the vast majority of us don't find fertile grounds for cross-curricular collaboration when we try. It's possible on the overlap of mathematics and physics, or computer science and statistics, or literature and history, or geography and sociology, granted. But that doesn't mean that any mathematician can work with any physicist; most of the time, it means simply that applied mathematicians who work on physical questions can work with theoretical physicists with a strong mathematical background. I'm inclined to say that, just because work like this transcends the boundaries of your department (or your funding body, or...), doesn't necessarily mean it's interdisciplinary in any true sense. Your collaborator's skillset is probably very close to yours in skill-space.

The flipside of being very highly specialised - like Alice and Carol's skillsets in your diagram, but longer and thinner - is that, the more truly 'interdisciplinary' we're aiming to be, the smaller the intersection of our skillsets is, and so the less our ability to communicate our research to each other becomes. There might be very fertile ground for collaboration between a researcher in chemistry and a researcher in sociology, but before they can even begin to find out whether that ground exists, the chemist is going to need an intensive crash course in sociology and vice-versa, so that each knows what the other is even on about, what the basic methods and approaches are, etc.

(This meshes with my limited personal experience too. I am an academic who specialised in field X, but also happens to be reasonably highly trained in very different field Y. I am probably Pareto best, or at least one of a very small number of Pareto best, in a large area of skill-space in that intersection. But, at least from my vantage point, there aren't really any problems in the intersection of the two. Maybe my role should be to mediate between X researchers and Y researchers to search the problem-space more. But also maybe I'd be wasting their time. It's hard to know.)

comment by conjectures · 2019-08-02T13:05:35.758Z · score: 4 (3 votes) · LW(p) · GW(p)

Nice article. I perhaps take issue with this:

And to a large extent, people do pursue those frontiers. It’s no secret that an academic can easily find fertile fields by working with someone in a different department. “Interdisciplinary” work has a reputation for being unusually high-yield.

Since (top tier, excluding Nature, Science) journals typically care most about single axis achievements.

comment by dreeves · 2019-06-25T07:28:17.376Z · score: 4 (2 votes) · LW(p) · GW(p)

Nice! I've heard a similar idea called a "talent stack" or "skill stack" but explaining it in terms of staking out a chunk of the Pareto frontier is much better.

Coincidentally, I just wrote a post explaining the idea of Pareto dominance -- -- in case that's useful to anyone.

comment by romeostevensit · 2019-06-25T05:18:16.099Z · score: 3 (6 votes) · LW(p) · GW(p)

Sometimes a post is so goodi get a little mad I didn't think of it first. I think it has to do with feeling like I don't have enough elbow roo-Dammit!

comment by Jan Rzymkowski (jan-rzymkowski) · 2019-08-19T14:13:57.016Z · score: 2 (2 votes) · LW(p) · GW(p)

This analysis seems to quietly assume that various important skills are independent variables and therefor many people in top of their field will neccesserly be average in various other skills (actually, the chart goes even further and assumes that there's universal negative correlation between skills -- I'm not even sure if that's mathematically possible for more than 2 variables).

World's greatest genontologist will probably be very good at statistics and even Ed Jaynes would probably be a above average generontologist just because he can effectively interpret generontology data.

comment by habryka (habryka4) · 2019-07-27T00:25:39.083Z · score: 2 (1 votes) · LW(p) · GW(p)

Promoted to curated: I think this post makes an important point, and does so in a great educational format. I am generally pretty excited about this kind of economic thinking as it applies to prioritization and helping individuals make better decisions.

comment by Eu-Ming Lee (eu-ming-lee) · 2019-09-13T21:25:32.419Z · score: 1 (1 votes) · LW(p) · GW(p)

This is why educating the world's population and increasing wealth across the world is beneficial for all of mankind. There are numerous combinations of multi-dimensional fields that all have Pareto frontiers that have untold riches for both that intrepid explorer and also for humanity as a whole. As more people are educated, we have more people willing to fill out those Pareto frontiers and claim their treasures within. And their discoveries ultimately benefit humanity as a whole, in general.

However, in the short term, a massive gold rush to Pareto frontiers threatens those who already have advantages in singular dimensions. Thus, the wealthy status quo is naturally aligned against education of the masses because it threatens their single dimensional advantages by introducing Pareto frontiers that were formerly too numerous and/or expensive to exploit effectively.

comment by ioannes_shade · 2019-07-28T17:19:58.746Z · score: 1 (1 votes) · LW(p) · GW(p)

cf. Talent Stacks

comment by Siebe · 2019-06-25T09:56:05.603Z · score: 1 (1 votes) · LW(p) · GW(p)

This was very informative!

How would you translate this into a heuristic? And how much do I need to have a secondary skill, rather than finding a partner that has a great complementary skill?

comment by johnswentworth · 2019-06-25T18:19:37.606Z · score: 2 (1 votes) · LW(p) · GW(p)

This ties into Pattern's comment too. Spreading out the skills across people introduces a bunch of problems [LW · GW]:

  • For the sort of problems which lend themselves to breakthroughs in the first place, the key is often one discrete insight. There's no good way to modularize the problem; breaking it up won't help find the key piece. (This is a GEM consequence: if it's modularizable, it's probably already been modularized.)
  • Group dynamics: Isaac Asimov wrote a great piece about this. Creative problem-solving requires an exploratory mindset, and you need the right sort of group setup to support that. Also it doesn't scale well with group size.
  • Translation: different specialties use different jargon, and somebody needs to do the work of translating. Translation can be spread across two people, but that means spending a lot of time on "hey what's the word for a crunchy sweet red fruit that's sort of spherical?" It's much faster if one person knows both languages.
  • Unknown unknowns: if each person only knows one field well, then there may be a solution in one field for a problem in the other, and neither person even thinks to bring it up. It's tough to know what kinds of things are available in a field you don't know.

All that said, obviously working in groups can theoretically leverage scale with less personal cost.

Heuristics left as an exercise to the reader.

comment by Pattern · 2019-07-27T17:09:32.656Z · score: 1 (1 votes) · LW(p) · GW(p)

What is GEM?

comment by CronoDAS · 2019-07-29T18:02:49.663Z · score: 2 (1 votes) · LW(p) · GW(p)

Generalized Efficient Markets - if something is easy and obvious, someone else is already doing it.

comment by elityre · 2019-06-25T03:34:12.808Z · score: 1 (5 votes) · LW(p) · GW(p)

This was great. Thank you!

comment by Pattern · 2019-06-24T22:39:51.964Z · score: 1 (1 votes) · LW(p) · GW(p)

It's interesting to consider how much effectiveness is had from the skillset being in one person (the table tennis/sprinter) versus a group - the term "interdisciplinary" seems like it could apply in either case. (Though it has a way of becoming it's own thing sometimes, and then there's a new field, which will probably be focused on getting the set into individuals as opposed to groups).