Selection Has A Quality Ceiling

post by johnswentworth · 2021-06-02T18:25:54.432Z · LW · GW · 18 comments

Contents

  Bits Of Search
  Asymptotics of Selection vs Training
  Problem: We Don’t Know How To Train
  Summary
None
19 comments

Suppose we’re working on some delightfully Hard problem - genetically engineering a manticore, or terraforming Mars, or aligning random ML models. We need very top tier collaborators - people who are very good at a whole bunch of different things. The more they’re good at, and the better they are, the better the chances of success for the whole project.

There’s two main ways to end up with collaborators with outstanding skill/knowledge/talent in many things: selection or training. Selection is how most job recruitment works: test people to see if they already have (some of) the skills we’re looking for. Training instead starts with people who don’t have (all of) the skills, and installs them de novo.

Key point of this post: selection does not scale well with the level of people we’re looking for. As we increase the number of skills-we-want in our collaborators, the fraction-of-people with all those skills shrinks exponentially, so the number-we-need-to-test grows exponentially. Training has much better asymptotic behavior: as the number of skills-we-want grows, the amount of training needed to install them grows only linearly - assuming we’re able to train them at all.

Suppose I have some test or criterion, and only half the population passes it - for instance, maybe I want someone with above-median math skills. That’s one bit of search [LW · GW]: it eliminates half the possibilities.

If I want above-median math skills and above-median writing skills, that’s (approximately) two bits, and I expect (approximately) one-in-four people to pass both tests. (Really, math and writing skills are correlated, so it will be somewhat more than one-in-four and thus somewhat less than two bits of search.) As more skills are added to the list of requirements, adding more “bits of search”, the number of people who pass all requirements will fall exponentially. With k bits of search, only 1-in-2^k people will pass, so I’ll need to search over ~2^k people just to find one potential collaborator.

In practice, skills are not independent, but the correlation is weak enough that exponentials still kick in. (Indeed, the only way exponentials won’t kick in is if correlation increases rapidly as we add more skills.)

I also sometimes want more-than-one bit of search in just one skill. For instance, if I want someone in the top 1/32 of writing skill, then that’s 5 bits of search. In practice, we usually want quite a few bits in relevant skills - for instance, if I’m looking for help genetically engineering a manticore, then I’ll want people with deep expertise in developmental biology and morphogenesis. I’d probably want something like 20 bits (i.e. a one-in-a-million person) in those skills alone, plus whatever other skills I might want (e.g. good communication, quantitative thinking, etc).

Asymptotics of Selection vs Training

So, as I crank up the number of bits-of-search, the search becomes exponentially more difficult. It won’t take long before nobody in the world passes my tests - there’s only ~10B people, so ~34 bits is all I get, and that’s if I test literally everyone in the world. That puts a pretty low skill cap on potential collaborators I can find! And even before I hit the everyone-in-the-world cap, exponential growth severely limits how much I can select.

There are ways around that: skills are not independent, and sometimes I can make do with someone who has most of the skills. But the basic picture still holds: as I raise my bar, selection becomes exponentially more difficult.

Training, in principle, does not have this problem. If I want to train two independent skills, then the time required to train both of them is the sum of time required to train each, rather than a product. So, training resource requirements should generally grow linearly, rather than exponentially. Again, skills aren’t really independent, but the basic picture should still hold even when we make the model more complicated.

Problem: We Don’t Know How To Train

When we look at schools or companies, they seem to mostly select. To the extent that training does take place, it’s largely accidental: people are expected to magically pick up some skills in their first weeks or months at a new job, but there isn’t much systematic effort to make that happen efficiently/reliably.

… and for most institutions, that’s good enough. The asymptotic arguments apply to finding “very high quality” people, by whatever criteria are relevant. Most institutions neither need nor find the very best (though of course lots of them claim to do so). Most people, most of the time, work on problems-we-basically-understand [LW · GW]. They just need to be able to use known tools in known ways, in similar ways to everyone else in their field, and about-as-well as others in their field. As long as the field is large, there are plenty of typical candidates, and selection works fine.

Selection breaks down when we need people with rare skills, and especially when we need people with many independent skills - exactly the sort of people we’re likely to need for problems-we-basically-don’t-understand [LW · GW].

But it still seems like training ought to be great - it should be profitable for schools or companies to install new skills in people. In some specific areas, it is profitable. So why don’t we see more of this? Here’s one theory: in order to train systematically, we need some kind of feedback loop - some way to tell whether the training is working. In other words, we need a test. Similarly, we need a test to prove to others that the training worked. And if we have a test, then we could just forget about training and instead use the test to select. As long as we’re not asking for too many bits, that’s probably cheaper than figuring out a whole training program.

So, we end up with a society that’s generally not very good at training.

Summary

Most of the world mostly “gets good people” by selection: we start with a big pool of candidates and then filter for those which best fit our criteria. But this technique puts a cap on “how good” we can select for - we can’t ask for someone better than the best in the world. Even if the number of people is effectively infinite, we still need to search over exponentially many candidates as the list of selection criteria grows.

For most institutions, this isn’t much of a problem, because they’re not “in the asymptote” - they don’t really need people with that many bits of perfection. But the Harder our problems, the more we need people with many bits - potentially people better than the current best in the world, or potentially people who are just too rare to cheaply search for in a giant pool of candidates. At that point, we have no choice but to train, rather than select.

Training is hard; it’s not a thing which most institutions know how to do well today. But if we want top-level collaborators in many skills, then we just have to figure out how to do it. Selection does not scale that way.

18 comments

Comments sorted by top scores.

comment by jimrandomh · 2021-06-02T18:36:55.771Z · LW(p) · GW(p)

There's a wrinkle here that I think changes the model pretty drastically: people vary widely in how readily they pick up skills. The immediate implication is that selecting on skills is selecting on a mix of age, teachability, and alignment between their past studies and the skillset you're testing. Counterintuitively, this means that a test which is narrowly focused on the exact skillset you need will do worse at testing for teachability, so if most of what you need is ultimately going to come from future training and study, then the more broad the skillset tested, the better.

Replies from: johnswentworth, An1lam
comment by johnswentworth · 2021-06-02T18:51:47.881Z · LW(p) · GW(p)

Strongly agree with this. Good explanation, too.

comment by NaiveTortoise (An1lam) · 2021-06-03T02:21:24.880Z · LW(p) · GW(p)

Nice point. I wanted to note that the converse is also true and seems like an example of Berkson's Paradox. If you only see individuals who passed the test, it will look like teachability is anti-correlated with the other two factors even though this may purely be a result of the selection process.

This may seem pedantic but the point I'm making is that it's equally important not to update in the other direction and assume less alignment between past experience and current skillset is better, since it may not be once you correct for this effect.

comment by Donald Hobson (donald-hobson) · 2021-06-03T09:23:04.330Z · LW(p) · GW(p)

There are several extra features to consider. Firstly, even if you only test, that doesn't mean the skills weren't trained. Suppose there are lots of smart kids that really want to be astronauts. And that Nasa puts its selection criteria somewhere easily available. The kids then study the skills they think they need to pass the selection. Any time there is any reason to think that skills X,Y and Z are good combinations there will be more people with these skills then chance predicts. 

There is also the dark side, goodharts curse. It is hard to select over a large number of people without selecting for lying sociopaths that are gaming your selection criteria. 

Replies from: johnswentworth
comment by johnswentworth · 2021-06-03T15:11:49.233Z · LW(p) · GW(p)

Great comment - these were both things I thought about putting in the post, but didn't quite fit.

Goodhart, in particular, is a huge reason to avoid relying on many bits of selection, even aside from the exponential problem. Of course we also have to be careful of Goodhart when designing training programs, but at least there we have more elbow room to iterate and examine the results, and less incentive for the trainees to hack the process.

comment by Dagon · 2021-06-02T22:45:04.940Z · LW(p) · GW(p)

I disagree with the premise.  The vast majority of selection is extremely parallelizable.  In terms of humans, self-selection does most of the work - we don't even have to consider 99.999% of people for most of our collaboration.  Or if we want (and can afford/attract) the best in the world, considering everyone, we set it up so they select among themselves for the first dozen levels of filter.   

Training is almost always individual, and non-scalable by it's nature.

In truth, the mechanisms work together - a few layers of selection to get the most promising interested in training, then additional mixes of training and selection until the greatest at something are pretty damn great.

Edit: on further thought, I think I'm being too narrow.  Both search and training are parallelizable, to the extent that the activity is cooperative and driven by the targets rather than strictly on the part of the interested agent.  It's still the case that search is almost always significantly cheaper than training/improvement, but perhaps not in a structural way.  

comment by Raemon · 2023-01-14T23:08:00.444Z · LW(p) · GW(p)

This post has tentatively entered my professional worldview. "Big if true."

I'm looking at this through the lens of "how do we find/create the right people to help solve x-risk and other key urgent problems." The track record of AI/rationalist training programs doesn't seem that great. (i.e. they seem to typically work mostly via selection[1]). 

In the past year, I've seen John attempt to make an actual training regimen for solving problems we don't understand [LW · GW]. I feel at least somewhat optimistic about his current training attempts, partly because his models make sense to me and partly based on his writeup of the results here [LW · GW]. But I think we're another couple years out before I really know how well it panned out. 

I almost reviewed this post without re-reading it, but am glad I stopped to fully re-read. The mechanics/math of the how the bits-of-selection worked were particularly helpful and I'd forgotten them. One thing they highlight: you might need a lot of different skills. And maybe some of those skills are ineffable and hard to teach. But others might be much more teachable. So maybe you need to select on one hard-to-find property, but can train a lot of other skills.

Some musings on training

  • I'm maybe more optimistic than John about what percentage of "school" is "training". I think maybe 10-15% of what I learned in middle/high-school was at least somewhat relevant to my longterm career, and later when I went to a trade school, I'd say closer to 50% of it was actual training, which I'd have had a harder time doing on my own. (And, my trade school created half of it's classes out of an attempt to be an accredited university. i.e. half the classes were definitively bullshit, and the other half were basically all useful if you were going into the domain of computer-animation).
     
  • When I say "rationality training turned out to mostly be selection", I think probably what I mean was "it didn't create superheroes, the way HPMOR might have vaguely led you to believe." And perhaps, "it mostly didn't produce great researchers." I do think the CFAR-and-Leverage-ecosystem produced a bunch of relevant skills for navigating life, which raise the sanity-and-coordination-waterline. I think it had the positive impact of "producing pretty good citizens." I've heard CFAR instructors complain that mostly they don't seem to imbue the spark of rationality into people, they only find people who already had the spark. But, it clearly IMO created an environment where people-with-that-spark cultivated it and leveled up at it.
     
  • I've heard grad school successful training people in the ineffable domain of research (or, the "hard-to-eff" domain of research). The thing that seems off/unsatisfactory about it, from the perspective of the x-risk-landscape, is it doesn't really train goal directed research, where you're actually trying to accomplish a particular task, and notice when you might be confused about how to approach it.
comment by Karthik Boyareddygari · 2021-06-03T21:49:46.185Z · LW(p) · GW(p)

This is pretty interesting and probably is getting at something true. We can all likely agree that there are thresholds of desired skills and proficiency above which no one is likely to exist that matches the profile. However, like Dagon, I am somewhat skeptical of the overall argument because it seems to disregard self-selection as a means of reducing the pool of applicants and improving the overall matching of skills being selected for to those being brought by the average applicant. 

Furthermore, it seems we are treating skills as though they are coming from a single pot and being chosen at random to create the profile of our desired candidate. But often skills come in clusters which are probabilistically tied to each other — not hard to believe given standardization of curriculum in formal education. 

From an economic standpoint, a lucrative position with certain prerequisite qualifications will create a demand to be filled. Even if no one suitable presently exists, crafting the position with sufficient appeal to overcome the cost of self-training ensures that a supply will emerge eventually. As jimrandomh and Gunnar_Zarncke touch on, there is also a factor of how easily people pick up certain skills which, if possible select for, would make training much more competitive with selection.

comment by Gunnar_Zarncke · 2021-06-03T08:52:11.833Z · LW(p) · GW(p)

I like that you are using math to model the problem but I think you have to argue a bit stronger on this assumption:

In practice, skills are not independent, but the correlation is weak enough that exponentials still kick in.

IQ is known to correlate significantly with all skills for example. And with N skills you have 2^N sets of skills that could be correlated with each other. I think you have to consider this to argue your point. Note that I think your point likely still holds.

Replies from: johnswentworth
comment by johnswentworth · 2021-06-03T15:06:22.936Z · LW(p) · GW(p)

So, one simple model which I expect to be a pretty good approximation: IQ/g-factor is a thing and is mostly not trainable, and then skills are roughly-independently-distributed after controlling for IQ.

For selection in this model, we can select for a high-g-factor group as the first step, but then we still run into the exponential problem as we try to select further within that group (since skills are conditionally independent given g-factor).

This won't be a perfect approximation, of course, but we can improve the approximation as much as desired by adding more factors to the model. The argument for the exponential problem goes through: select first for the factors, and then the skills will be approximately-independent within that group. (And if the factors themselves are independent - as they are in many factor models - then we get the exponential problem in the first step too.)

comment by Jay · 2021-06-03T21:58:33.741Z · LW(p) · GW(p)

Until roughly 1980, US corporations did lots of (paid) training.  Some still do; McDonalds operates Hamburger University.  They found that a lot of new hires left the company soon after training - the companies couldn't capture the value of the training very well.  Because of that they shifted toward hiring college graduates (pre-trained for general skills, if not for company specifics (which don't travel well anyway)) and, later, unpaid internships.

comment by StefanHex (Stefan42) · 2021-06-03T10:39:27.083Z · LW(p) · GW(p)

Nice argument! My main caveats are

* Does training scale linearly? Does it take just twice as much time to get someone to 4 bits (top 3% in world, one in every school class) and from 4 to 8 bits (one in 1000)?

* Can we train everything? How much of e.g. math skills are genetic? I think there is research on this

* Skills are probably quite highly correlated, especially when it comes to skills you want in the same job. What about computer skills / programming and maths skills / science -- are they inherently correlated or is it just because the same people need both? [Edit: See point made by Gunnar_Zarncke above, better argument on this]

Replies from: johnswentworth
comment by johnswentworth · 2021-06-03T14:57:24.086Z · LW(p) · GW(p)

Does training scale linearly? Does it take just twice as much time to get someone to 4 bits (top 3% in world, one in every school class) and from 4 to 8 bits (one in 1000)?

This is a good point. The exponential -> linear argument is mainly for independent skills: if they're uncorrelated in the population then they should multiply for selection; if they're independently trained then they should add for training. (And note that these are not quite the same notion of "independent", although they're probably related.) It's potentially different if we're thinking about going from 90th to 95th percentile vs 50th to 75th percentile on one axis.

(I'll talk about the other two points in response to Gunnar's comment.)

comment by romeostevensit · 2021-06-02T20:05:47.543Z · LW(p) · GW(p)

I have the sense that training happens out in the tails via the mechanism of lineage. Lineage holders get some selection power and might be doing something inscrutable with it, but it's not like they can cast a net for PhD candidates arbitrarily wide so they must be doing some training or we wouldn't see the concentration of results we do. The main issue with this seems to be that it is very expensive. If I have only 10 people I think can do top tier work it is very costly to test hypotheses that involve them spending time doing things other than top tier work. Suggestion: find ways for candidates to work closely with top tier people such that it doesn't distract those people too much. Look at how intellectual lineages do this and assume that some of it looks dumb on the surface.

Replies from: johnswentworth
comment by johnswentworth · 2021-06-02T21:17:17.159Z · LW(p) · GW(p)

Suggestion: find ways for candidates to work closely with top tier people such that it doesn't distract those people too much.

In particular, I currently think an apprenticeship-like model is the best starting point for experiments along these lines. Eli [LW · GW] also recently pointed out to me that this lines up well with Bloom's two-sigma problem: one-on-one tutoring works ~two standard deviations better than basically anything else in education.

comment by Noosphere89 (sharmake-farah) · 2023-09-17T17:31:13.928Z · LW(p) · GW(p)

A lot of the reason why we usually do selection have to do with the fact that for most purposes, once you have a person ready to do economically valuable things, their traits and attributes are basically fixed by genetics, and improvement is mostly not possible.

This is an important thing to remember about humans in general, but especially for this.

comment by ChristianKl · 2021-06-08T11:04:40.611Z · LW(p) · GW(p)

When it comes to training, the Navy Seals spent a lot of capital into training high levels of skills in addition to doing strong selection. A big part of the reason why the Navy Seals can do that and the average company can't is that a person can quit their job at a normal company while a Navy Seal can't simply quit. 

If a company wants to hire someone with 1/1,000,000 skills it costs the same to employ the person whether they found the person through selection or trained them. 

A Navy Seal is essentially for a certain amount of time a slave of the military. We don't like slavery as an institution and thus our laws don't allow for that setup. Some bootcamps manage legal setups where they get part of the wages for some time and thus have an actual incentive to teach skills.

Here’s one theory: in order to train systematically, we need some kind of feedback loop - some way to tell whether the training is working. In other words, we need a test. Similarly, we need a test to prove to others that the training worked. And if we have a test, then we could just forget about training and instead use the test to select. As long as we’re not asking for too many bits, that’s probably cheaper than figuring out a whole training program.

So, we end up with a society that’s generally not very good at training.

While this explains why companies aren't in the business of training skills it doesn't explain why we don't have schools that are good at teaching skills. 

A school that can say: "People who entered scored X on the test. People who left scored X+Y. The course is 3 weeks and costs 5000$" will get some customers if the skill is valuable enough and the skill improvement is big enough. 

If we would have prediction-based medicine [LW · GW]we would have the test for providers of medical treatment and thus we would likely have higher skilled treatment providers. 

comment by Pattern · 2021-06-03T14:37:08.146Z · LW(p) · GW(p)

TL:DR;

Combine searching and training to make the task not impossible. Use/make groups that have more skills than exist in an individual (yet). Do we 'basically understand paradigm changes/interdisciplinary efforts?' If you need a test you don't have, maybe you should make that test. Pay attention to growth - if you want someone (or a group) better than the best in the world, you need someone who is/can grow, past that point. Maybe you'll have to create a team that's better than the best (that currently exist) in the world - possibly people who are currently working in different fields.


1. Hybrid: searching and training

I also sometimes want more-than-one bit of search in just one skill. For instance, if I want someone in the top 1/32 of writing skill, then that’s 5 bits of search.

You could also search for a few bits, and try training the rest.


2. Change the constraints to make the problem solvable (use groups instead of individuals)

There are ways around that: skills are not independent, and sometimes I can make do with someone who has most of the skills. But the basic picture still holds: as I raise my bar, selection becomes exponentially more difficult.

Sounds like figuring out teams might be the way to go here.


3. Are interdisciplinary or paradigm changing project 'problems-we-basically-understand'?

Selection breaks down when we need people with rare skills, and especially when we need people with many independent skills - exactly the sort of people we’re likely to need for problems-we-basically-don’t-understand.

This might also be an issue if you combine a bunch of 'things we understand' into one project, or want to make major change, like (maybe) semiconductor lithography.


4. Can you build what you don't have?

And if we have a test, then we could just forget about training and instead use the test to select.

Maybe you have to develop one, and afterwards you could use it, but now you have people who are trained.


5. Asymptotic growth

But this technique puts a cap on “how good” we can select for - we can’t ask for someone better than the best in the world.

Unless you get someone who will get better over time AND they're (among) the best in the world.


6. Select for/Build a team.

But if we want top-level collaborators in many skills, then we just have to figure out how to do it. Selection does not scale that way.

Mentioned this in 2, though it seems like a different thing than the rest of the post - which is about getting one person with a lot of strong/rare traits, instead of people (from different fields?) who can work together to the same or better effect. (If you want a lot of stuff done, arguably that is a fundamental cap, and larger groups will be needed once you select too hard for that - though how this plays into automation/tools might matter a lot, depending on the area.)

comment by [deleted] · 2022-09-12T14:43:36.633Z · LW(p) · GW(p)