Can we hold intellectuals to similar public standards as athletes?

ozziegooen

Can we hold intellectuals to similar public standards as athletes?

post by ozziegooen · 2020-10-07T04:22:20.450Z · LW · GW · 48 comments

48 comments

Professional athletes are arguably the most publicly understood meritocracy around. There are public records of thousands of different attributes for each player. When athletes stop performing well, this is discussed at length by enthusiasts, and it's understood when they are kicked off their respective teams. The important stuff is out in the open. There's a culture of honest, open, and candid communication around meritocratic competence and value.

This isn't only valuable to help team decisions. It also helps data scientists learn which sorts of characteristics and records correlate best with long term success. As sufficient data is collected, whole new schools of thought emerge, and these coincide with innovative and effective strategies for future talent selection. See Moneyball or the entire field of sabermetrics.

In comparison, our standards for intellectuals are quite prosaic. If I want to get a sense of just how good LeBron James is I can look through tables and tables or organized data and metrics. If I don't trust one metric I have dozens of others to choose.

However, if I want to know how much to trust and value Jonathan Haidt I'm honestly not sure what to do. Some ideas:

Read most of his work, then do a large set of Epistemic Spot Checks [LW · GW] and more to get a sense of how correct and novel it is.
Teach myself a fair amount of Psychology, get a set of Academic Journal subscriptions, then read critiques and counter critiques of his work.
Investigate his citation stats [LW(p) · GW(p)].
Read the "Reception" part of his Wikipedia page and hope that my attempt to infer his qualities from that is successful.
Use some fairly quick "gut level" heuristics to guess.
Ask my friends and hope that they did a thorough job of the above, or have discussed the issue with other friends who did.

Of course, even if I do this for Jonathan Haidt broadly, I'd really want narrow breakdowns. Maybe his old work is really great, but in the last 5 years his motives have changed. Perhaps his knowledge and discussion of some parts of Psychology is quite on point, but his meanderings into Philosophy are simplistic and overstated.[1]

This is important because evaluating intellectuals is dramatically more impactful than evaluating athletes. When I choose incorrectly, I could easily waste lost of time and money, or be dramatically misled and develop a systematically biased worldview. It also leads to incentive problems. If intellectuals recognize the public's lack of discernment, then they have less incentive to do an actual good job, and more of an incentive to signal harder in uncorrelated ways.

I hear one argument: "It's obvious which intellectuals are good and bad. Just read them a little bit." I don't agree with this argument. For one, Expert Political Judgement provided a fair amount of evidence for just how poorly calibrated all famous and well esteemed intellectuals seem to be.

One could imagine an organization saying "enough is enough" and setting up a list of comprehensive grades for public intellectuals on an extensive series of metrics. I imagine this would be treated with a fantastical amount of vitriol.

"What about my privacy? This is an affront to Academics you should be trying to help."

"What if people misunderstand the metrics? They'll incorrectly assume that some intellectuals are doing a poor job, and that could be terrible for their careers."

"We can't trust people to see some attempts at quantifying the impossible. Let them read the sources and make up their own minds."

I'm sure professional athletes said the same thing when public metrics began to appear. Generally new signals get push back. There will be winners and losers, and the losers fight back much harder than the winners encourage. In this case the losers would likely be the least epistemically modest of the intellectuals, a particularly nasty bunch. But if signals can persist, they get accepted as part of the way things are and life moves on.

Long and comprehensive grading systems similar to that used by athletes would probably be overkill, especially to start with. Any work here would be very expensive to carry out and it's not obvious who would pay for it. I would expect that "intellectual reviews" would get fewer hits than "tv reviews", but that those hits would be much more impactful. I'd be excited to hear for simple proposals. Perhaps it would be possible to get many of the possible benefits while not having to face some of the many possible costs.

What should count as an intellectual? It's a fuzzy line, but I would favor an expansive definition. If someone is making public claims about important and uncertain topics and has a substantial following, these readers should have effective methods of evaluating them.

Meritocracy matters. Having good intellectuals and making it obvious how good these intellectuals are matters. Developing thought out standards, rubrics, and metrics for quality is really the way to ensure good signals. There are definitely ways of doing this poorly, but the status quo is really really bad.

[1] I'm using Jonathan Haidt because I think of him as a generally well respected academic who has somewhat controversial views. I personally find him to be every interesting.

48 comments

Comments sorted by top scores.

comment by abramdemski · 2020-10-09T18:00:55.562Z · LW(p) · GW(p)

Prediction-tracking systems, particularly prediction markets with dedicated virtual currencies, seem like an obvious contender here. One could imagine the scope of those being expanded to more fields. So I'm going to focus on what's not captured or inadequately captured by them:

The book Superforecasters notes that generating good questions is as important as predicting answers to questions effectively, but prediction-tracking is completely unable to assign credit to the thinkers who generate good questions.
This is similar to the idea that we should judge people separately on their ability to generate hypotheses and their ability to accurately score hypotheses. Hypothesis generation is incredibly valuable, and we would not want to optimize too much for correctness at the expense of that. This is an idea I attribute to Scott Garrabrant.
Another important idea from Scott is epistemic tenure [LW · GW]. This idea argues against scoring an intellectual in too fine-grained a way. If a person's score can go down too easily (for example, if recent work is heavily weighted), this could create a fear of putting bad ideas out there, which could severely dampen creativity.
Ability to produce proofs and strong evidence seems potentially under-valued. For example, if a conjecture is believed to be highly likely, producing a proof would enable you to get a few points on a prediction-tracker (since you could move your probability to 100% before anyone else), but this would severely underrate your contribution.
Ability to contribute to the thought process seems under-valued. Imagine a prediction market with an attached forum. Someone might contribute heavily to the discussions, in a way which greatly increases the predictive accuracy of top users, without being able to score any points for themselves.

Replies from: AndHisHorse, ryan_b

↑ comment by AndHisHorse · 2020-10-10T18:20:36.855Z · LW(p) · GW(p)

I do think "[a]bility to contribute to the thought process seems under-valued" is very relevant here. A prediction-tracking system captures one...layer[^1], I suppose, of intellectuals; the layer that is concerned with making frequent, specific, testable predictions about imminent events. Those who make theories that are more vague, or with more complex outcomes, or even less frequent[^2][^3], while perhaps instrumental to the frequent, specific, testable predictors, would not be recognized, unless there were some sort of complex system compelling the assignment of credit to the vague contributors (and presumably to their vague contributors, et cetera, across the entire intellectual lineage or at least some maximum feasible depth).

This would be useful to help the lay public understand outcomes of events, but not necessarily useful in helping them learn about the actual models behind them; it leaves them with models like "trust Alice, Bob, and Carol, but not Dan, Eve, or Frank" rather than "Alice, Bob, and Carol all subscribe to George's microeconomic theory which says that wages are determined by the House of Mars, and Dan, Eve, and Frank's failure to predict changes in household income using Helena's theory that wage increases are caused by three-ghost visitations to CEOs' dreams substantially discredits it". Intellectuals could declare that their successes or failures, or those of their peers, were due to adherence to a specific theory, or the lay people could try to infer as such, but this is another layer of intellectual analysis that is nontrivial unless everyone wears jerseys declaring what theoretical school of thought they follow (useful if there are a few major schools of thought in a field and the main conflict is between them, in which case we really ought to be ranking those instead of individuals; not terribly useful otherwise).

[^1]: I do not mean to imply here that such intellectuals are above or below other sorts. I use layer here in the same way that it is used in neural networks, denoting that its elements are posterior to other layers and closer to a human-readable/human-valued result.

[^2]: For example, someone who predicts the weather will have much more opportunity to be trusted than someone who predicts elections. Perhaps this is how it should be; while the latter are less frequent, they will likely have a wider spread, and if our overall confidence in election-predicting intellectuals is lower than in our predictions of weather-predicting intellectuals, that might just be the right response to a field with relatively fewer data points: less confidence in any specific prediction or source of knowledge.

[^3] On the other hand, these intellectuals may be less applied not because of the nature of their field, but the nature of their specialization; a grand an abstract genius could produce incredibly detailed models of the world, and the several people who run the numbers on those models would be the ones rewarded with a track record of successful predictions.

↑ comment by ryan_b · 2020-10-19T19:03:24.675Z · LW(p) · GW(p)

The point about proof generation is interesting. A general proof is equivalent to collapsing the scope of predictions covered by the proof; a method of generating strong evidence effectively setting a floor for future predictions.

A simple way to score this might be to keep adding to their prediction score every time a question is found to succumb to the proof. That being said, we could also consider the specific prediction separately from the transmissibility of the prediction method.

This might be worthwhile even with no change in the overall score; it feels obvious that we would like to be able to sort predictions by [people who have used proofs] or [people who generate evidence directly].

comment by ryan_b · 2020-10-12T02:05:40.462Z · LW(p) · GW(p)

I asked a tangentially related question [LW · GW] two years ago (to the day, as it happens). I made the comparison with an eye to performance maintenance/enhancement, rather than accountability, but it feels like they are two sides of the same coin.

I disagree with most of the other commenters about the problems of generating metrics. The simplest answer is that a lot of metric-style things are already known and used when comparing two different academics aside from citations or IQ, such as:

Age of first degree
Age of first publication
Scores on standardized tests, like SAT/ACT/GRE
Placement in national competitions/exams
Patents with their name
Patents based on their work

These are the kinds of things people reach for when comparing historical academics, for example. Then you could highlight other performance metrics that aren't necessarily their core area of expertise, but indicate a more general ability:

Publications outside their core specialization
Credentials outside their core specialization
Correctness of grammar/spelling of the languages in which they publish
Number of languages spoken
Number of languages in which they published
Success of their students

Then you can consider subjective and informal things:

Evidence of reputation among their peers
Popularity of talks or lectures not about one of their own publications

For example, consider that Von Neumann is considered a strong candidate for the smartest human ever to live, and it seems to me when this is being discussed the centerpiece of the argument boils down to all the other legendary minds of the day saying something to the effect of "this guy makes me feel like an idiot" in private correspondence.

But this information is all buried in biographies and curricula vitae. It isn't gathered and tracked systematically, largely because it seems like there is no body incentivized to do so. This is what I see as the crux of it; there is no intellectual institution with similar incentives to the sports leagues or ESPN.

comment by SamuelKnoche · 2020-10-07T07:22:55.175Z · LW(p) · GW(p)

I like the thought. Though unlike sports, intellectual work seems fundamentally open-ended, and therefore doesn't seem to allow for easy metrics. Intellectuals aren't the ones playing the game, they're the ones figuring out the rules of the game. I think that's why it often is better to focus on the ideas rather than the people.

A similar question also applies within academia. There, citation counts already serve as a metric to measure intellectual accomplishment. The goodharting of that metric probably can tell you a lot about the challenges such a system would face. What metrics do you have in mind?

In a way, this problem is just scaling up the old reputation/prestige system. You find people you respect and trust, and then you see who they respect and trust, and so on. While regularly checking for local validity [LW · GW] of course. Maybe some kind of social app inspired by liquid democracy/quadratic voting might work? You enter how much you trust and respect a few public intellectuals (who themselves entered how much they trust other intellectuals) and then it computes how much you can trust everyone else.

Replies from: newcom, ozziegooen, vlad.proex

↑ comment by newcom · 2020-10-07T11:07:14.921Z · LW(p) · GW(p)

I find such a social app idea really interesting. A map that tracks which public intellectuals value each others contributions (possibly even divided on subject) would be a valuable tool. I guess some initial work on this could even be done without participation of said persons, as most already identify their primary influences in their work.

↑ comment by ozziegooen · 2020-10-07T12:39:29.187Z · LW(p) · GW(p)

Thanks! Some very quick thoughts:

Intellectuals aren't the ones playing the game, they're the ones figuring out the rules of the game.

This doesn't seem true to me. There's relatively little systematic literature from intellectuals trying to understand what structural things make for quality intellectual standards. The majority of it seems to be arguing and discussing specific orthogonal opinions. It's true that they "are the ones" to figure out the rules of the game, but this is a small minority of them, and for these people, it's often a side endeavor.

In a way, this problem is just scaling up the old reputation/prestige system.

Definitely. I think the process of "evaluation standardization and openness" is a repeated one across industries and sectors. There's a lot of value to be had in understanding the wisdom of existing informal evaluation systems and scaling them into formal ones.

Maybe some kind of social app inspired by liquid democracy/quadratic voting might work?

I imagine the space of options here is quite vast. This option seems like a neat choice. Perhaps several distinct efforts could be tried.

What metrics do you have in mind?

I have some rough ideas, want to brainstorm on this a bit more before writing more.

Replies from: SamuelKnoche, SamuelKnoche

↑ comment by SamuelKnoche · 2020-10-07T14:36:07.108Z · LW(p) · GW(p)

I maybe wasn't clear about what I meant by 'the game.' I didn't mean how to be a good public intellectual but rather the broader 'game' of coming up with new ideas and figuring things out.

One important metric I use to judge public intellectuals is whether they share my views, and start from similar assumptions. It's obviously important to not filter too strongly on this or you're never going to hear anything that challenges your beliefs, but it still makes sense to discount the views of people who hold beliefs you think are false. But you obviously can't build an objective metric based on how much someone agrees with you.

The issue is that one of the most important metrics I use to quickly measure the merits of an intellectual is inherently subjective. You can't have your system based on adjudicating the truth of disputed claims.

↑ comment by SamuelKnoche · 2020-10-07T14:56:19.350Z · LW(p) · GW(p)

There's a lot of value to be had in understanding the wisdom of existing informal evaluation systems and scaling them into formal ones.

One consideration to keep in mind though is that there might also be a social function in the informality and vagueness of many evaluation systems.

From Social Capital in Silicon Valley:

The illegibility and opacity of intra-group status was doing something really important – it created space where everyone could belong. The light of day poisons the magic. It’s a delightful paradox: a group that exists to confer social status will fall apart the minute that relative status within the group is made explicit. There’s real social value in the ambiguity: the more there is, the more people can plausibly join, before it fractures into subgroups.

There is probably a lot to be improved with current evaluation systems, but one always has to be careful with those fences.

Replies from: ozziegooen

↑ comment by ozziegooen · 2020-10-08T01:20:59.521Z · LW(p) · GW(p)

Good points, thanks.

I think ranking systems can be very powerful (as would make sense for something I'm claiming to be important), and can be quite bad if done poorly (arguably, current uses of citations are quite poor). Being careful matters a lot.

↑ comment by vlad.proex · 2020-10-08T16:14:29.719Z · LW(p) · GW(p)

Maybe some kind of social app inspired by liquid democracy/quadratic voting might work?

Do you think it's wise to entrust the collective with judging the worth of intellectuals? I can think of a lot of reasons this could go wrong: cognitive biases, emotional reasoning, ignorance, Dunning–Kruger effect, politically-driven decisions... Just look at what's happening now with cancel culture.

In general this connects to the problem of expertise. If even intellectuals have trouble understanding who among them is worthy of trust and respect, how could individuals alien to their field fare better?

If the rating was done between intellectuals, don't you think the whole thing would be prone to conflicts of interest, with individuals tending to support their tribe / those who can benefit them / those whose power tempts them or scares them?

I am not against the idea of rating intellectual work. I'm just mistrustful of having the rating done by other humans, with biases and agendas of their own. I would be more inclined to support objective forms of rating. Forecasts are a good example.

Replies from: abramdemski

↑ comment by abramdemski · 2020-10-09T18:43:54.340Z · LW(p) · GW(p)

Do you think it's wise to entrust the collective with judging the worth of intellectuals?

The idea as described doesn't necessitate that.

Everyone rates everyose else. This creates a web of trust [LW · GW].
An individual user then designates a few sources they trust. The system uses those seeds to propagate trust through the network, by a transitivity assumption.
So every individual gets custom trust ratings of everyone else, based on who they personally trust to evaluate trustworthiness.

This doesn't directly solve the base-level problem of evaluating intellectuals, but it solves the problem of aggregating everyone's opinions about intellectual trustworthiness, while taking into account their trustworthiness in said aggregation.

Because the aggregation doesn't automatically include everyone's opinion, we are not "entrusting the collective" with anything. You start the trust aggregation from trusted sources.

Unfortunately, the trust evaluations do remain entirely subjective (IE unlike probabilities in a prediction market, there is no objective truth which eventually comes in to decide who was right.)

comment by clone of saturn · 2020-10-07T05:36:06.172Z · LW(p) · GW(p)

I think this is fundamentally not possible, because the world does not come pre-labeled with rules and win/lose conditions the way a sport or game does. Any attempt to do this would require you to take certain concepts as presumptively valid and unquestionable, but the main point of being an intellectual is to question accepted concepts and develop new ones.

Replies from: ozziegooen, sawyer

↑ comment by ozziegooen · 2020-10-07T12:23:47.821Z · LW(p) · GW(p)

"Fundamentally not possible" Thanks for providing such a clear and intense position.

I think this is either misunderstanding what I'm saying, or is giving up on the issue incredibly quickly.

Sports are a particularly good examples of pre-labeled rules, but I don't think that means that more tricky things are impossible to measure. (See How to Measure Anything). Even in sports, the typical metrics don't correlate perfectly with player value; it's taken a fair bit of investigation to attempt to piece this together. It would have been really easy early on to dismiss the initial recording of metrics; "These metrics are flawed, it's useless to try." It took sabermetrics several decades to get to where it is today.

There are many, many more fuzzy endeavors that are challenging to evaluate, but where we have developed substantial systems to a better job than "just let people intuit things."

The Chinese Imperial Examinations were considered a substantial success for meritocracy and quality in government.
Colleges have extensive processes of SAT/ACT scores, high school transcripts, and essays. This seems much better than "a few interviews"
When I played an instrument in school, I went to an evaluator each year who ranked me on a set of measures. The ranking determined which regional bands I would get to be in. (see NYSSMA)
Modern Western law is mostly statutory. One could have easily said, "Things are complicated! If we write up formal procedures, that would get in the way of the unique circumstances."
Most professional associations have actual tests to complete. If you want to be a lawyer you need to pass the Bar. If you want to be a doctor, get ready to face the US Licensing Examinations.

the main point of being an intellectual is to question accepted concepts and develop new ones. So you're saying that one main thing intellectuals do is question category systems and suggest new ones? This is almost never what I see intellectuals do. I often see them fighting for some side or another, presenting some arguments, finding lots of data and anecdotes. Intellectualism is a massive field.

If it is the case that intellectuals are so good at doing this, then I suggest they start with figuring out concepts on which to evaluate themselves on, and continue to improve those.

Replies from: AllAmericanBreakfast

↑ comment by DirectedEvolution (AllAmericanBreakfast) · 2020-10-08T16:48:03.599Z · LW(p) · GW(p)

I think this is either misunderstanding what I'm saying, or is giving up on the issue incredibly quickly.

You could have titled your post "Can we try harder to evaluate the quality of intellectuals?"

Instead, your phrase was "to similar public standards."

The consequence is that you're going to experience some "talking past each other." Some, like me, will say that it's transparently impossible to evaluate intellectuals with the same or similar statistical rigor as an athlete. As others pointed out, this is because their work is not usually amenable to rigidly defined statistics, and when it is, the statistics are too easily goodharted.

The debate you seem to desire is whether we could be trying harder to statistically evaluate intellectuals. The answer there is probably yes?

But these are two different debates, and I think the wording of your original post is going to lead to two separate conversations here. You may want to clarify which one you're trying to have.

Replies from: ozziegooen

↑ comment by ozziegooen · 2020-10-09T03:26:54.104Z · LW(p) · GW(p)

That's a good point, I think it's fair here.

I was using "athletes" as a thought experiment. I do think it's worth considering and having a bunch of clear objective metrics could be interesting and useful, especially if done gradually and with the right summary stats. However, the first steps for metrics of intellectuals would be subjective reviews and evaluations and similar.

Things will also get more interesting as we get better AI and similar to provide interesting stats that aren't exactly "boring objective stats" but also not quite "well thought out reviews" either.

Replies from: AllAmericanBreakfast

↑ comment by DirectedEvolution (AllAmericanBreakfast) · 2020-10-09T06:11:43.351Z · LW(p) · GW(p)

I think you might enjoy getting into things like Replication Watch and similar efforts to discover scientific fraud and push for better standards for scientific publishing. There is an effort in the scientific world to bring statistical and other tools to bear on policing papers and entire fields for p-hacking, publication bias and the file drawer problem, and outright fraud. This seems to me the mainline effort to do what you're talking about.

Here on LW, Elizabeth has been doing posts on what she calls "Epistemic Spot Checks," to try and figure out how a non-expert could quickly vet the quality of a book they're reading without having to be an expert in the field itself. I'd recommend reading her posts in general, she's got something going on.

While I don't think these sorts of efforts are going to ever result in the kind of crisp, objective, powerfully useful statistics that characterize sabermetrics, I suspect that just about every area of life could benefit from just a little bit more statistical rigor. And certainly, holding intellectuals to a higher public standard is a worthy goal.

↑ comment by sawyer · 2020-10-07T23:33:53.376Z · LW(p) · GW(p)

I think there's probably a fundamental limit to how good the ranking could be. For one thing, the people coming up with the rating system would probably be considered "intellectuals". So who rates the raters?

But it seems very possible to get better than we are now. Currently the ranking system is mostly gatekeeping and social signaling.

Replies from: ozziegooen

↑ comment by ozziegooen · 2020-10-07T23:52:59.380Z · LW(p) · GW(p)

Agreed there's a limit. It's hard. But, to be fair, so are challenges like qualifying students, government officials, engineers, doctors, lawyers, smart phones, movies, books.

Around "who rates the raters", the thought is that:

First, the raters should rate themselves.
There should be a decentralized pool of raters, each of which rates each other.

There are also methods that raters could use to provide additional verification, but that's for another post.

Replies from: abramdemski

↑ comment by abramdemski · 2020-10-09T19:27:01.596Z · LW(p) · GW(p)

I like the overlapping webs of trust [LW · GW] idea that there's no central authority, so each user just has to trust someone in order to get ratings from the system. If you can trust at least one other person, then you can get their rankings on who else has good thinking, and then integrate those people's rankings, etc.

Of course, it all remains unfortunately very subjective. No ground truth comes in to help decide who was actually right, unlike in a betting market.

Ratings will change over time, and a formula could reward those who spot good intellectuals early (the analogy being that your ratings are like an investment portfolio).

comment by Dirichlet-to-Neumann · 2020-10-09T14:47:55.843Z · LW(p) · GW(p)

Baseball and American sports in general are a bit of an exception though. Most sports are much more difficult to evaluate statistically. For example in chess the elo ranking offer good prediction of average performances but fails totally to evaluate 1) head-to-head win rate, 2) one-game results or even tournaments results, 3) relative strength on different parts of the game (for example "who is good at opening preparation ?" is typically answered by the same sort of heuristics as "who is good at maths" or "who is good at psychology ?".

comment by hg00 · 2020-10-07T06:59:54.602Z · LW(p) · GW(p)

Somewhere I read that a big reason IQ tests aren't all that popular is because when they were first introduced, lots of intellectuals took them and didn't score all that high. I'm hoping prediction markets don't meet a similar fate.

Replies from: ozziegooen, steven0461

↑ comment by ozziegooen · 2020-10-07T12:31:45.224Z · LW(p) · GW(p)

There's a funny thing about new signaling mechanisms.

If they disagree with old ones, then at least some people who did well in the old ones will complain (loudly).

If they perfectly agree with old ones, then they provide no evaluative value.

In general, introducing new signaling mechanisms is challenging, very much for this reason.

If they can last though, then eventually those in power will be ones who did well on them, so these people will champion them vs. future endeavors. So they can have lasting lock-in. It's more of a reason to work hard to get them right.

↑ comment by steven0461 · 2020-10-08T00:28:59.533Z · LW(p) · GW(p)

Relatedly, the term "superforecasting" is already politicized to death in the UK.

comment by SebastianG (JohnBuridan) · 2020-10-09T02:51:10.237Z · LW(p) · GW(p)

Here is a quick list of things that spring to mind when I evaluate intellectuals. Any score does not necessarily need to cash out in a ranking. There are different types of intellectuals that play different purposes in the tapestry of the life of the mind.

How specialized is this person's knowledge?
What are the areas outside of specialization that this person has above average knowledge about?
How good is this person at writing/arguing/debating in favor of their own case?
How good is this person at characterizing the case of other people?
What are this person's biggest weaknesses both personally and intellectually?

comment by ozziegooen · 2021-12-19T23:07:56.895Z · LW(p) · GW(p)

I enjoyed writing this post, but think it was one of my lesser posts. It's pretty ranty and doesn't bring much real factual evidence. I think people liked it because it was very straightforward, but I personally think it was a bit over-rated (compared to other posts of mine, and many posts of others).

I think it fills a niche (quick takes have their place), and some of the discussion was good.

comment by romeostevensit · 2020-10-10T00:07:56.616Z · LW(p) · GW(p)

An objection might be that athletes have good metrics. But this ignores selection effects. Athletes have good metrics in part because there is money to be made in finding good metrics and then arbitraging against teams who use worse metrics. See Moneyball. Universities do draft elite intellectuals, and that is partially based on how much money that intellectual will be able to bring to that university. University donors want to affiliate with elite intellectuals.

comment by steven0461 · 2020-10-08T00:38:29.540Z · LW(p) · GW(p)

Even someone who scores terribly on most objective metrics because of e.g. miscalibration can still be insightful if you know how and when to take their claims with a grain of salt. I think making calls on who is a good thinker is always going to require some good judgment, though not as much good judgment as it would take to form an opinion on the issues directly. My sense is there's more returns to be had from aggregating and doing AI/statistics on such judgment calls (and visualizing the results) than from trying to replace the judgment calls with objective metrics.

Replies from: ozziegooen

↑ comment by ozziegooen · 2020-10-08T01:19:05.895Z · LW(p) · GW(p)

I guess this wasn't very obvious, but my recommendation is to use a collection of objective and subjective measurements, taking inspiration to how we evaluate college applicants, civil servants, computers, as well as athletes.

I'm all for judgement calls if the process could be systematized and done well.

comment by Dagon · 2020-10-07T14:12:07.071Z · LW(p) · GW(p)

You'd have to narrow and categorize different domains of intellectualism - the VAST majority of athletes don't get any attention at all, and those that do comprise a very small volume of athelticism-space. And you'd have to turn it into a contest that median humans could identify with.

I don't think either of these changes would make things better. The top few percent of intellectuals could make a bunch of money (mostly through sponsorships, which may be unfortunate if we prefer them to be independent). A bunch of pretty-good intellectuals would learn that they'll never make it big and be less effective than they now are.

Summary: Academia and intellectualism might be sub-optimal, but sports is horrible and should not be copied.

Other difficulties include type and durability of output. Paid/popular sports are performances - the value is in the action and instantaneous demonstration of talent and physique. Intellectual value (to observers) is in the output - papers, ideas, explanations, validation of models, etc. This is both far more durable and less repeatable than sporting events. A true meritocracy could very well lead to such dominance that we're forced to sabotage the smartest/best.

comment by ChristianKl · 2020-10-10T13:37:07.308Z · LW(p) · GW(p)

Sabermetrics basically came out of the observation that a team that used the metrics to make hiring decisions outcompetes teams that don't. It's the same as universities using citation count metrics for their tenure decisions. In both cases metrics get used to make decisisions about which individual is supposed to be hired.

The problem is that the metrics for athletics are better for their usages then the metrics the universities have available for their tenure decisions.

Part of the standards we have for professional atheletes is that we have a lot of rules that cap their performance. Having rules that cap performance by engaging in some non-standard behavior is useful if you want to rank people by a fair metric but not useful if you care about the quality of the output.

comment by Richard_Ngo (ricraz) · 2020-10-07T21:55:34.424Z · LW(p) · GW(p)

if I want to know how much to trust and value Jonathan Haidt I'm honestly not sure what to do

The most basic approach which most academics would use: look at the citations on his papers. Then you could see a lot of the same metrics as in baseball: how his performance has been changing over time, how good he is compared with others in the field, and so on. You could look at his h-index, or at the impact factors of the journals he's been published in. It seems odd that you didn't mention any of these, since they're such a standard part of academic evaluation.

Replies from: ozziegooen

↑ comment by ozziegooen · 2020-10-07T22:41:01.884Z · LW(p) · GW(p)

Fair point, thanks.

I considered it briefly but have been reading/chatting recently about problems with citation indices so left it out. That said, it does serve a function.

comment by Filipe Marchesini (filipe-marchesini) · 2020-10-07T18:38:41.435Z · LW(p) · GW(p)

Yes, we can hold intellectuals to similar public standards as athletes. Using GPT-4/5 we could use it to create a set of questions to check if the intellectual can answer the questions correctly avoiding every kind of bias already explained here on LW. For each bias already explained before, we can create new questions that show when a human fall on them, assigning a new score to that human. I would like each human to write down all his knowledge with the help of an automatic writing system, we could create a visual tree of all knowledge the system detected the human acquired on the past, and evaluate how well he performs in answering questions about the fields he visited/he claims to know about. What's the point of asking your credentials when I can evaluate your knowledge in real-time with GPT-n systems?

On the tree of knowledge we could see which humans score higher in which domains and why. What are the questions they can answer that others can't. Don't ask me my credentials, ask me a hard question/ give me a hard problem to solve and let's see who solves it first or better. GPT-n could babble about the solution presented by different humans, and other group of humans that score high on these domains could also rate/evaluate the solutions by others, choosing the score they assign for each solution.

comment by Dagon · 2020-10-07T15:57:56.021Z · LW(p) · GW(p)

An interesting follow-up (but I have no path to implementation, and don't even know the desirability) is "can we hold amateur opinion-holders to the same standards we hold casual sports-players?"

A whole lot of people play pickup basketball or soccer among friends, and a subset (but still large amount) play in organized non-professional leagues. Nobody takes them very seriously. Why do we let idiots vote? Why do we care what they say on Facebook?

comment by AndHisHorse · 2020-10-10T18:26:02.144Z · LW(p) · GW(p)

While, if successful, such an epistemic technology would be incredibly valuable, I think that the possibility of failure should give us pause. In the worst case, this effectively has the same properties as arbitrary censorship: one side "wins" and gets to decide what is legitimate, and what counts towards changing the consensus, afterwards, perhaps by manipulating the definitions of success or testability. Unlike in sports, where the thing being evaluated and the thing doing the evaluating are generally separate (the success or failure of athletes doesn't impede the abilities of statisticians, and vice versa), there is a risk that the system is both its subject and its controller.

comment by alfalfajor · 2020-10-08T16:59:35.524Z · LW(p) · GW(p)

My impression of the proposed idea is to create a "hard" intellectual accountability system for intellectuals by sampling some "falsifiable" subspace of idea space, similar to what exists for superpredictors and athletes. This certainly seems helpful in some areas, and I think is similar to the purpose of politifact.

But then there's the risk of falling into the Marxist mistake: that if something isn't quantifiable or "hard" it is not useful. The idea that "hard" production matters (farmers, factory workers, etc.) while "soft" production (merchants, market researchers, entertainers) does not, which is at the base of Marxism, has been disgraced by modern economics. But this is kind of hard to explain, and Marxism seemed "obviously correct" to the proto-rationalists of the early 20th century.

The sphere of intellectual expertise, especially the even "softer" side of people who've taken it on themselves to digest ideas for the public, is much harder still to get right than economics. No matter how many parameters an analysis like this tries to take into account, it is likely to miss something important. While I like the idea of using something like this on the margin, to boost lower-status people who get an unusually high number of things right or flag people who are obviously full of it, I think it would be a bad idea to use something like this to replace existing systems.

comment by sawyer · 2020-10-07T23:36:50.315Z · LW(p) · GW(p)

This reminds me of attempts to rate the accuracy of political pundits. Maybe this was in Superforecasting? Pundits are a sort of public intellectual. I wonder if one place to start with this intellectual-sabermetrics project would be looking for predictions in the writings of other intellectuals, and evaluating them for accuracy.

Replies from: ozziegooen

↑ comment by ozziegooen · 2020-10-07T23:50:51.722Z · LW(p) · GW(p)

Expert Political Judgement discussed this. From what I remember they used a variety of "Experts", many I believe were Academics. I believe these crossed over significantly with what we would think of as intellectuals.

I like the name "intellectual-sabermetrics".

Luke Muehlhauser has been calling out people online for making bad predictions.

One common issue though is that many intellectuals are trained specifically not to communicate falsifiable predictions. They often try to word things in ways that seem confident, but are easy to argue against after the fact.

Replies from: crl826, crl826

↑ comment by crl826 · 2020-10-08T22:11:36.776Z · LW(p) · GW(p)

Thinking about this some more, I wonder if you could ever get to a place where pundits were looked down on for not making falsifiable predictions.

People with bad records and people who won't go on record are both treated as second class.

Probably still too much to hope for.

↑ comment by crl826 · 2020-10-08T00:47:46.525Z · LW(p) · GW(p)

One common issue though is that many intellectuals are trained specifically not to communicate falsifiable predictions. They often try to word things in ways that seem confident, but are easy to argue against after the fact.

Yep.

And most pundits skill is not in accuracy, but in building an audience. Media want pundits to make outrageous statements for clickbait/stop channel changing. General public want pundits to validate their opinions.

If accuracy was actually a concern, they would already be held accountable. There are several fairly easy ways to do it, society just has chosen not to.

comment by kithpendragon · 2020-10-07T09:17:15.104Z · LW(p) · GW(p)

Find a way to bet on the outcomes of intellectual performance, make that work public and entertaining in some way, and the bookies will figure out the rest. #slightlyGlib

comment by Pattern · 2020-10-09T16:42:44.906Z · LW(p) · GW(p)

I'm sure professional athletes said the same thing when public metrics began to appear. Generally new signals get push back. There will be winners and losers, and the losers fight back much harder than the winners encourage. In this case the losers would likely be the least epistemically modest of the intellectuals, a particularly nastybunch. But if signals can persist, they get accepted as part of the way things are and life moves on.

This seems to ignore the possibility of flaws in the metrics. For example, if such metrics had been created some time ago, and had included something involving p-values...

There are definitely ways of doing this poorly, but the status quo is really really bad.

In order to create a metric, you should know what the metric is for.

Tenure? I think those metrics already exists. (For better or for worse.)

Citation stats - not a good measure by itself. (It's independent of both replication and non-replication.)

For one, Expert Political Judgement provided a fair amount of evidence for just how poorly calibrated all famous and well esteemed intellectuals seem to be.

It determined that by asking them to make predictions? If you want good predictors, then ditch the intellectuals - and get good forecasters/forecasts.

comment by eyesack · 2020-10-07T17:35:53.196Z · LW(p) · GW(p)

A lot of people's beliefs are influenced by their political affiliation. I'd be wary of a rating system. If one political party got ahold of it, they might use it to completely discredit their opponents, when the issue is just not very well understood in the first place and having multiple perspectives is very useful.

Climate change is the issue that comes to mind first.

Replies from: ozziegooen

↑ comment by ozziegooen · 2020-10-08T01:47:14.520Z · LW(p) · GW(p)

I'd suggest decentralized rating systems for this reason. Think movie reviewers and Consumer Reports.

Replies from: ChristianKl

↑ comment by ChristianKl · 2020-10-10T13:38:52.751Z · LW(p) · GW(p)

How is Consumer Reports a decentralized system? It's an NGO that funds itself through subscription.

Replies from: ozziegooen

↑ comment by ozziegooen · 2020-10-10T14:36:46.581Z · LW(p) · GW(p)

Consumer Reports itself is centralized, but it's one of a cluster of various reviewing groups, none of which gets official state status.

comment by Products Insights (products-insights) · 2021-12-16T11:51:17.436Z · LW(p) · GW(p)

Can we hold intellectuals to similar public standards as athletes?

Contents

48 comments