Evaluating expertise: a clear box model

post by JustinShovelain · 2020-10-15T14:18:23.599Z · score: 26 (14 votes) · LW · GW · 3 comments

Contents

  Context
Purpose of expertise modelling
Types of expertise modelling
How expertise modelling fits within a truth finding process
Clear box expertise modelling
Main suggested heuristics for clear box expertise modelling
Use of clear box expertise modelling
Expertise Calculator
None


Context

Purpose of expertise modelling

To get what we value we must make good decisions. To make these decisions we must know what relevant facts are true. But the world is so complex that we cannot check everything directly ourselves and so must defer to topic “experts” for some things. How should we choose these experts and how much should we believe what they tell us? In this document, I’ll describe a way to evaluate experts.

Many of the problems in the world, be they political, economic, scientific, or personal, are caused by or exacerbated by making epistemic mistakes. We trust in the wrong advice and don’t seek out the right advice. We vote for the wrong politicians, believe the marketers, promote bad bosses, are mesmerized by conspiracy theories, are distracted by the irrelevant, fight with our neighbors, lack important information, suffer accidents, and don’t know the best of what has been discovered. If we accurately know what to do, how to do it, and why to do it, then we become more effective and motivated.

Types of expertise modelling

To evaluate these experts individually, we can use three methods: black box models, clear box models, or deferring further to other, “meta”, experts about these topic experts (see also this and this [LW · GW]).

• Black box/outside view of the expert: This type of modelling would be just looking at the expert’s prediction accuracy in the past without asking about detailed properties of how they come to those decisions. Their prediction accuracy is ultimately what we want to get at but sometimes track records are incomplete or don’t exist yet.
• Clear box/white box/inside view of the expert/interpretability: This type of modelling looks inside and asks about the specific properties of the experts that make them accurate. This lets us gauge their opinions when we don’t have a predictive track record for them. It also lets us better estimate to what extent their expertise generalizes, points out possible ways they may err and how to fix these errors, and points out how to gain expertise ourselves and improve upon the state-of-the-art.
• Social reputation/deference pointer: This strategy passes the buck of analyzing whether to believe an expert, to other, meta-experts; but, it still requires an ability to evaluate a meta-expert's ability to evaluate other experts, and so reduces to using black box or clear box models about the meta-expert’s ability to evaluate other experts. This has the advantage of letting us quickly assess something, but has the downsides of social biases and of playing the game-of-telephone.

How expertise modelling fits within a truth finding process

To move towards knowing the truth about a topic, a good process to go through would be the list of the following steps:

1. Figure out one’s own impression: Figure out our independent, without-the-experts (though possibly with-their-models) impression of the topic. For instance by using Fermi [LW · GW] modelling and Bayesian analysis, taking into account the limitations [EA · GW] in our data, and dealing with the upsides and downsides of using explicit probabilities [EA · GW].
2. Evaluate experts individually: Combine the three types of expertise modelling above to evaluate experts. Some additional tools and perspectives for this: double cruxing [LW · GW], frame conflicts [LW · GW], types of disagreement [LW · GW], causes of disagreement analysis [LW · GW], Bayesian truth serum, mechanism design, the meanings [LW · GW] of words [LW · GW], bets, analysis of the dynamics between experts [LW · GW], and Aumann’s agreement theorem.
3. Aggregate expert opinions: Aggregate across a sample of the topic experts to figure out what we believe given all of them. For example, using tools like prediction markets 1 [LW · GW],2 [? · GW],3 [LW · GW], the techniques used in superforecasting, the Delphi method, and mimesis [LW · GW].
4. Combine one’s impression and the aggregation of expert opinions into an overall “all things considered” assessment [EA · GW].

(Further gains can be had in complexifying and going back and forth over these steps and not just down the list. Also, gains may be had as a community using a process like ‘Evidential Reasoning’ referred to in here and perhaps mechanisms like that described here.)

Clear box expertise modelling

Main suggested heuristics for clear box expertise modelling

Let’s zoom in now on clear box modelling, the primary purpose of this post. How do we evaluate when others know more about a topic than ourselves? How do we compare experts? How can we know how much someone knows about a complex topic and how clear their thinking is about it?

Loosely inspired by AI theory, I believe that some good heuristic features to focus on are the following (see also this post [EA · GW] that makes some similar points):

• Data/unsupervised learning data quality(1, 2): How much data and feedback have these experts been exposed to? If they have short feedback loops over a long time period that are accurate (not noisy or irrelevant or statistically biased) and if the problem is simple then they probably have enough data for developing good explicit and intuitive models. Have they been exposed to alternative models and had their own thoughts subject to feedback and criticism? Also, if they have knowledge and skills in a closely related domain these may transfer to this topic. The AI analogue is the amount and quality of training data they have and roughly corresponds to the units of accuracy/virtual cycle.
• Incentives/motivation/supervised learning: How motivated are they towards understanding the topic? How motivated are they to share that knowledge without distortion? If their incentives are aligned with yours and they are paid to be accurate about the topic then there is a good chance they are motivated to give you the correct answer. This factor can be broken up into incentives to come to know the truth personally and incentives to convey what they believe accurately. The AI analogue is getting the right reinforcement, correctly labeled data, or having the right utility function and roughly corresponds to the units of cycles/cycles.
• Compute: How neurologically intelligent (the neurological contribution to IQ) are they, how creatively are they taking into account diverse perspectives, and how attentively focused on the task are they? This factor is a bit messy because of the neuroscience, but roughly corresponds to raw neurological speed, neurological parallelism, low level neurological wiring efficiency, neuroplasticity, and working memory size acting as a memory cache to speed things up. The AI analogue is having a lot of compute per unit time and corresponds to the units of cycles/second.
• Effective thinking: How good are they at thinking, in terms of rationality and meta-cognition, about the problems? If they don’t have good general methods to learn, think, and find mental errors then they are likely to be inefficient at figuring out the truth. Effective thinking methods are partially dependent on the topic. The AI analogue is having efficient good algorithms that approximate solutions well and don’t have systematic biases and roughly corresponds to the units virtual cycles/cycle.
• Time: How long have they been thinking about the topic at hand? Even given all the above, if they just haven’t had time to think about the question they may very well not give a good answer. The AI analogue corresponds to how deep a search process progresses and simply how much compute time has occured and corresponds to the units of seconds.

(note that the necessity of these heuristics will depend on the specific topic and its type of difficulty)

A Fermi pseudo equation (the mathematical version of pseudocode) to summarize this:

The importance of each factor would vary by topic. As a heuristic composed of heuristics, I think this is a good start.

Use of clear box expertise modelling

These factors can be used either in Fermi pseudo equation form, or as a checklist to compare experts and help ensure you consider all relevant factors. (See here for the usefulness of checklists.)

These heuristics can also be used constructively when trying to become an expert in a topic or when teaching others, as these are factors to optimize for in order to understand a topic.  They also give a sense of how much you know yourself in comparison (to others and in an absolute sense) so you can know how humble you should be and how much you have yet to learn.

Finally once you have evaluated the expertise of someone you can use that information in your truth finding processes which you in turn use to make decisions and achieve your goals and values.

In the spirit of providing models that people can interact with I have provided a simple online calculator for the expertise equation heuristic:

Expertise Calculator

(this is very much a rough draft calculator and, with its guessed weights, tries to cover the vast range of expertise from your dog Spot considering the topic for a moment to Einstein devoting his life to it)

My thanks to Ozzie Gooen [LW · GW], David Kristoffersson [LW · GW], Denis Drescher [EA · GW], Michael Aird [LW · GW], Marcello Herreshoff, Siebe Rozendal [EA · GW]Elizabeth [LW · GW], Dan Burfoot [LW · GW], Gregory Lewis [EA · GW], Spencer Greenberg, Shri Samson, Andres Gomez Emilsson [EA · GW], Alexey Turchin [LW · GW], and Remmelt Ellen [EA · GW] for reviewing and providing helpful feedback on the article.

comment by waveman · 2020-10-16T01:02:03.878Z · score: 6 (3 votes) · LW(p) · GW(p)
• Black box/outside view of the expert: This type of modelling would be just looking at the expert’s prediction accuracy in the past without asking about detailed properties of how they come to those decisions. Their prediction accuracy is ultimately what we want to get at but sometimes track records are incomplete or don’t exist yet.

[Worked out how to exit quote mode Pressing alt-enter 3 times works, today at least.]

You can do a lot better than this. Some signs of an expert from an outside perspective

1. Can predict the future better than simple extrapolations.
2. Can fix broken things better than everyman.
3. Can design and make things better than everyman.
4. Can explain things in a parsimonious way better than everyman,

All of the above need to take into account the possibility that luck played a part. For example if millions of people play the stock market and 29 get rich, then you need to take the large number of "attempts" in deciding whether the 29 have skill.

When you take this seriously it is astonishing now many 'experts' appear to have no skill at all.

comment by Mike G · 2020-10-17T20:51:44.739Z · score: 1 (1 votes) · LW(p) · GW(p)

Good blog.

How many fields offer "short feedback loops over a long time period that are accurate"?

In my field, K-12 teacher coaching, there is rarely data on teacher performance that isn't "noisy or irrelevant or statistically biased." Gates Foundation spent \$100 million to try to figure out this problem, but it proved thorny, both the stats and the politics.

Even in a data-loving field, like golf, with many competing experts (instructors) for hire, it's nearly impossible to know which ones actually generate the largest gains in their students. To use Waveman's point, it's hard to assess their relative skill at "fixing broken skills" because the data isn't available...even to the instructors.

What are the fields that best lend themselves to this sort of calculator?

comment by NunoSempere (Radamantis) · 2020-10-17T18:24:38.431Z · score: 1 (1 votes) · LW(p) · GW(p)

Very interesting! Your categorization into black box / clear box / social reputation seems like it's missing a level, and hence to me your names feel slightly off. I might instead think in terms of:

1. Clear box: I fact check some of the expert's claims, and estimate the accuracy of the claims I can't estimate based on the ones which I can. For example, [Ibn Tufail's](https://en.wikipedia.org/wiki/Ibn_Tufail) metaphysical claims might be difficult to refute, but his books also reference biological mechanisms which are easier to evaluate (e.g., men can be born from mud.) Similarly, if an expert claims some broad historical thesis, I can compare it to, e.g., Spain in the last centuries to see if it checks out.
2. Black box: I know the person's track record, or that the track record is good, but not which claims/accomplishments it's based on. For example, I know that Renaissance Technologies has a good track record in making money from the stock market and cultivating startups, even if I don't know how exactly they did it. Or I might know that someone is a super-forecaster without knowing what questions they have predicted to get there.
3. Proxies box (your clear box): I look at proxies for accuracy/track record. Some can be mechanistic: like skin in the game, alignment, computational power, time. But you can't look at, say, computational power or alignment directly (yet), so might have to look at correlational proxies for that, like prestigious university affiliations, big car, nice suit, English accent, brings up cogent and interesting points in a conversation, presentation skills, etc..
4. Deference pointer: I trust other people's assessment & status signals.

On 1., see Epistemic Spot Checks [LW · GW], and in particular this comment thread [LW(p) · GW(p)]. On 3., see Hanson's How to pick an X.