Second-Order Rationality, System Rationality, and a feature suggestion for LessWrong

post by Mati_Roy (MathieuRoy) · 2024-06-05T07:20:10.178Z · LW · GW · 2 comments

Contents

  Second-order rationality
    Definition
    How to use this second-order knowledge?
    Factors that correlate with beliefs
      Other beliefs
      Values
      Pleasure
      Emotional state
      Intelligence / track-record
      Meta-beliefs
      Personality
      Environment
        Fact that interest in religion declines as societies get rich suggests the main gain from religion is to help deal with stressful situations that happen a lot more to the poor.
        Inside the body, but external to the mind
      Genes
  System rationality
    Definition
    Examples
    Information consumption
      Affecting beliefs
      Affecting values
    Simple experiment idea
    Feature suggestion for LessWrong
    Further ideas
  Annexes
    Background
    Another example
    Related article
None
2 comments

Second-order rationality

Definition

By “second-order rationality” (and intelligence) I mean the study of rationally reasoning about other people’s rationality (and intelligence) in order to inform us about the world.

This is opposed to evaluating a proposition at face value, using the first-order evidence supporting it.

Second-order rationality is about updating your beliefs just from understanding the distribution of different beliefs or belief histories, possibly by grouping them across populations with different characteristics, without referring to any first-order evidence related to the nature of the belief.

I think it’s an area worth exploring more.

What’s your probability that this is a useful area to study? You can use your own operationalization. For this exercise to work, you should record your prediction before continuing to read this article. This will get used as an example of system rationality at the end of this post.

Edit to add: Use this link

Note: I think I'm following the instructions from Embedded Interactive Predictions on LessWrong [LW · GW]—I don't know what the above forecasting widget doesn't seem to work; do you know? (Edit to add: See comment [LW(p) · GW(p)])

How to use this second-order knowledge?

I’m guessing many of you are already intuitively anxious about the Dangers of deference and trying to psychoanalyze people into knowing reality. Me too! That’s okay–we can use our first-order rationality to review the value of second-order rationality, theoretically and experimentally. 

Ultimately, I think second-order rationality can help with finding the hypotheses which are most valuable to verify at the object-level but cannot allow us to defer all the way.

And we already defer to a very large extent, at least in terms of deciding what's worth learning (anyone here avoided all human interactions since birth in order to rederive everything by themselves?[1])–this is just meant to study that practice.

Factors that correlate with beliefs

Other beliefs

Obliviously, beliefs will correlate with each other, often rationally so. For example, if you believe the pavement is wet, you’re more likely to believe it recently rained.

Values

For example, studying how values and beliefs correlate might help us correct for the just-world fallacy. Presumably, for rational agents, beliefs and values shouldn’t be correlated[2]. If they are, then that’s an indication of irrationality. (see:)

As an example of such an indication, it seems like very few people[3] believe that (biological) immortality is feasible / likely / unavoidable, whether through a religious afterlife or a techno-optimist future, and yet still think this would be undesirable (and, conversely, there should also be a bias for thinking it’s desirable and impossible)[4]. Note this isn’t a perfect example as the desirability of immortality still also entangles epistemic questions about the world which could rationally be correlated with one’s judgement on the feasibility of immortality.

Pleasure

For example, it seems like if we start with 2 rational humans that have the same values and beliefs, with 1 finding meat less tasty, then after reading an argument on the moral value of meat-composed beings, they should both update in the same way. If they don’t, then that’s indicative of a bias.

Emotional state

Belief updates obviously affect emotional states, but it seems likely that the opposite also happens.

Intelligence / track-record

That one is kind of obvious–answering correctly some questions is predictive of answering correctly other questions. But there are still interesting questions, such as how one’s answering-capabilities for one domain correlates with one’s answering-capabilities in another domain. This can also be applied to forecasting specifically–how can we predict a forecaster’s performance based on their past performance.

Meta-beliefs

Here are 2 examples.

In Against a General Factor of Doom from AI Impacts, Jeffrey Heninger starts with (emphasis added):

I was recently reading the results of a survey asking climate experts about their opinions on geoengineering. The results surprised me: “We find that respondents who expect severe global climate change damages and who have little confidence in current mitigation efforts are more opposed to geoengineering than respondents who are less pessimistic about global damages and mitigation efforts.” This seems backwards. Shouldn’t people who think that climate change will be bad and that our current efforts are insufficient be more willing to discuss and research other strategies, including intentionally cooling the planet?

In On the belief that beliefs should change according to evidence: Implications for conspiratorial, moral, paranormal, political, religious, and science beliefs, it was found that (emphasis added):

[Having an actively open-minded thinking style about evidence] correlates negatively with beliefs about topics ranging from extrasensory perception, to respect for tradition, to abortion, to God; and positively with topics ranging from anthropogenic global warming to support for free speech on college campuses. More broadly, the belief that beliefs should change according to evidence was robustly associated with political liberalism, the rejection of traditional moral values, the acceptance of science, and skepticism about religious, paranormal, and conspiratorial claims.”

Side note: Of course, that doesn’t say anything about the causality. One hypothesis would be that if you have lower raw cognitive capabilities, then it’s rational to rely more on traditions and instincts as you’re less likely to outperform those.

Personality

For example, although not directly related to belief but indirectly through intelligence, the abstract of Low Correlations between Intelligence and Big Five Personality Traits: Need to Broaden the Domain of Personality says:

The correlations between the measures of cognitive abilities and personality traits are known to be low. Our data based on the popular Big Five model of intelligence show that the highest correlations (up to r = 0.30) tend to occur with the Openness to Experience. Some recent developments in the studies of intelligence (e.g., emotional intelligence, complex problem solving and economic games) indicate that this link may become stronger in future. Furthermore, our studies of the processes in the “no-man’s-land” between intelligence and personality suggest that the non-cognitive constructs are correlated with both. These include the measures of social conservatism and self-beliefs. Importantly, the Big Five measures do not tap into either the dark traits associated with social conservatism or self-beliefs that are known to be good predictors of academic achievement. This paper argues that the personality domain should be broadened to include new constructs that have not been captured by the lexical approach employed in the development of the Big Five model. Furthermore, since the measures of confidence have the highest correlation with cognitive performance, we suggest that the trait of confidence may be a driver that leads to the separation of fluid and crystallized intelligence during development.

Environment

All the above factors were internal: looking at how some part of one’s mind influences their beliefs.

Another lens through which we can look at that is by how environmental factors influence beliefs. Of course, this will just be updating one’s belief through one of the above internal paths, but it still seems helpful to study this indirect influence.

For example, Robin Hanson says in a series of tweets [5] (emphasis added):

Fact that interest in religion declines as societies get rich suggests the main gain from religion is to help deal with stressful situations that happen a lot more to the poor.

"No atheists in foxholes" seems to confirm that theory.

Some say that rich folks know more and thus know that religion can't be right. But rich folks seem to similarly believe in the supernatural, and very few know enough to understand well why religion can't be right.

Some say that religion produces more trust, which old societies needed more. But in fact people today are MORE trustworthy, even with weak religion and even when our institutions don't punish defections.

Some say that woke, green, etc. movements are just as religious, but note that they are less supernatural and offer less comfort or protection to the poor in difficult situations.

Inside the body, but external to the mind

We can also consider our blood composition as part of the environment (from the perspective of the mind). For example, blood sugar, hormones, and drugs seem to have an impact on our cognition, and possibly also on our fundamental values.

Genes

This is also an indirect cause rather than directly related to another internal process in one’s mind, but can similarly correlate with rationality.

System rationality

Definition

Incentives for being rational will naturally correlate with being rational, by assumption.

This is qualitatively different from the factors mentioned above as that’s something that’s done intentionally to elicit and evaluate beliefs, as well as aggregating them into a “superorganism”. System rationality is a form of "intentional second-order rationality".

Examples

For example, in general, we should presumably update that someone is saying something true the more costly it would be for them to say something false. Here are some articles about eliciting true beliefs:

At the aggregated level, the prototypical example of this category is a prediction market. The idea is that you should update your beliefs simply based on people’s willingness to bet on their beliefs–everything else staying a black box

Information consumption

Affecting beliefs

Epistemic status: This is an embryonic idea I have, which I find very appealing, and got me thinking about this post in the first place.

I would like to know for people that were/are “like” me (operationalization TBD) how they end up updating their mind. Imagine we had some systematic ways to keep track of our beliefs and the information we were consuming, and from that we could determine what information is most likely to update one’s beliefs. Something more systemic and personalized than “check the Facebook posts of people you remember having historically done that”. This might make our information consumption much more efficient. This might give us a deliberation elevator to rapidly climb up a deliberation ladder. I would particularly like having that when consuming information about AI alignment.

Affecting values

In a Facebook post, Kat Woods suggest an hypothesis for how the order in which we consume information might influence our values (initial emphasis added):

I find it very disconcerting that I, and maybe you, might have had a completely different moral stance if somebody had switched the order of the original trolley problem.

When I was first asked if I'd pull the lever to kill the one and save the many, I said yes. Then I was asked whether I'd push the fat man off the bridge, and I realized that to be consistent, I'd have to say yes, so I said yes. And then, more or less, I became a consequentialist / utilitarian.

But what I'd been first asked if I'd push the fat man? 80% chance I'd have said no. And then I would be asked about the lever, realize that to be consistent, I'd have to say no as well, and I put an 80% chance I would have stayed consistent and said no.

Does anybody know of any studies about this? Do you think you'd have answered differently depending on the order? What other philosophical thought experiments really depend on the ordering? And if so, what do you do with that?

Simple experiment idea

Here’s an example of an experiment that could be run to study “Update flow”.

We have a group of people attend a day-long study of AI x-risks.

To track beliefs, we ask the participants to report their beliefs on N short-term forecasting questions related to AI x-risks which will be scored using a proper scoring rule [? · GW]. The scoring rule also incentivizes participants to update their predictions as soon as they update their beliefs.

Participants are free to read any article from a large library of articles. Once they pick an article, they must read it, report the link, and update their predictions (if applicable).

We have 2 groups:

Then we check how helpful that mechanism was.

That’s just a simple idea for an experiment with a lot of potential for variants.

(It's also the personal pain point I'm trying to solve as there's so much to read on that topic, and I wish I had a better idea on what to prioritize.)

Feature suggestion for LessWrong

This should also work for predictions that might never get a ground truth, or that aren’t well operationalized. You’re still incentivized to report your true predictions because that’s how the system will know what articles to recommend you and which other users to partner-match you to.

Hence, we can revisit our initial question:

What’s your probability that this is a useful area to study? (You can use your own operationalization)

This could become a common practice on LessWrong, and this could be used to predict how much a given article will make a given user update based on how much it made different users update (ex.: weighted based on how similar their historical updating behavior has been to the given user). This could be used to inform users on what to read, as an additional metric along with karma.

If some questions are frequently used across different article, we could also create models to predict what you would believe "if you read all of LessWrong" or "all articles with that question". I'm aware this idea needs fleshing out, but I figured I'd share a first version, and if reception seemed good, then possibly work on it more or allow someone else to do that.

Further ideas

Maybe there’s the possibility for something more ambitious, like a browser extension that tracks what you read, LLMs used to aggregate the core ideas that make people update their beliefs, allowing for freeform reporting of belief update, etc.

Annexes

Background

The goal of this post was to share the broad idea along with some examples I’ve collected since I first thought of this concept on 2020-09-20.

Another example

From Superforecasting: Summary and Review:

Superforecasting: The Art and Science of Prediction is dedicated to understanding these superforecasters and exploring how an average person might become one of them. In Superforecasting, Philip Tetlock and Dan Gardner tease out a number of important qualities of superforecasters:

  • Philosophic Outlook = Cautious, Humble, Nondeterministic
  • Thinking Style = Open-Minded, Intelligent and Curious, Reflective, Numerate
  • Forecasting Style = Pragmatic, Analytical, Dragonfly-Eyed, Probabilistic, Thoughtful Updaters, Intuitive Psychologist
  • Work Ethic = Growth Mindset, Grit

Related article

I had a section listing hypotheses on the advantages of not being more intelligent and rational from an evolutionary perspective, but that didn't seem central to this post, so I'll keep that for a potential future post.

  1. ^

     I sometimes imagine that’s what I’d want in Utopia–but not fully what I want here 😅

  2. ^

    The ratio that believes is feasible and infeasible should be the same for those that believe it’s desirable and undesirable, and vice versa.

  3. ^

    As an exception, I know someone who was relieved when they stopped believing in religion because of zir fear of Hell.

  4. ^

    That’s partly why in the book “Ending Aging”, Aubrey de Grey focuses on the feasibility of anti-aging and doesn’t discuss its desirability.

  5. ^

    I think there are other plausible explanations. For example, maybe religion did play the role of trust, and modern society has now replaced this with something better. But I’m sharing that for the general line of thinking about how the environment correlates with beliefs.

2 comments

Comments sorted by top scores.

comment by jimrandomh · 2024-06-05T14:39:13.245Z · LW(p) · GW(p)

The Elicit integrations aren't working. I'm looking into it; it looks like we attempted to migrate away from the Elicit API 7 months ago and make the polls be self-hosted on LW, but left the UI for creating Elicit polls in place in a way where it would produce broken polls. Argh.

I can find the polls this article uses, but unfortunately I can't link to them; Elicit's question-permalink route is broken? Here's what should have been a permalink to the first question: link.