Interest in Leetcode, but for Rationality?

post by Gregory (gregory-eales) · 2024-10-16T17:54:25.578Z · LW · GW · 3 comments

This is a question post.

Contents

  Answers
    36 abstractapplic
    12 Olli Järviniemi
    8 Raemon
    4 RomanHauksson
    3 Julius
    2 ideasthete
    2 Cole Wyeth
    2 Julius
    2 scarcegreengrass
    1 keltan
None
3 comments

the following is motivated by:

I've been a long time lurker on Less Wrong and I've noticed the recurring criticism that despite its focus on rationality, the community lacks structured training to develop practical rationality skills. Eliezer Yudkowsky talks rationality as a martial art, because it's something that can be trained and refined through deliberate practice. But where is our dojo?

A model that comes to mind is a website like LeetCode, where programmers can solve coding challenges, share solutions, and see how others approach the same problems. LeetCode can sometimes encourage overfitting to specific problem types so it's not a perfect analogy. The community driven aspect would interesting to me as you can see how other people approach the problem. Could something similar be adapted for rationality?

Imagine a platform where, instead of solving coding puzzles, users engage with problems designed to train rational thinking. Here are a few types of problems that might fit:

  1. Cognitive Bias Detection: Users could review novel, real-world scenarios and try to identify what cognitive bias or logical fallacy is present. The goal would be to train pattern recognition for biases without simply memorizing common examples. For instance, a scenario might subtly include a case of confirmation bias or anchoring, and users would need to spot it.
  2. Calibration Training: One of the most important skills in rationality is aligning your confidence with reality. For each problem or scenario, users could submit a confidence interval along with their answer. This serves as a double-training: users practice assessing their certainty, and over time, they get feedback on how well-calibrated they are.
  3. Bite-Sized, Practical Challenges: The focus should be on small, actionable exercises rather than lengthy theoretical discussions. For example, a problem might ask users to predict an outcome based on limited data, forcing them to confront the common pitfalls of overconfidence or representativeness heuristics.

This kind of platform could be a place where people practice and refine their skills, not just absorb entertaining ideas in way that some say is weakly applicable. 

"identify the bias" type problem for a prototype i'm working on

I have a few years of experience in Software Engineering (backend and ML) and have been thinking about building a tool like this for my own use. However, if others would find it valuable, I'd be open to expanding it into something that the wider community could use as well. It could even present an opportunity to create a sustainable project with some potential financial benefits along the way. I'd love to hear if there’s interest in such a platform and what features might be most helpful to include.

Answers

answer by abstractapplic · 2024-10-16T21:56:44.607Z · LW(p) · GW(p)

I am extremely interested in this, and all similar efforts in this space. I agree our community should be doing much more along these lines.

Regarding your specific ideas:

Cognitive Bias Detection

Something about training people to categorize errors - instead of just making good decisions - rubs me the wrong way. Also, there's a lot of pre-existing work (I found out about this earlier today).

Calibration Training

The Credence Calibration Game exists. So does my variation on the same idea (see also the associated lesson plan [LW · GW]). So do play-money and real-money prediction markets. That said, I do think there's a valuable and unfilled niche for something which doesn't require a download and has a nice user interface and has a four-digit number of questions and lets you check your answers immediately (. . . though I don't know how many people other than me would consider it valuable).

Bite-Sized, Practical Challenges

I am very much in favor of this, to the point where I'm already (tentatively) planning to (eventually) build some games with a similar motivation. Relatedly, the "ask users to predict an outcome based on limited data" example sounds like a description of that genre I invented [? · GW] (though "Bite-Sized" suggests you're thinking in terms of something much more polished/generally-accessible).

(Side note: A subtle benefit of the "Practical Challenges" approach is that it can correct for biases you weren't aiming for. A large part of my motivation for making D&D.Sci was "forcing them to confront the common pitfalls of overconfidence or representativeness heuristics"; I found that a Lesswronger working in a Data Science context will more often be insufficiently confident, and place too little weight on surface appearances; my endeavor 'failed' gracefully and people got a chance to notice those errors instead (plus various other problems I didn't even consider).)

-

I look forward to seeing what comes of this. If you want anything playtested, please let me know.

comment by Gregory (gregory-eales) · 2024-10-17T08:09:26.108Z · LW(p) · GW(p)

I appreciate the reply! 

Something about training people to categorize errors - instead of just making good decisions - rubs me the wrong way

Are you able to pinpoint exactly what gives you this feeling? The goal of this problem type would be to train the ability to recognize bias to the point where it becomes second nature, with the hope that this same developed skill would also trigger in your own thought processes. I believe it’s generally easier to evaluate the truthfulness of a statement than to come up with one initially, so this training would help make the "biased thought detector" more accurate. 

Relatedly, the "ask users to predict an outcome based on limited data" example sounds like a description of that genre I invented (though "Bite-Sized" suggests you're thinking in terms of something much more polished/generally-accessible).

That’s really cool! I definitely see the value in multi-step case study problems, as they would require more complex reasoning than smaller bite-sized problems might. Themed problems could make the process much more engaging as I think this kind of training can get a bit dull with overly generic examples. Combining the depth of case studies with the accessibility of simpler exercises might strike a nice balance. 

I look forward to seeing what comes of this. If you want anything playtested, please let me know.

Definitely will take you up on this! I'm working on the prototype and should have something simple in the next few weeks. I'm considering starting a sequence to document the progress to get more visibility, interest, and immediate feedback.

Replies from: abstractapplic, ChristianKl
comment by abstractapplic · 2024-10-17T11:02:13.156Z · LW(p) · GW(p)

Are you able to pinpoint exactly what gives you this feeling?

 

Less a single sharp pinpoint, more a death of a thousand six cuts:

  • The emphasis on learning the names of biases is kinda guessing-the-teacher's-password-y.
  • You'd need to put forth an unusual effort to make sure you're communicating the subset of psychological research which actually replicates reliably.
  • Any given bias might not be present in the student or their social/business circle.
  • The suggested approach implies that the set of joints psychologists currently carve at is the 'best' one; what if I happen to see Bias A and Bias B as manifestations of Bias C?
  • I worry some students would round this off to "here's how to pathologize people who disagree with me!" training.
  • Like I said, this is the kind of fruit that's low-hanging enough that it's mostly already picked.

All that said, I still think this is potentially worthwhile and would still playtest it if you wanted. But I'm much more excited about literally every other idea you mentioned.

comment by ChristianKl · 2024-10-19T12:24:43.029Z · LW(p) · GW(p)

The goal of this problem type would be to train the ability to recognize bias to the point where it becomes second nature, with the hope that this same developed skill would also trigger in your own thought processes. 

Part of what rationality is about is that you don't just hope for beneficial things to happen. 

Cognitive bias is a term that comes out of the psychology literature and there were plenty of studies in the domain. It's my understanding that in academia nobody found that you get very far by teaching people to recognize biases.

Outside of academia, we have CFAR that did think about whether you can get people to be more rational by giving them exercises and came to the conclusion that those exercises should be different.

In a case like this, asking yourself "What evidence do I have that what I hope will actually happen?" and "What sources, be it academic people or experts I might interview, could give me more evidence?" would be much more productive questions than "What things in my thought process might be labeled as biases?"

answer by Olli Järviniemi · 2024-10-20T21:36:18.203Z · LW(p) · GW(p)

This is a long answer, in which I list around ten concrete problem types that such a site could have.


Before I go into my concrete proposals, here are some general points:

  • I think the rationality community has focused too much on quantifying subjective uncertainty / probabilistic calibration, and too little on quantitative thinking and numeric literacy in general.
    • The set of possible exercises for the latter is way larger and pretty unexplored.
    • There are lots of existing calibration tools, so I'd caution against the failure mode of making Yet Another Calibration Tool.
      • (Though I agree with abstractapplic that a calibration tool that's Actually Really Good still doesn't exist.)
  • More generally, I feel like at least I (and possibly the rationality community at large) has gotten too fixated on a few particular forms of rationality training: cognitive bias training, calibration training, spotting logical fallacies.
    • The low-hanging fruit here might be mostly plucked / pushing the frontier requires some thought (c.f. abstractapplic's comment).
  • Project Euler is worth looking as an example of a well-executed problem database. A few things I like about it:
    • A comment thread for those who have solved the problem.
    • A wide distribution of problem difficulty (with those difficulties shown by the problems).
    • Numbers Going Up when you solve problems is pretty motivating (as are public leaderboards).
    • The obvious thing: there is a large diverse set of original, high-quality problems.
    • (Project Euler has the big benefit that there is always an objective numerical answer that can be used for verifying user solutions; rationality has a harder task here.)
  • Two key features a good site would (IMO) have:
    • Support a wide variety of problem types. You say that LeetCode has the issue of overfitting; I think the same holds for rationality training. The skillset we are trying to develop is large, too.
    • Allow anyone to submit problems with a low barrier. This seems really important if you want to have a large, high-quality problem set.
  • I feel like the following two are separate entities worth distinguishing:
    • High-quantity examples "covering the basics". Calibration training is a central example here. Completing a single instance of the exercise would take some seconds or minutes at top, and the idea is that you do lots of repetitions.
    • High-effort "advanced examples". The "Dungeons and Data Science" exercises strike me as a central example here, where completion presumably takes at least minutes and maybe at least hours.
    • (At the very least, the UI / site design should think about "an average user completes 0-10 tasks of this form" and "an average user completes 300 tasks of this form" separately.)

And overall I think that having an Actually Really Good website for rationality training would be extremely valuable, so I'm supportive of efforts in this direction.


I brainstormed some problem types that I think such a site could include.

1: Probabilistic calibration training for quantifying uncertainty

This is the obvious one. I already commented on this, in particular that I don't think this should be the main focus. (But if one were to execute this: I think that the lack of quantity and/or diversity of questions in existing tools is a core reason I don't do this more.)

2: "Statistical calibration"

I feel like there are lots of quantitative statistics one could ask questions about. Here are some basic ones:

  • What is the GPD of [country]?
  • What share of [country]'s national budget goes to [domain]?
  • How many people work in [sector/company]?
  • How many people die of [cause] yearly?
  • Various economic trends, e.g. productivity gains / price drops in various sectors over time.
  • How much time do people spend doing [activity] daily/yearly?

(For more ideas, you can e.g. look at Statistics Finland's list here. And there just are various quantitative statistics floating around: e.g. today I learned that salt intake in 1800s Europe was ~18g/day [LW(p) · GW(p)], which sure is more than I'd have guessed.)

3: Quantitative modeling

(The line between this and the previous one is blurry.)

Fermi estimates are the classic one here; see Quantified Intuitions' The Estimation Game. See also this recent post [LW · GW] that's thematically related.

There's room for more sophisticated quantitative modeling, too. Here are two examples to illustrate what I have in mind:

Example 1. How much value would it create to increase the speed of all passenger airplanes by 5%?

Example 2. Consider a company that has two options: either have its employees visit nearby restaurants for lunch, or hire food personnel and start serving lunch at its own spaces. How large does the company need to be for the second one to become profitable?

It's not obvious how to model these phenomena, and the questions are (intentionally) underspecified; I think the interesting part would be comparing modeling choices and estimates of parameters with different users rather than simply comparing outputs.

4: The Wikipedia false-modifications game

See this post [LW · GW] for discussion.

5: Discourse-gone-astray in the wild

(Less confident on this one.)

I suspect there's a lot of pedagogically useful examples of poor discourse happening the wild (e.g. tilted or poorly researched newspaper articles, heated discussions in Twitter or elsewhere). This feels like a better way to execute what the "spot cognitive biases / logical fallacies" exercises aim to do. Answering questions like "How is this text misleading?", "How did this conversation go off the rails?" or "What would have been a better response instead of what was said here?" and then comparing one's notes to others seems like it could make a useful exercise.

6: Re-deriving established concepts

Recently it occurred to me that I didn't know how inflation works and what its upsides are. Working this through (with some vague memories and hints from my friend) felt like a pretty good exercise to me.

Another example: I don't know how people make vacuums in practice, but when I sat and thought it through, it wasn't too hard to think of a way to create a space with much less air molecules than atmosphere with pretty simple tools.

Third example: I've had partial success prompting people to re-derive the notion of Shapley value.

I like this sort of problems: they are a bit confusing, in that part of the problem is asking the right questions, but there are established, correct (or at least extremely good) solutions.

(Of course someone might already know the canonical answer to any given question, but that's fine. I think there are lots of good examples in economics - e.g. Vickrey auction, prediction markets, why price controls are bad / price gouging is pretty good,  "fair" betting odds [LW · GW] - for this, but maybe this is just because I don't know much economics.)

7: Generating multiple ideas/interventions/solutions/hypotheses

An exercise I did at some point is "Generate 25 ideas for interventions that might improve learning and other outcomes in public education". I feel like the ability to come up with multiple ideas to a given problem is pretty useful (e.g. this is something I face in my work all the time, and this list itself is an example of "think of many things"). This is similar to the babble exercises [? · GW], though I'm picturing more "serious" prompts than the ones there.

Another way to train this skill would be to have interactive exercises that are about doing science (c.f. the 2-4-6 problem) and aiming to complete them as efficiently as possible (This article is thematically relevant.)


(Discussion of half-developed ideas that I don't yet quite see how to turn into exercises.)

8: Getting better results with more effort

Two personal anecdotes:

  • I used to play chess as a child, but stopped at some point. When I years later played again, I noticed something: my quick intuitions felt just as weak as before, but I felt like I was better at thinking about what to think, and using more time to make better decisions by thinking more. Whereas when I was younger, I remember often making decisions pretty quickly and not seeing what else I could do.
  • I did math olympiads in high school. Especially early on, some problems just felt fundamentally unapproachable to me - I just couldn't make any progress on them. Whereas nowadays when I encounter problems, in math or otherwise, I'm rarely stuck in this sense. "Oh, obviously if I just spent more time on this, I could figure this stuff out eventually"

A type of exercise where you are supposed to first give an initial answer after X time, and then are allowed to revise your answer for Y time, seems like it could train this and other skills. (Maybe brainstorming exercises of the form "if you had a week/month/year of time, how would you solve [problem]?" could help, too.)

 

9: I think there's something in the genre of "be specific", and more specifically in "operationalize vague claims into something that has a truth value", that'd be nice to have in large-quantity exercise form. See this post [LW · GW] for related discussion. I'm also reminded of this comment [LW(p) · GW(p)].


There are definitely things not covered by this list; in particular, I have little of directly training to apply all this in real life (c.f. TAPs [? · GW], which is definitely a very real-life-y technique). So while I did keep practicality in mind, I'd be happy to see exercises that bridge the theory-practice-gap even more.

Also, the Dungeons and Data Science [? · GW] and the stuff Raymond is doing [? · GW] are something to keep in mind.

answer by Raemon · 2024-10-17T02:28:21.713Z · LW(p) · GW(p)

FYI I'm working on an angle on this. One of my dreams is to make a proper website, but for now it's been more efficient to assemble a collection of puzzles and exercises that various other people have built, and layer rationality training exercises on top of them.

My agenda is written up in the Feedbackloop-first Rationality sequence [? · GW]. The basic idea is that rationality is bottlenecked on inventing better feedbackloops that train the actually important skills. (You can look over the "exercises" section)

My general strategy has been to take existing puzzles/exercises that have a fair amount of depth, such that in order to solve it you're going to need to:

  • make a plan for gaining more information about the puzzle
  • make a plan for acting on that information

Which naturally lends itself well to practicing the skills of:

comment by Gregory (gregory-eales) · 2024-10-17T08:43:10.022Z · LW(p) · GW(p)

Thanks for sharing this! I’ve read Feedbackloop-first Rationality, and it’s definitely contributed why I want to build something like this. I’ve even been looking for Thinking Physics style problems that might be free to use online. Getting a diverse and quality set of interesting problems I think will be difficult whether its aggregated, crowdsourced, or possibly AI generated. 

My agenda is written up in the Feedbackloop-first Rationality sequence. The basic idea is that rationality is bottlenecked on inventing better feedbackloops that train the actually important skills. (You can look over the "exercises" section)

This will be very important because if a tool like this can be used to practice something hundreds of times. It should of course be something you actually want to reinforce.  If designed poorly, it could become a waste of time or, worse, actually harm one's thinking. I'll definitely take a look at more of these exercises.

  • My general strategy has been to take existing puzzles/exercises that have a fair amount of depth, such that in order to solve it you're going to need to:
  • make a plan for gaining more information about the puzzle
  • make a plan for acting on that information

It might be good to have an answer template that guides users through these kinds of thinking steps. This could help build the habit of thinking systematically. With so many problem types, though, it’s challenging to settle on effective schemas that could apply to multiple problems.

Replies from: Raemon
comment by Raemon · 2024-10-17T16:20:11.526Z · LW(p) · GW(p)

Yeah I'm basic using the lens of my cognitive bootcamp [LW · GW] series to iron out the pedagogy here. I try to write up LW posts for all the key takeaways and exercises, although it takes awhile.

answer by RomanHauksson · 2024-10-16T21:18:00.590Z · LW(p) · GW(p)

I would be interested in this!

Related: an organization called Sage maintains a variety of calibration training tools.

answer by Julius · 2024-10-17T05:25:04.949Z · LW(p) · GW(p)

Another place that's doing something similar is clearerthinking.org

answer by ideasthete · 2024-10-17T04:36:38.467Z · LW(p) · GW(p)

I like this leetcode style aspect of the idea. Maybe if you identify the "Blind 75" of cognitive biases, that might be a good start. Or take practice problems from this course: https://callingbullshit.org/. Maybe if you identify which problems you want to train on, you can use an LLM to continuously rewrite them to prevent users from simply memorizing an answer and force them to think critically. There's several ways to implement this sort of thing and I can easily imagine the high school version of myself falling down this rabbit hole. 

Not to self promote too much, but I created a demo of a similarly inspired idea that I call newsbetting at https://www.rashomonnews.com. The idea was to use betting mechanisms to create skin in the game and help news consumers identify their own biases when reading the news. Maybe you can include this betting mechanism as a proxy for your confidence interval. 

Regardless, I would very much like to see the outcome of this sort of project and I wish you the best!

answer by Cole Wyeth · 2024-10-17T02:04:36.182Z · LW(p) · GW(p)

I would be very interested to see what you come up with!

answer by Julius · 2024-10-16T23:07:05.950Z · LW(p) · GW(p)

I like this idea and have wanted to do something similar, especially something that we could do at a meetup. For what it's worth, I made a calibration trivia site to help with calibration. The San Diego group has played it a couple times during meetups. Feel free to copy anything from it. https://calibrationtrivia.com/

comment by Gregory (gregory-eales) · 2024-10-17T07:35:01.930Z · LW(p) · GW(p)

Thanks! I have seen a similar tool like this before and enjoyed it quite a bit. I’d love to know where you source the trivia data, especially if it is available for open use. Also could be interesting to tailor to some functionality for meetups as well.

Replies from: julius-1
comment by Julius (julius-1) · 2024-10-22T00:00:02.323Z · LW(p) · GW(p)

I originally had an LLM generate them for me, and then I checked those with other LLMs to make sure the answers were right and that weren't ambiguous. All of the questions are here: https://github.com/jss367/calibration_trivia/tree/main/public/questions

answer by scarcegreengrass · 2024-10-16T21:24:58.548Z · LW(p) · GW(p)

I think this sounds fun! The versions of this i'd be most likely to use would be:

  • Puzzling over scenarios of satisfying complexity. There could be numerical details, selection bias, unreliable narrator obstacles, cases where users with different values might disagree, etc. Even if the scenario-poster is arguably wrong about the right answer, that could still be interesting.
  • Scenarios that you puzzle over & then read a comment section about. Scenarios that you remember & talk about with friends later.
  • User-submitted anecdotes from their real lives. This is oddly similar to Reddit's 'Am I the Asshole' threads, but with a focus on becoming more clearheaded & unbiased. Users could sometimes ask for testable predictions about what will happen next, then report back later. So if the pictured scenario came from real life, Maria might ask users how many times Jake will be late in the next 6 months.
  • Philosophy-esque thought experiments.
  • Scenarios that do indeed benefit my thinking or expand my perspective. Perhaps by improving my mental statistics skills, or exposing me to perspectives of people with very different lives, or demonstrating little-known math subtleties like Simpson's paradox. One failure mode for this would be scenarios like the more boring HR-training courses, where the story doesn't contain any knowledge you don't already know.
answer by keltan · 2024-10-21T09:57:22.084Z · LW(p) · GW(p)

I imagine a character (Alice) is constantly used as the rational actor in scenarios. We make Alice a likeable character, give her a personality, a series of events and decisions that lead her to the present.

Then, when the user has been around for a sufficient amount of time. Alice starts to slip. She makes mistakes that harm others, perhaps she has disputes with ‘Stupidus’, Maybe she just begins to say untrue things.

How long will it take a user to pry themself out of the rose tinted glasses, and update on Alice?

3 comments

Comments sorted by top scores.

comment by romeostevensit · 2024-10-17T23:21:15.511Z · LW(p) · GW(p)

I've thought about this for a long time and I think one of the big issues is lack of labelled training data in many domains. E.g. people made calibration toys and that helped a lot for that particular dimension. Ditto the tests on which studies replicated. In many cases we'd want more complex blinded data for people to practice on, and that requires, like in games, someone to set up all the non-fun backend for them.

Replies from: Raemon
comment by Raemon · 2024-10-18T00:15:22.932Z · LW(p) · GW(p)

What is an example of a type of complex blinded data that you'd be imagining here?

Replies from: romeostevensit
comment by romeostevensit · 2024-10-18T00:20:25.010Z · LW(p) · GW(p)

like the calibration game but for a variety of decision problems, where the person has to assign probabilities to things at different stages based on what information is available. Afterwards they get an example brier score based on the average of what people with good prediction track records set at each phase.