LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

How to do conceptual research: Case study interview with Caspar Oesterheld
Chi Nguyen · 2024-05-14T15:09:30.390Z · comments (5)

[link] JumpReLU SAEs + Early Access to Gemma 2 SAEs
Senthooran Rajamanoharan (SenR) · 2024-07-19T16:10:54.664Z · comments (10)

SRE's review of Democracy
Martin Sustrik (sustrik) · 2024-08-03T07:20:01.483Z · comments (2)

Safety First: safety before full alignment. The deontic sufficiency hypothesis.
Chipmonk · 2024-01-03T17:55:19.825Z · comments (3)

4. Existing Writing on Corrigibility
Max Harms (max-harms) · 2024-06-10T14:08:35.590Z · comments (13)

[link] Constructive Cauchy sequences vs. Dedekind cuts
jessicata (jessica.liu.taylor) · 2024-03-14T23:04:07.300Z · comments (23)

Arguments for moral indefinability
Richard_Ngo (ricraz) · 2023-09-30T22:40:04.325Z · comments (16)

AI Pause Will Likely Backfire (Guest Post)
jsteinhardt · 2023-10-24T04:30:02.113Z · comments (6)

Three ways interpretability could be impactful
Arthur Conmy (arthur-conmy) · 2023-09-18T01:02:30.529Z · comments (8)

Sora What
Zvi · 2024-02-22T18:10:05.397Z · comments (3)

Some Experiments I'd Like Someone To Try With An Amnestic
johnswentworth · 2024-05-04T22:04:19.692Z · comments (33)

Extended Interview with Zhukeepa on Religion
Ben Pace (Benito) · 2024-08-18T03:19:05.625Z · comments (58)

[link] on neodymium magnets
bhauth · 2024-01-30T15:58:24.088Z · comments (6)

How do we know that "good research" is good? (aka "direct evaluation" vs "eigen-evaluation")
Ruby · 2024-07-19T00:31:38.332Z · comments (21)

2023 Prediction Evaluations
Zvi · 2024-01-08T14:40:07.377Z · comments (0)

What distinguishes "early", "mid" and "end" games?
Raemon · 2024-06-21T17:41:30.816Z · comments (22)

Environmental allergies are curable? (Sublingual immunotherapy)
Chipmonk · 2023-12-26T19:05:08.880Z · comments (10)

Demis Hassabis and Geoffrey Hinton Awarded Nobel Prizes
Anna Gajdova (anna-gajdova) · 2024-10-09T12:56:24.856Z · comments (14)

Mission Impossible: Dead Reckoning Part 1 AI Takeaways
Zvi · 2023-11-01T12:52:29.341Z · comments (13)

How to safely use an optimizer
Simon Fischer (SimonF) · 2024-03-28T16:11:01.277Z · comments (21)

[link] "If we go extinct due to misaligned AI, at least nature will continue, right? ... right?"
plex (ete) · 2024-05-18T14:09:53.014Z · comments (23)

[link] Five projects from AI Safety Hub Labs 2023
charlie_griffin (cjgriffin) · 2023-11-08T19:19:37.759Z · comments (1)

Value learning in the absence of ground truth
Joel_Saarinen (joel_saarinen) · 2024-02-05T18:56:02.260Z · comments (8)

[link] A Good Explanation of Differential Gears
Johannes C. Mayer (johannes-c-mayer) · 2023-10-19T23:07:46.354Z · comments (4)

Goodbye, Shoggoth: The Stage, its Animatronics, & the Puppeteer – a New Metaphor
RogerDearnaley (roger-d-1) · 2024-01-09T20:42:28.349Z · comments (8)

Caring about excellence
owencb · 2024-07-22T14:24:37.892Z · comments (4)

Chess as a case study in hidden capabilities in ChatGPT
AdamYedidia (babybeluga) · 2023-08-19T06:35:03.459Z · comments (32)

shortest goddamn bayes guide ever
lukehmiles (lcmgcd) · 2024-05-10T07:06:23.734Z · comments (8)

Fund Transit With Development
jefftk (jkaufman) · 2023-09-22T11:10:05.645Z · comments (22)

[link] Immortality or death by AGI
ImmortalityOrDeathByAGI · 2023-09-21T23:59:59.545Z · comments (30)

Critiques of the AI control agenda
Jozdien · 2024-02-14T19:25:04.105Z · comments (14)

Run evals on base models too!
orthonormal · 2024-04-04T18:43:25.468Z · comments (6)

[link] Can AI Outpredict Humans? Results From Metaculus's Q3 AI Forecasting Benchmark
ChristianWilliams · 2024-10-10T18:58:46.041Z · comments (2)

Assessment of intelligence agency functionality is difficult yet important
trevor (TrevorWiesinger) · 2023-08-24T01:42:20.931Z · comments (5)

LW UI features you might not have tried
Elizabeth (pktechgirl) · 2023-10-13T03:04:57.542Z · comments (6)

[question] Rationalist horror movies
Elizabeth (pktechgirl) · 2023-10-15T07:42:14.509Z · answers+comments (35)

Vaniver's thoughts on Anthropic's RSP
Vaniver · 2023-10-28T21:06:07.323Z · comments (4)

[link] MIRI's September 2024 newsletter
Harlan · 2024-09-16T18:15:40.785Z · comments (0)

Higher-effort summer solstice: What if we used AI (i.e., Angel Island)?
Rachel Shu (wearsshoes) · 2024-06-25T01:35:54.064Z · comments (9)

AI #33: Cool New Interpretability Paper
Zvi · 2023-10-12T16:20:01.481Z · comments (18)

[link] Metascience of the Vesuvius Challenge
Maxwell Tabarrok (maxwell-tabarrok) · 2024-03-30T12:02:38.978Z · comments (2)

Forecasting One-Shot Games
Raemon · 2024-08-31T23:10:05.475Z · comments (0)

[Intuitive self-models] 5. Dissociative Identity (Multiple Personality) Disorder
Steven Byrnes (steve2152) · 2024-10-15T13:31:46.157Z · comments (6)

On OpenAI’s Model Spec
Zvi · 2024-06-21T13:00:03.014Z · comments (3)

Enriched tab is now the default LW Frontpage experience for logged-in users
Ruby · 2024-06-21T00:09:30.441Z · comments (27)

Big Picture AI Safety: Introduction
EuanMcLean (euanmclean) · 2024-05-23T11:15:44.037Z · comments (7)

The Fragility of Life Hypothesis and the Evolution of Cooperation
KristianRonn · 2024-09-04T21:04:49.878Z · comments (6)

AI #75: Math is Easier
Zvi · 2024-08-01T13:40:05.539Z · comments (25)

[link] Bayesians Commit the Gambler's Fallacy
Kevin Dorst · 2024-01-07T12:54:59.939Z · comments (28)

AI Safety 101 : Capabilities - Human Level AI, What? How? and When?
markov (markovial) · 2024-03-07T17:29:53.260Z · comments (8)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

olli-jaerviniemi on Interest in Leetcode, but for Rationality?

This is a long answer, in which I list around ten concrete problem types that such a site could have.

Before I go into my concrete proposals, here are some general points:

I think the rationality community has focused too much on quantifying subjective uncertainty / probabilistic calibration, and too little on quantitative thinking and numeric literacy in general.
- The set of possible exercises for the latter is way larger and pretty unexplored.
- There are lots of existing calibration tools, so I'd caution against the failure mode of making Yet Another Calibration Tool.
  - (Though I agree with abstractapplic that a calibration tool that's Actually Really Good still doesn't exist.)
More generally, I feel like at least I (and possibly the rationality community at large) has gotten too fixated on a few particular forms of rationality training: cognitive bias training, calibration training, spotting logical fallacies.
- The low-hanging fruit here might be mostly plucked / pushing the frontier requires some thought (c.f. abstractapplic's comment).
Project Euler is worth looking as an example of a well-executed problem database. A few things I like about it:
- A comment thread for those who have solved the problem.
- A wide distribution of problem difficulty (with those difficulties shown by the problems).
- Numbers Going Up when you solve problems is pretty motivating (as are public leaderboards).
- The obvious thing: there is a large diverse set of original, high-quality problems.
- (Project Euler has the big benefit that there is always an objective numerical answer that can be used for verifying user solutions; rationality has a harder task here.)
Two key features a good site would (IMO) have:
- Support a wide variety of problem types. You say that LeetCode has the issue of overfitting; I think the same holds for rationality training. The skillset we are trying to develop is large, too.
- Allow anyone to submit problems with a low barrier. This seems really important if you want to have a large, high-quality problem set.
I feel like the following two are separate entities worth distinguishing:
- High-quantity examples "covering the basics". Calibration training is a central example here. Completing a single instance of the exercise would take some seconds or minutes at top, and the idea is that you do lots of repetitions.
- High-effort "advanced examples". The "Dungeons and Data Science" exercises strike me as a central example here, where completion presumably takes at least minutes and maybe at least hours.
- (At the very least, the UI / site design should think about "an average user completes 0-10 tasks of this form" and "an average user completes 300 tasks of this form" separately.)

And overall I think that having an Actually Really Good website for rationality training would be extremely valuable, so I'm supportive of efforts in this direction.

I brainstormed some problem types that I think such a site could include.

1: Probabilistic calibration training for quantifying uncertainty

This is the obvious one. I already commented on this, in particular that I don't think this should be the main focus. (But if one were to execute this: I think that the lack of quantity and/or diversity of questions in existing tools is a core reason I don't do this more.)

2: "Statistical calibration"

I feel like there are lots of quantitative statistics one could ask questions about. Here are some basic ones:

What is the GPD of [country]?
What share of [country]'s national budget goes to [domain]?
How many people work in [sector/company]?
How many people die of [cause] yearly?
Various economic trends, e.g. productivity gains / price drops in various sectors over time.
How much time do people spend doing [activity] daily/yearly?

(For more ideas, you can e.g. look at Statistics Finland's list here. And there just are various quantitative statistics floating around: e.g. today I learned that salt intake in 1800s Europe was ~18g/day [LW(p) · GW(p)], which sure is more than I'd have guessed.)

3: Quantitative modeling

(The line between this and the previous one is blurry.)

Fermi estimates are the classic one here; see Quantified Intuitions' The Estimation Game. See also this recent post [LW · GW] that's thematically related.

There's room for more sophisticated quantitative modeling, too. Here are two examples to illustrate what I have in mind:

Example 1. How much value would it create to increase the speed of all passenger airplanes by 5%?

Example 2. Consider a company that has two options: either have its employees visit nearby restaurants for lunch, or hire food personnel and start serving lunch at its own spaces. How large does the company need to be for the second one to become profitable?

It's not obvious how to model these phenomena, and the questions are (intentionally) underspecified; I think the interesting part would be comparing modeling choices and estimates of parameters with different users rather than simply comparing outputs.

4: The Wikipedia false-modifications game

See this post [LW · GW] for discussion.

5: Discourse-gone-astray in the wild

(Less confident on this one.)

I suspect there's a lot of pedagogically useful examples of poor discourse happening the wild (e.g. tilted or poorly researched newspaper articles, heated discussions in Twitter or elsewhere). This feels like a better way to execute what the "spot cognitive biases / logical fallacies" exercises aim to do. Answering questions like "How is this text misleading?", "How did this conversation go off the rails?" or "What would have been a better response instead of what was said here?" and then comparing one's notes to others seems like it could make a useful exercise.

6: Re-deriving established concepts

Recently it occurred to me that I didn't know how inflation works and what its upsides are. Working this through (with some vague memories and hints from my friend) felt like a pretty good exercise to me.

Another example: I don't know how people make vacuums in practice, but when I sat and thought it through, it wasn't too hard to think of a way to create a space with much less air molecules than atmosphere with pretty simple tools.

Third example: I've had partial success prompting people to re-derive the notion of Shapley value.

I like this sort of problems: they are a bit confusing, in that part of the problem is asking the right questions, but there are established, correct (or at least extremely good) solutions.

(Of course someone might already know the canonical answer to any given question, but that's fine. I think there are lots of good examples in economics - e.g. Vickrey auction, prediction markets, why price controls are bad / price gouging is pretty good, "fair" betting odds [LW · GW] - for this, but maybe this is just because I don't know much economics.)

7: Generating multiple ideas/interventions/solutions/hypotheses

An exercise I did at some point is "Generate 25 ideas for interventions that might improve learning and other outcomes in public education". I feel like the ability to come up with multiple ideas to a given problem is pretty useful (e.g. this is something I face in my work all the time, and this list itself is an example of "think of many things"). This is similar to the babble exercises [? · GW], though I'm picturing more "serious" prompts than the ones there.

Another way to train this skill would be to have interactive exercises that are about doing science (c.f. the 2-4-6 problem) and aiming to complete them as efficiently as possible (This article is thematically relevant.)

(Discussion of half-developed ideas that I don't yet quite see how to turn into exercises.)

8: Getting better results with more effort

Two personal anecdotes:

I used to play chess as a child, but stopped at some point. When I years later played again, I noticed something: my quick intuitions felt just as weak as before, but I felt like I was better at thinking about what to think, and using more time to make better decisions by thinking more. Whereas when I was younger, I remember often making decisions pretty quickly and not seeing what else I could do.
I did math olympiads in high school. Especially early on, some problems just felt fundamentally unapproachable to me - I just couldn't make any progress on them. Whereas nowadays when I encounter problems, in math or otherwise, I'm rarely stuck in this sense. "Oh, obviously if I just spent more time on this, I could figure this stuff out eventually"

A type of exercise where you are supposed to first give an initial answer after X time, and then are allowed to revise your answer for Y time, seems like it could train this and other skills. (Maybe brainstorming exercises of the form "if you had a week/month/year of time, how would you solve [problem]?" could help, too.)

9: I think there's something in the genre of "be specific", and more specifically in "operationalize vague claims into something that has a truth value", that'd be nice to have in large-quantity exercise form. See this post [LW · GW] for related discussion. I'm also reminded of this comment [LW(p) · GW(p)].

There are definitely things not covered by this list; in particular, I have little of directly training to apply all this in real life (c.f. TAPs [? · GW], which is definitely a very real-life-y technique). So while I did keep practicality in mind, I'd be happy to see exercises that bridge the theory-practice-gap even more.

Also, the Dungeons and Data Science [? · GW] and the stuff Raymond is doing [? · GW] are something to keep in mind.

bhauth on If far-UV is so great, why isn't it everywhere?

Apart from potential harms of far-UVC, it's good to remove particulate pollution anyway. Is it possible that "quiet air filters" is an easier problem to solve?

bogdan-ionut-cirstea on The case for unlearning that removes information from LLM weights

Here's a recent paper which might provide [inspiration for] another approach: Meta-Unlearning on Diffusion Models: Preventing Relearning Unlearned Concepts (though it seems at least somewhat related to the tamper-resistant paper mentioned in another comment).

cubefox on Taking nonlogical concepts seriously

But if is "It's a cat" and $q$ is "It has four legs", and $P$ describes our beliefs (or more precisely, say, my beliefs at 5 pm UTC October 20, 2024), then $P (q | p) > P (q)$ . Which surely means $p$ is a materially good reason for $q$ . But $p ⊬ q$ , so the inference from $p$ to $q$ is still logically bad. So we don't have logicism about reasons in probability theory. Moreover, probability expressions are not invariant under substituting non-logical vocabulary. For example, if $r$ is "It has two legs", and we substitute $q$ with $r$ , then $P (r | p) < P (r)$ . Which can only mean the inference from $p$ to $r$ is materially bad.

Laws of probability theory still impose a structure on relations between material concepts (there are still forms of monotonicity and transitivity), whereas the logical-expressivist order of explanation argues that the theoretician isn't entitled to a priori impose such a structure on all material concepts: rather, their job is to describe them.

I think the axioms of probability can be thought of as being relative to material conceptual relations. Specifically, the additivity axiom says that the probabilities of "mutually exclusive" statements can be added together to yield the probability of their disjunction. What does "mutually exclusive" mean? Logically inconsistent? Not necessarily. It could simply mean materially inconsistent. For example, "Bob is married" and "Bob is a bachelor" are (materially, though not logically) mutually exclusive. So their probabilities can be added to form the disjunction. (This arguably also solves the problem of logical omniscience, see here [LW(p) · GW(p)]).

rogerdearnaley on Interpreting the Learning of Deceit

A great paper highly relevant to this. That suggests that lying is localized just under a third of the way into the layer stack, significantly earlier than I had proposed. My only question is whether the lie is created before (at an earlier layer then) the decision whether to say it, or after, and whether their approach located one or both of those steps. They're probing yes-no questions of fact, where assembling the lie seems trivial (it's just a NOT gate), but lying is generally a good deal more complex than that.

momom2 on Against empathy-by-default

I was under the impression that empathy explained by evolutionary psychology as a result of the need to cooperate with the fact that we already had all the apparatus to simulate other people (like Jan Kulveit's first proposition).
(This does not translate to machine empathy as far as I can tell.)

I notice that this impression is justified by basically nothing besides "everything is evolutionary psychology". Seeing that other people's intuitions about the topic are completely different is humbling; I guess emotions are not obvious.

So, I would appreciate if you could point out where the literature stands on the position you argue against, Jan Kulveit's or mine (or possibly something else).
Are all these takes just, like, our opinion, man, or is there strong supportive evidence for a comprehensive theory of empathy (or is there evidence for multiple competing theories)?

rogerdearnaley on Interpreting the Learning of Deceit

That's a great paper on this question. I would note that by the midpoint of the model, it has clearly analyzed both the objective viewpoint and also that of the story protagonist. So presumably it would next decide which of these was more relevant to the token it's about to produce — which would fit with my proposed pattern of layer usage.

arjun-panickssery on Overcoming Bias Anthology

Yeah it's for the bounty. Hanson suggested that a list of links might be preferred to a printed book, at least for now, since he might want to edit the posts.

richard_kennaway on Open Thread Fall 2024

No, I just threw that in. But there is the VHEM, and apparently serious people who argue for anti-natalism.

Short of those, there are also advocates for "degrowth".

I suspect the reason that Zvi declined to engage with such arguments is that he thinks they're too batshit insane to be worth giving house room, but these are a few terms to search for.

towards_keeperhood on Overview of strong human intelligence amplification methods

I mostly expect you start getting more and more into sub-critical intelligence explosion dynamics when you exceed +6std more and more. (E.g. see second half of this other comment i wrote [LW(p) · GW(p)]) I also expect very smart people will be able to better setup computer-augmented note organizing systems or maybe code narrow aligned AIs that might help them with their tasks (in a way it's a lot more useful than current LLMs but hard to use for other people). But idk.

I'm not sure how big the difference between +6 and +6.3std actually is. I also might've confused the actual-competence vs genetical-potential scale. On the scale I used the drive/"how hard one is trying" also plays a big role.

I actually mostly expect this from seeing that intelligence is pretty heavitailed. E.g. alignment research capability seems incredibly heavitailed to me, though it might be hard to judge the differences in capability there if you're not already one of the relatively few people who are good at alignment research. Another example is how Einstein managed to find general relativity where the combined rest of the world wouldn't have been able to do it like that without more experimental evidence.
I do not know why this is the case. It is (very?) surprising to me. Einstein didn't even work on understanding and optimizing his mind. But yeah that's how I guess.