What alignment-relevant abilities might Terence Tao lack?

towards_keeperhood

What alignment-relevant abilities might Terence Tao lack?

post by Towards_Keeperhood (Simon Skade) · 2025-04-07T19:44:18.620Z · LW · GW · 0 comments

  Background
  The Questions
None
No comments

40. "Geniuses" with nice legible accomplishments in fields with tight feedback loops where it's easy to determine which results are good or bad right away, and so validate that this person is a genius, are (a) people who might not be able to do equally great work away from tight feedback loops, (b) people who chose a field where their genius would be nicely legible even if that maybe wasn't the place where humanity most needed a genius, and (c) probably don't have the mysterious gears simply because they're rare. You cannot just pay $5 million apiece to a bunch of legible geniuses from other fields and expect to get great alignment work out of them. They probably do not know where the real difficulties are, they probably do not understand what needs to be done, they cannot tell the difference between good and bad work, [...]

-Eliezer in AGI ruin [LW · GW]

This question is about the capabilities that are needed for alignment research in the worlds where alignment is hard, so we need to solve alignment very robustly, for which the easiest path to success likely involves creating a new AGI paradigm where alignment is more feasible.

My guess is that Eliezer is likely right about that we cannot just pay a young supergenius to work on alignment and expect useful (hard-world) alignment progress to come out, but I'm wondering whether we might be able to train them to become capable in the relevant ways.

I'm not asking because of Terence Tao specifically - I think he's too old. I'm thinking about 2 other young supergeniuses, though I don't want to write their names here, mainly because there's opportunity cost to reaching out prematurely.^[1]

Background

Let's divide human intelligence into 2 (clusters of) subdimensions, and call them INT and WIS^[2]:

INT: working memory size, accuracy and speed of performing complex operations on working memory content, pattern recognition ability on working memory content, precise long-term memory. (Mostly the subdimensions that are measured through IQ tests.)
WIS: forming very deep models over long timescales where even tiny inconsistencies/confusions get noticed, ability to form good ontologies and find core cruxes in problems, precise intuitive Bayesian updating.

John von Neumann and Terence Tao can be seen as examples that sorta max out INT within the observed human variation, and Einstein (and IMO perhaps Eliezer) can be seen as examples that max out WIS.

The problem isn't that the suggestive power isn't big enough. The problem is that the verifier is broken.

-Eliezer in some podcast (but I forgot which)

Very roughly, I think INT maps to 'having high suggestive power' and WIS maps to 'having a good verifier'.

Also, while I agree that 'being able to judge what is progress from what is not' is the current bottleneck, I think we might also need higher suggestive power. (It would be awesome to have another Einstein, but in the hard worlds I'd guess he would be way too slow to solve it in 20 years.)

I think there likely exist trainable thinking techniques which strongly augment someone's effective WIS^[3], especially for people with very high INT, though I don't know how far out of reach such techniques are.^[4] We already have some^[5] such techniques, though often they are not that explicit, and even if they are, we often still lack good training exercises.

The Questions

The questions are mainly directed to competent agent-foundations-like^[6] researchers.

Let's assume an unrealistic best-case scenario: Say we have a 20-year old, motivated, and trustworthy^[7] Terence Tao, who carefully studies stuff like the sequences, gets mentored by (among others) Eliezer, and tries to work on the most important problems and improve his most important skills.

I basically want to get a better probability estimate for:

Would this Terence Tao become super-Einstein for alignment research and make a lot more useful progress than has yet been done?

I think a key crux for this is:

How much does good hard-world alignment research depend on learnable skills vs innate WIS?

I think a useful question to ask here is:

What are the core abilities you have that allow you to do useful progress?^[8] (Please include whatever comes to mind, whether it's a clearly learnable skill (like "whenever I have formed a hypothesis, I look for a counterexample") or an opaque dimension of your intelligence (like "important ideas/shower-thoughts often just seemingly randomly pop into my mind").)

I'm interested in thoughts on any of those questions. If you have thoughts on multiple questions, perhaps answer them in the reverse order of how I wrote them here.

(You can DM me your thoughts if you prefer to not post an answer publicly^[9].)

^{^}
Yes I think we're in the peculiar situation where there exist 2 young people who are likely roughly Terence-Tao-level, even though that's very rare. Both were not sane enough to start working on alignment so far, though they are both <=22y old. Also feel free to DM me in case you'd be willing to help with trying to effectively reach out to them.
^{^}
Which roughly but probably not exactly correspond to INT and WIS from Projectlawful [LW · GW].
^{^}
E.g. if both Einstein and John von Neumann went through dath ilani keeper training, I would guess John von Neumann would come out as far more competent. Even though historically I am more impressed with Einstein as a scientist.
^{^}
The techniques for augmenting WIS may work by using INT to a significant extent, so it's perhaps more like separately having a thinking-skill::WIS and a native::WIS and you're effective WIS is more like the maximum of those, rather than techniques adding WIS on whatever your native WIS is. INT might be a lot harder to train. So if we hypothetically had sufficiently good thinking techniques, native high-INT people would end up more competent. (Though INT might be augmentable through gene therapy, though obviously seems very hard.)
^{^}
E.g. in Eliezer's sequences (noticing confusion, noticing mysterious answers, holding off on proposing solutions, crisis of faith, the virtues of rationaltiy, defending against biases, ...), and some further ones from CFAR and Reamon, or Fermi estimate skills like Ryan Greenblatt does well (e.g. [LW(p) · GW(p)]).
^{^}
and also people like Steven Byrnes and Paul Christiano
^{^}
trustworthy = sane enough to keep dangerous AI capability insights secret. (And for further specification: Let's NOT assume that this Terence Tao was sane enough to just decide by himself to work on alignment, and rather that we needed to (first pay him and) carefully convince him, but that that was successful.)
^{^}
And maybe also: What are the relevant abilities that most people lack?
^{^}
E.g. in case you fear saying sth like 'Terence Tao couldn't do that research I did' may be perceived as status hacking.

0 comments

Comments sorted by top scores.

What alignment-relevant abilities might Terence Tao lack?

Contents

Background

The Questions

0 comments