Learning Math in Time for Alignment

nicholas-heather-kross

Learning Math in Time for Alignment

post by Nicholas / Heather Kross (NicholasKross) · 2024-01-09T01:02:37.446Z · LW · GW · 5 comments

  Tangent (for large grantmakers and orgs only)
  The Challenge
None
5 comments

Epistemic status: Strong hunches, weakly held. At least some of this could be found false in experiments.

If you want to do technical AI alignment research, you'll need some amount of non-trivial math knowledge. It may be more theoretical [LW · GW], or with more ML/biology grounding [? · GW], but it'll definitely be math.

How do you learn all this math?

"Self-teaching" is almost a misnomer, compared to just "learning". I don't need to distill something for others, I only need myself to grok it. I may use distillation or adjacent techniques to help myself grok it, but like any N=1 self-experiment [LW · GW], it only needs to work for me. ^[1]

So then... what helps me understand things?

Formal rules that are written precisely
Wordy concepts that one could use in an essay

Math is technically the former, but real mathematicians (even the great ones!) actually use it more like the latter. That is, they use a lot of "intuition" built up over time.

You can't survive on intuition alone (unless you have the genetic improbability of Ramanujan's brain). And you can't survive on rigor alone (according to all bounded human minds doing math research). Heck, even learning rigorously/boring is nontrivial (since e.g. small errors are harder to correct when you're learning an alien system).

The Mathopedia [LW · GW] concept is, in many ways, the "wordy" version [LW · GW]. Viliam notes that math's "hardness" (i.e. objectivity) means you can't just teach it in the wordy version [LW · GW]. After all, there is generally one real canonical definition [LW(p) · GW(p)] for a mathematical object.

And yet... both Viliam and Yudkowsky say that math is fun when [? · GW] you know what you're doing. I kind of agree! I've had fun doing (what seemed like) math, at least twice in my life!

OK, so it's simple! Just make sure to understand everything thoroughly before moving to the next thing, and "play with the ideas" to understand them better.

Except... there's a problem.

AI timelines.

Giving children quality tutoring and new K-12 curricula [LW · GW] won't work even if we have 20 years before existentially-risky AI is used. 5 years is almost reasonable to learn deeply about a subfield or two, enough to make original contributions.

AI alignment, if it involves enough math to justify this post, requires deeper-than-average understanding, and possibly an ability to create entirely new mathematics.

And timelines might be as short as a year or two. ^[2]

Tangent (for large grantmakers and orgs only)

Why didn't MIRI or other groups prepare for this moment earlier? Why didn't MIRI say "OK, we have $X to fund researchers, and $Y left over, so let's put $Z towards hedging our short-timelines bets. We can do that using human enhancement and/or in-depth teaching of the relevant hard (math) parts. Let's do that now!"?

I think it's something like... MIRI had pre-ML-calibrated short timelines. Now they have post-ML short timelines. In both cases, they wouldn't think "sharpening the saw"-type strategies [LW · GW] worthwhile. And if short timelines are true now, then it's too late to use them.

Luckily, insofar as AI governance does anything, we can get longer timelines. And insofar as you (a large grantmaker or org with funds/resources to spare on hedging your timeline scenarios) have enough money to hedge your timeline bets, you should fund and/or set up such longer-term programs. If you put 80% credence in 5-year timelines, but you also control $100 million in funding (e.g. you're OpenPhil), then you should be doing math-learning [LW · GW] and intelligence enhancement programs!

The Challenge

So clearly, a person needs to be able to get deep understanding of lots of math (in backchaining [LW · GW]-resistant worlds, that means lots of math). Within a year or two. In time to, and with the depth needed to, come up with new good ideas.

This is the challenge.

This post is the first in (hopefully) a series of posts, as I learn and learn-how-to-learn some alignment-relevant math from a promising reading list [LW · GW].

If you want to join the challenge, I must note two things.

First, the field of pedagogy is filled with bad ideas, and even the best existing ones are rarely up to the challenge posed here ^[3]. I will probably write a post (or a few) speedrunning these [? · GW].

Second, like human intelligence enhancement, any sufficiently effective math-learning technique can count as a dangerous capability. So if you find something cool, be careful with it [LW · GW], and only share with those looking to use human enhancement for alignment research [LW(p) · GW(p)].

As I explore the challenge of learning math deep enough and quick enough, I'll hopefully make progress on it. It's on my mind a lot.

If you are not, personally, a world-class technical alignment researcher... it should be on your mind, too.

^{^}
Also, any tools (or notes) I generate in the course of learning can, later, be filtered for exfo [LW · GW] and then shared with others on alignment-specific or math-specific [LW · GW] websites).
^{^}
Or shorter, but at that point you may be bringing in more assumptions, above the existing assumptions needed for 1- or 2-year timelines.
^{^}
Rule of thumb: If you want it to be true, treat it with deep skepticism.

5 comments

Comments sorted by top scores.

comment by the gears to ascension (lahwran) · 2024-01-09T04:14:11.550Z · LW(p) · GW(p)

lots of this is written like assertions. how much of it do you know vs suspect? what parts are hunches to be tested?

Replies from: NicholasKross

↑ comment by Nicholas / Heather Kross (NicholasKross) · 2024-01-09T05:46:11.125Z · LW(p) · GW(p)

Good catch! Most of it is hunches to be tested (and/or theorized on, but really tested) currently. Fixed

comment by riceissa · 2024-01-10T03:07:59.737Z · LW(p) · GW(p)

I self-studied a bunch of math in 2017-2019 in order to do AI alignment research (specifically, agent foundations type stuff), and have a lot of thoughts about how to do it. Feel free to message me if you want to discuss.

Replies from: theanos@tutanota.com

↑ comment by Elias711116 (theanos@tutanota.com) · 2025-01-19T12:53:40.105Z · LW(p) · GW(p)

Hi Issa, have you written anything on this elsewhere? I'm interested in reading learning related content.

Replies from: riceissa

↑ comment by riceissa · 2025-01-19T19:46:28.378Z · LW(p) · GW(p)

I have, but my writings are pretty disorganized at the moment and probably hard for people to interpret without some sort of dialogue with me, which is probably why I invited Nicholas/Heather to message me (I no longer remember my exact thought process from a year ago when I wrote the grandparent comment). But regardless, here are some links that you can check out of learning-related content I have written (feel free to message me or reply to this comment if you want to talk more about this stuff):

Learning Math in Time for Alignment

Contents

Tangent (for large grantmakers and orgs only)

The Challenge

5 comments