I tried to learn as much Deep Learning math as I could in 24 hours

post by Phosphorous (everett-smith) · 2023-01-08T21:07:34.190Z · LW · GW · 2 comments

Contents

    Table of Contents
  Origins and Motivations
  Results
  Takeaways
  The Experiment Set-Up
  The Curriculum that I used
  Documentation on Hours
    Saturday
    Sunday
None
2 comments

TL:DR I designed an experiment where I committed to spend two 12 hour days trying to learn as much deep-learning math as possible, basically from scratch. 

Table of Contents

  1. Origins and Motivations [LW · GW]
  2. Results [LW(p) · GW(p)]
  3. Takeaways [LW(p) · GW(p)]
  4. Experiment set-up [LW · GW]
  5. The Curriculum [LW · GW]
  6. Documentation on hours [LW · GW]

Origins and Motivations

For a long time, I’ve felt intimidated by the technical aspects of alignment research. I had never taken classes on linear algebra or multivariable calculus or deep learning, and when I cracked open many AI papers, I was terrified by symbols and words I didn’t understand. 

7 months ago I wrote up a short doc about how I was going to remedy my lack of technical knowledge: I collected some textbooks and some online courses, and I decided to hire a tutor to meet a few hours a week. I had the first two weeks of meetings, it was awesome, then regular meetings got disrupted by travel, and I never came back to it. 

When I thought about my accumulating debt of technical knowledge, my cached answer was “Oh, that might take six months to get up to speed. I don’t have the time.


 

Then, watching my productivity on other projects over the intervening months, I noticed two things: 

  1. There appeared to be massive variance in my productivity. Sometimes, in a single day, I would get more done than I had accomplished in previous weeks. 
  2. I seemed to both enjoy and get more done by “sprinting” through certain projects, eg. by spending 10 hours on it in a single day, rather than spreading that same work out over 2 hours a week for 5 weeks. It was, for some reason, way more motivating and seemingly more efficient to sprint. 


 

Also, when I asked myself what I thought the main bottlenecks were for addressing my technical debt problem, I identified two categories: 

  1. Time (I felt busy all the time, and was afraid of committing too much to one project)
  2. A combination of lacking Motivation, Accountability and Fun


 

Then, as my mind wandered, I started to put 2 and 2 together: Perhaps these new things I had noticed about my productivity, could be used to address the bottlenecks in my technical debt? I decided to embark on an experiment: how much technical background on deep learning could I learn in a single weekend? My understanding of the benefits of this experiment were as follows: 

  1. Committing “a weekend” felt like a much smaller time cost than committing “a few months”, even if they were the same number of hours.
  2. No Distraction: I could design my environment to minimize distractions for two days, something it would be intractable to do to the same degree for several months.
  3. “Trying to learn as much as possible” felt like a challenge. I was, to be honest, pretty scared. I didn’t know what I was doing,  it felt extreme, but that also made it exciting and fun.
  4. I had some historical data that I might be good at this kind of sprinting, and framing this as an experiment to see what I could learn about my productivity added another layer of discovery-driven motivation and fun. What if I learned more about how to be productive and get hard things done via this experiment?
  5. As far as I knew, nobody else among my peers had done this - but I suspected that more people than me had the same problems, and that if I conducted this experiment, I might learn things that would be helpful to others, which added yet another layer of discovery-driven motivation and fun. 
  6. Accountability: Once I told somebody about this, it was hard to back out. It’s way easier for them to monitor me for a weekend than for a few months.

Results


 

Takeaways

The Experiment Set-Up

The Curriculum that I used

Intro to deep learning (I kept returning to these videos throughout the experiment, rewatching and understanding slightly more)


 

Linear algebra (this took me 2hr 25 mins, and ~36 mins of breaks)


 

Calc 3 (this took me 2hr 57 mins and ~50 mins of breaks)


 

- ResNets: https://www.youtube.com/watch?v=ZILIbUvp5lk (took me 18 mins) 

- RNNs (optional): https://www.youtube.com/watch?v=_aCuOwF1ZjU (took me 13 mins) 

- Transformers: https://www.youtube.com/watch?v=4Bdc55j80l8&t=609s

(I spent like, two hours on the above video which ex-post was not great. I would recommend others choose a different explainer on Transformers. )


 

- RL basics https://www.youtube.com/watch?v=JgvyzIkgxF0 (took me 25 mins)

- policy gradients / ppo: https://www.youtube.com/watch?v=5P7I-xPq8u8&t=318s

I could not understand the above video after rewatching it several times (I think the curricula skipped some prerequisites for this) so I had to have Thomas Larsen walk me through it on his own for around an hour. Thanks Thomas! 


 

- RLHF:  rob miles video: https://www.youtube.com/watch?v=PYylPRX6z4Q (took me 23 mins)


 

Documentation on Hours

(I used toggl track to record my time, and was fairly happy with the software. However, I made many errors / didn’t record breaks correctly, etc. So take these numbers with a grain of salt.)





 

Saturday

3b1b Video 1

28 min

Started around 10:00am

Video 1 Summarizing

15 min

 

???

7 min

 

3b1b Video 2

29 min

 

Break

4 min

 

Video 2 Summarizing

20 min

 

3b1b Video 3

13 min

 

Break

9 min

 

White-Boarding

4 min

 

Linnear Algebra - first four videos

58 min

 

Cleaning up notes

7 min

 

Chapter 4 Linnear Algebra

16 min

 

White boarding

14 min

 

Three Dimensional Linnear Transformations

14 min

 

???

12 min

 

Chapter 9

10 min

 

break

27 min

 

Chapter 13

14 min

 

Calculus?

3 min

 

Backpropagation, Chapter 4

10 min

 

break

9 min

 

Multivariable Calculus

1hr 6min

 

meditation

9 min

 

Multivariable Calculus

10 min

 

meditation

7 min

 

Multivariable Calculus

26 min

 

break

25 min

 

Multivariable Calculus

47 min

 

Multivariable Calculus

28 min

 

Watching Neural Nets Ch. 3 again

30 min

 

????

45 min

 

Trying to explain and failing

30 min

Ended around 8:45pm

Sunday

Rewatching Backpropagation

22 min

Started around 11am

Resnets Video

18 min

 

RNN's Video

13 min

 

Transformers

21 min

 

break

5 min

 

Transformers

36 min

 

RL Basics

25 min

 

break

23 min

 

More RL

23 min

 

Talking to Thomas about Transformers and Reinforcement Learning and PPO

120 mins

Ended around 6pm


 

2 comments

Comments sorted by top scores.

comment by Curt Tigges (curt-tigges) · 2023-01-08T23:31:10.258Z · LW(p) · GW(p)

This is a cool idea, and I have no doubt it helped somewhat, but IMO it falls prey to the same mistake I see made by the makers of almost every video series/online course/list of resources for ML math: assuming that math is mostly about concepts and facts.

It's only about 5% that. Maybe less. I and many others in ML have seen the same videos and remembered the concepts for a while too. And forgotten them, in time. More than once! On the other hand, I've seen how persistently and operationally fluent (especially in ML and interpretability) people become when they actually learned math the way it must be learned: via hundreds of hours of laborious exercises, proofs, derivations, etc. Videos and lectures are a small fraction of what's ultimately needed.

For most of ML, it's probably fine--you'll never need to do a proof or do more than simple linear algebra operations by hand. But if you want to do the really hard stuff, especially in interpretability, I don't think there's any substitute for cranking through those hours.

To be clear, I think this weekend was a great start on that--if you continue immediately to taking full courses and doing the exercises. I'm a top-down learner, so it would certainly help me. But unless it's practiced in very short order, it will be forgotten, and just become a collection of terms you recognize when others talk about them.

comment by Artie · 2023-02-20T19:04:53.925Z · LW(p) · GW(p)

thank you! I really liked the idea seeing it like an expriment! I will try to apply the same in building an unity idle game in 24h :)