Learning Deep Learning: Joining data science research as a mathematician

post by magfrump · 2017-10-19T19:14:01.823Z · LW · GW · 3 comments

Contents

  Near term activities:

  Medium term goals:

  Long term goals:

None
3 comments

About two years ago I finished my PhD in mathematics on an obscure technical topic in number theory. I left academic math because I wanted to do something that had a bigger (i.e. any) impact on the world around me. I also wanted to get out of the extremely perverse academic job market.

Since then, I’ve designed taught courses on machine learning and am now working as a data scientist for a large company you’ve heard of (but not that one). In some respects I feel like my background in math better prepared me for this job than I can imagine a data science program doing—I think my desire for a higher burden of proof than p=.05 is one of the most important things I’ve brought to the table in all the projects I’ve touched. I’ve also gotten a lot out of my background on LessWrong, mostly because it’s the only place I’ve ever really studied statistics. You’d think you couldn’t get a math PhD without doing at least a little stats, but you’d be wrong.

Anyway there is one aspect of data science that I’m definitely behind the curve on, and that’s the software engineering side. In particular, as a mathematician I’m very read to grab a hold of a lot of abstraction and then tuck it into a black box and then reopen it whenever it doesn’t work exactly how I expect. But in the modern data science community, there are a ton of abstractions and they’ve only been boxed up inconsistently.

I read a lot about deep learning research, both on LessWrong and for work, and there are a ton of interesting experiments that I would like to replicate and a handful of original research ideas I’d like to try out, just in brief prototypes to see if they are even worth exploring or thinking about. To do this, what I want is something like the ability to write the following code:

model = pre_trained_alexnet()

GAN = generative_adversarial_network()
model.transfer(new_task())
GAN.aggrieve(model,new_task())

While it may be possible to do things like this, especially with libraries like keras, this isn’t where most introductions to deep learning start.

Frustratingly for me, they almost all start with a huge series of videos on how to implement backpropagation where they insist that it’s not really that bad if you don’t understand all the calculus. To me this feels like doing absolutely nothing—these parts of the process have already been well established and optimized beyond my ability to contribute meaningfully. What I want to work with is higher levels of abstraction, looking at what architectures perform well on which problems, what level of data augmentation improves accuracy and what level causes overfitting, etc.

Anyway I don’t get anywhere by sitting around complaining about how teaching is hard and not everyone is perfect at adapting to my unique situation as a student, so I’m focusing on self-improvement and I’m hoping to keep myself responsible about it for at least a couple of weeks by writing about my progress. With any luck this will end up being a useful resource for others in my circumstances but no promises.

So for now, I’ll outline a few near-term (the next week), medium term (by the end of the year), and longer term (within a couple of years) goals that I have for myself.

Near term activities:

Right now I am taking the coursera deep learning specialization. I’m not very synced up with the official course schedule, mostly because in these initial “compute derivatives and parrot definitions of bias and variance” stages I can get through 2-3 weeks of material in a day. I don’t feel like I’ve had a strong learning experience yet, though I’m optimistic that this will improve in later courses, but this does keep me focused and will be advantageous for future interviews since it reinforces basics and will give me a certificate I can put on a resume.

Medium term goals:

There are a ton of research papers I’d like to replicate the results of, but there are two in particular that I’d like to understand the connections between: ‘Deep Reinforcement Learning for Human Preferences’ by Christiano et. al., and ‘"Why should I trust you?" explaining the predictions of any classifier’ by Ribeiro et. al.

The first uses weakly directed reinforcement learning augmented by occasional sparse human feedback to build complicated reward functions. The second constructs model agnostic explanations for single classifications of supervised learners and dictates a mechanism for non-experts to select quality learners or potentially improve learners by rating their explanations for different predictions.

I’d like to use the process of the first on the second, to see if I can train a very poorly calibrated network on image classification or to see if I can use explanations to make such a network more robust to adversarial examples. Hopefully in the next 10 weeks I can learn enough basic tools for deep learning that setting up this kind of experiment is not so daunting.

Long term goals:

What I would really like to have is a nice setup of things like Atari environments for deep reinforcement learning, the ability to easily do apprenticeship and inverse reinforcement learning from this environment, and a familiar code base for building evolutionary algorithms and adversarial examples. My guess is that 80-90% of the codebase I want already exists, but that something like 30-40% of it is maintained on personal github pages by grad students and the libraries take significant work to compile.

I’ll try to continue this with 10-15 minute TIL updates until November, when my writing time will be dedicated to NaNoWriMo.

If anyone wants to follow along or talk on some Discord some time about studying, or if you have suggestions for good/better places to learn these sorts of things, please let me know!

3 comments

Comments sorted by top scores.

comment by α · 2017-10-20T05:16:12.296Z · LW(p) · GW(p)

For what it's worth I found the "re-implement backprop" to be extremely useful in developing a gears-level model of what was going on under the hood.

Andre Karpathy's "A Hacker's Guide to Neural Networks" is really good, and I think focuses on getting a good intuitive understanding of what's going on: https://karpathy.github.io/neuralnets/

I've also found Coursera and other MOOCs in the past somewhat watered down, but YMMV.

comment by sarahconstantin · 2017-10-20T20:17:23.419Z · LW(p) · GW(p)

Hi! Math PhD-turned-data-scientist here.

On resources: I agree that Karpathy's writeups are very helpful. Other useful resources:

Chris Olah's blog and his new interactive journal Distil are good explanations of why deep-learning tricks work.

Michael Nielsen's book on neural networks and deep learning is also good for conceptual understanding.

The Keras documentation is good. Armed with somebody else's Github code as a template, the Keras documentation, and a paper suggesting a variation on that project, you can try reimplementing your own version of the code (and training it on your own data) and that's probably the learning experience you want.

You're going to read a lot of papers, and then look up terms you don't understand and find papers or blog posts that explain them. It helps to have a system for saving and taking notes on useful papers and links. The field just isn't that consolidated yet, so a lot of good info is spread out this way and not captured in courses or textbooks.

I also recommend becoming familiar enough with Python and the basic libraries (panda, numpy, sklearn, matplotlib) so that data cleaning and basic exploratory data analysis is easy to do. Find a development environment that works for you. (I am happiest with Anaconda and doing everything in Jupyter notebooks.) If you're just now transitioning to coding regularly, you should be aware that initial setup/installation/environment woes happen to everyone, they are not a sign that you are unusually "bad with computers", and they will go away once you have found an effective work setup. If you haven't already done so, it's good to get an up-to-date laptop, and maybe set up access to higher-performance computers like an AWS account. (I spent a year blaming myself for terrible code performance before a friend diagnosed the problem: my ancient laptop.)

Replies from: magfrump
comment by magfrump · 2017-10-21T01:16:07.417Z · LW(p) · GW(p)

I've been doing machine learning for about 2.5 years now and using python for longer than that and I'm also a big jupyter notebook fan. I still have a bit of trouble reading other people's code almost always, what I'm really hoping is that I'll be able to dive into the keras documentation more as this undertaking moves along.

I'll check out the blogs also, thanks for the references!