Rage Against The MOOChine

post by Borasko · 2021-08-07T17:57:43.858Z · LW · GW · 12 comments

[This was a review of Andrew Ng's Machine Learning course for a math and computer science guide I was making that ended up spiraling into a rant against MOOC's (massive open online course), I left it un-posted for a while but I think it remains mostly true what the problems are with the current MOOC paradigm.]  

Course: Machine Learning, Andrew Ng

I started out excited for this course and left disappointed with a changed outlook on online learning. 

I’ll start off saying that I liked the theory section in this course. I think Andrew did a good job of explaining the theory of why he was doing the things he was doing. I liked the explanations of supervised and unsupervised learning. The later chapters on recommender systems and principal component analysis were also very interesting. 

But the problem is that other than a brief overview of the skills there was not much other depth to the course. The course glazed over math topics like how to find the partial derivative of the cost function for gradient descent. As somebody without a rigorous math background I was fine with that because I don’t know how to do partial differential equations. Then we got to programming in Octave, during the lecture he said it was better this way because it’s faster and it doesn’t really pick sides of which data science programming language you should use. The programming exercises being in octave could be overlooked if the programming exercises were good, but sadly they aren't.

The programming assignments don’t teach you anything, at least not anything I could at this point say is useful. Usually they are some form of transcribing the loss function of the model you are learning from its mathematical representation to it’s coding representation. With everything usually given to you as a hint at the top of the problem in comments, or it’s vectorized form mentioned in the document you are given to work through the problem. 

Which made me feel like I was mostly doing this for each coding assignment:       

It was trivial, and it felt trivial. All of the actual hard part of making the model was usually made for you. The data being collected, cleaned, visualized, loaded, split and then sent to a function where you just type the loss function pretty much as described. The only time I really ran into problems with getting code to run is when I didn’t understand Octave syntax. I started thinking towards the middle point of the course about how much it matters if I know the Octave syntax if I am never planning on using it again. Why bash my head into the learning of esoteric Octave features if i will just relearn all this in python? Why not just do it in python to begin with? 

I think the main reason it’s not in python is mainly due to the course being older. Andrew Ng’s most recent specializations are done in python and I think that's because it's the industry leader now. I understand it would be too much to rebuild the course from the ground up but I do believe that people learning Octave for practical machine learning from the course are mostly wasting their time. It would be better to try to implement the coding assignments in python using numpy and PyTorch to test your own learning and getting used to manipulating vectors and matrices in python. I was scrolling through the reviews of this course and one of the more glowing reviews said they fondly remember looking at Octave documentation for hours to try and get their code to work. I couldn't help but think, “what a waste of time”.

So the coding assignments were kinda sucky and required outside work to be practical in the modern day machine learning environment, even though that's rough at least the theory is good right? Yeah, it is good. Andrew Ng is a good teacher, with nice explanations of one topic flowing into the next. But like with the programming assignments the further I got into the course I started to ask myself, “what am I really doing here?”. 

The problem is math. Going over principle component analysis and anomaly detection, Andrew Ng did a great job of brushing over the math components to get to a more broad overview of how these things work. Which is great for getting that broad overview, but not useful either. As for stuff like PCA or calculating loss functions, there are already some modules in PyTorch that will just do that for you. I learned in one of my other MOOC’s that modules that do calculations for you are usually better than writing your own (other than figuring out implementations), purely for reasons that usually they are made by practitioners to be as quick and as computationally efficient as possible. So the reason for learning the theory of anomaly detection, PCA, gradient descent and all other machine learning algorithms would then be to have a baseline of that knowledge and build off it. But...

( I typed “Andrew Ng math memes” into google for this image, there are many like it. )

Math is hard. AI and machine learning have been getting a lot of hype in popular culture, with recent interesting releases making people like me want to join the field and contribute. The problem is, I’m assuming, a large chunk of us have not developed a good enough math skill set but still want to get involved where all the cool stuff is happening. This means that when making courses teachers will be incentivized to make courses which are inclusive to the vast amount of people wanting to join ML but not knowing much (or almost any) math, because if they don’t the courses that are inclusive to those lacking higher level math skills will be making the money and not them. This is also why, I’m assuming, many machine learning courses include an “intro to python” section as well. And I’m going to be pessimistic and assume the teaching of python isn’t to get those with many years in the software development field up to date with python. It's to seemingly lower the bar for entry, giving the assuming learner the idea that math and ml and python can all be learned in a big bundle together which usually isn’t the case. (Which it theoretically could, but it would need to be a really really really big book or course).

Math is vitally important in this field. Since all the lectures in this course contained math and these models only function and make sense because of math, it seems like the course is then doing students a disservice by not being more rigorous with math and in my opinion is worse for future wannabe practitioners for brushing over it. But again I understand why it is this way because many new students don’t want to deal with the long road to being math proficient and the teachers of all courses of this type don’t want to limit their applicant pool. 

This is all to say I don’t think learning PCA or anomaly detection was good for me with my limited math skills. I enjoy having a broad idea of what they do, but with no background of how those methods came to be and how they could be built off of in the future I realized I was goodhearting myself by learning the theory provided in this course. I wasn't building strong foundations of knowledge that would help me in future rigorous applications, I was building little islands of knowledge with no coherent whole. I was optimizing for learning the course content because that's what I thought would help me become a machine learning engineer. This applied to the coding exercises as well.

But the more I thought about the more I started to see this trend in almost all of the MOOC courses I’ve taken. I was goodhearting myself almost the whole time. I had the autodidact equivalent of tires being stuck in mud, I was moving through the courses fast, but also getting almost nowhere. It was a disappointing realization, if I sound bitter it’s because I am. I don’t know if other online learners feel the same but it is disheartening. I still like MOOCs as a concept, I think they could really help with practical understanding, but the current way isn’t doing it. The current ML MOOC landscape is filled with courses that are too coddling and shallow for anybody that wants real working knowledge. These bad courses are made intentionally or unintentionally by what I believe are perverse incentives by educators, who are operating on the misguided wants of the educatee.  There are exceptions but they are few and far between and this course isn’t one of them.

I don’t want to optimize for the wrong thing, I want to gain skills and work on hard problems. I think the modern learning landscape sucks because of these misaligned incentives. I can complete the course and get a little certificate which looks cool to my layman peer group, but if I were put in front of a raw dateset right now and told to build a machine learning model I wouldn't do a very good job (if I even could make one), and I don’t like that. 

So I’m gonna accept that this experiment didn’t work out. I tried to learn computer science by MOOC’s but I don’t feel comfortable with what I’ve learned even though I’ve been through 7 different courses by a handful of different websites by now. I’m going to reevaluate my courses and build a better guide to learn math and computer science that will actually build long lasting and cumulative skills. 


Reflection: After leaving this post alone for a month and reading it again I still think it's correct. I think most academic institutions online or not have people optimize for the degree / certificate / accreditation rather than practical knowledge.  Partly because people don't know when starting a program what would be practical knowledge and things in the field change all the time so accreditation it seems is usually just a rough proxy for, "this person probably knows enough in this field to be useful". 

I am sympathetic to how hard it is to build useful courses, and how hard it is to understand if a student actually knows something or can just regurgitate it and then will forget it immediately after. Its uncharitable to MOOC's to say only they have the problem of lacking usefulness while they were directly made after how institutional classes are. But the lack of student interaction and tangent environment (clubs, professor interaction, etc) does leave MOOC's more limited for a useful comprehensive learning environment.

I think the biggest problem MOOC's seem to have is it's lack of advancing structure. A degree has set classes increasing in difficultly so the higher level classes know that the student is prepared (or at least passed the lower level classes) to be there. If Andrew Ng's course was the 5th course behind a calc, linear algebra, statistics and probability theory course it would feel like a nice payoff. But the course assumes nothing of its learner and suffers for it. The universities that offer degree like multi-class MOOC's usually have an application process and cost thousands of dollars, which is prohibitive but understandable as they usually give a degree that doesn't say "online" on it, and for societal reasons "real" accreditation can't come cheap.

I think a solid education curriculum for math and computer science could be made to be less expensive and much more coherently, and since neither the learners or the educators have any incentive to change how MOOC's fit together or function for the better, I decided to make a better guide for learning. You can it here as a GitHub read-me. I spent a lot of time on it but its still in its infancy, it uses mostly textbook sprinkled with the MOOC classes I think are / will be actually useful. I would greatly appreciate any comments to make it better, especially what resources would be good for self learning Machine Learning / Deep Learning with rigorous math. Either way I will keep it updated for the autodidact who wants to learn Math and CS a more rigorous way, no matter where they start. I obviously can't offer accreditation so hopefully it will be helpful for people who want to optimize for pure competency in the field. I'm staking my own future on it so I have every incentive for this guide to be as good and useful as possible.  

12 comments

Comments sorted by top scores.

comment by johnswentworth · 2021-08-07T20:09:25.757Z · LW(p) · GW(p)

I recommend looking for "open course ware" (e.g. here or here) rather than MOOCs. These are usually course materials from the classes taught in-person at a university, which means they have actual prerequisites, and usually aren't subject to the MOOC-esque pressure to dumb everything down to the most accessible possible level. (The latter link above includes the version of Ng's ML course taught at Stanford, back in the day.)

comment by Alex Mikhalev (alex-mikhalev) · 2021-08-08T11:57:04.405Z · LW(p) · GW(p)

I think the most important point of teaching maths for data science is to build mental models in the data scientists head. It takes time and part of the process of learning maths and usually takes 2 years (or two years course in university). Bypassing that process backfires - startups raising money for AI/ML normally take 2 years before shipping the product. 

I think the mental model part is the most difficult to teach, but obviously, we are paid for specialised skills - like coding in python, hence everyone wants to jump into coding python without putting effort into learning maths and building proper mental models, which I think is wrong. The coding part is easy - I am comfortable with Octave/Matlab and Python or Rust. I would back Andrew Ng for choosing Octave for example - it's one of the most concise ways to get ML concepts working, although I disliked it when I was a student, then I tried to translate the following code into Python:

%% Main loop

while(gen<maxgen)
  gen
  % perform uniform selection, then intermediate crossover
  chx=zeros(lamda,nvar);   % create storage space for new chrom
  sigx=zeros(lamda,nvar);  % create storage space for new sigmas
  for n=1:lamda            % loop over all population
    if(rand<xovr)      % roll the dice and see if we need to cross
      alpha=rand(1,nvar)*1.5-0.25; % create vector of random
                                   % then crossover
      chx(n,:)=alpha.*chrom(mod(n-1,p)+1,:)+(1-alpha).*chrom(ceil(rand*p),:);
      sigx(n,:)=abs(alpha.*sig(mod(n-1,p)+1,:)+(1-alpha).*sig(ceil(rand*p),:));
    else
      chx(n,:)=chrom(mod(n-1,p)+1,:);   % just copy if not crossing
      sigx(n,:)=sig(mod(n-1,p)+1,:);
   end
  
  end
comment by Teja Prabhu (teja-prabhu-1) · 2021-08-07T18:44:03.962Z · LW(p) · GW(p)

I myself did some of Andrew Ng's courses, and I understand where you're coming from. Although this was several years ago, but I do remember Octave!

I saw your guide: https://github.com/Simon-Holloway/Full_Math_CS_Guide I just want to say: Real Analysis is overkill in my opinion if your goal is to simply become an AI researcher. Also, I personally like Karpathy's advice (which seems like it should radically alter your guide):

How to become expert at thing: 
1 iteratively take on concrete projects and accomplish them depth wise, learning “on demand” (ie don’t learn bottom up breadth wise) 
2 teach/summarize everything you learn in your own words 
3 only compare yourself to younger you, never to others

https://twitter.com/karpathy/status/1325154823856033793?s=20

Also, just as a side note, even mathematicians do this in research if not for basic things like real analysis:

Most big number theory results are apparently 50-100 page papers where deeply understanding them is ~as hard as a semester-long course. Because of this, ~nobody has time to understand all the results they use—instead they "black-box" many of them without deeply understanding.
https://twitter.com/benskuhn/status/1419281155074019330

 

Replies from: alex-mikhalev
comment by Alex Mikhalev (alex-mikhalev) · 2021-08-08T12:09:31.187Z · LW(p) · GW(p)

I agree that nothing bets practical projects, but in modern life, you need to learn a lot of background information before jumping into the real world. There are plenty of ML projects and examples that are equivalent to the ToDo (12-factor app) in complexity - single component, boundaries clearly defined. The next steps in the real world would be - here is a payment platform with 270+ services and components, how does your AI/ML component fit into it? Who do you talk to to figure out the business value of the AI/ML component (business analysis/domain driven design)? How do you talk to your creative colleagues who are responsible for user experience in a productive manner ( i.e. jobs to be done )? 

I see this gap quite consistently and I am trying to address it on the technical side by building medium size AI/ML project with 3 pipelines http://thepattern.digital/ and I think modern ML/AL professionals need to know things above before jumping into any real-world project. 

comment by philip_b (crabman) · 2021-08-08T11:46:32.826Z · LW(p) · GW(p)

There are a lot of very good resources to learn ML that are accessible for free. Here I list some of them.

  • Introduction to Applied Linear Algebra: Vectors, Matrices, and Least Squares - Boyd 2018. It's a not-very-deep (but much deeper than your average MOOC) linear algebra textbook with the focus on ML and ML-adjacent applications. For instance, it covers k-means clustering, least squares data fitting, least squares classification. Boyd is known for being the authour of the best introductory textbook on convex optimization, hence this textbook is probably good as well.
  • Linear algebra and learning from data - Strang 2019. This one assumes some knowledge of linear algebra and teaches how to use all that on contemporary computers efficiently and also it teaches many ML and ML-adjacent methods using that knowledge, including (stochastic) gradient descent, neural networks, backpropagation, some statistics. Strang is known for being the author of one of the best linear algebra textbooks "Introduction to linear algebra", hence this textbook is probably good as well. There is also a free MIT opencourseware course https://ocw.mit.edu/courses/mathematics/18-065-matrix-methods-in-data-analysis-signal-processing-and-machine-learning-spring-2018/ that follows this textbook.
  • The elements of statistical learning. Data mining, inference, and prediction 2nd edition - Hastie and Tibshirani 2008. This one covers A LOT OF ML, although all of it is pre-neural-network-revolution. Most exercises are of form "prove this theorem" rather than "implement this in code".
  • Pattern recognition and machine learning - Bishop 2006
  • Probabilistic Machine Learning: An Introduction - Murphy 2021
  • Deep learning - Goodfellow, Bengion, and Courville 2016
  • AI: a modern approach 4th edition - Russell, Norvig 2020

Some of these are free but most are not. However, you can pirate them off libgen.

comment by habryka (habryka4) · 2021-08-07T20:35:19.796Z · LW(p) · GW(p)

Mod note: I cleaned up the formatting a bit. 

comment by CharlesRW · 2021-08-08T15:18:06.643Z · LW(p) · GW(p)

Fwiw, my experience with MOOCs/OCW  has been extremely positive (mainly in math and physics). Regarding the issue of insufficient depth, for a given 'popular topic' like ML, there are indeed strong incentives for lots of folks to put out courses on these, so there'll be lots to choose from, albeit with a wide variance in quality - c.f. all the calculus courses on edX.

That said, I find that as you move away from 'intro level', courses are generally of a more consistent quality, where they tend to rely a lot less on a gimmicky MOOC structure and follow much more closely the traditional style of lecture --> reading --> exercises/psets.  I find this to be true of things that are about equivalent to courses you'd take in ~3rd year uni - e.g.  "Introduction to Operating System Scheduling" as opposed to the "Introduction to Coding" courses or whatever the equivalent is for your area of study.  If you start to look at more advanced/specific courses, you may find the coverage quality improves.

I definitely can't promise this would work for ML, but I've found it useful to think about what I want to learn from a course (concepts?  technique?  'Deep understanding'?) before actually searching for one.  This provides a good gauge that you can usually figure out if a course meets within a few minutes, and I think may be especially relevant to ML where courses are fairly strongly divided along a tradeoff of 'practice-heavy' vs 'theory-heavy'.

It's worth noting though, that my physics/math perspective might not be as valid in learning ML, as in the former the most effective way to learn is more-or-less by following some traditional course of studies, whereas the latter has a lot of different websites which teach both in technique and theory at a basic-intermediate level; I'd be surprised if they, combined with projects, were less effective than MOOCs at the level for which they're written.  As others have noted, it may be worth looking at OpenCourseWare - MIT's being the most extensive that I'm aware of.  They also offer a 'curriculum map' so you can get a feel for which courses have prerequisites and what the general skillset of someone taking a given class should be.

comment by Olomana · 2021-08-08T07:18:08.189Z · LW(p) · GW(p)

I have taken a few MOOCs and I agree with your assessment.

MOOCs are what they are.  I see them as starting points, as building blocks.  In the end, I'd rather take a free, dumbed-down intro MOOC from Andrew Ng at Stanford, than pay for an in-person, dumbed-down intro class from some clown at my local community college.  At least there's no sunk cost, so it's easy to walk away if I lose interest.

comment by johfst · 2021-08-08T03:18:11.784Z · LW(p) · GW(p)

I just finished Andrew Ng's course as well, and had a similar experience to you. I do have a math background, so in retrospect it was probably a mistake to take it, but I saw it recommended so highly by people. I think the main value I got from it was the heuristics for debugging models and such, but I'm left wondering how many of those are even still relevant.

I'm still trying to learn ML though, so I'll take a look at your CS+ML guide. I remember trying fastai a few months ago and I felt there like I wasn't learning much there either, again other than debugging heuristics. I also don't like their special library, because I can't remember which things are part of the library and which are just pytorch (they're essentially teaching you two libraries at once, plus all the ML concepts--it's kind of a lot to keep in your head). Maybe I'll take another crack at it.

If you want another guide to pull from, I was following this one a few months ago. It stood out to me from the millions of other "86 bajillion books to learn computer science NOW" lists online because they intentionally limited it to a few subjects, and give their reasoning for each choice (and the reason some other popular books may be bad choices). It's much more CS focused, rather than programming focused, which is why I'm not following it now, but I plan to return to it when I actually have a job :)

comment by jacopo · 2021-08-07T21:39:02.609Z · LW(p) · GW(p)

For info, you can find most of the exercises in python (done by someone else than Ng) here. They are still not that useful: I watched the course videos a couple of years ago and I stopped doing the exercises very quickly. 

I agree with you on both the praise and the complaints about the course. Besides it being very dated, I think that the main problem was that Ng was neither clear nor consistent about the goal. The videos are mostly an non-formal introduction to a range of machine learning techniques plus some in-depth discussion of broadly useful concepts and of common pitfalls for self-trained ML users. I found it delivered very well on that. But the exercises are mostly very simple implementations, which would maybe fit a more formal course. Using an already implemented package to understand hands-on overfitting, regularization etc. would be much more fitting to the course (no pun intended). At the same time, Ng kept repeating stuff like "at the end of the course you will know more than most ML engineers" which was a very transparent lie, but gave the impression that the course wanted to impart a working knowledge of ML, which was definitely not the case.

I don't know how much this is a common problem with MOOCs. It seems easily fixable but the incentives might be against it happening (being unclear about the course, just as aiming for students with minimal background, can be useful in attracting more people). Like johnswentworth I had more luck with open course ware, with the caveat that sometimes very good courses build on other ones with are not available or have insufficient online material.

comment by Jozdien · 2021-08-07T19:41:20.752Z · LW(p) · GW(p)

I agree with your points on practical programming in the course, but I also think that's not even Andrew Ng's core intent with his courses.  As Teja Prabhu mentioned in his comment, learning through taking on projects of your own is a method that I can't think of many good alternatives to, as far as practical usage goes.  But getting there requires that you cast a wide net breadth-wise to at least know what's possible and what you can use, in machine learning.  You can, and probably will, learn the math depth-wise as you try working on your own projects, but to get there?  I think he throws just the right amount of technical math at you.  Trying to fit all the math involved in all the different ML methods he covers, from the ground up, is probably infeasible as anything but a year-long degree, and you don't need that to start learning it yourself depth-wise.

That, and a working understanding of ML theory are what I think his primary intent is, with his courses.  I did his Deep Learning specialization a couple months ago, and while the programming is slightly more hands-on there, it's still massively aided by hints and the like.  But he even says in one of those videos that the point of doing the programming exercises is only to further your understanding of theory, not as practice for building your own projects - writing code from scratch for a predefined goal in a course wouldn't be a great way of motivating people to learn that stuff.  Incidentally, this is why I think MOOCs for learning programming actually are pointless.

comment by Alex Mikhalev (alex-mikhalev) · 2021-08-08T15:21:15.740Z · LW(p) · GW(p)

 I would recommend structuring applied maths learning differently: start with Computational Beauty of Nature https://mitpress.mit.edu/books/computational-beauty-nature and then go deep in relevant areas + graph theory + graph algebra. Also a deep understanding of multi-objective optimisation techniques: NSGA-3, Pareto front/Pareto surface.