A couple of us at IARC have been following the course on machine learning from Stanford University taught by Andrew Ng. It really is an excellent course and we are now recommending it to our colleagues. The course has just finished, but it starts again in January and you can register at jan2012.ml-class.org.
The course lasts 10 weeks, with about 2 hours of video lectures per week. It is interactive, with review questions and programming exercises in Octave that you can submit for grading. The programming exercises were a little challenging to start with, since I had to unlearn all the tricks that I use for vectorized calculations in R. But they do not require mastery of the language. Much of the work is done for you and you only need to implement the methods described in the lectures.
What really made the course stand out for me is the emphasis on model building and criticism. In many applications of machine learning, data collection is cheap but the time spent by engineers building a machine learning system is expensive. One of the key lessons of the course is that if you do not follow a good methodology for model criticism or do not properly prioritize your efforts to improve the model, then you will waste a lot of time.
This is obviously very different from biostatistics and epidemiology, where data collection is expensive and we are generally trying to squeeze the most information out of a small data set. Machine learning also emphasises good prediction rather than trying to summarize the data succinctly or understand the data-generating process. Thus the outlook is rather different even when the techniques are familiar (For example, the course starts with linear regression and logistic regression).
The reason we started following the course is that machine learning techniques are entering biomedical science through the increasing use of high-throughput laboratory methods. I noticed that when the junior lab scientists at IARC gave a seminar, they would casually drop machine-learning terms and I had no idea what they meant. It was time for a little re-training, if only to learn the vocabulary.
Stanford will also offer a course on probabilistic graphical models, starting in January, which may be of interest to readers of this blog. We are going to try this next.