Question: What's missing in Microsoft's data science professional degree?
Hint: It's relational
Comment Microsoft grabbed the headlines this week when it announced a Professional Degree Program at its annual partner conference. It starts with data science.
Microsoft claims to have consulted data scientists and companies that employ them in order to ensure students the core skills for a job in this extremely hot field of employment.
The curriculum is certainly extensive and covers the sort of topics you would expect in a data science course. The course material certainly seems to be of excellent quality, and I’ve no doubt that if you take the courses you will certainly get an awful lot out of them.
It’s not clear from the website, but you can enrol for the courses now and start working through the material, and courses (with the exception of the orientation course) are being run four times a year. You can take the courses for free, but if you want the professional degree you will need to pay for a certificate for each module you take, between $25 and $99 depending on the module.
It will certainly work out cheaper and quicker than a traditional university MSc programme.
But there are some drawbacks.
First, it’s an online course. The great strength of going in person and face to face is the joy of interacting with the tutor and - just as importantly - other students. A good tutor will allow a session to drift off topic if the students are interested and bring it back on topic when it goes too far. Derivation and deviation can throw up all kinds of interesting discussions leading to realisations and breakthroughs.
The use of videos as the primary teaching method here mean this just isn’t possible.
This sharing of knowledge and information is important when learning at this level – other students sharing their experience is vital to the learning experience and can often be more important than the learning material itself. Microsoft is trying to solve this with their Born to Learn forums that go some way to stimulating conversation, but in no way can simulate the experience of a good technical discussion in the pub after a learning session.
The next issue is something that’s shared by a number of courses set up by technology companies. With the best will in the world, companies can’t help but treat courses as adverts for their own technology. As an example, this program offers a module “Query Relational Data” which is actually a course in T-SQL, Microsoft’s own version of the SQL database query language. That’s OK if this was a training course, but if it’s meant to be training data science professionals, why not teach generic (or standard) SQL?
There is the matter of the final project, too. A University MSc course will typically have a three-month intensive research project, and some universities will arrange for the project to be taken with an industrial partner.
Microsoft’s answer is to take part in the Cortana Intelligence Competition, something I have no doubt is tough and fun, but there is a nagging feeling it will be superficial compared to a three-month research project. Also, yes, Cortana is a Microsoft product so - once more - it's a little self-serving.
Arguably more the biggest concern, however, is the module doesn’t teach relational database theory or relational data modelling. Both are surely vitally important to a good data scientist but, as we know, historically relational is something that's proved disposable in big data, an area this qualification no doubt seeks to serve. Without this understanding it’s hard to understand why NoSQL databases are different, what advantages they bring as well as their disadvantages. More importantly, without a good understanding of relational theory, the data scientist misses a huge and well-tested bag of tricks that avoids a whole host of analytical problems.
There is a suggestion that the student can go elsewhere to learn this material, but it’s not clear exactly where the student should go. Without this basic understanding, just exactly why data is normalised for instance, the student could be storing up problems for the future, and potentially problems for anyone employing the student. Designing databases that work correctly, both traditional and for big data, is never an easy job: understanding insert, update and delete errors is only one part of the design, the need for transactions (not strictly part of relational theory, I’ll grant) is another, indexing strategies need to be considered. These don’t go away in the big-data world, they are simply more hidden and still need to be understood.
As the courses have no prerequisites (the course can easily be taken in any order as well), it can’t be assumed that students will already have this knowledge or know where to get it. This democratisation of education is surely a good thing, but it’s a worry that basic theoretical understanding is thrown out of the window like this.
Ultimately there is an important role for this sort of online training, and this course provided by Microsoft is no doubt an excellent introduction to data science.
It will certainly get anyone who takes it a good head start in data science and like any good course is an excellent jumping-off point for further study. If you view the program as a training course on how to use Microsoft tools (OK there are Python and Spark included as well), then you’ll have a good idea on what to expect – it’s certainly a cheap way to gain knowledge quickly.
But if you want to prove to potential employers that you have taken the course and you have answered the exam questions yourself, then there remains no substitute for verified certificates. ®