Feeds

Big Data's big issue: Where are all the data scientists coming from?

This personnel gap isn't just a job-title change

Build a business case: developing custom apps

Analysis Plug “data scientist” into Google and it is clear the job title has finally come of age and, suddenly there is a huge skills shortage.

An oft-quoted source about this shortage is a McKinsey Global Institute study, here. This predicts a talent gap of 140,000 to 190,000 people by 2018 in the US alone. I am always sceptical of IT projections more than 18 months ahead (let alone six years) but I am convinced there is currently a huge skills shortage that is not going away in the next 17 months and 29 days.

So, what is a data scientist? My favourite description comes from Twitter: “Yeah, so I'm actually a data scientist. I just do this barista thing in between gigs.” More cynically: “A data scientist is just an analyst who lives in California.”

Possibly more accurate is that a data scientist (DS) is “a better software engineer than any statistician and a better statistician than any software engineer”. In other words, an important part of the job is to be able to design novel analytical algorithms for specific sets of data and then be able to implement that algorithm in the appropriate computer language.

Data scientists excel at analysing data, particularly large amounts of data that does not fit easily into tabular structures, so called "Big Data."

For example, you should be able to point a data scientist at a web log and say: “Find the different patterns of behaviour in our users.” Or think about oil rigs for a moment. Breaking a drill bit during DIY work is irritating; in the middle of the North Sea it is annoying and very, very expensive. But if you collect enough sensor data (such as temperature, vibrations and RPM) you eventually have data for both normal running and breakages. You then point a data scientist at the data and say: “Build a system that predicts breakages before they happen.”

Data scientists are part artist and part engineer. They need a toolbox of techniques, skills, processes and abilities from which to construct novel solutions. And they need the ability to create a user interface that turns their abstract finding into something that the users of the system can understand, so data scientists also need the skills to create elegant visualisations that turn raw data into information. And they need to be able to communicate well with people. There is little use in creating a superb analytical process if you can’t communicate how and why it works to the board members.

And then there is the curiosity. Duncan Ross, director of data sciences at Teradata characterised data scientists well: “The first and most important trait is curiosity. Insane curiosity. In many walks of life evolution selects against the kind of person who decides to find out what happens 'if I push that button'. Data Science selects for it.”

So, what are the general characteristics of a DS?

They include: insatiable curiosity (see above), interdisciplinary interests, excellent communication skills and excellent analytical capabilities. Data scientists also need a good working knowledge of machine learning techniques, data mining, statistics, maths, algorithm development, code development, data visualisation and multi-dimensional database design and implementation.

Specific skills include the technologies to handle Big Data: NoSQL databases, Hadoop and related technologies and MapReduce and its implementation on differing software platforms. Data scientists also have an intimate knowledge of languages such as SQL, MDX, R and Functional and OOP languages such as Erlang and Java.

Data scientists will be required wherever large sets of data need to be analysed. This is true in the scientific world of course, but that is where the title is somewhat misleading because they are also needed in commercial organisations, in organisations like the NHS, government departments, defence and so on.

So where are all the data scientists going to come from? We’ve been "doing" data science at the School of Computing at the University of Dundee where I am chair of analytics, working with sets of Big Data as diverse as the output from mass spectrometers, image processing, web logs, data collected by games companies and so on.

This year, to run in parallel with our existing part-time Masters in BI, we are introducing a part-time Masters in Data Science. Most of the course is remote study because it is specifically designed for people already in employment in the database/analytical world who want to move into data science.

Fashions come and fashions go, but data scientists (whatever they may be called in the future) will endure. They will endure for the simple reasons that data is complex, the patterns within it are valuable, and spotting the patterns is difficult and requires an unusual mix of skills. ®

Mark Whitehorn holds the chair of analytics at the University of Dundee. His role involves working on data output from mass spectrometers, two-dimensional graphical traces of three-dimensional peaks that must be detected and their volumes calculated. The trick isn’t to do the sums; it’s to do them rapidly because another 8Gbyte output file is always coming.

Boost IT visibility and business value

More from The Register

next story
Microsoft's MCSE and MCSD will become HARDER to win
Redmond decides it won't replace Masters certifications, so lesser certs get more rigour
Pinterest diversity stats: Also pale and male (but not as much as Twitter)
Cats'n'flowers site latest to admit white men rule its roost
MoJ IT workers 'n' pals extend strike action over privatisation
Fears of cuts when shared services gig moves to Steria
Pleased to meet you. I'm Joe Bloggs, MVP, vExpert, Cisco Champ
What a mouthful. Do customers care? Six title-holders quizzed
'Oh my god – Mark Zuckerberg wants to meet me'
'The Swiss have got no great interest in working with Apple'
Dammit, Foxconn: Where's our 1 MILLION-strong robot ARMY?
'Foxbots' just aren't good enough to take up the slack
Devs: Fancy a job teaching Siri to speak the Queen's English?
Spik propa lyk dis blud innit, ya get me?
Bankers bid to use offshore temp techies
WikiLeaks publishes Financial Services Annex to 50-nation Trade in Services Agreement
Hey! Where! are! the! white! women! at!? It's! Yahoo!
In non-tech jobs, that is – still mostly white men running Marissa Mayer's web biz
prev story

Whitepapers

Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Consolidation: The Foundation for IT Business Transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.
Backing up Big Data
Solving backup challenges and “protect everything from everywhere,” as we move into the era of big data management and the adoption of BYOD.
Boost IT visibility and business value
How building a great service catalog relieves pressure points and demonstrates the value of IT service management.
Why and how to choose the right cloud vendor
The benefits of cloud-based storage in your processes. Eliminate onsite, disk-based backup and archiving in favor of cloud-based data protection.