Feeds

Big Data's big issue: Where are all the data scientists coming from?

This personnel gap isn't just a job-title change

3 Big data security analytics techniques

Analysis Plug “data scientist” into Google and it is clear the job title has finally come of age and, suddenly there is a huge skills shortage.

An oft-quoted source about this shortage is a McKinsey Global Institute study, here. This predicts a talent gap of 140,000 to 190,000 people by 2018 in the US alone. I am always sceptical of IT projections more than 18 months ahead (let alone six years) but I am convinced there is currently a huge skills shortage that is not going away in the next 17 months and 29 days.

So, what is a data scientist? My favourite description comes from Twitter: “Yeah, so I'm actually a data scientist. I just do this barista thing in between gigs.” More cynically: “A data scientist is just an analyst who lives in California.”

Possibly more accurate is that a data scientist (DS) is “a better software engineer than any statistician and a better statistician than any software engineer”. In other words, an important part of the job is to be able to design novel analytical algorithms for specific sets of data and then be able to implement that algorithm in the appropriate computer language.

Data scientists excel at analysing data, particularly large amounts of data that does not fit easily into tabular structures, so called "Big Data."

For example, you should be able to point a data scientist at a web log and say: “Find the different patterns of behaviour in our users.” Or think about oil rigs for a moment. Breaking a drill bit during DIY work is irritating; in the middle of the North Sea it is annoying and very, very expensive. But if you collect enough sensor data (such as temperature, vibrations and RPM) you eventually have data for both normal running and breakages. You then point a data scientist at the data and say: “Build a system that predicts breakages before they happen.”

Data scientists are part artist and part engineer. They need a toolbox of techniques, skills, processes and abilities from which to construct novel solutions. And they need the ability to create a user interface that turns their abstract finding into something that the users of the system can understand, so data scientists also need the skills to create elegant visualisations that turn raw data into information. And they need to be able to communicate well with people. There is little use in creating a superb analytical process if you can’t communicate how and why it works to the board members.

And then there is the curiosity. Duncan Ross, director of data sciences at Teradata characterised data scientists well: “The first and most important trait is curiosity. Insane curiosity. In many walks of life evolution selects against the kind of person who decides to find out what happens 'if I push that button'. Data Science selects for it.”

So, what are the general characteristics of a DS?

They include: insatiable curiosity (see above), interdisciplinary interests, excellent communication skills and excellent analytical capabilities. Data scientists also need a good working knowledge of machine learning techniques, data mining, statistics, maths, algorithm development, code development, data visualisation and multi-dimensional database design and implementation.

Specific skills include the technologies to handle Big Data: NoSQL databases, Hadoop and related technologies and MapReduce and its implementation on differing software platforms. Data scientists also have an intimate knowledge of languages such as SQL, MDX, R and Functional and OOP languages such as Erlang and Java.

Data scientists will be required wherever large sets of data need to be analysed. This is true in the scientific world of course, but that is where the title is somewhat misleading because they are also needed in commercial organisations, in organisations like the NHS, government departments, defence and so on.

So where are all the data scientists going to come from? We’ve been "doing" data science at the School of Computing at the University of Dundee where I am chair of analytics, working with sets of Big Data as diverse as the output from mass spectrometers, image processing, web logs, data collected by games companies and so on.

This year, to run in parallel with our existing part-time Masters in BI, we are introducing a part-time Masters in Data Science. Most of the course is remote study because it is specifically designed for people already in employment in the database/analytical world who want to move into data science.

Fashions come and fashions go, but data scientists (whatever they may be called in the future) will endure. They will endure for the simple reasons that data is complex, the patterns within it are valuable, and spotting the patterns is difficult and requires an unusual mix of skills. ®

Mark Whitehorn holds the chair of analytics at the University of Dundee. His role involves working on data output from mass spectrometers, two-dimensional graphical traces of three-dimensional peaks that must be detected and their volumes calculated. The trick isn’t to do the sums; it’s to do them rapidly because another 8Gbyte output file is always coming.

High performance access to file storage

More from The Register

next story
From corporate bod to startup star: The 10-month gig that changed everything
What I learned as a techie in my time away from globo firms
Facebook snubbed Google's Silicon Valley wage-strangle pact, Sheryl Sandberg claims
Report details letter COO wrote to court addressing 'no-compete deal' lawsuit
Another day, another nasty Android vuln
Memory corruption mess can brick your mobe
Barclays warns freelance techies of DOUBLE DIGIT rate cut
'IT was a car crash before, so this isn't going to get any better' - sources
VMware announces compulsory bi-ennial VCP recertification
Downside: more time and money; Upside: VMware hints at two-yearly release cycle
Sysadmins and devs: Do these job descriptions make any sense?
Industry lobby group defines skills used in 25 common IT jobs
Who earns '$7k a month' but can't even legally drink? A tech intern!
Glassdoor reveals astonishing salaries of Silicon Valley undergrads
Your CIO is now a venture capitalist and you work at their startup
This just happened without you changing job, by the way
Turnover at the top in Oz telco-land as AAPT, Huawei, Optus, lose top brass
Move along, nothing to see here but orderly transitions
prev story

Whitepapers

Top three mobile application threats
Learn about three of the top mobile application security threats facing businesses today and recommendations on how to mitigate the risk.
Combat fraud and increase customer satisfaction
Based on their experience using HP ArcSight Enterprise Security Manager for IT security operations, Finansbank moved to HP ArcSight ESM for fraud management.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Five 3D headsets to be won!
We were so impressed by the Durovis Dive headset we’ve asked the company to give some away to Reg readers.
SANS - Survey on application security programs
In this whitepaper learn about the state of application security programs and practices of 488 surveyed respondents, and discover how mature and effective these programs are.