Feeds

Spotting a Big Data faker as you set up Big Data for someone

It says here on your CV....

Next gen security for virtualised datacentres

Having read my last Big Data piece, I fear that some of you will try to blag your way out of the declining Oracle/Java/VB market without the legs to support what’s on your CV.

This article is not for you: it’s for the poor souls who have to catch you out whilst trying to get in someone who’s at least mildly competent. There are no qualifications yet that even pretend to establish the bona fides of people applying in this field, so you’re into the world of pattern matching and instinct. Your HR department has negotiated hard to reduce what they pay recruiters, which means you pay more than you should for less service than you need (a headhunter writes - Ed). Few recruiters have ever coded anything so a trawl for Big Data will turn up a wide spread of CVs which have the right buzzwords. Your job is to lose 99 per cent of them.

Degeneration Game

Since you’re a manager, you’re probably a generation or maybe two behind what the footsoldiers are actually doing with Big Data. You won’t have hands on experience with the tech so you may have to fall back on the audit approach.

People rarely tell just one lie on their CV; and they find it hard to look you in the eye and actually say something they know to be false, rather than bigged up a little. That allows you to filter by asking detailed questions about things they claim to have done which you do understand. Then you can work on the principle that if they exaggerate their Python skills, odds are they’re doing that with what you need as well.

But that just filters out the incompetent bullshitters (not the good ones) so you need to look for positives amongst the survivors. To be useful a BD pro has to work at more levels and across more data sources than with traditional client/server databases. In most cases it is not the sheer volume of data that they have to overcome, it is the fact that you’re sucking in information from more places: eg social media, the accounts system and web clicks.

BD is also a state of mind as much as a collection of skills. One day these sorts of analytics will come in a more pre-canned format and there’s already a Big Data for Dummies, but for now it is an experimental science. Even if you have bought into a package like Talend and need people to drive the thing, they need to be ones who can dive deeper into the stack than someone who’s spent their career insulated from network and storage issues in a legacy Oracle setup.

That’s the opposite of how we are used to thinking about developers. Previously you’d actually worry about a Client/Server guy who cared very much about where on the network his data lived or how the topology affected response times.

If you’ve been hiring BD people, that’s probably led you to the next issue … which is that there is almost no one out there who can actually do the job properly. It may even be the case that if you think you’ve hired one of them, that you’re wrong. The harsh fact is that hardcore referential integrity issues will blank the eyes of someone whose core is topology - and since I come from the software direction there are network issues where I read the words but don’t always hear the music and that’s going to hurt a team built of “smart people”.

I reckon myself one of them and just like you, I define “smart” by them holding their own in a tech pissing contest. That’s suboptimal in Big Data. You need to get them to explain their core competence and use the “ink blot” style of team building and hope the smudges join up with an acceptably small area of holes. But if you find someone is pushing the idea that they know the whole stack, odds are that they’re trying it on. What you really need in this game is people that pick stuff up quickly, so if not sure where someone fits on the “normal CV bigging / utter bullshitter” spectrum pick some tech item that they don’t know and work through what they’d do to understand it.

This stack issue goes all the way to data analysis skills and that’s something in short supply in the market and in your own head, else you wouldn’t need to read this.

You can start with the identification of entities across different platforms. Financial systems mostly work off accounts, which may be firms or individuals which won’t have a one to one mapping with people’s activity on your site. Especially if what you’re trying to find out is why some customers buy more than others and what can be done about it. Is the “customer” the firm or the person who orders? Can you identify points at which they stopped ordering? Was it after some support event? Dell would have far more customers if their systems did that.

This is not so disjointed from traditional database work and you can pick some data sources at your firm that don’t work well together and see how the candidate would deal with the sort of setup you have. An important part of recruitment isn’t just “are they smart?” but “do they fit? If you don’t make that trade off consciously you’re not doing it right and it can end badly.

That also means that an irony of this particular new set of buzzwords is that older candidates are often better, even though we associate new tech with youth. Over time, good IT pros absorb ways of dealing with the shit that accumulates in the gaps between systems. Others just forget stuff. The way you partition that set is to go back in time and ask them how they’d use ancient tech like REXX or VB6 to do this job. Neither are likely to be useful to you, but you will learn whether they absorb disparate skills and can apply tools that aren’t designed for a task when that’s all you have.

It’s also not unknown for those legacy skills to be directly useful since the mature core systems that you’re going to be sucking to create Bigness are best accessed using tech you thought was dead and buried.

One irritating question I sometimes throw at candidates is “how do you parse CSV ?”. I can (and have) written two pages on error handling, CR/LF, CHR(26) and or/4, Unicode and escape chars. CSV is a shitty old tech, but if you’re a seasoned IT pro you will know sometimes it’s the only way to get data from system A to system B even if they’re on the same server. Another thing you need to filter for is whether they can actually think up the analytics that your setup needs, which adds creativity and statistics to the mix and spreads people even thinner.

A good question to lob at them is how they separate causality from correlation, quite a few stumble on that.

Different ways of thinking

One of the layers within a good BD skillset is the ability to think about things in a distributed and asynchronous way. A purpose built BD system scatters the data and processing across N nodes with the result that if they have the experience they claim, then any explanation of their work will talk about the flow/storage/processing tradeoffs and the way they chose the set they did and the problems they overcame. Drilling down through the way they handled problems is the gold standard in evaluating a BD pro. Syntax skills aren’t worth all that much, what you should be paying for is good judgement, either through bloodily earned experience or from raw brainpower.

Most BD setups are at least partly based upon somewhere between 2 to 7 base systems and the sort of skills you’re looking for aren’t just mastery of the DB drivers or SQL, but given that these systems do critical work for other parts of the business, how he avoided overloading them and which choices he made for which data he trusted.

That’s not to say you don’t check their Aster-H SQL coding and/or switch configuration skills, but between ourselves let’s be honest here, a moderately smart blagger can learn syntax and Google potential interview questions. Judgement is harder to fake and shows they’ve done it for real as well as finding the most useful people for your team.

Lastly, there is one way to cut the chances of hiring a dud, even if it’s not one approved of by HR. Let your contractors do the interviewing. Freelancers know how to spot a blagger better than most, yet few firms let them interview staff - which is a mistake. When all else fails, spend a couple of hundred on a contractor for the final interview, which is worth it even if only to protect your back against hiring a dud. ®

Dominic Connor is a City headhunter who can also blag his way in C++, pricing exotic options, chemistry and journalism.

Boost IT visibility and business value

More from The Register

next story
6 Obvious Reasons Why Facebook Will Ban This Article (Thank God)
Clampdown on clickbait ... and El Reg is OK with this
No, thank you. I will not code for the Caliphate
Some assignments, even the Bongster decline must
Fast And Furious 6 cammer thrown in slammer for nearly three years
Man jailed for dodgy cinema recording of Hollywood movie
Caught red-handed: UK cops, PCSOs, specials behaving badly… on social media
No Mr Fuzz, don't ask a crime victim to be your pal on Facebook
Barnes & Noble: Swallow a Samsung Nook tablet, please ... pretty please
Novelslab finally on sale with ($199 - $20) price tag
Ballmer leaves Microsoft board to spend more time with his b-balls
From Clippy to Clippers: Hi, I see you're running an NBA team now ...
Video of US journalist 'beheading' pulled from social media
Yanked footage featured British-accented attacker and US journo James Foley
Call of Duty daddy considers launching own movie studio
Activision Blizzard might like quality control of a CoD film
prev story

Whitepapers

Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Endpoint data privacy in the cloud is easier than you think
Innovations in encryption and storage resolve issues of data privacy and key requirements for companies to look for in a solution.
Scale data protection with your virtual environment
To scale at the rate of virtualization growth, data protection solutions need to adopt new capabilities and simplify current features.
Boost IT visibility and business value
How building a great service catalog relieves pressure points and demonstrates the value of IT service management.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?