Intel beckons SMBs aboard Big Data bandwagon
Small fry firms can be Big Data fish with Hadoop
Cutting edge Big Data projects might seem the sole preserve of big name multinationals and government organisations but the democratisation of these next gen analytics capabilities is coming soon to an SMB near you, according to Intel.
Speaking at the APAC launch of the Intel Distribution for Apache Hadoop, Chipzilla’s global director of enterprise computing, Patrick Buddenbaum, told The Reg that the firm’s OEMs, SIs and software vendors will help to push analytics capabilities down to a wider base of organisations than might currently be using such advanced tools.
“The partner and ecosystem aspect will drive the scale of this thing to the next set of customers – software vendors incorporating Hadoop into their packaged ERP or BI applications,” he said.
The “other element of the dissemination curve” lies with cloud players like Amazon and Savvis effectively offering Hadoop-as-a-service – giving yet more firms the opportunity to experiment with the tech, added Buddenbaum.
It will happen “over time, not over night”, and a range of business, technical and cultural issues may slow the rate of adoption of such capabilities among small and medium sized firms, he said.
OEMs and SIs can help customers unfamiliar dealing with large scale clusters. However, many companies aren’t asking the right “what if” questions of their data, while elsewhere CIOs are failing to support requests for broader cross-organisational data access to generate real-time intelligence, said Buddenbaum.
Intel was also at pains to point out its Hadoop distribution is not the “be all and end all” and that Apache Hadoop itself is not the answer to all Big Data problems but just “one piece of the puzzle”.
However, Forrester research director Dane Anderson was more positive, arguing that the Hadoop-based distributed computing paradigm is already collapsing Big Data barriers to entry for smaller firms.
“A lot of what is holding large companies back is that it’s hard for them to change. The majority of analytics spend is in the traditional, data warehousing / batch processing way of analysis,” he explained.
“Vendors and enterprises overseeing that don’t want to let it go but there’s a new, younger shift based on Hadoop and open source. We have generational change and the possibility of a lot of companies coming in and transforming their industries with this technology.”
The focus will shift towards this idea of “cutting edge compute and software resources at pay-per-use prices” using Hadooop, but it will take time according to Anderson, who referenced the 5-6 years it took for SaaS to hit the mainstream after getting a foot in the door among early adopters.
He highlighted a dearth of in-house skills, organisational roadblocks and lack of clear ownership on Big Data projects, as well as the problems of navigating vendor hype, as key additional barriers.
As it is, at the moment there are few case studies for Big Data being bandied around by the major IT players that aren’t large scale projects such as Intel’s work with the world's biggest network operator, China Mobile - designed to give customers real-time access to billing data.
However, Chipzilla’s APAC marketing bod Takashi Tokunaga said the firm was working with several sub-100 employee online retailers in the region who have been harnessing Hadoop to help customise and enhance the shopping experience à la Amazon.
Another example comes from EMC and research outfit the Singapore-MIT Alliance for Research and Technology which embarked on a project to discover why the long queues for taxis in Singapore when it rains.
After crunching weather satellite data and GPS data on local taxis, they discovered that it wasn't because cabs were more in demand at these times but because drivers were refusing fares in the wet because of the risk of accidents, which they were liable to pay an upfront $800 fee to cover.
In a few years, as the hype blows away and adoption goes mainstream the industry will probably not even be talking about “Big Data” anymore but just "data analytics".
However, if it means always being able to grab a taxi, even in the middle of a thunderstorm, this may yet be one of the few technology trends which just about lives up to the hype. ®
The idea that Hadoop is "cheaper" is a myth. Hadoop solves the "expensive server" problem by spamming a whole bunch of shitty consumer-grade hardware at the problem. If you do the research into the subject and talk to the right people there is rather a lot of dissention as to whether or not this actually results in an over price drop.
You see, the expensive databases (Oracle, DB2, etc) are really tightly coded the hardware for performance. They aren't perfect, but they are a hell of a lot more efficient than Hadoop. Plus, you generally get away with doing what you need to do on a single (or smallish number) of exceptionally powerful boxes. This drives down your power, cool, space and networking bills by quite a bit.
You can overcome some of the inherent limitations with Hadoop if you have shit-hot programmers, but as you pointed out, SMBs don't. What's more, as the traditional DB folks are being kicked out of the higher end positions thanks to Hadoop actually being useful (and cheaper) when you get to petascale, the cost of the expertise required to do Neat Things with traditional databases is plummeting.
I have on hand a handful of system that could theoretically be Hadoop nodes. They would be exceptionally shitty Hadoop nodes and they wouldn't come anywhere close to providing the compute, IOPS or network bandwidth required to do the imagery analysis discussed above. Assuming, of course, I could find a dev to program it.
The ability to use consumer hardware doesn't mean it's cheaper. It means it scales out in a more linear fashion. When you have a small scale budget, limited space, limited cooling and big requirements, Hadoop just isn't the thing.
That's the beauty Trevor, hadoop class hardware is exactly what you do have because it scales in a linear way with hardware so you can use what you have. You're right on the value of the analytics but the cost of a man to write the stuff is so prohibitive that it's a joke at the moment for an SMB to even consider. The exception being, if they can find a placement student (intern in your speak? Sorry don't know Canada, over here we often take a year out from University for work experience) who happens to be enthusiastic, very clever, and doesn't realise their true value yet. In fact, people who don't know their value will likely be the catalyst for this whole shebang.
But most businesses do better if they have analytics. Chicken and egg. So:
Step 1) Collect all the data you can
Step 2) Start interrogating it
Step 3) Alter your busienss/marketing practices based on what you discover
I have a dozen companies Microsoft could use for case studies. (Were they willing to front some hardware! I don't have hadoop-class anything lying around.) That said...a lot of these companies already do analytics. Using PHP. And MySQL. Dear god, I am about the move the FIRST of these SMBs to an SSD for the MySQL database! Standard SQL databases will hold pretty much all the data these companies actually use.
You've got a long way to go to sell me on the necessity of that. Sure, the same company we're moving to the SSD for the MySQL database has potentially 100TB of data per year coming in. Most of it, however, is imagery. Can you even imagine what you'd need to do image-based analysis to extra things like "what are most people taking pictures of" etc?
Yeah, so we stick to sales data, geographics....if we get really ambitious we could pull metadata from the images and analyse that. But where's the ROI in pulling apart the images, scanning for "pictures of babies, pictures of landscapes, pictures of cars" etc. Will knowing what people are shooting produce more of a revenue bump than the cost of the nuclear substation and small shopping mall we'd need to crunch the data?
Hadoop for SMB? WHY?
So is Intel going to configure it all for the SMBs as well? From what I can tell the barrier to using Big Data techniques is not being able to run the free open source software but rather the complete lack of people with the skills and knowledge to use the software stack. Putting it in the cloud won't help, and neither will bundling it into a distribution.
Hadoop is like Crystal Reports - it's only useful if you have a goal. Telling a CEO that you can do analytics will get you nowhere if you can't back it up with examples.