Microsoft opens trenchcoat, reveals 'in-memory' Big Data column
Just the (100 billion) facts, man
If there’s one thing scarier than the big data tsunami, tech vendors tell us, then it’s tech vendors getting left out of the big-data conversation.
Microsoft is the latest software maker to crowbar itself into the debate on big data, this time claiming a place at the table on in-memory databases.
And guess what? Microsoft is poised to exploit this. SQL Server Technical Fellow Dave Campbell writes:
"Microsoft has been investing in, and shipping, in-memory database technologies for some time."
Campbell identified “in-memory” in the Microsoft world as a column-based storage engine in Word and Excel. This has now shipped with the newly released SQL Server 2012 as the xVelocity in-memory analytics engine that's part of SQL Server Analysis Services.
Campbell claimed a 200 times performance gain for one SQL Server 2012 customer “through the use of this new in-memory optimized columnstore index type.”
Microsoft’s man promised more from Redmond’s labs.
“Microsoft is also investing in other in-memory database technologies which will ship as the technology and opportunities mature,” he said. He didn’t reveal details but said that this includes an in-memory database solution in the company’s lab “and building our real-world scenarios to demonstrate the potential.”
“One such scenario, based upon one of Microsoft’s online services businesses, contains a fact table of 100 billion rows. In this scenario we can perform three calculations per fact – 300 billion calculations in total, with a query response time of 1/3 of a second. There are no user defined aggregations in this implementation; we actually scan over the compressed column store in real time,” he said.
When it comes to in-memory, database giant Oracle at least has some legitimacy. Years before Big Data was a blob on the horizon, Larry Ellison’s database beast swallowed tiny TimesTen in 2005. TimesTen uses replication and access techniques to keep data in the memory of a system rather than write to disk cache.
Last week, Ellison’s great white whale SAP revived its own HANA in-memory platform, announcing a $337m data base adoption program and $155m SAP HANA Real-Time Fund for startups and entrepreneurs to develop real-time apps.
SAP Ventures, the ERP giant’s venture-capital wing, has also joined Toshiba, Juniper Network and others putting $50m into flash array start-up Violin Memory.
This might account for Microsoft's claims.
While Oracle might be the in-memory leader, though, it isn't above a little shameless bandwagon-jumping when it needs to. The database giant in January 2011 was laying some tenuous claims of its own this time on NoSQL.
"Is Berkeley DB a 'NoSQL' solution today?" Oracle asked here of the embedded database it bought in 2006.
"Nope. Could Berkeley DB grow into a NoSQL solution? Absolutely" - given the right changes. ®
... someone asked for a "sqlite-type nosql" solution. Because, you know, databases are hot shit. Or because the data was all about nested objects and such. But for how much data? "50-100 objects". Ah. Use the built-in serialisation and drop the result in a file, there's a good developer.
We have computing power that approaches "free", there's yet another database hype going on, and this time we have so much data it's not funny any longer. What it all means? Most of it is crud; how much truly interesting stuff is there to be found on facebook, google+, and so on? Not that much. But we like to believe that with enough massaging the lead turns into gold.
How many users of various "alternative databases" actually need that? Quite a lot fewer than're actually trying to use the things. Nothing wrong with using whatever tool works and doesn't have nasty side effects like subtly corrupting your data, but that doesn't automatically make for a "best" or even "good enough" solution. Sometimes it's good to remember that there's still problems around best solved by using a key/value store without any additional "NoSQL" or "XML" or other buzzword-du-jour layers. What was it again, the master shows in the choosing?
Re: Quite recently...
There are no alternative databases here... just researching/exploiting (decades later) alternative storage and indexing structures. Think of it: the columnar storage in SQL Server 2012 is built on top of regular row-oriented structures but exploits column compression along with smart ordering. For this new thing just try to be a bit more creative: in memory databases doesn't necessarily have drop any ACID property, nor they have to use b-trees...
Typical NoSQL thingamajigs simply want to store-fast-think-later. It's not that by storing things by keys/values make the conceptual data model disappear. Sooner of later people will have to analyze the stored data and being so close and tied to a (simple) storage mechanism immediately fires back. They'll have to understand the real data model behind property bags and they'll need a powerful language to be able to make requests. Maybe a language that combines the power of algebra and calculus. So they'll end up building new MaybeSQL fads until they realize that it would have been better to start off with a proper RDBMS, and learn about Data Dependent Routing techniques, instead of using map-reduce to do basic things like hash-joins.