Google's MapReduce suddenly not so backward
SQL tools plug gaps
What was seen as a major hole in Google's MapReduce database technology has been plugged, not once but twice. In the same week.
The lack of SQL tools was one of the main criticisms levelled at MapReduce in January 2008 by database gurus Michael Stonebraker and David DeWitt. They hammered MapReduce for its failure to offer SQL, describing - to the consternation of many - Google's offering as "a major step backwards" in database technology.
Aster Data, founded in 2005 by three ex-Stanford post-graduate students, brought its Aster nCluster massively parallel processing (MPP) database technology to market in May 2008. It counts MySpace and Aggregate Knowledge as customers. Aster chief executive Mayank Bawa wrote in this blog that nCluster brings the advantages of relational SQL to MapReduce's large-scale database.
Greenplum takes a slightly different tack, emphasising the "next-generation data warehouse" credentials of its database technology. Founded in 2003, its customers include Nasdaq, LinkedIn and Indian telco Reliance Communications. ®
"select min(id) keep (dense rank first order by ((x-p_x)*(x-p_x) + (y-p_y)*(y-p_y) + (z-p_z)*(z-p_z)) ) from points"
i.e. brute force, (x,y,z) is what you query this db against, so keeping the index doesn't help you because x,y,z changes and would be recalculated each time.
Map reduce. No point in putting it in the db either, just bung the raw data into a map reduce.
Google has made their own query language, called Sawzall, on top of MapReduce. It doesn't look much like SQL (it looks more like a "normal" programming language) , but it seems quite nice.
select min(id) keep (dense rank first order by ((x-p_x)*(x-p_x) + (y-p_y)*(y-p_y) + (z-p_z)*(z-p_z)) ) from points