The Register® — Biting the hand that feeds IT

Comments on: Google's MapReduce suddenly not so backward

Ahem 

Posted Thursday 28th August 2008 07:00 GMT

Happy

Anyone looked at ORACLE cluster or DB2 Sysplex recently.

They obviously use some variation of MapReduce internally to spread the laod over several machines, its just not exposed in hte API.

And as I said at the time Google's BigTable does support an SQL varient, among several options for accessing the database.

Great, now miss out the SQL 

Posted Thursday 28th August 2008 08:11 GMT

That's great, now you can make it faster by not putting the data in the SQL database in the first place...saving the query optimization to get it out again and shoving it into your mapreduce directly.

As Jeremy recently discovered:

http://jeremy.zawodny.com/blog/archives/010523.html

Remember folks, SQL databases are for data that can be

1. Run against a query that can be expressed in SQL syntax

2. That query can be improved by one or more indexes

3. The burden of making the index is less than the time it saves in your queries.

If it can't be expressed in SQL then the query can't be reduced and you end up reducing the data instead..... which is the point of map-reduce.

Test: Given a changing set of points in 3d, run a query against a snapshot of that data that returns the nearest point to (x,y,x)... feel free to express that in sql....

@AC 

Posted Thursday 28th August 2008 10:01 GMT

select min(id) keep (dense rank first order by ((x-p_x)*(x-p_x) + (y-p_y)*(y-p_y) + (z-p_z)*(z-p_z)) ) from points

(Oracle syntax)

Sawzall 

Posted Thursday 28th August 2008 12:32 GMT

Google has made their own query language, called Sawzall, on top of MapReduce. It doesn't look much like SQL (it looks more like a "normal" programming language) , but it seems quite nice.

@Alistair 

Posted Thursday 28th August 2008 14:44 GMT

"select min(id) keep (dense rank first order by ((x-p_x)*(x-p_x) + (y-p_y)*(y-p_y) + (z-p_z)*(z-p_z)) ) from points"

i.e. brute force, (x,y,z) is what you query this db against, so keeping the index doesn't help you because x,y,z changes and would be recalculated each time.

Map reduce. No point in putting it in the db either, just bung the raw data into a map reduce.

Don’t Miss

SunSun's surviving staff hit with 'motivation' missive

Exclusive Code: Your solace, our savior

Ubuntu teaser Ubuntu's Karmic Koala bares fangs at Windows 7

Review Shuttleworthian scrap

AppleChange your views: OS X tags exploited

Mac Secrets Apple windows insider

JavaSun preps cell-phone Java plan for netbooks

OpenWorld 09 Modules not globules