Feeds

MongoDB speaks elephantese with Hadoop Connector upgrades

10Gen proves square JSON pegs can be inserted into round HDFS holes

Combat fraud and increase customer satisfaction

MongoDB steward 10Gen has increased the capabilities of its Hadoop Connector, which lets administrators shuttle data between MongoDB and HDFS and other Hadoop services.

The updates were announced on Tuesday, and see the company add support for Mongo's Binary JSON (BSON) backup files into the connector, along with support for Apache Hive and incremental MapReduce jobs.

The Hadoop Connector puts MongoDB data in a Hadoop File System (HDFS) costume, letting MapReduce jobs fiddle with the datastores. This tech lets organizations manipulate MongoDB data without having to move it through the data center, saving bandwidth.

Combined, these enhancements help 10Gen push MongoDB into being more than a NoSQL datastore, and into its own platform for minor analytics, data storage, and cross-platform querying. It follows on from IBM implementing support for MongoDB's JSON-oriented query method inside DB2 and WebSphere.

Apache Hive is a query engine for Hadoop that lets people probe HDFS datasets without having to write MapReduce jobs, and instead use a SQL-like query language. This does not map perfectly to MongoDB, and this created some challenges.

"Figuring out a way to express field mappings for fields in Hive to fields in MongoDB in a way that covers the edge cases users may encounter is tricky," 10Gen software engineer Mike O Brien told The Register via email. "Also, there are data types in MongoDB that do not have analogous counterparts in Hive (for example, ObjectId) so there are some design decisions around how to handle those as well."

The JSON filetype is also not native to Hadoop, so work had to be done to get the system to churn through the objects without introducing errors.

"To handle splitting for parallelism, it crawls through a BSON file and calculates byte-offsets in the files to create a list of fixed size chunks which are then processed in parallel," O'Brien writes. "Or, the splits can be pre-built locally with a provided script. When reading the bson off disk, it decodes the bson documents on the fly and passes them into the Mapper as a 'BSONObject' which is the base class used to represent a simple document in the mongo java driver."

In the future, the company plans to boost performance, enforce better integration with various Hadoop APIs, and "expose some more fine-grained control options to the user on how jobs run and read/write data," O'Brien said.

As more and more companies invite Hadoop into their data center, gaining compatibility with the technology will be crucial for new databases, lest developers start forsaking the data stores for more HDFS-friendly systems. With the Hadoop connector, 10Gen is working to make sure this problem doesn't appear, and that DBAs can dance with the elephant, wherever their data is stored. ®

SANS - Survey on application security programs

More from The Register

next story
This time it's 'Personal': new Office 365 sub covers just two devices
Redmond also brings Office into Google's back yard
Oh no, Joe: WinPhone users already griping over 8.1 mega-update
Hang on. Which bit of Developer Preview don't you understand?
Microsoft lobs pre-release Windows Phone 8.1 at devs who dare
App makers can load it before anyone else, but if they do they're stuck with it
Half of Twitter's 'active users' are SILENT STALKERS
Nearly 50% have NEVER tweeted a word
Internet-of-stuff startup dumps NoSQL for ... SQL?
NoSQL taste great at first but lacks proper nutrients, says startup cloud whiz
IRS boss on XP migration: 'Classic fix the airplane while you're flying it attempt'
Plus: Condoleezza Rice at Dropbox 'maybe she can find ... weapons of mass destruction'
Ditch the sync, paddle in the Streem: Upstart offers syncless sharing
Upload, delete and carry on sharing afterwards?
New Facebook phone app allows you to stalk your mates
Nearby Friends feature goes live in a few weeks
Microsoft TIER SMEAR changes app prices whether devs ask or not
Some go up, some go down, Redmond goes silent
prev story

Whitepapers

Securing web applications made simple and scalable
In this whitepaper learn how automated security testing can provide a simple and scalable way to protect your web applications.
3 Big data security analytics techniques
Applying these Big Data security analytics techniques can help you make your business safer by detecting attacks early, before significant damage is done.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Top three mobile application threats
Learn about three of the top mobile application security threats facing businesses today and recommendations on how to mitigate the risk.
Combat fraud and increase customer satisfaction
Based on their experience using HP ArcSight Enterprise Security Manager for IT security operations, Finansbank moved to HP ArcSight ESM for fraud management.