The Register® — Biting the hand that feeds IT

Feeds

MongoDB speaks elephantese with Hadoop Connector upgrades

10Gen proves square JSON pegs can be inserted into round HDFS holes

Email delivery: 4 steps to get more email to the inbox

MongoDB steward 10Gen has increased the capabilities of its Hadoop Connector, which lets administrators shuttle data between MongoDB and HDFS and other Hadoop services.

The updates were announced on Tuesday, and see the company add support for Mongo's Binary JSON (BSON) backup files into the connector, along with support for Apache Hive and incremental MapReduce jobs.

The Hadoop Connector puts MongoDB data in a Hadoop File System (HDFS) costume, letting MapReduce jobs fiddle with the datastores. This tech lets organizations manipulate MongoDB data without having to move it through the data center, saving bandwidth.

Combined, these enhancements help 10Gen push MongoDB into being more than a NoSQL datastore, and into its own platform for minor analytics, data storage, and cross-platform querying. It follows on from IBM implementing support for MongoDB's JSON-oriented query method inside DB2 and WebSphere.

Apache Hive is a query engine for Hadoop that lets people probe HDFS datasets without having to write MapReduce jobs, and instead use a SQL-like query language. This does not map perfectly to MongoDB, and this created some challenges.

"Figuring out a way to express field mappings for fields in Hive to fields in MongoDB in a way that covers the edge cases users may encounter is tricky," 10Gen software engineer Mike O Brien told The Register via email. "Also, there are data types in MongoDB that do not have analogous counterparts in Hive (for example, ObjectId) so there are some design decisions around how to handle those as well."

The JSON filetype is also not native to Hadoop, so work had to be done to get the system to churn through the objects without introducing errors.

"To handle splitting for parallelism, it crawls through a BSON file and calculates byte-offsets in the files to create a list of fixed size chunks which are then processed in parallel," O'Brien writes. "Or, the splits can be pre-built locally with a provided script. When reading the bson off disk, it decodes the bson documents on the fly and passes them into the Mapper as a 'BSONObject' which is the base class used to represent a simple document in the mongo java driver."

In the future, the company plans to boost performance, enforce better integration with various Hadoop APIs, and "expose some more fine-grained control options to the user on how jobs run and read/write data," O'Brien said.

As more and more companies invite Hadoop into their data center, gaining compatibility with the technology will be crucial for new databases, lest developers start forsaking the data stores for more HDFS-friendly systems. With the Hadoop connector, 10Gen is working to make sure this problem doesn't appear, and that DBAs can dance with the elephant, wherever their data is stored. ®

Supercharge your infrastructure

Whitepapers

5 ways to reduce advertising network latency
Implementing the tactics laid out in this whitepaper can help reduce your overall advertising network latency.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Email delivery: 4 steps to get more email to the inbox
This whitepaper lists some steps and information that will give you the best opportunity to achieve an amazing sender reputation.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
5 ways to prepare your advertising infrastructure for disaster
Being prepared allows your brand to greatly improve your advertising infrastructure performance and reliability that, in the end, will boost confidence in your brand.

More from The Register

next story
Windows 8 fans out-enthuse Apple fanbois
Redmond allows 81 Win 8 devices to use one user ID, solving side-loading shemozzle
'200 million' fanbois using iOS 7 just a week after release - study
Plus: Most US iDevice users are drinking Cupertino's latest Koolaid
No luck at all for BlackBerry as Messenger apps launch stalls
Leaked Android build 'causes issues,' is withdrawn
App Store ratings mess: What do we like? Sigh, we dunno – fanbois
How do I know what to download if I don't know what everyone else is doing?
OUCH: Google preps ad goo injection for Android mobile Gmail app
Don't worry, fandroids, wallet-plumping serum won't hurt a bit
Launchpads, catapults... what a load of - WAIT, there's £15m for grabs?
Quango sprinkles cash on games, animation and trendy meeja types
Apple iOS 7 makes some users literally SICK. As in puking, not upset
'Eye candy really is as bad as classical candy is for the teeth,' writes one
Google reveals its Hummingbird: Fly, my little algorithm - FLY!
Update brings Googleplex one step closer to sentience
Oracle hides ExaLogic price cut
Old price lists prove price halved, so why has Big Red deleted the post announcing it?
prev story