MongoDB 2.0 debuts with shrunken indexes
'25% smaller, 25% faster'
The MongoDB community and its 10gen overseers have released version 2.0 of the distributed "NoSQL" database, saying the new incarnation improves concurrency while reducing the size and boosting the speed of indexes.
According to a blog post from Eliot Horowitz – chief technology officer at 10gen, the outfit that founded MongoDB and provides commercial support for the platform – this is a "significant new release", but he adds that it's no more significant than the previous leap from version 1.6 to version 1.8.
Horowitz says that the concurrency improvements in version 2.0 are the beginning of a much larger effort in this area. Basically, the idea is ensure that a server will not hold a write lock when it's reading data off of disk – i.e., when there's a "page fault".
"If you hold a lock during a page fault, then for the next five milliseconds or so, no one else can do anything even though you know you're not going to do anything but hit the disk," Horowitz tells The Register. "Before you hit disk, we unlock. Then you do the page fault. And then we reapply the lock and continue working."
With version 2.0, this doesn't happen with all page faults. But according to Horowitz, it happens with many of the "hot" scenarios that were causing users problems.
Separately, the new version is meant to boost the overall performance of the database by making it easier to keep indexes in memory. "We've basically optimized the way we store indexes on disk," Horowitz says. "We're made them about 25 per cent smaller in most cases. That can have a major impact, because any indexes that are larger than RAM can cause problems.
At the same time, developers have optimized the actual processing of indexes, so that simple index lookups are 25 per cent faster on average, according to Horowitz. But he acknowledges the index size and performance will vary depending on the situation. When you upgrade to the new version, you benefit from these index enhancements only if you create a new index or re-index an old one.
With version 2.0, authentication works with shared clusters, and there are two changes to "replica sets", groups of nodes that work together and share data. Node priority with a set is established, and you can tag members with their physical location, to lock down rules for individual data centers, racks, and servers.
"Priorities let you have nodes that you prefer to be primary if you have a non homogeneous environment. Tagging lets you guarantee writes hit certain groups of servers. One use case for this is guaranteeing a new user registration is written to two data centers before acknowledging to a user," Horowitz says in his post.
Today, 10gen also announced that it has raised an additional $20 million, including funds from venture capital outfits Sequoia Capital, Flybridge Capital, and Union Square Ventures. This brings the company's total funding to $31 million. The outfit claims 400 customers, and 100,00 downloads of the open source database a month.
Like CouchDB and other NoSQL offerings, MongoDB discards the familiar relational database model in favor of a distributed platform tailored for today's web applications. "By reducing transactional semantics, we could still solve an interesting set of problems, but we could also scale," 10gen CEO Dwight Merriman has said. 10gen offers support, training, and consulting services for the database as well as serving as the open source project's primary steward. ®
Update: This story has been updated with additional comment from 10gen's Eliot Horowitz.