Original URL: http://www.theregister.co.uk/2012/07/24/translattice_elastic_database_postgres/

Postgres-on-steroids wields bare metal in Oracle, IBM skirmish

TransLattice's database doesn't need no stinkin' OS

By Timothy Prickett Morgan

Posted in Applications, 24th July 2012 13:26 GMT

Distributed database provider TransLattice is taking the fight to Oracle and IBM: it's breaking its TransLattice Elastic Database, or TED, free of its database appliances and selling it on bare metal or virtual machine instances.

TransLattice was founded in November 2007 and came out of stealth mode in August 2010. The company was founded by Frank Huerta (CEO), Michael Lyle (CTO), and Robert Geiger (VP of engineering) - all veteran IT execs who formed Recourse Technologies a decade earlier to create a distributed threat security and network intrusion system, which Symantec bought in 2002 for $135m in cash.

All three have deep expertise in distributed systems, and have applied those skills to distributed databases. Having seen what the three founders were up to, which was turning the open-source PostgreSQL database into an inherently scalable and resilient database and application runtime platform, private equity firm DCM kicked in $9.5m in Series A funding in August 2008 to the 25-person startup.

TransLattice started ramping up sales into the enterprise last July with the debut of its TransLattice Application Platform 2.0, an appliance server based on Dell PowerEdge servers as well as a Xen-based virtual machine version of the distributed database that was deployable on Amazon's EC2 compute cloud.

After pushing the appliance variant of the database for a bit and beefing up the SQL capabilities of the TransLattice database, Lyle tells El Reg that the company is now ready to sell its distributed database as free-standing code that companies can deploy on their own X86 boxes, either on bare metal or on virtualised instances running atop VMware's ESXi hypervisor.

Lyle says that about half of the internal development systems are based on Red Hat's KVM hypervisor, so getting the distributed database certified on KVM is just a matter of time and paperwork since the company already knows it works.

Translattice logo

The TransLattice Elastic Database announced today is at the 2.5 release level, and Lyle says that the company has made "major strides" in the amount of SQL and database primitives that the database supports.

He's comfortable selling the database as a standalone product rather than as a foundational element of its appliance platform, which was sold as a J2EE framework based on Enterprise JavaBeans and Java ServerPages. Not everyone wants to re-certify their applications on this Java environment, and so TransLattice is offering the underlying database as a product in its own right.

TransLattice Elastic Database 2.5 has PostgreSQL 9.0 as its foundation, and the upper levels of the database management system look and smell like normal Postgres.

But the underlying guts of the database, including the locking mechanisms and data storage methods, have been completely replaced with a new set of code.

This new software can create a database cluster like Oracle's Real Application Clusters (RAC) or IBM's PureScale for DB2, and do policy-driven database sharding - effectively partitioning databases and spreading bits of them over multiple servers - as well as replication over a wide-area network.

Companies can therefore craft a geographically distributed processing environment that can also stay in compliance with government and company regulations about what data can be where and what transactions can be done in what locations in the world.

TransLattice shards, replicates, and provides distributed access to data

TransLattice shards, replicates, and provides distributed access to data (click to enlarge)

Protecting sensitive information

The modified Postgres database created by TransLattice has what the company calls a distributed global consensus protocol that is used to commit transactions across the nodes in the cluster and deal with dependences.

The first and important thing is that data underneath the tweaked Postgres is sharded, with multiple pieces of tables stored on many different nodes across the cluster. The nodes can be in the same room, like RAC or PureScale clusters, or they can be spread around the world and linked by wide-area networks.

With normal database clusters, you have to keep the nodes within 200 feet or so of each other or the latencies have a dramatically bad effect on performance, says Lyle.

The TransLattice Elastic Database uses rules set by system and database administrators to determine the level of replication and resilience necessary for the company's applications, and then watches how transactions are actually run and moves database shards to they can be local for processing in each geographical region.

This is all done transparently to the users and applications, which just see one great big honking PostgreSQL database.

An important thing about the TED database is that it also has policy controls that limit where data and transactions can be pushed to across the geographically distributed database cluster. So, for instance, you can say that a certain subset of data can never leave Germany and any transactions against that data have to be done on physical systems located in Germany.

Clusters, clusters everywhere, but how to keep them in synch?

The TED database adheres to the normal ACID properties of a database, so don't think it is a NoSQL database like other sharded data stores that are coming out to support web-style applications these days.

TED does have a different means of handling access to data, however. In a normal database, there are row and column locking mechanisms that make all of the nodes in a cluster stop working while one node commits its transactions.

With TED, the consensus protocol watches all of the transactions flowing around the cluster, noting dependencies much as a compiler does when it compiles code on an explicitly parallel machine like an Itanium server. And then the consensus protocol spreads transactions across the cluster, ordering up the transactions, and commits them all on their respective nodes in a giant batch in one fell swoop.

The effect is that the database nodes are working all the time, and the substantial geographical latencies are masked by the clever data sharding and transaction commit methodology. A typical customer might have tens of thousands of database shards spread around their distributed cluster, all being kept in synch by the distributed global consensus protocol.

The TED management system similarly has a distributed query planner that understands the physical locations of database nodes and where the replicated data shards are located and takes ad hoc queries in and comes up with the fastest method to run those queries across the nodes in real time.

Obviously, high availability and database scalability are inherent in the TED design. If you need more capacity, you just fire up some more nodes in the places where the transaction loads are heaviest. You also have to beef up the locations where data is replicated for safe keeping, too.

Translattice versus DB2 and Oracle

Cost of TransLattice Elastic Database versus clustered DB2 and Oracle databases

The TransLattice application platform appliance and the free-standing database both run on a hardened version of Canonical's Ubuntu Server that has had everything but the necessary features stripped out of it. This Linux image runs in a RAM disk on the server so it cannot be corrupted or changed in any way, claims Lyle.

The TED database doesn't need a full-fat operating system below it: the suite can be run on bare metal servers or packaged up in an Open Virtualization Format (OVF) and deployed atop the ESXi hypervisor. If you really want to, you could manually hack TED onto Microsoft's Hyper-V hypervisor, and TransLattice has done a lot of tweaking to the virtual instance so it can be packaged up into an Amazon Machine Image (AMI) format and run across virtual server instances on the EC2 compute cloud.

Lyle says that the database clustering runs atop TCP/IP and uses SSL encryption, and that gigabit Ethernet links between the nodes is sufficient for normal transaction processing. You don't need to go to 10GE or 40GE Ethernet or QDR or FDR InfiniBand.

"Even with 500 kilobit satellite transponders with 700 millisecond latencies in the field, the database still works," says Lyle.

Hewlett-Packard and Dell need a database of their own to build their software empires, and TransLattice might just fit the bill.

Pricing for the TED 2.5 distributed database is based on a node licence, whether it is a physical or virtual node. A single licence covers 16 cores (physical or virtual), up to 16 LUNs on the file system, and up to 96GB of main memory. Any time you bust through one of those limits on a box or a VM, you need to buy another licence. That license costs $79,000 per node, which includes one year of tech support and updates for the database.

If you want to run it on the TAP Appliance, which has specific tunings for PowerEdge server iron and related storage plus the tuned J2EE runtime, add another $18,000 for the physical box.

A five-node TED licence for a development cluster, which has no CPU, LUN, or memory restrictions but which cannot be used to run production workloads, costs $31,600. ®