ParAccel flashes data warehouses

Thinking in columns

Secure remote control for conventional and virtual desktops

Redmondian roots

ParAccel got its start as an appliance maker front-ending Microsoft's SQL Server database to speed up queries and has gradually transformed itself into a seller of a free-standing database for analytics and data warehousing. The software runs on a stripped down version of Red Hat Linux, which ParAccel has cut all the fat out of and is given just the features needed to run the database.

The software is supported on just about any x64 server, and starting at the end of 2008, the company tapped EMC's Clarrion CX4 arrays as the preferred SAN storage for companies that wanted to use a mix of local and SAN storage for their data warehouses. This was called the Scalable Analytic Appliance.

In May of this year, the SAA II appliance was announced using the Clariion CX4 arrays. The base configuration of this setup comes with eight x64 server nodes and a CX4 model 240 array. (This is a soft bundle, meaning you have to buy the parts yourself, but they are certified to work together). With today's announcement, customers can plug in any flash-based storage device that goes directly into the PCI-Express bus of the server, which is what the database and the operating system can see. You can't use a disk controller with lots of flash drives hanging off it since the database doesn't know how to talk to the controllers; it wants to talk directly to flash.

In the sample rack configuration of the SAA II appliance, ParAccel has eight two-socket Dell 2U PowerEdge servers as compute nodes; each has 24 small form factor 500 GB SAS disks, four Gigabit Ethernet or two 10 Gigabit Ethernet ports. The rack includes one leader server node for managing the database cluster nodes in the rack and a hot standby server. The rack has a CX4-240 or CX4-480 array, which can house up to 60 2 TB disks. With compression on the data, this setup has an effective capacity north of 500 TB.

This setup can deliver database scans on the order of 2,400 MB/sec per server, according to Stanick. Shifting to a flash configuration that uses eighteen dual-socket Xeon 5500 1U PowerEdge servers, each with three SAS drives and two Fusion-io 640 GB flash drives plus the hot spares and the CX4 SAN also has an effective capacity of 500 TB (compressed). Given this, a single server node in the SAA II setup can deliver 2,800 MB/sec per server in database scans and takes up a smaller footprint and uses a lot less energy, too. By switching to flash for some of the local storage, you can get 2.6 times as much oomph chewing on that 500 TB of data.

In June, with the launch of PADB 2.0, a feature called blended scan started shipping, which is one of the reasons why adding lots of flash doesn't boost performance on an individual server. This feature already is boosting performance. Here's how it works. In a typical server node in a data warehouse cluster, each disk is mirrored (RAID 1) so a disk failure doesn't result in the loss of data. So if you have a typical four-node database cluster, with each node having eight drives, only half of them are doing useful work, yielding a scan rate of about 800 MB/sec.

If you hook the four nodes up to a SAN that has 56 mirrored disks, you might see a scan rate of 1,200 MB/sec. With blended scanning, which ParAccel is trying to get a patent on, you designate the disks out on the SAN as being the authorized copy of the data and you mirror there and then use the local disks on the server nodes as a cache for data. The scans run across a mix of the local and SAN disks, yielding a scan rate of 2,800 MB/sec (twice the rate of the four nodes because all the disks are doing useful work in the nodes plus making use of the SAN bandwidth).

The PADB 2.0 analytics database has a list price of $100,000 per TB, but discounts are available for volume purchases. ParAccel, which has several dozen paying customers (including PriceChopper, OfficeMax, Merkle, and Autometrics), also sells the software under a subscription model for $5,000 per TB per month. ®

Internet Security Threat Report 2014

More from The Register

next story
Fat fingered geo-block kept Aussies in the dark
NASA launches new climate model at SC14
75 days of supercomputing later ...
Yahoo! blames! MONSTER! email! OUTAGE! on! CUT! CABLE! bungle!
Weekend woe for BT as telco struggles to restore service
You think the CLOUD's insecure? It's BETTER than UK.GOV's DATA CENTRES
We don't even know where some of them ARE – Maude
Trio of XSS turns attackers into admins
Cloud unicorns are extinct so DiData cloud mess was YOUR fault
Applications need to be built to handle TITSUP incidents
BOFH: WHERE did this 'fax-enabled' printer UPGRADE come from?
Don't worry about that cable, it's part of the config
Astro-boffins start opening universe simulation data
Got a supercomputer? Want to simulate a universe? Here you go
prev story


Why and how to choose the right cloud vendor
The benefits of cloud-based storage in your processes. Eliminate onsite, disk-based backup and archiving in favor of cloud-based data protection.
Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
10 threats to successful enterprise endpoint backup
10 threats to a successful backup including issues with BYOD, slow backups and ineffective security.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Protecting users from Firesheep and other Sidejacking attacks with SSL
Discussing the vulnerabilities inherent in Wi-Fi networks, and how using TLS/SSL for your entire site will assure security.